Regards I am analyzing some bacterial strains in which I am sure there are RGPs an

How can the search parameters be modified? about ppanggolin HOT 4 CLOSED

labgem commented on June 19, 2024

How can the search parameters be modified?

from ppanggolin.

Comments (4)

axbazin commented on June 19, 2024 2

Yes absolutely.
Normally on your previous analysis, a pangenome.h5 file was generated, you can reuse it to rerun parts of the workflow.
The simplest option is to change the minimum required length for an RGP to be predicted. The default is 3000bp. If you want the predicted RGPs to be of at least 5000 bp for example, you can use the '--min_length' option along with this pangenome.h5 file, as such:

ppanggolin rgp -p pangenome.h5 --min_length 5000

Another possibility is modifying the minimum score threshold. The default threshold for that score is 4 which roughly means that you need at least 4 shell or cloud genes close together to get a RGP, when other parameters are set to default.

If you feel like this is not strict enough, and only want the regions with a lot more genes, you can change this threshold, as such:

ppanggolin rgp -p pangenome.h5 --min_score 8

This will set the threshold to 8 instead of the default 4.

There are other parameters, but they are less straight forward to explain. You can see them all by running ppanggolin rgp -h.

Afterward, you can regenerate the 'plastic_regions.tsv' file by running

ppanggolin write -p pangenome.h5 --regions --output MyNewRegionsOutputDir

If you do start tweaking the parameters, you might find the following command useful:

ppanggolin info -p pangenome.h5 --parameters
which will list the parameters used to compute the results currently stored in the .h5 file for all the steps of the analysis.

from ppanggolin.

axbazin commented on June 19, 2024 1

Hello

Taken alone, those two parameters kind of oppose each other.

Persistent penalty default is 3. Decreasing it might fuse two RGPs that are close together along the genome but separated by some persistent genes. Increasing it might divide RGPs into multiple components if there are persistent genes included in them.

Variable gain default is 1. Increasing it might fuse two RGPs that are close together along the genome, while decreasing it might divide RGPs into multiple components if there are persistent genes included in them.
And both of those parameters will impact the score of the RGPs that are predicted.

In any case however, having persistent genes in the middle of RGPs is relatively rare, so modifying those parameters slightly should not have a lot of impact, while changing them greatly might not give you biologically meaningful results anymore, as you may group RGPs together over long stretches of persistent genes.

If you want to understand more in detail how all of those parameters interact, the full method is detailed in this preprint : https://www.biorxiv.org/content/10.1101/2020.03.26.007484v1.full
In part "2.1 - panRGP method"

In part 2.1.1, parameter p in the formula corresponds to persistent penalty, parameter v to variable gain
In part 2.1.2, parameter s min is "min_score" and l min is "min_length" I was talking about previously.

Only 2.1.1 and 2.1.2 will be of interest for understanding how the RGPs are predicted.

If something is unclear, do not hesitate to ask more questions :)

from ppanggolin.

mikkelbregovic commented on June 19, 2024

Hello!
Could you briefly explain the options

--persistent penalty

variable_gain

If I increase or decrease these values, what should I expect?

Thanks for taking the time to answer these basic things.
Really appreciated

from ppanggolin.

axbazin commented on June 19, 2024

Since this is from may and there has been no other questions since, I will close this issue. If you have any other question please do not hesitate to reopen it.

from ppanggolin.

How can the search parameters be modified? about ppanggolin HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent