Code Monkey home page Code Monkey logo

Comments (4)

axbazin avatar axbazin commented on June 19, 2024 2

Yes absolutely.
Normally on your previous analysis, a pangenome.h5 file was generated, you can reuse it to rerun parts of the workflow.
The simplest option is to change the minimum required length for an RGP to be predicted. The default is 3000bp. If you want the predicted RGPs to be of at least 5000 bp for example, you can use the '--min_length' option along with this pangenome.h5 file, as such:

ppanggolin rgp -p pangenome.h5 --min_length 5000

Another possibility is modifying the minimum score threshold. The default threshold for that score is 4 which roughly means that you need at least 4 shell or cloud genes close together to get a RGP, when other parameters are set to default.

If you feel like this is not strict enough, and only want the regions with a lot more genes, you can change this threshold, as such:

ppanggolin rgp -p pangenome.h5 --min_score 8

This will set the threshold to 8 instead of the default 4.

There are other parameters, but they are less straight forward to explain. You can see them all by running ppanggolin rgp -h.

Afterward, you can regenerate the 'plastic_regions.tsv' file by running

ppanggolin write -p pangenome.h5 --regions --output MyNewRegionsOutputDir

If you do start tweaking the parameters, you might find the following command useful:

ppanggolin info -p pangenome.h5 --parameters
which will list the parameters used to compute the results currently stored in the .h5 file for all the steps of the analysis.

from ppanggolin.

axbazin avatar axbazin commented on June 19, 2024 1

Hello

Taken alone, those two parameters kind of oppose each other.

Persistent penalty default is 3. Decreasing it might fuse two RGPs that are close together along the genome but separated by some persistent genes. Increasing it might divide RGPs into multiple components if there are persistent genes included in them.

Variable gain default is 1. Increasing it might fuse two RGPs that are close together along the genome, while decreasing it might divide RGPs into multiple components if there are persistent genes included in them.
And both of those parameters will impact the score of the RGPs that are predicted.

In any case however, having persistent genes in the middle of RGPs is relatively rare, so modifying those parameters slightly should not have a lot of impact, while changing them greatly might not give you biologically meaningful results anymore, as you may group RGPs together over long stretches of persistent genes.

If you want to understand more in detail how all of those parameters interact, the full method is detailed in this preprint : https://www.biorxiv.org/content/10.1101/2020.03.26.007484v1.full
In part "2.1 - panRGP method"

In part 2.1.1, parameter p in the formula corresponds to persistent penalty, parameter v to variable gain
In part 2.1.2, parameter s min is "min_score" and l min is "min_length" I was talking about previously.

Only 2.1.1 and 2.1.2 will be of interest for understanding how the RGPs are predicted.

If something is unclear, do not hesitate to ask more questions :)

from ppanggolin.

mikkelbregovic avatar mikkelbregovic commented on June 19, 2024

Hello!
Could you briefly explain the options

--persistent penalty

  • variable_gain

If I increase or decrease these values, what should I expect?

Thanks for taking the time to answer these basic things.
Really appreciated

from ppanggolin.

axbazin avatar axbazin commented on June 19, 2024

Since this is from may and there has been no other questions since, I will close this issue. If you have any other question please do not hesitate to reopen it.

from ppanggolin.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.