Code Monkey home page Code Monkey logo

Comments (7)

umahsn avatar umahsn commented on August 25, 2024

Hi Jonatan,

Thank you for pointing this out. We made some changes to the indel candidate site selection and reorganizing the code to become more modular, and this might be causing an increase in runtime. I reran some tests and it does seem that there is an increase in runtime. While I fix this issue, you can try to use an older release (less than v0.4) which has similar indel performance and same SNP performance. Additionally I can try to create a branch that provides older candidate selection method as an option and uses the same API as v0.4 release.

from nanocaller.

umahsn avatar umahsn commented on August 25, 2024

Another thing, if you used human reference genome for variant calling, did you use --exclude_bed option set to hg19 or hg38? Setting this parameter removes telomeric and centromeric regions from variant calling and can significantly increase speed because these regions have very high alignment error which gives rise to too many variant candidates, especially in chr1 centromere which can end up taking several hours just by itself.

In our paper, we used this parameter to report runtime.

from nanocaller.

Akazhiel avatar Akazhiel commented on August 25, 2024

Hello!

Thanks for all the input and tips. I'll try using a previous version and check how faster it goes. Eventually we'll be running this in a HPC with access to more CPUs which will increment the speed by a lot but I still found it weird to be so slow in 8 CPUs, it's taken 5 days just to complete one sample.

For your second reply yes, I did use the --exclude-bed option for hg38.

from nanocaller.

umahsn avatar umahsn commented on August 25, 2024

Just for context, can you tell me what is the coverage of your BAM file, and if you know which Guppy version was used to basecall the reads?

from nanocaller.

Akazhiel avatar Akazhiel commented on August 25, 2024

Hello!

The average coverage of the BAM file if I didn't calculate it in a wrong way because there are a lot of ways of computing it and I always fail to find an easy and straightforward one, is 20x. As for the Guppy version was the 3.4.5.

from nanocaller.

umahsn avatar umahsn commented on August 25, 2024

Hi Jonatan,

It turns out that the problem was being caused by this commit: 2546959, so I have reverted the changes from that commit in v0.4.1 (both in this repo and docker). You should be able to get a ~40% reduction in runtime compared to v0.4.0 and the performance will be similar to the one reported in our paper.

During this testing I found several other areas of runtime improvement, for instance replacing biopython's pairwise alignment algorithm with one that is implemented in C. I will be releasing these improvements over the next few weeks.

Also, NanoCaller logs report coverage which is calculated for SNP calling. If you use NanoCaller_WGS.py, these logs will be in the output/logs/ directory, or in case of NanoCaller.py just printed to stdout like in this example: https://github.com/WGLab/NanoCaller/blob/master/sample/log

from nanocaller.

Akazhiel avatar Akazhiel commented on August 25, 2024

Hello Mian!

That's great news! Thanks for looking into it and fixing it in such a quick time. I reckon the improvements of replacing Biopython will come with a future version and that are not yet implemented in the v0.4.1? I'm looking forward to it!

On another note I've checked the logs and it seems that on average the coverage was 20x for the sample tested. According to a discussion with the author of PEPPER which is another tool for SNP/Indel calling we agreed that 20x seems to be low to call these type of variants since it'll call almost everything it finds. Hopefully for the samples I asked to be sequenced I'll have more coverage and the calling will be more precise.

Best regards,

Jonatan

from nanocaller.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.