Code Monkey home page Code Monkey logo

Comments (10)

SHuang-Broad avatar SHuang-Broad commented on August 26, 2024

Hi Brian, thanks for the report.

Do you know the coverage of this input bam?

I have a suspicion that this bam might have a low coverage, hence Sniffles cannot estimate error rate and other parameters from the few reads.

from long-read-pipelines.

brigranger avatar brigranger commented on August 26, 2024

@SHuang-Broad I think you're right.

I wasn't able to find any specific coverage report for this bam, so I ran samtools idxstats on it and it appears it has one read aligned on chr2, and that's it. So it's not terribly surprising it failed.

I guess it would be nice if the pipeline failed a little more gracefully and maybe was able to continue on? But there's really nothing here to do much with...

from long-read-pipelines.

kvg avatar kvg commented on August 26, 2024

Wow, one read? Is this a test BAM file or real data?

from long-read-pipelines.

SHuang-Broad avatar SHuang-Broad commented on August 26, 2024

@brigranger hmm.... Typically variants is the last step for the pipelines now, so if this one particular task fails, no downstream tasks are severely impacted practically speaking.

But I do think there could be more QC done at the beginning (or maybe middle) of the workflow, i.e. quit early when things are really suspicious.

What do you think, @kvg ?

from long-read-pipelines.

kvg avatar kvg commented on August 26, 2024

This is an interesting one. Thinking ahead to amplicon sequencing, there will certainly be cases where we parallelize over chromosomes and some will have no data. So it looks like we'll have to figure out a way to protect ourselves from those kind of failures when the tools we're running don't have those protections built-in themselves.

How to do that robustly will require some thought. @brigranger can you send us a link to the original input BAM?

from long-read-pipelines.

brigranger avatar brigranger commented on August 26, 2024

@kvg You should be able to find it here: gs://broad-gp-pacbio/r64020_20200116_203442/2_B01/

from long-read-pipelines.

SHuang-Broad avatar SHuang-Broad commented on August 26, 2024

I looked at the subreads bam, it seems to be a low yield CCS flowcell.

gsutil du -sh gs://broad-gp-pacbio/r64020_20200116_203442/2_B01/m64020_200118_025318.subreads.bam
151.93 GiB   gs://broad-gp-pacbio/r64020_20200116_203442/2_B01/m64020_200118_025318.subreads.bam

So it looks like something in CCS that produced ultralow depth.

from long-read-pipelines.

SHuang-Broad avatar SHuang-Broad commented on August 26, 2024

@kvg , yes, I believe this is ultimately tied to what QC we put in each step of the pipeline, as by nature has to be an implement-as-we-encounter issue, considering that there could be many places where things go wrong.

from long-read-pipelines.

SHuang-Broad avatar SHuang-Broad commented on August 26, 2024

@brigranger is this the only occasion where you see the error?
If so, I'm going to close this and deal with #113 instead as a solution.

from long-read-pipelines.

brigranger avatar brigranger commented on August 26, 2024

from long-read-pipelines.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.