Comments (10)
Hi Brian, thanks for the report.
Do you know the coverage of this input bam?
I have a suspicion that this bam might have a low coverage, hence Sniffles cannot estimate error rate and other parameters from the few reads.
from long-read-pipelines.
@SHuang-Broad I think you're right.
I wasn't able to find any specific coverage report for this bam, so I ran samtools idxstats
on it and it appears it has one read aligned on chr2, and that's it. So it's not terribly surprising it failed.
I guess it would be nice if the pipeline failed a little more gracefully and maybe was able to continue on? But there's really nothing here to do much with...
from long-read-pipelines.
Wow, one read? Is this a test BAM file or real data?
from long-read-pipelines.
@brigranger hmm.... Typically variants is the last step for the pipelines now, so if this one particular task fails, no downstream tasks are severely impacted practically speaking.
But I do think there could be more QC done at the beginning (or maybe middle) of the workflow, i.e. quit early when things are really suspicious.
What do you think, @kvg ?
from long-read-pipelines.
This is an interesting one. Thinking ahead to amplicon sequencing, there will certainly be cases where we parallelize over chromosomes and some will have no data. So it looks like we'll have to figure out a way to protect ourselves from those kind of failures when the tools we're running don't have those protections built-in themselves.
How to do that robustly will require some thought. @brigranger can you send us a link to the original input BAM?
from long-read-pipelines.
@kvg You should be able to find it here: gs://broad-gp-pacbio/r64020_20200116_203442/2_B01/
from long-read-pipelines.
I looked at the subreads bam, it seems to be a low yield CCS flowcell.
gsutil du -sh gs://broad-gp-pacbio/r64020_20200116_203442/2_B01/m64020_200118_025318.subreads.bam
151.93 GiB gs://broad-gp-pacbio/r64020_20200116_203442/2_B01/m64020_200118_025318.subreads.bam
So it looks like something in CCS that produced ultralow depth.
from long-read-pipelines.
@kvg , yes, I believe this is ultimately tied to what QC we put in each step of the pipeline, as by nature has to be an implement-as-we-encounter issue, considering that there could be many places where things go wrong.
from long-read-pipelines.
@brigranger is this the only occasion where you see the error?
If so, I'm going to close this and deal with #113 instead as a solution.
from long-read-pipelines.
from long-read-pipelines.
Related Issues (20)
- Replace LRConvertBCF workflow with a more efficient and up-to-date solution
- CNV pipeline needs to finalize results
- Investigate if finalization steps could be improved with `gcloud storage cp`
- ONTProcessBasecall Input Issue
- update all pacbio software to those packaged in the SMRTLink release
- Make Finalize.wdl tasks volatile
- Long read pipeline Canu issue with Mouse Genome reference HOT 2
- Migrate all viable docker images to google artifact registry
- Workflow Documentation HOT 3
- Documentation for Epi and CNV pipelines
- Assembly pipeline output should give bgz files not gz files
- `lr-metrics` docker refers to a non-official build of GATK
- bump DV version to 1.6
- Make the finalization tasks take a manifest
- Update Dockerfiles for Faster Conda Dependency Resolution
- Use sysmetic monitor to check GCP's preemption rates HOT 1
- Install samtools with libdeflate enabled HOT 1
- Prep for SMRTLink v13
- API to pull WDL documentation from Dockstore to Terra
- Make preemption dependent on expected length
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from long-read-pipelines.