Comments (4)
You could also stream the data into the various metrics collection tools. For example, something like samtools view -b <in.bam> <region> | java -jar CollectWgsMetrics.jar I=/dev/stdin ...
. This would allow you to select only the regions in which you are interested.
If you are up for it, we would be happy to take a look at a pull request for this tool.
from picard.
Fair enough. Streaming is a good approach, though I'm not sure if the reference will also need modifying to get sensible output. That is something I will play with when I get the time and try to adjust/fix if needed.
Thanks for the quick response.
from picard.
Sorry, I don't thin your suggestion (streaming from samtools view) will actually work.
The problem is the header and reference.
To get the proper GENOME_TERRITORY (and serveral other things I suspect), the reference needs to be modified to just include the region/chrom of interest. That isn't difficult (samtools faidx), but does remove the possibility of a single command without intermediate files.
However, a critical problem occurs when picard is looking for regions in the reference base upon the bam file. Samtools doesn't remove entries outside of the selected region from the bam header. I've tried modifying the bam header (removing all chroms but the one of interest), but just can't get it to work. CollectWgsMetrics keeps looking for entries which aren't in the sequence dictionary... I have no idea why, since everything but a single chromosome has been removed from the bam header and alignments as well as the reference.
Anyway, even if I could get this to work (I've put several hours of trying into it already), the complexity required makes it far from a viable option IMO.
It seems like it would be much simpler to just add functionality to the Collect*Metrics commands to enable selecting a region. I imagine that the readers (BAMFileReader and such) already have most of the required functionality, but I'm not familiar enough with the code (nor have the time) to do it myself.
from picard.
I've added the functionality needed (and fixed a minor bug).
Please see my existing pull request, which was originally for a different (very minor) issue.
#87
Specifically, see this diff.
travc@2b039dc
At least a couple of people on BioStars and elsewhere have asked for this functionality, so it would probably be good if it could be merged soon. The changes are rather simple.
from picard.
Related Issues (20)
- Streaming from requester-pays buckets for CrosscheckFingerprints doesn't work in Terra HOT 2
- StringIndexOutOfBoundsException at LiftoverVcf HOT 3
- Error: A JNI error has occurred, please check your installation and try again HOT 1
- MarkDuplicates: Instead of adding the read group of a read to the key, picard always adds "RG"
- java.lang.NegativeArraySizeException HOT 2
- genotypeConcordance irregular behavior in counting after applying filters HOT 3
- MarkDuplicates with single end data does not write the full report HOT 3
- Picard Build Failure HOT 6
- java environment not detected when using option OUTPUT in BuilBamIndex HOT 1
- Getting Segfault when running ReorderSam via GATK HOT 1
- How to control the number of threads HOT 1
- Edge case in LiftoverVcf causes cryptic string exception HOT 8
- TAGGING_POLICY=OpticalOnly tags everything HOT 4
- Issue Running Picard MarkDuplicates: File Compression Error with Snappy HOT 1
- about MarkDuplicates HOT 1
- UnMarkDuplicate HOT 1
- Running picard in parallel issue HOT 3
- `--USE_END_IN_UNPAIRED_READS` and `--USE_UNPAIRED_CLIPPED_END` options for `picard MarkDuplicates` have no effect HOT 9
- MarkDuplicates SAM validation error HOT 6
- Optical duplicates HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from picard.