Comments (16)
Emailed the user for a data snippet.
from picard.
This issue is not yet resolved. I am willing to share a data snippet for you to reproduce the error. Please let me know how I can upload the BAM file. Also the reference is streptomyces rapamycinicus available from NCBI http://www.ncbi.nlm.nih.gov/nuccore/NC_022785.1 (I can also upload that if you need).
from picard.
@visivas, thanks for being willing to share your data. Please follow the instructions given at https://www.broadinstitute.org/gatk/guide/article?id=1894
If you could please include the reference genome in the same data upload that would make it much easier for us, thanks.
from picard.
Alright, I have uploaded all the required files as a tar.gz file (GcBiasTroubleShoot.tar.gz). Please let me know if you need anything else. Thanks for working on this issue.
from picard.
Thanks @visivas, we'll try to look at this soon.
from picard.
To recap the issue, the user was running CollectGcBiasMetrics on some bacterial genomes but it was failing with error:
Exception in thread "main" net.sf.samtools.SAMException: Exception counting
mismatches for read MAC1-WDL30341:66:000000000-ABTGC:1:2111:13507:4770 2/2
0b aligned read.
User's input file validates with no errors (with IGNORE= MATE_NOT_FOUND since the input is a snippet).
But running GATK PrintReads to subset the culprit read fails with error:
##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/Users/vdauwera/codespace/sandbox/GcBiasTroubleShoot/AS_ATCC.bam} is malformed: the BAM file has a read with no stored bases (i.e. it uses '*')
I'm not sure what Picard's position is on reads with no stored bases. Looks like ValidateSamFile is letting it go but CollectGcBiasMetrics chokes on it. I haven't specifically tested whether that is the actual culprit, but it seems likely since the error occurs when counting bases and the tool reports seeing 0b.
@nh13 Would you care to opine? Should ValidateSamFile catch this?
from picard.
From the sam spec:
SEQ: segment SEQuence. This field can be a ‘*’ when the sequence is not
stored.
So the sam is valid. CollectGcBiasMetrics should protect against empty
bases.
The issue seems to be here: htsjdk/samtools/util/SequenceUtil.java:336
where, if there are alignment blocks, but no seq string then there will be
indeed an OOB exception.
from picard.
I am glad that someone is taking a look at this. Any idea when a fix will be issued?
thank you!
from picard.
There is an exception while processing reads, because of symbol *
in SEQ field. SAM/BAM format specification allows this but CollectGcBiasMetrics can not process such situation correctly. We suggest to omit reads with *
is SEQ field and log message. Also, we added new unit test to check correctness. The fix for this issue will be included into the pull request regarding the following related issue #342
from picard.
@vdauwera I can take care of this but I do not have permission/access for this repository.
from picard.
can you do it with a fork?
On Thu, Sep 29, 2016 at 12:36 PM, Ron Levine [email protected]
wrote:
@vdauwera https://github.com/vdauwera I can take care of this but I do
not have permission/access for this repository.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#138 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACnk0tqoJECA7Cch43N6SMs6JVELyFCtks5qu-kKgaJpZM4DKRWw
.
from picard.
@yfarjoun We have the pull request which already fixes this issue: #646
from picard.
should be fixed. thanks @ZLyanov
from picard.
Hello,
I am using CollectGcBiasMetrics
of the latest picard (2.18.27), but I still have this error.
java -XX:ParallelGCThreads=8 -Djava.io.tmpdir=/tmp -jar /software/picard_2.18.27/picard.jar CollectGcBiasMetrics I=M0.A.005.sorted.marked.filtered.bam O=gc_bias_metrics.txt CHART=gc_bias_metrics.pdf S=summary_metrics.txt R=/RefData/Mus_musculus/Bowtie2Index/mm10.fa VALIDATION_STRINGENCY=SILENT
Exception in thread "main" htsjdk.samtools.SAMException: Exception counting mismatches for read ST-E00494:563:HTKTKCCXY:7:1217:7334:52625 2/2 150b aligned to chr1_GL456210_random:16155-16304.
at htsjdk.samtools.util.SequenceUtil.countMismatches(SequenceUtil.java:476)
at htsjdk.samtools.util.SequenceUtil.countMismatches(SequenceUtil.java:452)
at htsjdk.samtools.util.SequenceUtil.countMismatches(SequenceUtil.java:490)
at picard.analysis.GcBiasMetricsCollector.addRead(GcBiasMetricsCollector.java:389)
at picard.analysis.GcBiasMetricsCollector.access$600(GcBiasMetricsCollector.java:48)
at picard.analysis.GcBiasMetricsCollector$PerUnitGcBiasMetricsCollector.addReadToGcData(GcBiasMetricsCollector.java:221)
at picard.analysis.GcBiasMetricsCollector$PerUnitGcBiasMetricsCollector.acceptRecord(GcBiasMetricsCollector.java:155)
at picard.analysis.GcBiasMetricsCollector$PerUnitGcBiasMetricsCollector.acceptRecord(GcBiasMetricsCollector.java:100)
at picard.metrics.MultiLevelCollector$AllReadsDistributor.acceptRecord(MultiLevelCollector.java:192)
at picard.metrics.MultiLevelCollector.acceptRecord(MultiLevelCollector.java:315)
at picard.analysis.CollectGcBiasMetrics.acceptRead(CollectGcBiasMetrics.java:179)
at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:145)
at picard.analysis.SinglePassSamProgram.doWork(SinglePassSamProgram.java:84)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:295)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 16299
at htsjdk.samtools.util.SequenceUtil.countMismatches(SequenceUtil.java:468)
... 15 more
This reads looks like this:
samtools view M0.A.005.sorted.marked.filtered.bam | grep 'ST-E00494:563:HTKTKCCXY:7:1217:7334:52625'
ST-E00494:563:HTKTKCCXY:7:1217:7334:52625 99 chr1_GL456210_random 15967 30 150M = 16155 338 CATACTCAGACGAAGTCATAGAGGCAGAACCAAGGATCAAAATAATGGGGAATGAGATGTGTAATGAAGGAACAGATGGGTTATAGATCAATGGTTTGAGAGAATTGTGTCTGTGGATAGGGAGACAAAAGGAAAGTTATAGTATTTTAA AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJFJJJJJJJJJAAJJJJFFFJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJAJJJJJJJJJJJJJJJJJFJJF-FJJJ MD:Z:150 PG:Z:MarkDuplicates XG:i:0 NM:i:0 XM:i:0 XN:i:0 XO:i:0 AS:i:0 XS:i:-12 YS:i:0 YT:Z:CP
ST-E00494:563:HTKTKCCXY:7:1217:7334:52625 147 chr1_GL456210_random 16155 30 150M = 15967 -338 TGATAGTATTCTTTTCCCATCACCAACTCATCCCAGATCCTCTCCAAATTCTGTCTCTTGCCTGTATTTGTGTGTCTCTTGCTCTCTCTTAAAAAATAAAAACAAAGCGAAAGAACAAAAAAAATACTCAAACAAGAGAAGCCTCCACCC JJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFFFAA MD:Z:150 PG:Z:MarkDuplicates XG:i:0 NM:i:0 XM:i:0 XN:i:0 XO:i:0 AS:i:0 XS:i:-6 YS:i:0 YT:Z:CP
And with ValidateSamFile
module, I got one error "Read groups is empty". I guess this does not matter.
The reference used was downloaded from ftp://ftp.ccb.jhu.edu/pub/data/bowtie2_indexes/mm10.zip.
I don't know how to solve this. Any help or suggestions would be appreciated.
Bests,
Yiwei Niu
from picard.
Hello @YiweiNiu,
Thanks for your interest.
Do maximize the quality and efficiency of the response to your questions please:
- Do not reuse closed tickets for new issues,
- Do post user problems like this in the GATK forum https://gatkforums.broadinstitute.org/gatk/categories/gatk-support-forum
from picard.
Sorry. I did not know that.
from picard.
Related Issues (20)
- java.lang.NegativeArraySizeException HOT 2
- genotypeConcordance irregular behavior in counting after applying filters HOT 3
- MarkDuplicates with single end data does not write the full report HOT 3
- Picard Build Failure HOT 6
- java environment not detected when using option OUTPUT in BuilBamIndex HOT 1
- Getting Segfault when running ReorderSam via GATK HOT 1
- How to control the number of threads HOT 1
- Edge case in LiftoverVcf causes cryptic string exception HOT 8
- TAGGING_POLICY=OpticalOnly tags everything HOT 4
- Issue Running Picard MarkDuplicates: File Compression Error with Snappy HOT 1
- about MarkDuplicates HOT 1
- UnMarkDuplicate HOT 1
- Running picard in parallel issue HOT 3
- `--USE_END_IN_UNPAIRED_READS` and `--USE_UNPAIRED_CLIPPED_END` options for `picard MarkDuplicates` have no effect HOT 9
- MarkDuplicates SAM validation error HOT 6
- Optical duplicates HOT 4
- ValidateSamFile wrong NM tag computation HOT 3
- CollectWgsMetrics error, out of bound of the index? HOT 8
- Lost output/information during RenameSampleInVcf HOT 11
- Can CollectWgsMetrics fall back to slow algorithm gracefully if the fast one fails with the default read length? HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from picard.