Code Monkey home page Code Monkey logo

Comments (16)

vdauwera avatar vdauwera commented on August 16, 2024

Emailed the user for a data snippet.

from picard.

visivas avatar visivas commented on August 16, 2024

This issue is not yet resolved. I am willing to share a data snippet for you to reproduce the error. Please let me know how I can upload the BAM file. Also the reference is streptomyces rapamycinicus available from NCBI http://www.ncbi.nlm.nih.gov/nuccore/NC_022785.1 (I can also upload that if you need).

from picard.

vdauwera avatar vdauwera commented on August 16, 2024

@visivas, thanks for being willing to share your data. Please follow the instructions given at https://www.broadinstitute.org/gatk/guide/article?id=1894

If you could please include the reference genome in the same data upload that would make it much easier for us, thanks.

from picard.

visivas avatar visivas commented on August 16, 2024

Alright, I have uploaded all the required files as a tar.gz file (GcBiasTroubleShoot.tar.gz). Please let me know if you need anything else. Thanks for working on this issue.

from picard.

vdauwera avatar vdauwera commented on August 16, 2024

Thanks @visivas, we'll try to look at this soon.

from picard.

vdauwera avatar vdauwera commented on August 16, 2024

To recap the issue, the user was running CollectGcBiasMetrics on some bacterial genomes but it was failing with error:

Exception in thread "main" net.sf.samtools.SAMException: Exception counting
mismatches for read MAC1-WDL30341:66:000000000-ABTGC:1:2111:13507:4770 2/2
0b aligned read.

User's input file validates with no errors (with IGNORE= MATE_NOT_FOUND since the input is a snippet).

But running GATK PrintReads to subset the culprit read fails with error:

##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/Users/vdauwera/codespace/sandbox/GcBiasTroubleShoot/AS_ATCC.bam} is malformed: the BAM file has a read with no stored bases (i.e. it uses '*')

I'm not sure what Picard's position is on reads with no stored bases. Looks like ValidateSamFile is letting it go but CollectGcBiasMetrics chokes on it. I haven't specifically tested whether that is the actual culprit, but it seems likely since the error occurs when counting bases and the tool reports seeing 0b.

@nh13 Would you care to opine? Should ValidateSamFile catch this?

from picard.

yfarjoun avatar yfarjoun commented on August 16, 2024

From the sam spec:

SEQ: segment SEQuence. This field can be a ‘*’ when the sequence is not
stored.

So the sam is valid. CollectGcBiasMetrics should protect against empty
bases.

The issue seems to be here: htsjdk/samtools/util/SequenceUtil.java:336
where, if there are alignment blocks, but no seq string then there will be
indeed an OOB exception.

from picard.

visivas avatar visivas commented on August 16, 2024

I am glad that someone is taking a look at this. Any idea when a fix will be issued?

thank you!

from picard.

 avatar commented on August 16, 2024

There is an exception while processing reads, because of symbol * in SEQ field. SAM/BAM format specification allows this but CollectGcBiasMetrics can not process such situation correctly. We suggest to omit reads with * is SEQ field and log message. Also, we added new unit test to check correctness. The fix for this issue will be included into the pull request regarding the following related issue #342

from picard.

ronlevine avatar ronlevine commented on August 16, 2024

@vdauwera I can take care of this but I do not have permission/access for this repository.

from picard.

yfarjoun avatar yfarjoun commented on August 16, 2024

can you do it with a fork?

On Thu, Sep 29, 2016 at 12:36 PM, Ron Levine [email protected]
wrote:

@vdauwera https://github.com/vdauwera I can take care of this but I do
not have permission/access for this repository.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#138 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACnk0tqoJECA7Cch43N6SMs6JVELyFCtks5qu-kKgaJpZM4DKRWw
.

from picard.

 avatar commented on August 16, 2024

@yfarjoun We have the pull request which already fixes this issue: #646

from picard.

yfarjoun avatar yfarjoun commented on August 16, 2024

should be fixed. thanks @ZLyanov

from picard.

YiweiNiu avatar YiweiNiu commented on August 16, 2024

Hello,

I am using CollectGcBiasMetrics of the latest picard (2.18.27), but I still have this error.

java -XX:ParallelGCThreads=8 -Djava.io.tmpdir=/tmp -jar /software/picard_2.18.27/picard.jar  CollectGcBiasMetrics I=M0.A.005.sorted.marked.filtered.bam O=gc_bias_metrics.txt CHART=gc_bias_metrics.pdf S=summary_metrics.txt R=/RefData/Mus_musculus/Bowtie2Index/mm10.fa VALIDATION_STRINGENCY=SILENT

Exception in thread "main" htsjdk.samtools.SAMException: Exception counting mismatches for read ST-E00494:563:HTKTKCCXY:7:1217:7334:52625 2/2 150b aligned to chr1_GL456210_random:16155-16304.
	at htsjdk.samtools.util.SequenceUtil.countMismatches(SequenceUtil.java:476)
	at htsjdk.samtools.util.SequenceUtil.countMismatches(SequenceUtil.java:452)
	at htsjdk.samtools.util.SequenceUtil.countMismatches(SequenceUtil.java:490)
	at picard.analysis.GcBiasMetricsCollector.addRead(GcBiasMetricsCollector.java:389)
	at picard.analysis.GcBiasMetricsCollector.access$600(GcBiasMetricsCollector.java:48)
	at picard.analysis.GcBiasMetricsCollector$PerUnitGcBiasMetricsCollector.addReadToGcData(GcBiasMetricsCollector.java:221)
	at picard.analysis.GcBiasMetricsCollector$PerUnitGcBiasMetricsCollector.acceptRecord(GcBiasMetricsCollector.java:155)
	at picard.analysis.GcBiasMetricsCollector$PerUnitGcBiasMetricsCollector.acceptRecord(GcBiasMetricsCollector.java:100)
	at picard.metrics.MultiLevelCollector$AllReadsDistributor.acceptRecord(MultiLevelCollector.java:192)
	at picard.metrics.MultiLevelCollector.acceptRecord(MultiLevelCollector.java:315)
	at picard.analysis.CollectGcBiasMetrics.acceptRead(CollectGcBiasMetrics.java:179)
	at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:145)
	at picard.analysis.SinglePassSamProgram.doWork(SinglePassSamProgram.java:84)
	at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:295)
	at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
	at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 16299
	at htsjdk.samtools.util.SequenceUtil.countMismatches(SequenceUtil.java:468)
	... 15 more

This reads looks like this:

samtools view M0.A.005.sorted.marked.filtered.bam | grep 'ST-E00494:563:HTKTKCCXY:7:1217:7334:52625'
ST-E00494:563:HTKTKCCXY:7:1217:7334:52625	99	chr1_GL456210_random	15967	30	150M	=	16155	338	CATACTCAGACGAAGTCATAGAGGCAGAACCAAGGATCAAAATAATGGGGAATGAGATGTGTAATGAAGGAACAGATGGGTTATAGATCAATGGTTTGAGAGAATTGTGTCTGTGGATAGGGAGACAAAAGGAAAGTTATAGTATTTTAA	AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJFJJJJJJJJJAAJJJJFFFJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJAJJJJJJJJJJJJJJJJJFJJF-FJJJ	MD:Z:150	PG:Z:MarkDuplicates	XG:i:0	NM:i:0	XM:i:0	XN:i:0	XO:i:0	AS:i:0	XS:i:-12	YS:i:0	YT:Z:CP
ST-E00494:563:HTKTKCCXY:7:1217:7334:52625	147	chr1_GL456210_random	16155	30	150M	=	15967	-338	TGATAGTATTCTTTTCCCATCACCAACTCATCCCAGATCCTCTCCAAATTCTGTCTCTTGCCTGTATTTGTGTGTCTCTTGCTCTCTCTTAAAAAATAAAAACAAAGCGAAAGAACAAAAAAAATACTCAAACAAGAGAAGCCTCCACCC	JJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFFFAA	MD:Z:150	PG:Z:MarkDuplicates	XG:i:0	NM:i:0	XM:i:0	XN:i:0	XO:i:0	AS:i:0	XS:i:-6	YS:i:0	YT:Z:CP

And with ValidateSamFile module, I got one error "Read groups is empty". I guess this does not matter.

The reference used was downloaded from ftp://ftp.ccb.jhu.edu/pub/data/bowtie2_indexes/mm10.zip.

I don't know how to solve this. Any help or suggestions would be appreciated.

Bests,
Yiwei Niu

from picard.

yfarjoun avatar yfarjoun commented on August 16, 2024

Hello @YiweiNiu,

Thanks for your interest.

Do maximize the quality and efficiency of the response to your questions please:

  1. Do not reuse closed tickets for new issues,
  2. Do post user problems like this in the GATK forum https://gatkforums.broadinstitute.org/gatk/categories/gatk-support-forum

from picard.

YiweiNiu avatar YiweiNiu commented on August 16, 2024

Sorry. I did not know that.

from picard.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.