Code Monkey home page Code Monkey logo

Comments (11)

slowkow avatar slowkow commented on August 16, 2024 5

The problem: my sequence dictionary lines are space-delimited, and they must be tab-delimited, i.e. \t.

I apologize for the confusion, and for wasting your time. I recommend that you document the format of the interval list file, or refer the user to an example of a properly-formatted file. It would be useful to point them to a ready-to-use file like this one.

Strangely, the current documentation for CollectRnaSeqMetrics refers the user to the IntervalList javadoc. The comment at the top of the page does not mention that the SAM-style header must be tab-delimited. In my opinion, developers -- not users -- should be referred to javadoc documentation.

Wrong (each item is separated by a space ' '):

@SQ SN:chrM LN:16571

Right (each item is separated by a tab '\t'):

@SQ SN:chrM LN:16571

from picard.

slowkow avatar slowkow commented on August 16, 2024

By the way, you can see the full ribosomal intervals file and my script for creating it here: https://gist.github.com/slowkow/b11c28796508f03cdf4b

It would be nice to maintain a public repository of commonly used files such as this one.

from picard.

slowkow avatar slowkow commented on August 16, 2024

The current implementation of Picard throws a new exception, discarding the information provided by htsjdk describing the difference between the two sequence dictionaries.

The new exception hides information from the htsjdk exception:

throw new PicardException("Sequence dictionaries differ in " + samFile.getAbsolutePath() + " and " + ribosomalIntervalsFile.getAbsolutePath(),

The htsjdk exception includes information: https://github.com/samtools/htsjdk/blob/a30bc12df2aeb3b4312a9c236c6a639025d5b596/src/java/htsjdk/samtools/util/SequenceUtil.java#L100

from picard.

yfarjoun avatar yfarjoun commented on August 16, 2024

Can you please post the output of the following:

$ cat hg19.genome.dict
$ samtools view -H file.bam

Thanks.

Yossi.

On Wed, Dec 10, 2014 at 4:57 PM, Kamil Slowikowski <[email protected]

wrote:

I'm running:

java -Xmx4g -jar picard.jar CollectRnaSeqMetrics
REFERENCE_SEQUENCE=hg19.genome.fa
REF_FLAT=gencode.v19.annotation.refFlat
RIBOSOMAL_INTERVALS=gencode.v19.rRNA.interval_list
STRAND_SPECIFICITY=NONE
INPUT=file.bam
ASSUME_SORTED=true
OUTPUT=file.rnaseq_metrics
CHART_OUTPUT=file.rnaseq.pdf

I get this error:

Exception in thread "main" picard.PicardException: Sequence dictionaries differ in file.bam and gencode.v19.rRNA.interval_list

However, I copied the sequence dictionary from the BAM file. So, the
sequence dictionaries are identical.

Here's the sequence dictionary along with the first 5 lines of the
ribosomal intervals file:

@hd VN:1.4 SO:coordinate
@sq SN:chrM LN:16571
@sq SN:chr1 LN:249250621
@sq SN:chr2 LN:243199373
@sq SN:chr3 LN:198022430
@sq SN:chr4 LN:191154276
@sq SN:chr5 LN:180915260
@sq SN:chr6 LN:171115067
@sq SN:chr7 LN:159138663
@sq SN:chr8 LN:146364022
@sq SN:chr9 LN:141213431
@sq SN:chr10 LN:135534747
@sq SN:chr11 LN:135006516
@sq SN:chr12 LN:133851895
@sq SN:chr13 LN:115169878
@sq SN:chr14 LN:107349540
@sq SN:chr15 LN:102531392
@sq SN:chr16 LN:90354753
@sq SN:chr17 LN:81195210
@sq SN:chr18 LN:78077248
@sq SN:chr19 LN:59128983
@sq SN:chr20 LN:63025520
@sq SN:chr21 LN:48129895
@sq SN:chr22 LN:51304566
@sq SN:chrX LN:155270560
@sq SN:chrY LN:59373566
chr1 9497728 9497837 - ENST00000517147.1
chr1 13949679 13949779 - ENST00000411020.1
chr1 34578550 34578664 + ENST00000364278.1
chr1 37730278 37730387 - ENST00000516559.1
chr1 39619836 39619968 - ENST00000410446.1


Reply to this email directly or view it on GitHub
#126.

from picard.

slowkow avatar slowkow commented on August 16, 2024

Hi Yossi, I believe I already posted it. See above.

You can see my ribosomal intervals file, in full, at the gist link. The header of my bam file is already posted in the first comment of this thread, except for the single line that specifies the program which created the bam file. Here it is a second time, with that line included:

$ samtools view -H file.bam
@HD     VN:1.4  SO:coordinate
@SQ     SN:chrM LN:16571
@SQ     SN:chr1 LN:249250621
@SQ     SN:chr2 LN:243199373
@SQ     SN:chr3 LN:198022430
@SQ     SN:chr4 LN:191154276
@SQ     SN:chr5 LN:180915260
@SQ     SN:chr6 LN:171115067
@SQ     SN:chr7 LN:159138663
@SQ     SN:chr8 LN:146364022
@SQ     SN:chr9 LN:141213431
@SQ     SN:chr10        LN:135534747
@SQ     SN:chr11        LN:135006516
@SQ     SN:chr12        LN:133851895
@SQ     SN:chr13        LN:115169878
@SQ     SN:chr14        LN:107349540
@SQ     SN:chr15        LN:102531392
@SQ     SN:chr16        LN:90354753
@SQ     SN:chr17        LN:81195210
@SQ     SN:chr18        LN:78077248
@SQ     SN:chr19        LN:59128983
@SQ     SN:chr20        LN:63025520
@SQ     SN:chr21        LN:48129895
@SQ     SN:chr22        LN:51304566
@SQ     SN:chrX LN:155270560
@SQ     SN:chrY LN:59373566
@PG     ID:subread      PN:subread      VN:1.4.5-p1

When you say hg19.genome.dict, are you referring to something else that is not the ribosomal intervals file? My understanding is that the CollectRnaSeqMetrics tool requires a RIBOSOMAL_INTERVALS file formatted as I have shown in my previous comment.

from picard.

nh13 avatar nh13 commented on August 16, 2024

@slowkow I only see the sequence dictionary from the ribosmal interval list, not the sequence dictionary associated with the FASTA file.

from picard.

slowkow avatar slowkow commented on August 16, 2024

@nh13 There is no mention of a sequence dictionary associated with the FASTA file in the error, and also no mention in the documentation for CollectRnaSeqMetrics.

Here are the chromosome names in my FASTA file:

grep "^>" hg19.genome.fa
>chrM
>chr1
>chr2
>chr3
>chr4
>chr5
>chr6
>chr7
>chr8
>chr9
>chr10
>chr11
>chr12
>chr13
>chr14
>chr15
>chr16
>chr17
>chr18
>chr19
>chr20
>chr21
>chr22
>chrX
>chrY

from picard.

nh13 avatar nh13 commented on August 16, 2024

Do you have a file hg19.genome.dict or hg19.genome.fa.dict? If not, could you create one and post the sequence dictionary created (use CreateSequenceDictionary).

Regarding documentation and error messages, we would be grateful if you could submit a pull request so we can review (@vdauwera).

from picard.

slowkow avatar slowkow commented on August 16, 2024

I created the file as you suggested.

I also tried replacing the dictionary in my ribosomal intervals file with the generated dictionary, and I receive the same error as in my first post..

cat hg19.genome.fa.dict
@HD     VN:1.4  SO:unsorted
@SQ     SN:chrM LN:16571        UR:file:/medpop/rnaseq/genome/hg19.genome.fa    M5:d2ed829b8a1628d16cbeee88e88e39eb
@SQ     SN:chr1 LN:249250621    UR:file:/medpop/rnaseq/genome/hg19.genome.fa    M5:1b22b98cdeb4a9304cb5d48026a85128
@SQ     SN:chr2 LN:243199373    UR:file:/medpop/rnaseq/genome/hg19.genome.fa    M5:a0d9851da00400dec1098a9255ac712e
@SQ     SN:chr3 LN:198022430    UR:file:/medpop/rnaseq/genome/hg19.genome.fa    M5:641e4338fa8d52a5b781bd2a2c08d3c3
@SQ     SN:chr4 LN:191154276    UR:file:/medpop/rnaseq/genome/hg19.genome.fa    M5:23dccd106897542ad87d2765d28a19a1
@SQ     SN:chr5 LN:180915260    UR:file:/medpop/rnaseq/genome/hg19.genome.fa    M5:0740173db9ffd264d728f32784845cd7
@SQ     SN:chr6 LN:171115067    UR:file:/medpop/rnaseq/genome/hg19.genome.fa    M5:1d3a93a248d92a729ee764823acbbc6b
@SQ     SN:chr7 LN:159138663    UR:file:/medpop/rnaseq/genome/hg19.genome.fa    M5:618366e953d6aaad97dbe4777c29375e
@SQ     SN:chr8 LN:146364022    UR:file:/medpop/rnaseq/genome/hg19.genome.fa    M5:96f514a9929e410c6651697bded59aec
@SQ     SN:chr9 LN:141213431    UR:file:/medpop/rnaseq/genome/hg19.genome.fa    M5:3e273117f15e0a400f01055d9f393768
@SQ     SN:chr10        LN:135534747    UR:file:/medpop/rnaseq/genome/hg19.genome.fa    M5:988c28e000e84c26d552359af1ea2e1d
@SQ     SN:chr11        LN:135006516    UR:file:/medpop/rnaseq/genome/hg19.genome.fa    M5:98c59049a2df285c76ffb1c6db8f8b96
@SQ     SN:chr12        LN:133851895    UR:file:/medpop/rnaseq/genome/hg19.genome.fa    M5:51851ac0e1a115847ad36449b0015864
@SQ     SN:chr13        LN:115169878    UR:file:/medpop/rnaseq/genome/hg19.genome.fa    M5:283f8d7892baa81b510a015719ca7b0b
@SQ     SN:chr14        LN:107349540    UR:file:/medpop/rnaseq/genome/hg19.genome.fa    M5:98f3cae32b2a2e9524bc19813927542e
@SQ     SN:chr15        LN:102531392    UR:file:/medpop/rnaseq/genome/hg19.genome.fa    M5:e5645a794a8238215b2cd77acb95a078
@SQ     SN:chr16        LN:90354753     UR:file:/medpop/rnaseq/genome/hg19.genome.fa    M5:fc9b1a7b42b97a864f56b348b06095e6
@SQ     SN:chr17        LN:81195210     UR:file:/medpop/rnaseq/genome/hg19.genome.fa    M5:351f64d4f4f9ddd45b35336ad97aa6de
@SQ     SN:chr18        LN:78077248     UR:file:/medpop/rnaseq/genome/hg19.genome.fa    M5:b15d4b2d29dde9d3e4f93d1d0f2cbc9c
@SQ     SN:chr19        LN:59128983     UR:file:/medpop/rnaseq/genome/hg19.genome.fa    M5:1aacd71f30db8e561810913e0b72636d
@SQ     SN:chr20        LN:63025520     UR:file:/medpop/rnaseq/genome/hg19.genome.fa    M5:0dec9660ec1efaaf33281c0d5ea2560f
@SQ     SN:chr21        LN:48129895     UR:file:/medpop/rnaseq/genome/hg19.genome.fa    M5:2979a6085bfe28e3ad6f552f361ed74d
@SQ     SN:chr22        LN:51304566     UR:file:/medpop/rnaseq/genome/hg19.genome.fa    M5:a718acaa6135fdca8357d5bfe94211dd
@SQ     SN:chrX LN:155270560    UR:file:/medpop/rnaseq/genome/hg19.genome.fa    M5:7e0e2e580297b7764e31dbc80c2540dd
@SQ     SN:chrY LN:59373566     UR:file:/medpop/rnaseq/genome/hg19.genome.fa    M5:1e86411d73e6f00a10590f976be01623

from picard.

nh13 avatar nh13 commented on August 16, 2024

Ah, it says the interval list and file.bam dictionaries are different. My apologies, could you post the sequence dictionary for the header in file.bam (ex. samtools view -H file.bam).

from picard.

slowkow avatar slowkow commented on August 16, 2024

#126 (comment)

from picard.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.