Comments (11)
The problem: my sequence dictionary lines are space-delimited, and they must be tab-delimited, i.e. \t
.
I apologize for the confusion, and for wasting your time. I recommend that you document the format of the interval list file, or refer the user to an example of a properly-formatted file. It would be useful to point them to a ready-to-use file like this one.
Strangely, the current documentation for CollectRnaSeqMetrics
refers the user to the IntervalList javadoc. The comment at the top of the page does not mention that the SAM-style header must be tab-delimited. In my opinion, developers -- not users -- should be referred to javadoc documentation.
Wrong (each item is separated by a space ' '
):
@SQ SN:chrM LN:16571
Right (each item is separated by a tab '\t'
):
@SQ SN:chrM LN:16571
from picard.
By the way, you can see the full ribosomal intervals file and my script for creating it here: https://gist.github.com/slowkow/b11c28796508f03cdf4b
It would be nice to maintain a public repository of commonly used files such as this one.
from picard.
The current implementation of Picard throws a new exception, discarding the information provided by htsjdk describing the difference between the two sequence dictionaries.
The new exception hides information from the htsjdk exception:
The htsjdk exception includes information: https://github.com/samtools/htsjdk/blob/a30bc12df2aeb3b4312a9c236c6a639025d5b596/src/java/htsjdk/samtools/util/SequenceUtil.java#L100
from picard.
Can you please post the output of the following:
$ cat hg19.genome.dict
$ samtools view -H file.bam
Thanks.
Yossi.
On Wed, Dec 10, 2014 at 4:57 PM, Kamil Slowikowski <[email protected]
wrote:
I'm running:
java -Xmx4g -jar picard.jar CollectRnaSeqMetrics
REFERENCE_SEQUENCE=hg19.genome.fa
REF_FLAT=gencode.v19.annotation.refFlat
RIBOSOMAL_INTERVALS=gencode.v19.rRNA.interval_list
STRAND_SPECIFICITY=NONE
INPUT=file.bam
ASSUME_SORTED=true
OUTPUT=file.rnaseq_metrics
CHART_OUTPUT=file.rnaseq.pdfI get this error:
Exception in thread "main" picard.PicardException: Sequence dictionaries differ in file.bam and gencode.v19.rRNA.interval_list
However, I copied the sequence dictionary from the BAM file. So, the
sequence dictionaries are identical.Here's the sequence dictionary along with the first 5 lines of the
ribosomal intervals file:@hd VN:1.4 SO:coordinate
@sq SN:chrM LN:16571
@sq SN:chr1 LN:249250621
@sq SN:chr2 LN:243199373
@sq SN:chr3 LN:198022430
@sq SN:chr4 LN:191154276
@sq SN:chr5 LN:180915260
@sq SN:chr6 LN:171115067
@sq SN:chr7 LN:159138663
@sq SN:chr8 LN:146364022
@sq SN:chr9 LN:141213431
@sq SN:chr10 LN:135534747
@sq SN:chr11 LN:135006516
@sq SN:chr12 LN:133851895
@sq SN:chr13 LN:115169878
@sq SN:chr14 LN:107349540
@sq SN:chr15 LN:102531392
@sq SN:chr16 LN:90354753
@sq SN:chr17 LN:81195210
@sq SN:chr18 LN:78077248
@sq SN:chr19 LN:59128983
@sq SN:chr20 LN:63025520
@sq SN:chr21 LN:48129895
@sq SN:chr22 LN:51304566
@sq SN:chrX LN:155270560
@sq SN:chrY LN:59373566
chr1 9497728 9497837 - ENST00000517147.1
chr1 13949679 13949779 - ENST00000411020.1
chr1 34578550 34578664 + ENST00000364278.1
chr1 37730278 37730387 - ENST00000516559.1
chr1 39619836 39619968 - ENST00000410446.1—
Reply to this email directly or view it on GitHub
#126.
from picard.
Hi Yossi, I believe I already posted it. See above.
You can see my ribosomal intervals file, in full, at the gist link. The header of my bam file is already posted in the first comment of this thread, except for the single line that specifies the program which created the bam file. Here it is a second time, with that line included:
$ samtools view -H file.bam
@HD VN:1.4 SO:coordinate
@SQ SN:chrM LN:16571
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10 LN:135534747
@SQ SN:chr11 LN:135006516
@SQ SN:chr12 LN:133851895
@SQ SN:chr13 LN:115169878
@SQ SN:chr14 LN:107349540
@SQ SN:chr15 LN:102531392
@SQ SN:chr16 LN:90354753
@SQ SN:chr17 LN:81195210
@SQ SN:chr18 LN:78077248
@SQ SN:chr19 LN:59128983
@SQ SN:chr20 LN:63025520
@SQ SN:chr21 LN:48129895
@SQ SN:chr22 LN:51304566
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566
@PG ID:subread PN:subread VN:1.4.5-p1
When you say hg19.genome.dict
, are you referring to something else that is not the ribosomal intervals file? My understanding is that the CollectRnaSeqMetrics
tool requires a RIBOSOMAL_INTERVALS
file formatted as I have shown in my previous comment.
from picard.
@slowkow I only see the sequence dictionary from the ribosmal interval list, not the sequence dictionary associated with the FASTA file.
from picard.
@nh13 There is no mention of a sequence dictionary associated with the FASTA file in the error, and also no mention in the documentation for CollectRnaSeqMetrics
.
Here are the chromosome names in my FASTA file:
grep "^>" hg19.genome.fa
>chrM
>chr1
>chr2
>chr3
>chr4
>chr5
>chr6
>chr7
>chr8
>chr9
>chr10
>chr11
>chr12
>chr13
>chr14
>chr15
>chr16
>chr17
>chr18
>chr19
>chr20
>chr21
>chr22
>chrX
>chrY
from picard.
Do you have a file hg19.genome.dict
or hg19.genome.fa.dict
? If not, could you create one and post the sequence dictionary created (use CreateSequenceDictionary
).
Regarding documentation and error messages, we would be grateful if you could submit a pull request so we can review (@vdauwera).
from picard.
I created the file as you suggested.
I also tried replacing the dictionary in my ribosomal intervals file with the generated dictionary, and I receive the same error as in my first post..
cat hg19.genome.fa.dict
@HD VN:1.4 SO:unsorted
@SQ SN:chrM LN:16571 UR:file:/medpop/rnaseq/genome/hg19.genome.fa M5:d2ed829b8a1628d16cbeee88e88e39eb
@SQ SN:chr1 LN:249250621 UR:file:/medpop/rnaseq/genome/hg19.genome.fa M5:1b22b98cdeb4a9304cb5d48026a85128
@SQ SN:chr2 LN:243199373 UR:file:/medpop/rnaseq/genome/hg19.genome.fa M5:a0d9851da00400dec1098a9255ac712e
@SQ SN:chr3 LN:198022430 UR:file:/medpop/rnaseq/genome/hg19.genome.fa M5:641e4338fa8d52a5b781bd2a2c08d3c3
@SQ SN:chr4 LN:191154276 UR:file:/medpop/rnaseq/genome/hg19.genome.fa M5:23dccd106897542ad87d2765d28a19a1
@SQ SN:chr5 LN:180915260 UR:file:/medpop/rnaseq/genome/hg19.genome.fa M5:0740173db9ffd264d728f32784845cd7
@SQ SN:chr6 LN:171115067 UR:file:/medpop/rnaseq/genome/hg19.genome.fa M5:1d3a93a248d92a729ee764823acbbc6b
@SQ SN:chr7 LN:159138663 UR:file:/medpop/rnaseq/genome/hg19.genome.fa M5:618366e953d6aaad97dbe4777c29375e
@SQ SN:chr8 LN:146364022 UR:file:/medpop/rnaseq/genome/hg19.genome.fa M5:96f514a9929e410c6651697bded59aec
@SQ SN:chr9 LN:141213431 UR:file:/medpop/rnaseq/genome/hg19.genome.fa M5:3e273117f15e0a400f01055d9f393768
@SQ SN:chr10 LN:135534747 UR:file:/medpop/rnaseq/genome/hg19.genome.fa M5:988c28e000e84c26d552359af1ea2e1d
@SQ SN:chr11 LN:135006516 UR:file:/medpop/rnaseq/genome/hg19.genome.fa M5:98c59049a2df285c76ffb1c6db8f8b96
@SQ SN:chr12 LN:133851895 UR:file:/medpop/rnaseq/genome/hg19.genome.fa M5:51851ac0e1a115847ad36449b0015864
@SQ SN:chr13 LN:115169878 UR:file:/medpop/rnaseq/genome/hg19.genome.fa M5:283f8d7892baa81b510a015719ca7b0b
@SQ SN:chr14 LN:107349540 UR:file:/medpop/rnaseq/genome/hg19.genome.fa M5:98f3cae32b2a2e9524bc19813927542e
@SQ SN:chr15 LN:102531392 UR:file:/medpop/rnaseq/genome/hg19.genome.fa M5:e5645a794a8238215b2cd77acb95a078
@SQ SN:chr16 LN:90354753 UR:file:/medpop/rnaseq/genome/hg19.genome.fa M5:fc9b1a7b42b97a864f56b348b06095e6
@SQ SN:chr17 LN:81195210 UR:file:/medpop/rnaseq/genome/hg19.genome.fa M5:351f64d4f4f9ddd45b35336ad97aa6de
@SQ SN:chr18 LN:78077248 UR:file:/medpop/rnaseq/genome/hg19.genome.fa M5:b15d4b2d29dde9d3e4f93d1d0f2cbc9c
@SQ SN:chr19 LN:59128983 UR:file:/medpop/rnaseq/genome/hg19.genome.fa M5:1aacd71f30db8e561810913e0b72636d
@SQ SN:chr20 LN:63025520 UR:file:/medpop/rnaseq/genome/hg19.genome.fa M5:0dec9660ec1efaaf33281c0d5ea2560f
@SQ SN:chr21 LN:48129895 UR:file:/medpop/rnaseq/genome/hg19.genome.fa M5:2979a6085bfe28e3ad6f552f361ed74d
@SQ SN:chr22 LN:51304566 UR:file:/medpop/rnaseq/genome/hg19.genome.fa M5:a718acaa6135fdca8357d5bfe94211dd
@SQ SN:chrX LN:155270560 UR:file:/medpop/rnaseq/genome/hg19.genome.fa M5:7e0e2e580297b7764e31dbc80c2540dd
@SQ SN:chrY LN:59373566 UR:file:/medpop/rnaseq/genome/hg19.genome.fa M5:1e86411d73e6f00a10590f976be01623
from picard.
Ah, it says the interval list and file.bam
dictionaries are different. My apologies, could you post the sequence dictionary for the header in file.bam
(ex. samtools view -H file.bam
).
from picard.
from picard.
Related Issues (20)
- MarkDuplicates: Instead of adding the read group of a read to the key, picard always adds "RG"
- java.lang.NegativeArraySizeException HOT 2
- genotypeConcordance irregular behavior in counting after applying filters HOT 3
- MarkDuplicates with single end data does not write the full report HOT 3
- Picard Build Failure HOT 6
- java environment not detected when using option OUTPUT in BuilBamIndex HOT 1
- Getting Segfault when running ReorderSam via GATK HOT 1
- How to control the number of threads HOT 1
- Edge case in LiftoverVcf causes cryptic string exception HOT 8
- TAGGING_POLICY=OpticalOnly tags everything HOT 4
- Issue Running Picard MarkDuplicates: File Compression Error with Snappy HOT 1
- about MarkDuplicates HOT 1
- UnMarkDuplicate HOT 1
- Running picard in parallel issue HOT 3
- `--USE_END_IN_UNPAIRED_READS` and `--USE_UNPAIRED_CLIPPED_END` options for `picard MarkDuplicates` have no effect HOT 9
- MarkDuplicates SAM validation error HOT 6
- Optical duplicates HOT 4
- ValidateSamFile wrong NM tag computation HOT 3
- CollectWgsMetrics error, out of bound of the index? HOT 8
- Lost output/information during RenameSampleInVcf HOT 11
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from picard.