Code Monkey home page Code Monkey logo

picard's Introduction

User Support:

For user questions please look for answers and ask first in the GATK forum.


Build Status License: MIT

A set of Java command line tools for manipulating high-throughput sequencing (HTS) data and formats.

Picard is implemented using the HTSJDK Java library HTSJDK to support accessing file formats that are commonly used for high-throughput sequencing data such as SAM and VCF.

As of version 3.0, Picard requires Java 1.17.

Building Picard

  • First, clone the repo:
    git clone https://github.com/broadinstitute/picard.git
    cd picard/
  • Picard is now built using gradle. A wrapper script (gradlew) is included which will download the appropriate version of gradle on the first invocation.

  • To build a fully-packaged, runnable Picard jar with all dependencies included, run:

    ./gradlew shadowJar
  • The resulting jar will be in build/libs. To run it, the command is:
    java -jar build/libs/picard.jar
    
    or
    
    java -jar build/libs/picard-<VERSION>-all.jar 
  • To build a jar containing only Picard classes (without its dependencies), run:
    ./gradlew jar
  • To clean the build directory, run:
    ./gradlew clean

Running Tests

  • To run all tests, the command is:
    ./gradlew test
  • To run a specific test, the command is:
    ./gradlew legacyTest --tests "*TestClassName*"
    
    or
    
    ./gradlew barclayTest --tests "*TestClassName*"

Running legacyTest uses the legacy commandline parser while barclayTest uses the new parser.

Changing the released version of HTSJDK that Picard depends on

To switch Picard's HTSJDK dependency to a different released version:

  • Open build.gradle
  • Edit VERSION in the following line to be a different released version of HTSJDK. HTSJDK releases are listed here
    final htsjdkVersion = System.getProperty('htsjdk.version', 'VERSION')`
  • Open a pull request with this change

Building Picard with a Custom Version of HTSJDK

During development in Picard, it is sometimes necessary to build locally against an unreleased version or branch of HTSJDK.

  • To build against an unreleased version of HTSJDK's master branch:

    • Go to the Broad artifactory, where continuous snapshots of HTSJDK's master branch are published, and select the version you want to use. For example, 2.5.1-9-g5740ca1-SNAPSHOT. You can search by tag or short git commit hash.
    • In your Picard clone, run ./gradlew shadowJar -Dhtsjdk.version=VERSION, where VERSION is the version of the HTSJDK master branch snapshot you want to use.
  • To build against a version of HTSJDK that has not yet been merged into HTSJDK's master branch:

    • Clone HTSJDK, and in your clone check out the tag or branch you want to build Picard with.
    • Run ./gradlew install printVersion in your htsjdk clone to install that version to your local maven repository. Take note of the version number that gets printed at the end.
    • Switch back to your Picard clone, and run ./gradlew shadowJar -Dhtsjdk.version=VERSION, where VERSION is the version of HTSJDK you installed to your local maven repository.

Releasing Picard

Full instructions on how to create a new release of Picard are here

Path providers

Picard has limited support for reading from Path providers. Currently only google's api is supported, and only a few tools support this. To run with this support you need to compile the cloudJar target with gradle:

./gradlew cloudJar

then run picard as follows:

java -jar build/lib/picardcloud.jar <Picard arguments starting from program>

For example:

java -jar build/lib/picardcloud.jar CrosscheckFingerprints \
   I=gs://sample1.vcf \
   I=gs://sample2.vcf \
   CROSSCHECK_BY=FILE \
   H=Haplotype_db.txt \
   O=crosscheck.out

Alternatively, you can run the tool via the GATK which bundles the Google-Cloud jar, and should thus "Just Work".


Citing

Please cite this repository when using Picard tools for your publications.

“Picard Toolkit.” 2019. Broad Institute, GitHub Repository. https://broadinstitute.github.io/picard/; Broad Institute

@misc{Picard2019toolkit,
  title = {Picard toolkit},
  year = {2019},
  publisher = {Broad Institute},
  journal = {Broad Institute, GitHub repository},
  howpublished = {\url{https://broadinstitute.github.io/picard/}}
}

Identifiers from software registries are increasingly accepted by journals, as in (biotools:picard_tools) or (RRID:SCR_006525).

Picard is migrating to semantic versioning. We will eventually adhere to it strictly and bump our major version whenever there are breaking changes to our API, but until we more clearly define what constitutes our official API, clients should assume that every release potentially contains at least minor changes to public methods.

Please see the Picard Documentation for more information.

picard's People

Contributors

akiezun avatar alecw avatar bw2 avatar cmnbroad avatar cristynkells avatar danxmoran avatar droazen avatar eitanbanks avatar fleharty avatar gbggrant avatar geoffjentry avatar jacarey avatar jamesemery avatar jmthibault79 avatar jrobinso avatar kachulis avatar kbergin avatar ktibbett avatar lbergelson avatar lindenb avatar magicdgs avatar mattsooknah avatar mccowan avatar meganshand avatar nh13 avatar ramsey-watchmaker avatar ronlevine avatar tfenne avatar tlangs avatar yfarjoun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

picard's Issues

Picard now has a HuBoard

I've set up a HuBoard to track Picard issues at https://huboard.com/broadinstitute/picard/

For all those issues that fall within my scope, we'll be using labels to track status and progress of issues and pull requests. For the rest, feel free to use them or not as you prefer.

Note that this has labeled everything "Backlog" by default, including anything that may be in progress. Feel free to ignore it of relabel your work as you see fit.

Missing metric definitions page

Hi,

I can't find a working copy of the metric definitions page. The link in sourceforge[1] is broken. At least it is for me using Chrome, Firefox and Safari in OS X 10.9.3. I also tried to trace the source of this page to this github repository and I couldn't find it. It sounds like it should be in 'src/www/inc', but it is not there. Also, 'git log --all -- src/www/inc/picard-metric-definitions.shtml' came out empty.

[1]http://picard.sourceforge.net/picard-metric-definitions.shtml

Thanks,
Carlos

Add .fna to the list of recognized extensions

Picard .dict file creation doesn’t recognize the FASTA file extension .fna. This needs to be added to the list of recognized extensions (check for .fasta, .fa, .fsa, or .fna).

CollectTargetedPcrMetrics concatenates overlapping intervals

Hello,
I’ve noticed that Picard’s CollectTargetedPcrMetrics will concatenate overlapping intervals in the PER_TARGET_COVERAGE output. For example:
chr3 93692457 93692581 + 1_PROS1_4
chr3 93692546 93692664 + 1_PROS1_5

becomes:
chr3 93692457 93692664 208 1_PROS1_4|1_PROS1_5

Is there a way of switching this behavior off? The reason being that I’m trying to collate other statistics to that file and this causes the rows not to match.

Thank you,
Stathis

Feature request: Collect* only for specified chromosomes/contigs

Apologies if this isn't the right forum to make a feature request... I couldn't find anyplace better.

It would be immensely useful to me if the picard Collect* commands (especially CollectWgsMetrics) could be run on a specific set of chroms/contigs instead of the whole genome. Yes, it is possible to make a new bam file (and reference) containing only those chroms/contigs, but that is a hell of a lot of unnecessary IO.

Implementation wise, I was thinking of a command-line argument (lets call it CHROMS) where you give a list of chroms/contigs to process. The default (null) would process all the chroms/contigs (ie. the whole genome).

To generate stats for each chrom/contig (which is something I suspect many people would like to do), you could run the command multiple times, specifying a single chrom/contig each time. Maybe not ideal, but it would be trivial to implement.
Alternatively, if the framework allows it, perhaps the CHROMS argument could be given multiple times, and the stats for each one would be output separately (and perhaps overall stats too).

PS: We are working on non-model species (and strains) with rather complex genomes and relatively poor assemblies. The full set of contigs doubtlessly contains cruft such as large repeats, chimeras, and quite possibly contamination. For reliable stats, we normally only want to look at a few selected contigs which are cleaner and better characterized.
Additionally, we (and some other groups) occasionally map to multiple organisms at once (something I'm calling "competitive mapping"). Obviously, looking at metrics across the whole "genome" in that case makes no sense.

Sequence dictionaries differ

I'm running:

java -Xmx4g -jar picard.jar CollectRnaSeqMetrics \
REFERENCE_SEQUENCE=hg19.genome.fa \
REF_FLAT=gencode.v19.annotation.refFlat \
RIBOSOMAL_INTERVALS=gencode.v19.rRNA.interval_list \
STRAND_SPECIFICITY=NONE \
INPUT=file.bam \
ASSUME_SORTED=true \
OUTPUT=file.rnaseq_metrics \
CHART_OUTPUT=file.rnaseq.pdf

I get this error:

Exception in thread "main" picard.PicardException: Sequence dictionaries differ in file.bam and gencode.v19.rRNA.interval_list

However, I copied the sequence dictionary from the BAM file. So, the sequence dictionaries are identical.

Here's the sequence dictionary along with the first 5 lines of the ribosomal intervals file:

@HD     VN:1.4  SO:coordinate
@SQ     SN:chrM LN:16571
@SQ     SN:chr1 LN:249250621
@SQ     SN:chr2 LN:243199373
@SQ     SN:chr3 LN:198022430
@SQ     SN:chr4 LN:191154276
@SQ     SN:chr5 LN:180915260
@SQ     SN:chr6 LN:171115067
@SQ     SN:chr7 LN:159138663
@SQ     SN:chr8 LN:146364022
@SQ     SN:chr9 LN:141213431
@SQ     SN:chr10        LN:135534747
@SQ     SN:chr11        LN:135006516
@SQ     SN:chr12        LN:133851895
@SQ     SN:chr13        LN:115169878
@SQ     SN:chr14        LN:107349540
@SQ     SN:chr15        LN:102531392
@SQ     SN:chr16        LN:90354753
@SQ     SN:chr17        LN:81195210
@SQ     SN:chr18        LN:78077248
@SQ     SN:chr19        LN:59128983
@SQ     SN:chr20        LN:63025520
@SQ     SN:chr21        LN:48129895
@SQ     SN:chr22        LN:51304566
@SQ     SN:chrX LN:155270560
@SQ     SN:chrY LN:59373566
chr1    9497728 9497837 -   ENST00000517147.1
chr1    13949679    13949779    -   ENST00000411020.1
chr1    34578550    34578664    +   ENST00000364278.1
chr1    37730278    37730387    -   ENST00000516559.1
chr1    39619836    39619968    -   ENST00000410446.1

Update the home page

Picard's source code is now hosted at Github, but the old Subversion repository is still referenced on http://picard.sourceforge.net/ .

Also, it may be a good idea to either deactivate the subversion repository or to remove all files within it and commit a single README file that points to the new Git repository. Makes it easier for people who just run svn update regularly.

CompareSAMs sometimes reports a file as being different from itself

Suppose you have a coordinate-sorted file foo.sam with multiple entries having the same read name and start coordinate (for whatever reason). Calling CompareSAMs foo.sam foo.sam will tell you that foo.sam differs from itself!

This is because the code currently assumes that we only have one entry for each read name at a given coordinate. Moreover, it traverses the "left" and "right" files asymmetrically, so that duplicate entries will get respected in one file but not the other.

Picard markduplicates and supplementary alignments

From: https://sourceforge.net/p/samtools/mailman/message/32800811/

Hi,

I am having the "Value was put into PairInfoMap more than once" error from MarkDuplicates, and it seems it's caused by chimeric reads/supplementary alignment produced by bwa mem:

Error message:
Exception in thread "main" htsjdk.samtools.SAMException: Value was put into PairInfoMap more than once. 3: 1:HW-ST997:617:HADL0ADXX:1:1114:15556:79712

Reads causing problem:
HW-ST997:617:HADL0ADXX:1:1114:15556:79712 161 chr10 124174955 0 45S56M chr13 91710177 0 TATCTATATAAAAATACATACACAGACAGACAGACAGAAAGAGAGACAGACAGACAGACAGACAGACAGACAGACGGACGGACGGACGGACGGACGGACGG CCCFFFFFHHHHHJIJJJJJJJJJJJJJJJJJIIJJIIJJIIJIJJJJIIJJJJJIIJJIJJIIIIJHHFFHFFFDDDDBBDDDDBDDDDDDDDDDDDDDB NM:i:1 AS:i:51 XS:i:51 SA:Z:chr13,91710177,-,43S31M27S,0,0; MQ:i:0 RG:Z:1
HW-ST997:617:HADL0ADXX:1:1114:15556:79712 81 chr10 124174955 0 38S63M chr13 91710177 0 ATAAAAATACATACACAGACAGACAGACAGAAAGAGAGACAGACAGACAGACAGACAGACAGACAGACGGACGGACGGACGGACGGACGGACGGACGGACG CDDEEDDDDDEEDDDDDDDDDDDDDDEDDDCDCDDECCEDEEFFFFFFFHHHHHHFJJIGIIJJJJJIGIGJJJIIJIFEGEHGJJJJGHHHHDDFFFCCC NM:i:3 AS:i:52 XS:i:51 SA:Z:chr13,91710177,+,50S31M20S,0,0; MQ:i:0 RG:Z:1
HW-ST997:617:HADL0ADXX:1:1114:15556:79712 2145 chr13 91710177 0 50H31M20H chr10 124174955 0 TCTGTCTGTCTGTCTCTCTTTCTGTCTGTCT HHFFFFFFFEEDECCEDDCDCDDDEDDDDDD NM:i:0 AS:i:31 XS:i:29 SA:Z:chr10,124174955,-,38S63M,0,3; MQ:i:0 RG:Z:1
HW-ST997:617:HADL0ADXX:1:1114:15556:79712 2193 chr13 91710177 0 43H31M27H chr10 124174955 0 TCTGTCTGTCTGTCTCTCTTTCTGTCTGTCT JIIJJJJJIIJJJJIJIIJJIIJJIIJJJJJ NM:i:0 AS:i:31 XS:i:29 SA:Z:chr10,124174955,+,45S56M,0,1; MQ:i:0 RG:Z:1

The version of picard I am using is version 1.119. I think picard started supporting the 0x800 flag/supplementary alignment since version 1.96, but perhaps not all modules handle those reads correctly?

Is there a work-around of this issue at the moment?

Many thanks,
Ni

Minor clone-hts issue

This is probably an easy issue to fix, but this did not used to occur last week. Below is what I am experiencing now:

$ git clone https://github.com/broadinstitute/picard
Initialized empty Git repository in /home/pgrosu/me/gg_gatk/gatk-tools-java/picard-tst/picard/.git/
remote: Counting objects: 34471, done.
remote: Compressing objects: 100% (804/804), done.
remote: Total 34471 (delta 709), reused 0 (delta 0), pack-reused 33662
Receiving objects: 100% (34471/34471), 38.01 MiB | 17.22 MiB/s, done.
Resolving deltas: 100% (19021/19021), done.
$
$ cd picard/
$ ant -lib lib/ant clone-htsjdk package-commands
Buildfile: /home/pgrosu/me/gg_gatk/gatk-tools-java/picard-tst/picard/build.xml

clone-htsjdk:
     [exec] Initialized empty Git repository in /home/pgrosu/me/gg_gatk/gatk-tools-java/picard-tst/picard/htsjdk/.git/
     [exec] Permission denied (publickey).
     [exec] fatal: The remote end hung up unexpectedly

BUILD FAILED
/home/pgrosu/me/gg_gatk/gatk-tools-java/picard-tst/picard/build.xml:129: exec returned: 128

Total time: 0 seconds
$

I'm on a cluster where I cannot change the operating system settings. I get a feeling that it probably has something to do with the git configuration. Any recommendations would be appreciated :)

Thanks,
Paul

Make HTSJDK project a proper dependency

The dependency between Picard and HTSJDK needs to be improved.

$ git clone https://github.com/broadinstitute/picard.git
$ ant
Buildfile: /Users/xxx/working/picard/build.xml

set-htsjdk-version:

BUILD FAILED
/Users/xxx/working/picard/build.xml:108: Basedir /Users/xxx/working/picard/htsjdk does not exist

Total time: 0 seconds

There are many mechanisms for dependency resolution in java projects (e.g. Maven, Ivy, Gradle, etc.).

After SortSam, file can't be validated with ValidateSam or processed with MarkDuplicates

Hi,

I have some DNA-Seq data for SNP calling. I did bwa mem (or aln + sampe) alignemnt, but after I sorted the alignment bam file using SortSam (or samtools sort), the sorted file can't be validated with ValidateSam nor further processed with MarkDuplicates. The same error was thrown out:
Exception in thread "main" java.lang.NoClassDefFoundError: java/lang/ref/Finalizer$2
at java.lang.ref.Finalizer.runFinalization(Finalizer.java:144)
at java.lang.Runtime.runFinalization0(Native Method)
at java.lang.Runtime.runFinalization(Runtime.java:705)
at java.lang.System.runFinalization(System.java:967)
at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:58)
at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:49)
at htsjdk.samtools.util.ResourceLimitedMap.get(ResourceLimitedMap.java:76)
at htsjdk.samtools.CoordinateSortedPairInfoMap.getOutputStreamForSequence(CoordinateSortedPairInfoMap.java:180)
at htsjdk.samtools.CoordinateSortedPairInfoMap.put(CoordinateSortedPairInfoMap.java:164)
at picard.sam.DiskReadEndsMap.put(DiskReadEndsMap.java:67)
at picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:449)
at picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:177)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:183)
at picard.sam.MarkDuplicates.main(MarkDuplicates.java:161)

Anyone knows how to get around that error? Thanks!!!

CollectAlignmentSummaryMetrics - Adapter sequences incorrect use of adapter sequence parameters?

See: http://sourceforge.net/p/samtools/mailman/message/33435794/

So this may be more challenging than initially thought. I am working of f the following branch: https://github.com/broadinstitute/picard/tree/nh_mark_duplicates_with_low_q_end
Dear all,

After various unsuccessful attempts, and browsing the archive (partially fixed my problem), I am afraid I have to post here for help fixing my issue.

I am trying to use CollectAlignmentSummaryMetrics for bisulfite libraries. My problem is that adapters are not detected by Picard while I can clearly find 3.5% of the first 1,000,000 read pairs with the first mate containing the reverse complement of the reverse strand adapter (i.e. perfect match to the adapter sequence detected by grep command). Maybe I am using the ADAPTER_SEQUENCE of Picard wrong?

Similarly to the post https://sourceforge.net/p/samtools/mailman/message/32771613/ I was getting a lot of zeros when the reference genome sequence was not specified. However, giving a reference genome did not improve the detection of adapter sequences. Anything I am missing to detect the adapter sequences in my command line below ?

Here is the command I used (in doubt, I gave the two adapter sequences + the two reverse complement sequences, but apparently none are detected):
java -jar /usr/local/src/picard-tools-1.128/picard.jar CollectAlignmentSummaryMetrics INPUT=/workspace/scratch/krue/Methylation/bwa_8lanes/C12.bam OUTPUT=/workspace/scratch/krue/Methylation/bwa_8lanes/C12.test_picard_summary.txt ADAPTER_SEQUENCE=null ADAPTER_SEQUENCE=GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ADAPTER_SEQUENCE=TACACTCTTTCCCTACACGACGCTCTTCCGATCT ADAPTER_SEQUENCE=AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC ADAPTER_SEQUENCE=AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA IS_BISULFITE_SEQUENCED=true REFERENCE_SEQUENCE=/workspace/storage/genomes/bostaurus/UMD3.1.75/source_file/Bos_taurus.UMD3.1.75.dna.toplevel.fa STOP_AFTER=100000

When I tested the SAM file, PicardValidateSamFile returned only one error:
ERROR: Read name C12_TAGCTT_L002_R_001, A platform (PL) attribute was not found for read group

Many thanks in advance, and I hope I didn't miss an embarrassingly obvious explanation somewhere

MEAN_TARGET_COVERAGE underestimated in v1.124

When running calculateHsMetrics using the same interval file for baits and targets, I am getting two different numbers for MEAN_TARGET_COVERAGE and MEAN_BAIT_COVERAGE when coverage levels are very high (> 30,000x). When compared to coverage calculated using samtools mpileup, I see coverage levels more similar to MEAN_BAIT_COVERAGE.

Any idea why this might be occuring?

Some example numbers are below:

Sample MEAN_TARGET_COVERAGE MEAN_BAIT_COVERAGE Calculated from samtools (>Q30)
A 28775 42528 38479
B 29396 59039 52709
C 30117 68338 61293

SAMFIleReader Inputstream

Hi Picard developers,

I wonder if it is possible to expose an inputstream from the SAMFileReader so that I can just

InputStream inputStream = samFileReader.queryOverlapping(queryIntervals).getInputStream();
IOUtils.copy(inputStream, outputstream);

My experience with the SAMRecordIterator (return from queryOverlapping) is not too satisfactory. I can only get at most 1MB/s from a remote server. If I download the whole file from the remote server, I can get 10MB/s.

I was thinking if keeping everything in binary and making use of streams as possible will improve performance.

I'm open to hear advices from you to tell me how I can get better performance.

Thank you!

Jerry

new site doesnt show command usage

The includes in the file _includes/command-line-usage.html is using the server side include syntax, which I suppose was from the sourceforge site.

<!--#include virtual="AddCommentsToBam.html" -->

to make it work with jekyll you need to change it to look like this:

{% include AddCommentsToBam.html %}

picard CalculateHsMetrics not working with me

Please help with this error!
my lists start with the matchong .dict content followed by bed rows from SeqCapEZ_Exome_v3

java -jar /opt/biotools/picard/picard.jar CalculateHsMetrics BI=/opt/biodata/SeqCapEZ_Exome_v3/bait.list TI=/opt/biodata/SeqCapEZ_Exome_v3/target.list I=Patient1_results/Patient1_gatk.bam R=/opt/biodata/reference/human/GRCh37.73.fa O=test
[Mon Dec 22 22:11:52 CET 2014] picard.analysis.directed.CalculateHsMetrics BAIT_INTERVALS=[/opt/biodata/SeqCapEZ_Exome_v3/bait.list] TARGET_INTERVALS=[/opt/biodata/SeqCapEZ_Exome_v3/target.list] INPUT=Patient1_results/Patient1_gatk.bam OUTPUT=test REFERENCE_SEQUENCE=/opt/biodata/reference/human/GRCh37.73.fa METRIC_ACCUMULATION_LEVEL=[ALL_READS] VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Mon Dec 22 22:11:52 CET 2014] Executing as splaisan@r710bits on Linux 3.10.0-123.13.2.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.7.0_71-mockbuild_2014_10_03_09_36-b00; Picard version: 1.126(4691ee611ac205d4afe2a1b7a2ea975a6f997426_1417447214) IntelDeflater
[Mon Dec 22 22:11:53 CET 2014] picard.analysis.directed.CalculateHsMetrics done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=1515716608
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: Invalid interval record contains 1 fields: track name=target_region description="Target Regions"
at htsjdk.samtools.util.IntervalList.fromReader(IntervalList.java:367)
at htsjdk.samtools.util.IntervalList.fromFile(IntervalList.java:293)
at htsjdk.samtools.util.IntervalList.fromFiles(IntervalList.java:322)
at picard.analysis.directed.CollectTargetedMetrics.doWork(CollectTargetedMetrics.java:87)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:187)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:89)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:99)

extract one chromosome from a BAM

Hi,
Is there a picard equivalent to
samtools view -h $bam chrN:from-to
apparently the (broad)picard ''ViewSam'' command does not handle the region arguments commonly used with samtools view

thanks!

PERCENT_DUPLICATION Calculation Bug

The METRICS_FILE generated by MarkDuplicates does not show the PERCENT_DUPLICATION as expected. It is in fact a fraction and not a percentage. i.e. it is 2-orders of magnitude less than one might expect given the column name.

FixMateInformation crash

Hi,
There seems to be a problems with FixMateInformation crashing with

Exception in thread "main" java.lang.NullPointerException
at htsjdk.samtools.SamPairUtil.setMateInformationOnSupplementalAlignment(SamPairUtil.java:300)
at htsjdk.samtools.SamPairUtil$SetMateInfoIterator.advance(SamPairUtil.java:442)
at htsjdk.samtools.SamPairUtil$SetMateInfoIterator.next(SamPairUtil.java:454)
at htsjdk.samtools.SamPairUtil$SetMateInfoIterator.next(SamPairUtil.java:360)
at picard.sam.FixMateInformation.doWork(FixMateInformation.java:194)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:185)
at picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:125)
at picard.sam.FixMateInformation.main(FixMateInformation.java:93)

The problem first appeared in version 1.121. It is present in version 1.128. Versions prior to 1.120 worked and continue to work fine. I am currently using Java 1.7.0_75, but I observed the same problem with earlier version of Java. The problem occurs under several different version of Fedora.

The command lines I am using are:

java -jar picard-1.128/picard.jar FixMateInformation INPUT=test.bam OUTPUT=fixed.bam VALIDATION_STRINGENCY=SILENT
(fails)

java -jar picard-1.121/FixMateInformation.jar INPUT=test.bam OUTPUT=fixed.bam VALIDATION_STRINGENCY=SILENT
(fails)

java -jar picard-1.120/FixMateInformation.jar INPUT=test.bam OUTPUT=fixed.bam VALIDATION_STRINGENCY=SILENT
(succeeds)

I have observed the problem with various BAM files. This one is (a small subset of) the output of an indel realignment with GATK.

Best regards,

Bernt

MarkDuplicates question

User is confused about exactly what criteria are used by MarkDuplicates. GATK docs say it marks as dupes reads that have the same start pos and identical CIGAR, but it looks like Picard docs say nothing about CIGAR, and just mentions length. We need to reconcile this (I expect the GATK doc needs to be corrected).

Secondary question is how it chooses which read is kept as non-dupe -- random or highest MAPQ?

This Issue was generated from your forums

Negative values in "reads processed" count for BuildBamIndex

Hi, not sure if this is a bug, but it seems odd to me and I wanted to know if there is something wrong with my BAM file. I noticed when indexing a large BAM file (341GB) that after about 2.1 billion reads, the numbers in the tally switch to negative, and then the tally continues in a positive direction towards zero.

Here's a snippet of the screen output:

INFO    2015-05-04 18:29:46 BuildBamIndex   2138000000 reads processed ...
INFO    2015-05-04 18:29:51 BuildBamIndex   2139000000 reads processed ...
INFO    2015-05-04 18:29:55 BuildBamIndex   2140000000 reads processed ...
INFO    2015-05-04 18:30:00 BuildBamIndex   2141000000 reads processed ...
INFO    2015-05-04 18:30:05 BuildBamIndex   2142000000 reads processed ...
INFO    2015-05-04 18:30:09 BuildBamIndex   2143000000 reads processed ...
INFO    2015-05-04 18:30:14 BuildBamIndex   2144000000 reads processed ...
INFO    2015-05-04 18:30:19 BuildBamIndex   2145000000 reads processed ...
INFO    2015-05-04 18:30:23 BuildBamIndex   2146000000 reads processed ...
INFO    2015-05-04 18:30:28 BuildBamIndex   2147000000 reads processed ...
INFO    2015-05-04 18:30:32 BuildBamIndex   -2147000000 reads processed ...
INFO    2015-05-04 18:30:37 BuildBamIndex   -2146000000 reads processed ...
INFO    2015-05-04 18:30:43 BuildBamIndex   -2145000000 reads processed ...
INFO    2015-05-04 18:30:48 BuildBamIndex   -2144000000 reads processed ...
INFO    2015-05-04 18:30:53 BuildBamIndex   -2143000000 reads processed ...
INFO    2015-05-04 18:30:58 BuildBamIndex   -2142000000 reads processed ...
INFO    2015-05-04 18:31:03 BuildBamIndex   -2141000000 reads processed ...
INFO    2015-05-04 18:31:08 BuildBamIndex   -2140000000 reads processed ...
INFO    2015-05-04 18:31:13 BuildBamIndex   -2139000000 reads processed ...
INFO    2015-05-04 18:31:18 BuildBamIndex   -2138000000 reads processed ...
INFO    2015-05-04 18:31:23 BuildBamIndex   -2137000000 reads processed ...

Everything ran fine at the end (except for this little detail) -- i.e., no error messages:

INFO    2015-05-04 18:44:08 BuildBamIndex   -1982000000 reads processed ...
INFO    2015-05-04 18:44:11 BuildBamIndex   -1981000000 reads processed ...
INFO    2015-05-04 18:44:14 BuildBamIndex   -1980000000 reads processed ...
INFO    2015-05-04 18:44:17 BuildBamIndex   Successfully wrote bam index file /hpcdata/pid/aws/bam_files/new_files2/849NIH.bai
[Mon May 04 18:44:17 EDT 2015] net.sf.picard.sam.BuildBamIndex done. Elapsed time: 291.94 minutes.
Runtime.totalMemory()=257097728

At first glance, it appears that the index is okay. I.e., when I run samtools idxstats, I see for chr22:

22  51304566    29566913    142722

And when I run samtools view, I get the sum of the last two columns from above (mapped + unmapped):

samtools view 849NIH.bam 22 | wc
29709635

Do you think this output from BuildBamIndex is normal? Or, should I be worried about the BAM file and/or the index file?

Let me know if you need any more information.

Thanks,

Andrew

Where do I find CreateSequenceDictionary.jar?

Hi,

I've installed picard and have picard.jar in my dist/ directory. However, I'm trying to find CreateSequenceDictionary.jar and cannot locate it.

I can find CreateSequenceDictionary.java in src/java/picard/sam but not the .jar file. Am I supposed to manually compress it myself? Or are there additional installation procedures for creating the .jar file?

Finally, I'd like to express my frustration for how ridiculously little information I've been able to find in the documentation. I'm really unfamiliar with Java, and I'm not sure why it's assumed that we'd be familiar with it when working with Picard.

Thanks.

Duplicates missed when mate has low mapping quality

When a pair-end read has one read with good mapping quality and the second with low (e.g. 0) mapping quality, the second may be placed in several locations randomly according to the aligner (even with the same aligner). This means that a duplicated such fragment will be incorrectly not marked as duplicate if the two version of the second read are aligned differently (note that this has nothing to do with secondary or supplemental alignments)

Perhaps the solution could be as simple as to add another condition to line 283 in MarkDuplicates.java (and perhaps a similar condition in MarkDuplicatesWithCigar?) that verifies that the mate is well mapped. I suspect that a MQ!=0 would suffice, though it depends on the aligner...so perhaps it could be an @option.

building picard against existing htsjdk

I'm packaging htsjdk and picard as separate libraries for GNU Guix. In the build environment it is not possible and not desirable to download dependencies during the build process. It would be preferable to build picard against an existing installation of htsjdk.

Is it possible to add an option to build.xml that would allow one to not compile htsjdk when building picard but to specify the path to already built htsjdk libraries?

A similar issue has previously been raised here.

PicardCommandLine USAGE listing of commands

The listing of available commands is useful and very nice. Here are some minor issues and potential improvements:

  • It would be good if redirecting the listing to a file suppressed the colourisation.

    (As a data point, I think bedtools eventually gave up colourising their usage listing entirely as being more trouble than it was worth. It would also be good to get the terminal control strings via termcap or terminfo, but I'm sure that's no fun in Java...)

  • There should be a way (a --list-commands option or similar) to get a listing of just the names of available commands without any formatting or descriptions. This would be useful for writing shell completion scripts.

  • It would be nice if the listing could be formatted to at least somewhat fit on an 80-column terminal. At present, even the horizontal rules are a little too wide:

USAGE: PicardCommandLine <program name> [-h]

Available Programs:
--------------------------------------------------------------------------------
------
Fasta:                                           Tools for manipulating FASTA, o
r related data.
    CreateSequenceDictionary                     Creates a SAM or BAM file from 
reference sequence in fasta format
    ExtractSequences                             Extracts intervals from a refer
ence sequence, writing them to a FASTA file
    NormalizeFasta                               Normalizes lines of sequence in
 a fasta file to be of the same length

SamToFastq: add option ReadGroup tag

By default (hardcoded) when using "OUTPUT_PER_RG" the output will be splitted on PU. If the PU is not in de readgroup it will split on ID. This does not work for us, as we would like to split on ID, but also have a PU tag in our readgroups.

Is it possible to add an option, for example RG_TAG=ID or RG_TAG=PU, to select which tag you want to use to split the fastq files?

Unit tests fail due to changes in HashMap iteration order

Java 8 includes some possible changes to HashSet/Map iteration order:
http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html

Some unit tests that compare files will break because of this. For example, the write order of read group header params may change:

   [testng] FAILED: testNonBarcoded
   [testng] htsjdk.samtools.SAMException: Files /var/folders/c_/gb26gz6s6rngcxblwp1sb31h0000gp/T/nonBarcoded.5177585563167724806.sam and testdata/picard/illumina/25T8B25T/sams/nonBarcoded.sam differ.
   [testng]     at htsjdk.samtools.util.IOUtil.assertFilesEqual(IOUtil.java:404)
   [testng]     at picard.illumina.IlluminaBasecallsToSamTest.testNonBarcoded(IlluminaBasecallsToSamTest.java:80)
   [testng] ... Removed 21 stack frames
   [testng] FAILED: testMultiplexedWithAlternateBarcodeName
   [testng] htsjdk.samtools.SAMException: Files /var/folders/c_/gb26gz6s6rngcxblwp1sb31h0000gp/T/singleBarcodeAltName.4683165887051716028.dir/AAAAAAAA.sam and testdata/picard/illumina/25T8B25T/sams/AAAAAAAA.sam differ.
   [testng]     at htsjdk.samtools.util.IOUtil.assertFilesEqual(IOUtil.java:404)
   [testng]     at picard.illumina.IlluminaBasecallsToSamTest.runStandardTest(IlluminaBasecallsToSamTest.java:165)
   [testng]     at picard.illumina.IlluminaBasecallsToSamTest.testMultiplexedWithAlternateBarcodeName(IlluminaBasecallsToSamTest.java:91)
   [testng] ... Removed 21 stack frames
   [testng] FAILED: testDualBarcodes
   [testng] htsjdk.samtools.SAMException: Files /var/folders/c_/gb26gz6s6rngcxblwp1sb31h0000gp/T/dualBarcode.2258278280539031805.dir/AACTTGAC.sam and testdata/picard/illumina/25T8B8B25T/sams/AACTTGAC.sam differ.
   [testng]     at htsjdk.samtools.util.IOUtil.assertFilesEqual(IOUtil.java:404)
   [testng]     at picard.illumina.IlluminaBasecallsToSamTest.runStandardTest(IlluminaBasecallsToSamTest.java:165)
   [testng]     at picard.illumina.IlluminaBasecallsToSamTest.testDualBarcodes(IlluminaBasecallsToSamTest.java:96)
   [testng] ... Removed 21 stack frames
   [testng] FAILED: testMultiplexed
   [testng] htsjdk.samtools.SAMException: Files /var/folders/c_/gb26gz6s6rngcxblwp1sb31h0000gp/T/multiplexedBarcode.3415989947968785924.dir/AAAAAAAA.sam and testdata/picard/illumina/25T8B25T/sams/AAAAAAAA.sam differ.
   [testng]     at htsjdk.samtools.util.IOUtil.assertFilesEqual(IOUtil.java:404)
   [testng]     at picard.illumina.IlluminaBasecallsToSamTest.runStandardTest(IlluminaBasecallsToSamTest.java:165)
   [testng]     at picard.illumina.IlluminaBasecallsToSamTest.testMultiplexed(IlluminaBasecallsToSamTest.java:85)
   [testng] ... Removed 21 stack frames

CollectQualityYieldMetrics has problems on RNASeq BAM file from tophat(bowtie1)

I have 12 RNASeq bam files. On 11 of those, the program CollectQualityYieldMetrics aborts (without printing stats) .. after it seemingly has finished processing the reads on the last chromosome)

Runtime.totalMemory()=1665204224
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Record 78218867, Read name R0251321:126:H7NGRADXX:2:1101:1537:1996, Mapped mate should have mate reference name
at htsjdk.samtools.SAMUtils.processValidationErrors(SAMUtils.java:439)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:643)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:628)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:598)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:515)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:489)
at picard.analysis.CollectQualityYieldMetrics.doWork(CollectQualityYieldMetrics.java:94)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:187)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:89)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:99)

CollectInsertSizeMetrics infers incorrect PairOrientation

CollectInsertSizeMetrics infers incorrect the PairOrientation (RF, should be FR) of a library when the average fragment size of is less than read length, local alignment is used, and adapter sequences are not removed. The majority of reads map to the same position, with a significant number mapping in what appears to the RF orientation due to reference and adapter microhomology at the aligned position.

The pair orientation returned by CollectInsertSizeMetrics should reflect the FR orientation of the library prep and sequencing.

Release picard to Maven Central repository

Please release picard and its upstream dependencies to the Maven Central repository.

Currently downstream projects (e.g. ADAM, Google Genomics API, etc.) are depending on a rather out-of-date un-official build of picard distributed by the UTGB (University of Tokyo Genome Browser) Toolkit project
http://search.maven.org/#search|ga|1|picard

Sonatype OSSRH (Open Source Software Repository Hosting) provides a service to sync to Maven Central
http://central.sonatype.org/pages/ossrh-guide.html

MAX_OUTPUT='null' kills ValidateSamFile

$ java -Xmx4096m -jar "$PICARD_HOME/ValidaSamFile.jar" INPUT="$bam" OUTPUT="${base}.ValidateSamFile.out" MAX_OUTPUT='null'

stderr:
[Thu Apr 30 12:27:00 EDT 2015] Executing as goldba06@interactive5 on Linux 2.6.32-358.23.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_03-b04; Picard version: 1.112(1930) IntelDeflater
[Thu Apr 30 12:27:00 EDT 2015] net.sf.picard.sam.ValidateSamFile done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=1009909760
To get help, see http://picard.sourceforge.net/index.shtml#GettingHelp
Exception in thread "main" java.lang.NullPointerException
at net.sf.picard.sam.ValidateSamFile.doWork(ValidateSamFile.java:148)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:179)
at net.sf.picard.sam.ValidateSamFile.main(ValidateSamFile.java:101)

Although it's documented in http://broadinstitute.github.io/picard/command-line-overview.html

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.