Comments (8)
Turns out there are some tests for this case already (and they seem to pass):
https://github.com/broadinstitute/picard/blob/master/testdata/picard/util/BedToIntervalListTest/zero_base_interval.bed
https://github.com/broadinstitute/picard/blob/master/testdata/picard/util/BedToIntervalListTest/zero_base_interval.bed.interval_list
https://github.com/broadinstitute/picard/blob/master/testdata/picard/util/BedToIntervalListTest/zero_length_interval_at_first_position_in_contig.bed
https://github.com/broadinstitute/picard/blob/master/testdata/picard/util/BedToIntervalListTest/zero_length_interval_at_first_position_in_contig.bed.interval_list
@rickymagner Do you believe it would be better (and more correct in a sense) if the output interval lists for the test cases above were empty instead (besides the header)?
from picard.
Interval Lists should be able to represent empty intervals, but since they use 1-based inclusive coordinates, as opposed to 0-based half-open (like bed) there needs to be a coordinate change on "start" (but not "end"). The result isn't illegal....
Which tools fail on empty intervals in an interval list?
from picard.
Perhaps the problem isn't this tool then but how GATK uses the -L
flag. The example that I saw this failing on was running gatk CountVariants -V example.vcf -L test.interval_list
, but it seems to also fail on test.bed
as well. The error is:
A USER ERROR has occurred: Badly formed genome unclippedLoc: Parameters to GenomeLocParser are incorrect:The stop position 100 is less than start 101 in contig chr1
Or maybe it's just localized to CountVariants
? If we're confident most tools should be able to handle this edge case properly, I'd be OK treating this as a CountVariants
bug rather than BedToIntervalList
.
from picard.
GATK interval processing will reject empty intervals by design.
The last time we looked into this all the empty intervals in bed files we found were errors in the bed file, not meaningful data. I know there's an ongoing "but what about insertions!" argument bug I thought we had agreed that insertions were going to be represented like in vcf.
If you use empty intervals htsjdk will process them inconsistently and your results will be wrong because the math for empty intervals is inconsistent and no one has ever cared enough to fix it.
@yfarjoun Is there a new use case that actually needs empty intervals?
from picard.
See samtools/htsjdk#1320 for a list of ongoing issues.
from picard.
Lets add a --remove-zero-length flag to these tools.
from picard.
Consensus is that we should just add a --keep-zero-length-intervals
flag to the tool, and skip them by default.
from picard.
This discussion should now be resolved by #1928
from picard.
Related Issues (20)
- MarkDuplicates: Instead of adding the read group of a read to the key, picard always adds "RG"
- java.lang.NegativeArraySizeException HOT 2
- genotypeConcordance irregular behavior in counting after applying filters HOT 3
- MarkDuplicates with single end data does not write the full report HOT 3
- Picard Build Failure HOT 6
- java environment not detected when using option OUTPUT in BuilBamIndex HOT 1
- Getting Segfault when running ReorderSam via GATK HOT 1
- How to control the number of threads HOT 1
- Edge case in LiftoverVcf causes cryptic string exception HOT 8
- TAGGING_POLICY=OpticalOnly tags everything HOT 4
- Issue Running Picard MarkDuplicates: File Compression Error with Snappy HOT 1
- about MarkDuplicates HOT 1
- UnMarkDuplicate HOT 1
- Running picard in parallel issue HOT 3
- `--USE_END_IN_UNPAIRED_READS` and `--USE_UNPAIRED_CLIPPED_END` options for `picard MarkDuplicates` have no effect HOT 9
- MarkDuplicates SAM validation error HOT 6
- Optical duplicates HOT 4
- ValidateSamFile wrong NM tag computation HOT 3
- CollectWgsMetrics error, out of bound of the index? HOT 8
- Lost output/information during RenameSampleInVcf HOT 11
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from picard.