Comments (11)
A good question is whether Hadoop-BAM/Picard handle that conversion for us. @AndreSchumacher, would you know? I believe that the SAMRecord type in Picard does that conversion (to the SAM 1-based coordinate system).
We use a 0-based coordinate system in ADAM, and yes, we should document this. A clear place to add the documentation is in the schema — I'll add it there shortly.
from adam.
This is related to the question that Uri was bringing up in email too, about "slice-based" coordinates (which I've seen called "gap-based" coordinates, in opposition to the "residue-based" coordinates that are both the 0- and 1-based schemes). Uri is right that the gap-based coordinates make it easier to indicate insertions and deletions -- you need a convention (such as "the coordinate of the insertion is the location of the last reference base prior to the insertion," etc.) to make it clear what you mean when you use residue-based coordinates. On the other hand, point locations are more annoying to specify when using gap-based coordinates; everything becomes a range, and classes like our ReferencePosition really would need to be modified.
Anyway, coordinate systems are one of those things that lots of bioinformatics people have opinions about, but it's all convention -- what matters is that we specify what choice we have made loud and clear, and make sure that all coordinate-based code sticks to it.
from adam.
Perhaps someone can propose a convention for us for Wednesday? We can then make whatever changes we feel are necessary, agree on it as a group, and then document it and revise whatever code needs to be revised.
from adam.
I'm giving up the assignment on this issue, since I don't think I've been pushing the issue forward. Maybe we can take it back up as part of the discussion around variant representations (since this will be important there).
from adam.
Did this ever get addressed? I'd like to know if I should use different indexing depending on whether my input is a SAM or BAM file.
from adam.
I believe the understanding is that Samtools performs the conversion, so your indexing should be the same independent of whether your input is SAM/BAM. I suppose this should be simple to check, but I haven't done the check myself.
from adam.
The design goal is to have 0-based indexing throughout regardless of whether you are reading SAM, BAM or ADAM files. If you find otherwise, please let us know since that should be considered a bug.
from adam.
@massie @fnothaft This looks old and "resolved" -- any objection to closing it out?
from adam.
+1, let's close.
Timothy Danford [email protected] wrote:
@massie @fnothaft This looks old and "resolved" -- any objection to closing it out?
Reply to this email directly or view it on GitHub:
#34 (comment)@massie @fnothaft This looks old and "resolved" -- any objection to closing it out?
—
Reply to this email directly or view it on GitHub.
from adam.
+1
-Matt
On Sat, Jun 7, 2014 at 2:21 PM, Frank Austin Nothaft <
[email protected]> wrote:
+1, let's close.
Timothy Danford [email protected] wrote:
@massie @fnothaft This looks old and "resolved" -- any objection to
closing it out?
Reply to this email directly or view it on GitHub:
#34 (comment)@massie @fnothaft
This looks old and "resolved" -- any objection to closing it out?—
Reply to this email directly or view it on GitHub
#34 (comment).
from adam.
Closing it.
from adam.
Related Issues (20)
- error when saveAsPairedFastq files HOT 2
- Update Spark dependency version to 3.2.1
- Parse error in interval list header
- Features without strand NPE in printFeatureAttributes
- Kmers overcounted in Slice countKmers
- Add sort output parameter to countKmers/countSliceKmers
- Add references parameter to transformSequences/transformSequences/countKmers etc. HOT 2
- Add maximum length parameter to countSliceKmers
- Update bdg-utils dependency version to 1.0
- Update bdg-formats dependency version to 1.0
- Update Spark dependency version to 3.3.0 HOT 1
- BAM/BED to parquet HOT 5
- Update Spark dependency version to 3.3.1
- Update Avro dependency version to 1.11.1
- Parameter warnings for scala-maven-plugin
- Update Spark dependency version to 3.3.2
- Missing Fastq reads HOT 1
- Parquet/Avro schema mismatch in transformFragments HOT 1
- Duplicate 1.0.1 entry in CHANGES.md
- Update Spark dependency version to 3.5.x HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from adam.