Code Monkey home page Code Monkey logo

ga4gh-schemas's People

Contributors

adamnovak avatar awblocker avatar benedictpaten avatar buske avatar calbach avatar cassiedoll avatar cmungall avatar dcolligan avatar delagoya avatar dglazer avatar diekhans avatar fnothaft avatar hammer avatar hershman avatar heuermh avatar hjellinek avatar jeromekelleher avatar jnguyenx avatar kellrott avatar lh3 avatar macieksmuga avatar massie avatar maximilianh avatar mfiume avatar nlwashington avatar pcingola avatar pgrosu avatar richarddurbin avatar sarahhunt avatar skeenan avatar

Watchers

 avatar  avatar

ga4gh-schemas's Issues

./generate_sphinx_docs.sh has confusing output

when the build script generates doc, one gets a confusing transfer message with no context. Please add a message about downloading avro tools. Also, if we put this under maven, can maven handle getting avro toosl?

./generate_sphinx_docs.sh 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                                      Dload  Upload   Total   Spent    Left  Speed
100 11.9M  100 11.9M    0     0  1133k      0  0:00:10  0:00:10 --:--:-- 1086k

ReferenceName should be pointer

In schema variants, record Variant, field referenceName should probably be a pointer to schema reference, record Reference, field 'id'. This would be consistent with other records.

ReadAlignment should point to reference, not fragment

To be consistent with other records, the fragmentId field in the ReadAlignment records of the reads schema should be replaced with ReferenceId, pointing to a Reference record. The field fragmentName should be removed from ReadAlignment, and the ill defined Fragment record should be deleted.

OntologyTerm duplicated and misplaced

There's a file named ontologies.avdl In the avro directory, that contains one record: OntologyTerm. This record is duplicated in common.avdl

OntologyTerm is only used in record Attributes of schema sequenceannotations, so it should probably be moved to there.

Note: OntologyTerm does occur in the Description field of record Individual in schema metadata but here refers to NCBI taxon orthology. It might be useful to change the naming in one of the two.

Wiggle and WiggleSets orphaned

Wiggle and WiggleSets are part of SequenceAnnotations, but nothing points to them and they don't point to any other records (not even each other).
I'm assuming this is a work in progress but there's little info in the .avdl file on where these belong.

VariantSetMetadata ill defined

In the Variant schema, the record VariantSetMetadata looks like an example schema. It has
key: string
value:string
etc. No other record in Variant appears to point to this record, except VariantSet, which has a key metadata: array

Link up the whole doc build

Ideal situation:

At the top level, once dependencies are installed:
mvn doc
od
make doc

This generates a directory containing sphinx from the manual documentation and the avro schemas in a form that can be copied the the website or uploaded to readthedocs.org. A clean target reset everything.

create a standard INSTALL.md file

Move doc/GeneratingDocumentation.md to toplevel INSTALL.md and start evolving it towards
general INSTALL.md style instructions with a section on documentation.

Please make changes in documentation branch.

Orphaned record common/Region

No other record points to this record, and it does not point elsewhere.
The description is rather vague:
An abstraction for referring to a genomic region, in relation to some already known reference. This will require some significant rework as we move to graph coordinates.

The word region appears in the Wiggle record (SequenceAnalysis schema) and in the description of the SequenceAnalysis/Feature record, but I'm not sure if those are related.

clean up source files

  • what is sphinx/build.py. It is undocumented and appears unused. Delete if true.
  • what happened to Adam's graphviz UML builder that was in contrib. It would be good to keep that in the doc.
  • mvn should handle managing avrotools jar get

Create UML accessible pointers

Once the Sphinx plugin for creating documentation from markdown is finished, make sure records contain consistent pointers that allow the UML to point from a field to another field or another complete record.

It may also be useful to put all pointers near the top of each record, and maybe all contained values to the bottom.

put make in charge

implement documentation build using make command rather than a script that runs make.
Or move it to mvn

Record id field should be present and consistent

Most records have an 'id' field. However, the description of that field varies a lot between records, from 'User specified ID' to 'The id of this annotation node'. The field is also not always first, and it's not always defined as 'string' (sometimes it is union<null|string>, which seems unwise).
In the metadata schema, ids are consistently in the first field and described as 'The UUID. This is globally unique'. This seems like a good template for all the other records.

The following records do not have an 'id' field:

  • in schema variants:
    • AlleleCall
    • Call
  • in schema reads:
    • ReadStats
    • GraphAlignment
    • LinearAlignment
  • in schema sequenceannotations:
    • Wiggle
    • ExternalIdentifier

ReadGroup description duplicate?

In schema reads, record ReadGroup has a field 'datasetId' that points to Dataset. The Dataset record has two fields: 'id' and 'description'. ReadGroup itself also has a description field. Is this a duplicate? In other words: is there some description of the ReadGroup that does not belong in Dataset?

Inconsistent directionality

In the Reads schema, in the ReadGroupSet record there's a readGroupsId that points to (actually contains an array of) ReadGroups. So the group container holds a list of individual records.

However in Sequence Annotations, the Feature contains a FeatureSetID (a string). So here the individual record points to the group, instead of the other way around (FeatureSet has no array or ID referencing Features).

Shouldn't this be consistent? Either works fine, but together they're confusing.

ReadStats should contain more information

Record ReadStats in the reads schema should probably contain the information that comes from samtools flagstat (or equivalent). This is:

  • total reads
  • duplicates
  • mapped
  • paired in sequencing
  • read1 count
  • read2 count
  • properly paired (both mates of a read pair map to the same chromosome, oriented towards each other, and with a sensible insert size)
  • with itself and mate mapped (meaning both reads of a pair are mapped)
  • singletons
  • with mate mapped to a different chr
  • with mate mapped to a different chr (mapQ>=5)

In samtools flagstats, each field has two numbers, one for reads that pass quality mapping and another for reads that do not.

Organizing top-level overview documentation

This ticket is to capture ideas on the organization or the GA4GH schema documentation.

The API really consists of

  • A conceptual data model
  • Schemas and a wire protocol for exchange data

The audience are:

  • server developers - they need a normative API spec
  • client developers - they need a normative API spec
  • data users - they need to understand the conceptual model and basics of the API, but communication protocols are a bit below that they need

Dataset should be in Metadata, not Reads

Currently the following records point to datasetId:

  • reads/ReadGroup
  • reads/ReadGroupSet
  • variant/VariantSet
  • sequencealignment/FeatureSet

It seems that dataset should be moved to the metadata schema.

Common/CigarUnit should be under Reads

CigarUnit is only referenced by the GraphAlignment and LinearAlignment records in the Reads schema and should probably be moved there together with the ENUM CigarOperation.

warnings in ga4gh topic branch

Lots of warnings that need fixed To deal with git merge crap, this is now in jetlje/documentation2 or
ga4gh topic branch documentation.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/common.rst:59: WARNING: Field list ends without a blank line; unexpected unindent.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/metadata.rst:8: WARNING: duplicate Avro object description of Strand.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/metadata.rst:11: WARNING: Field list ends without a blank line; unexpected unindent.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/metadata.rst:16: WARNING: duplicate Avro object description of Position.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/metadata.rst:38: WARNING: duplicate Avro object description of ExternalIdentifier.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/metadata.rst:56: WARNING: duplicate Avro object description of CigarOperation.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/metadata.rst:59: WARNING: Field list ends without a blank line; unexpected unindent.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/metadata.rst:105: WARNING: duplicate Avro object description of CigarUnit.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/readmethods.rst:4: WARNING: duplicate Avro object description of Strand.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/readmethods.rst:7: WARNING: Field list ends without a blank line; unexpected unindent.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/readmethods.rst:12: WARNING: duplicate Avro object description of Position.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/readmethods.rst:34: WARNING: duplicate Avro object description of ExternalIdentifier.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/readmethods.rst:52: WARNING: duplicate Avro object description of CigarOperation.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/readmethods.rst:55: WARNING: Field list ends without a blank line; unexpected unindent.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/readmethods.rst:101: WARNING: duplicate Avro object description of CigarUnit.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/readmethods.rst:122: WARNING: duplicate Avro object description of GAException.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/readmethods.rst:127: WARNING: duplicate Avro object description of Experiment.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/readmethods.rst:208: WARNING: duplicate Avro object description of Dataset.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/readmethods.rst:358: WARNING: Inline literal start-string without end-string.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/readmethods.rst:387: WARNING: Inline literal start-string without end-string.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/readmethods.rst:391: WARNING: Inline literal start-string without end-string.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/readmethods.rst:414: WARNING: Inline literal start-string without end-string.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/reads.rst:84: WARNING: duplicate Avro object description of Strand.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/reads.rst:87: WARNING: Field list ends without a blank line; unexpected unindent.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/reads.rst:92: WARNING: duplicate Avro object description of Position.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/reads.rst:114: WARNING: duplicate Avro object description of ExternalIdentifier.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/reads.rst:132: WARNING: duplicate Avro object description of CigarOperation.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/reads.rst:135: WARNING: Field list ends without a blank line; unexpected unindent.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/reads.rst:181: WARNING: duplicate Avro object description of CigarUnit.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/reads.rst:202: WARNING: duplicate Avro object description of Experiment.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/reads.rst:283: WARNING: duplicate Avro object description of Dataset.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/reads.rst:318: WARNING: duplicate Avro object description of Program.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/reads.rst:351: WARNING: duplicate Avro object description of ReadStats.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/reads.rst:376: WARNING: duplicate Avro object description of ReadGroup.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/reads.rst:433: WARNING: Inline literal start-string without end-string.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/reads.rst:441: WARNING: duplicate Avro object description of ReadGroupSet.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/reads.rst:462: WARNING: Inline literal start-string without end-string.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/reads.rst:466: WARNING: Inline literal start-string without end-string.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/reads.rst:476: WARNING: duplicate Avro object description of LinearAlignment.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/reads.rst:489: WARNING: Inline literal start-string without end-string.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/reads.rst:498: WARNING: duplicate Avro object description of Fragment.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/reads.rst:512: WARNING: duplicate Avro object description of ReadAlignment.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/referencemethods.rst:4: WARNING: duplicate Avro object description of Strand.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/referencemethods.rst:7: WARNING: Field list ends without a blank line; unexpected unindent.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/referencemethods.rst:12: WARNING: duplicate Avro object description of Position.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/referencemethods.rst:34: WARNING: duplicate Avro object description of ExternalIdentifier.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/referencemethods.rst:52: WARNING: duplicate Avro object description of CigarOperation.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/referencemethods.rst:55: WARNING: Field list ends without a blank line; unexpected unindent.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/referencemethods.rst:101: WARNING: duplicate Avro object description of CigarUnit.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/referencemethods.rst:122: WARNING: duplicate Avro object description of GAException.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/referencemethods.rst:182: WARNING: Inline literal start-string without end-string.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/referencemethods.rst:194: WARNING: Inline literal start-string without end-string.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/references.rst:7: WARNING: duplicate Avro object description of Strand.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/references.rst:10: WARNING: Field list ends without a blank line; unexpected unindent.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/references.rst:15: WARNING: duplicate Avro object description of Position.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/references.rst:37: WARNING: duplicate Avro object description of ExternalIdentifier.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/references.rst:55: WARNING: duplicate Avro object description of CigarOperation.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/references.rst:58: WARNING: Field list ends without a blank line; unexpected unindent.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/references.rst:104: WARNING: duplicate Avro object description of CigarUnit.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/references.rst:125: WARNING: duplicate Avro object description of Reference.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/references.rst:180: WARNING: Inline literal start-string without end-string.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/references.rst:183: WARNING: duplicate Avro object description of ReferenceSet.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/references.rst:192: WARNING: Inline literal start-string without end-string.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/variantmethods.rst:4: WARNING: duplicate Avro object description of GAException.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/variantmethods.rst:9: WARNING: duplicate Avro object description of Strand.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/variantmethods.rst:12: WARNING: Field list ends without a blank line; unexpected unindent.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/variantmethods.rst:17: WARNING: duplicate Avro object description of Position.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/variantmethods.rst:39: WARNING: duplicate Avro object description of ExternalIdentifier.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/variantmethods.rst:57: WARNING: duplicate Avro object description of CigarOperation.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/variantmethods.rst:60: WARNING: Field list ends without a blank line; unexpected unindent.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/variantmethods.rst:106: WARNING: duplicate Avro object description of CigarUnit.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/variantmethods.rst:215: WARNING: Inline literal start-string without end-string.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/variants.rst:9: WARNING: duplicate Avro object description of Strand.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/variants.rst:12: WARNING: Field list ends without a blank line; unexpected unindent.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/variants.rst:17: WARNING: duplicate Avro object description of Position.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/variants.rst:39: WARNING: duplicate Avro object description of ExternalIdentifier.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/variants.rst:57: WARNING: duplicate Avro object description of CigarOperation.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/variants.rst:60: WARNING: Field list ends without a blank line; unexpected unindent.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/variants.rst:106: WARNING: duplicate Avro object description of CigarUnit.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/variants.rst:127: WARNING: duplicate Avro object description of VariantSetMetadata.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/variants.rst:168: WARNING: duplicate Avro object description of VariantSet.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/variants.rst:198: WARNING: duplicate Avro object description of CallSet.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/variants.rst:215: WARNING: Inline literal start-string without end-string.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/variants.rst:232: WARNING: duplicate Avro object description of Call.

/home/travis/build/ga4gh/schemas/target/generated-docs/rst/schemas/variants.rst:292: WARNING: duplicate Avro object description of Variant.

ok

testSchemaProperties (test_protocol.TestValidateSchemas) ... ok


Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.