phenopackets / phenopacket-schema Goto Github PK
View Code? Open in Web Editor NEWRepository for the GA4GH phenopacket schema
Home Page: https://phenopacket-schema.readthedocs.io/
License: BSD 3-Clause "New" or "Revised" License
Repository for the GA4GH phenopacket schema
Home Page: https://phenopacket-schema.readthedocs.io/
License: BSD 3-Clause "New" or "Revised" License
// Sequence position at which the change is located, in the specified coordinate system.
int32 position = 4; //start, end...
Why start and end? this is not a repeated element?
For this element,
message Evidence {
OntologyClass evidence_code = 1;
ExternalReference reference = 2;
}
I have written this
For example, in order to describe the evidence for a phenotypic observation that is derived from a publication, one might use
the ECO term `author statement from published clinical study used in manual assertion` <https://www.ebi.ac.uk/ols/ontologies/eco/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FECO_0006017>`_ and record a PubMed id in the reference field
(See :ref:`External Reference element`).
Is this correct? Are there any other examples we can provide here?
There is a typo: suspect conditions
should be suspected conditions
.
For instance, prostatocystectomy is used to remove bladder cancer, and a biopsy is taken from the resulting specimen. On the other hand, a smaller biopsy could be taken by cystsocopy.
There are NCI thesaurus terms to represent these procedures.
The readthedocs is an excellent intro to how to use the java code, what is the status of generated python? Is the java canonical? Sounds like some groups (cc @drseb ) are using generated python. Should we generate python and release on pypi with each release, or is that taking on too much?
I propose we remove the Analysis
message. This is only used in AttributeValue
and I think is out of scope.
// An analysis contains an interpretation of one or several experiments. (e.g.
// SNVs, copy number variations, methylation status) together with information
// about the methodology used.
message Analysis {
// Formats of id | name | description | accessions are described in the
// documentation on general attributes and formats.
string id = 1;
string name = 2;
string description = 3;
// The time at which this record was created, in ISO 8601 format.
string created = 4;
// The time at which this record was last updated, in ISO 8601 format.
string updated = 5;
// The type of analysis.
string type = 6;
// The software run to generate this analysis.
repeated string software = 7;
// A map of additional analysis information.
Attributes attributes = 9;
}
message Individual {
string id = 1;
// is this not too precise a field?
google.protobuf.Timestamp date_of_birth = 2;
Sex sex = 3;
repeated Phenotype phenotypes = 4;
repeated Disease diagnoses = 5;
}
Yes! However it has temporality baked-in, which is good. If we were to be less precise I guess year_of_birth
and month_of_birth
fields could be used, but this becomes a pain to model.
This is not the way to include the package in a different project
<dependency>
<groupId>org.phenopackets</groupId>
<artifactId>phenopacket-schema</artifactId>
<version>${project.version}</version>
</dependency>
This is because ${project.version} is defined for the project and not for the phenopackets schema.
Suggest revising the documentation to say
add the following line to the properties section of the pom file.
<phenopackets.version>0.1.0</phenopackets.version>
and then add the following to the dependencies section.
<dependency>
<groupId>org.phenopackets</groupId>
<artifactId>phenopacket-schema</artifactId>
<version>${phenopackets.version}</version>
</dependency>
I would recommend abstracting this into a general object for external references. For ga4gh-metadata
, we prototyped ExternalIdentifier
:
// Identifier from a public database, describing a representation of
// the record, and the relation between record and external reference
// TODO: relation as OntologyTerm object (e.g. as IAO property).
// Example:
// {
// "relation" : "reported_in",
// "identifier" : "pubmed:25752754"
// }
//
message ExternalIdentifier {
// The CURIE of the identifier.
string identifier = 1;
// The relation between identifier and the object ist was reported from.
string relation = 2;
}
This could serve as a blueprint here, instead of the specific "Publication".
Could be important for future computational cancer cases
We had one before, we cold have one again.
Could be defined as an aggregated individual:
message Cohort {
string id = 1;
string description = 2;
int size = 3;
repeated Phenotype phenotypes = 4;
repeated Cohort sub_groups = 5;
}
or maybe an actual set of individuals:
message Cohort {
string id = 1;
string description = 2;
int size = 3;
repeated Individual members = 4
repeated Phenotype phenotypes = 5;
repeated Cohort sub_groups = 6;
}
or something else.
Consider update from version 3.5.1 to 3.6.1 of protobuf.
I believe this might fix the following error warning (see the change log https://github.com/protocolbuffers/protobuf/blob/master/CHANGES.txt).
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.google.protobuf.UnsafeUtil (file:/home/robinp/.m2/repository/com/google/protobuf/protobuf-java/3.6.1/protobuf-java-3.6.1.jar) to field java.nio.Buffer.address
WARNING: Please consider reporting this to the maintainers of com.google.protobuf.UnsafeUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
The External reference is not 100% clear. Here is the current definition
message ExternalReference {
// e.g. ISBN, PMID:123456, DOI:...
string id = 1;
string description = 2;
}
Do we intend the id to be something like "PMID:12345" and the description to be something like "Patient B" from the same article? Can we provide a few examples of the intended use cases?
I propose we remove the Experiment
message. It is only used as part of AttributeValue
. It seems a bit out of scope, however it might also have some utility for the Biosample
message. In the latter case it might be better to re-model some of the fields as OntologyClass
instead of string
.
Thoughts @mcourtot @pnrobinson?
// An experimental preparation of a sample.
message Experiment {
// The experiment ID
string id = 1;
// The name of the experiment.
string name = 2;
// A description of the experiment.
string description = 3;
// The time at which the record of this experiment was last updated.
// Format: ISO 8601, YYYY-MM-DDTHH:MM:SS.SSS (e.g. 2015-02-10T00:03:42.123Z)
string updated = 5;
// The time at which this experiment was performed.
// Granularity here is variable (e.g. date only).
// Format: ISO 8601, YYYY-MM-DDTHH:MM:SS (e.g. 2015-02-10T00:03:42Z)
string run_time = 6;
// The molecule examined in this experiment. (e.g. genomics DNA, total RNA)
string molecule = 7;
// The experiment technique or strategy applied to the sample.
// (e.g. whole genome sequencing, RNA-seq, RIP-seq)
string strategy = 8;
// The method used to enrich the target. (e.g. immunoprecipitation, size
// fractionation, MNase digestion)
string selection = 9;
// The name of the library used as part of this experiment.
string library = 10;
// The configuration of sequenced reads. (e.g. Single or Paired).
string library_layout = 11;
// The instrument model used as part of this experiment.
// This maps to sequencing technology in BAM.
string instrument_model = 12;
// The data file generated by the instrument.
// Should this be `instrumentData` instead?
File instrument_data_file = 13;
// Name of the laboratory where this experiment was performed.
string sequencing_center = 14;
// The address coded as geolocation object of the laboratory.
GeoLocation location = 18;
// The platform unit used as part of this experiment. This is a
// flowcell-barcode
// or slide unique identifier.
string platform_unit = 15;
// A map of additional experiment information.
Attributes attributes = 17;
}
The current comment needs some attention, methinks.
// Biosample(s) derived from the patient or a collection of biosamples in isolation
With the recent revisions, we have decided that a PhenoPacket describes one individual. It seems to me that if we allow a Phenopacket to describe a collection of isolated biosamples, that it is scope creep, and many of the other fields do not make sense (e.g., patient describes one individual and not a cohort).
We would simplify things enormously if we make another element called biosample-collection
that has the required metadata. Both the phenopackets and this object would then be interoperable, and we would greatly simplify the use of the PhenoPacket.
Suggested revision of the comment
// Biosample(s) derived from the patient
Medium-term: also consider creation of biosample-collection message.
I am wondering if we can comment here or elsewhere that a PhenoPacket is meant to represent one moment in time (one particular examination, which may include past medical history), and that if we want to describe multiple visits at different time points, then different phenopackets should be used.
@julesjacobsen @mellybelly @cmungall @drseb
individuals
pedigree
Cohort
#5 in which multiple individuals would be represented by repeated phenopacket
In the Age element we suggest that HPO terms can be used to represent age.
// age_class : { term_id : "HP:0003596", term : "Middle age onset" }
OntologyClass age_class = 2;
But of course they do not represent age, they represent age of onset. Is there some other ontology we can recommend here? If not, we should remove this option for now because it will lead to errors in data entry.
This was part of the original phenopacket idea, but has not made it into the new schema yet.
seem to indicate they are similar.
Disease onset is used a lot by GEL clinicians as a means of filtering out potential diagnoses based on the proband's age.
message Disease {
// The identifier of this disease e.g. MONDO:0007043, OMIM:101600, Orphanet:710, DOID:14705 (note these are all equivalent)
string id = 1;
// Textual name of the disease e.g. Pfeiffer syndrome
string label = 2;
// Free-text description
string description = 3;
// The mode of inheritance for this disease, in preference a sub-class of http://purl.obolibrary.org/obo/HP_0000005
OntologyClass modeOfInheritance = 4;
// The disease phenotype
repeated Phenotype phenotypes = 5;
//TODO: onset here?
OntologyClass onset = 6;
}
Shouldn't it be possible to store multiple updates? If I look here https://phenopackets-schema.readthedocs.io/en/latest/individual.html#updated it seem that only one update is possible.
Should the naming be consistent between individual and biosample?
// Deleted bases on the forward strand - equivalent to the VCF REF
string deletion = 5;
// Inserted bases on the forward strand - equivalent to the VCF ALT
string insertion = 6;
Can we go over exactly what we want people to do with these fields and document them?
I am not a SPDI expert, but I do not think that deletions require a string. According to the documentation, When a nucleotide deletion occurs, nothing is inserted. In SPDI syntax, InsertedSequence is simply an empty string. Thus, the deletion of the same A nucleotide as above is represented as:
Seq1:4:1:
Thus, is this a mistake, or do we want people to add EITHER SPDI OR VCF (which would be a recipe for disaster!!!)
Can we document the format conventions for Age, as in:
Age.newBuilder().setAge("P48Y3M").build()
Does this just mean the proband (P) is 48 years and 3 months old? Where does the P come from? It seems the code is treating it as a string
Can people use curies from GeoNames or ENVO?
Currently
// NCBI taxonomic identifier NCBITaxon e.g. NCBITaxon:9606 or NCBITaxon:1337
// For resources where there may be more than one organism being studied it is advisable to indicate the taxonomic
// identifier of that organism, to it's most specific level
OntologyClass taxonomy = 7;
it's is incorrect, please remove the apostrophe.
Variant {
// genome assembly isn't strictly necessary if using the sequence accession as that will be unique for the assembly/sequence
GenomeAssembly genome_assembly = 1;
// Accession of the sequence e.g. NC_000010.10 (GRCh37 chromosome 10)
string sequence =
I think it would be better to rename "sequence" to sequence_accession or accession_number, because I thought sequence would be a sequence of nucleotides -- the label is confusing.
If I create a phenopacket that includes this,
HtsFile htsFile = HtsFile.newBuilder()
.setHtsFormat(HtsFile.HtsFormat.VCF)
.setGenomeAssembly(org.phenopackets.schema.v1.core.GenomeAssembly.GRCH_37)
// in this case the proband identifier is different to the sample
// identifier used in the VCF file
.putIndividualToSampleIdentifiers(proband.getId(), "manuel")
.setFile(File.newBuilder().setPath("/home/robinp/data/exomiser-cli-9.0.1/examples/Pfeiffer.vcf").build())
.build();
then the output is missing the part about "GRCH_37"
"htsFormat": "VCF",
"individualToSampleIdentifiers": {
"manuel": "manuel"
},
"file": {
"path": "/home/robinp/data/exomiser-cli-9.0.1/examples/Pfeiffer.vcf"
}
}],
Is my code wrong or what is the way of indicating the genome build?
Open Location Code aka PlusCodes if you want the glossy overview, looks to be a nicely engineered solution to the issue of location.
From their README:
Open Location Code is a technology that gives a way of encoding location into a form that is easier to use than latitude and longitude. The codes generated are called plus codes, as their distinguishing attribute is that they include a "+" character.
The technology is designed to produce codes that can be used as a replacement for street addresses, especially in places where buildings aren't numbered or streets aren't named.
Plus codes represent an area, not a point. As digits are added to a code, the area shrinks, so a long code is more precise than a short code.
Codes that are similar are located closer together than codes that are different.
A location can be converted into a code, and a code can be converted back to a location completely offline.
There are no data tables to lookup or online services required. The algorithm is publicly available and can be used without restriction.
Currently location is defined like this:
// A GeoLocation object provides information about a geographic position
// related to a record. Examples could be:
// - an address, e.g. of a lab performing an analysis
// - provenance of an individual, obfuscated to a larger order administrative
// entity (Suffolk, U.K.)
// - a lat/long/alt position where an environmental sample was collected
// The geographic point object uses the default units from the DCMI point scheme
// http://dublincore.org/documents/dcmi-point/
// and avoids optional representation in non-standard units.
message GeoLocation {
// a text representation, preferably using standard geographic identification
// elements, of the corresponding latitude,longitude(,altitude)
// This representation serves the purposes to
// - capture standard data entry parameters
// - provide a sanity check for latitude,longitude values
// Example:
// - 34 Washington Blvd, Marina del Rey, CA 90292, United States
// - Str Marasesti 5, 300077 Timisoara, Romania
// - Heidelberg, Deutschland
string label = 1;
// an optional indication of the maximum precision to be derived from the
// latitude,longitude values
// Example:
// Given a street address "Winterthurerstrasse 190, 8057 Zürich, Switzerland",
// a privacy driven (destructive) obfuscation approach could recode this
// to
// "latitude": 47.37, "longitude": 8.54
// while providing
// "precision":"city", "label": "Zürich, Switzerland"
// ... indicating that the original location could correspond to any
// latitude,longitude point value inside the administrative boundaries of
// the city of Zürich, Switzerland
string precision = 2;
// signed decimal degrees (North, relative to Equator)
double latitude = 3;
// signed decimal degrees (East, relative to IERS Reference Meridian)
double longitude = 4;
// optional, e.g. for environmental samples
double altitude = 5;
}
It could, using the plus code, be represented like this for the Zurich example:
{
"geoLocation": {
"plusCode": "8FVC9G00+"
}
}
Which resolves to this area: https://plus.codes/8FVC9G00+
The only snag with this is that if the thing you're trying to fit into a box doesn't fully fit, then what? In this example Zurick straddles approximately four boxes at this scale:
https://plus.codes/8FVCCG00+
https://plus.codes/8FVCCH00+
https://plus.codes/8FVC9G00+
https://plus.codes/8FVC9H00+
but the next box up is too big. Even so, these are computable compared to the existing 'precision' string, which is completely undefined. If the goal is purely approximation and obfuscation, then the pluscode will work better then the existing setup.
<dependency>
<groupId>com.google.openlocationcode</groupId>
<artifactId>openlocationcode</artifactId>
<version>1.0</version>
<type>pom</type>
</dependency>
These would represent somatic variants in a cancer sample, for instance @cmungall @mellybelly @julesjacobsen @mbaudis
IMO for some attributes it is dangerous to have them defined with strong dependency on the specific serialisation.
The typical case here are the time definitions, which refer to the internally handled timestamp:
This does not give any indication to the user of how to I/O time.
Time formats are something where a single, clean definition exists, and which has been well documented in the metadata schema. I am strongly for defining time (poits, spans e.g. ages ...) as string/ISO8601, and leave the deparsing to the I/O handlers.
This is just one example where we easy implementation NE clear definition.
For instance, com.google.guava:guava:19.0 could be updated to 27.1-jre.
We should use the maven versions plugin to bring all of the dependencies up to date.
This element is complicated and we should provide a number of examples of the intended use cases (currently unclear). Also, item 9 is commented out and should either be removed or repaired prior to the official release.
Currently the comment for the sex element in individual states the following
A PATO term for the biological sex of the individual
// http://purl.obolibrary.org/obo/PATO_0000047
OntologyClass sex = 5;
I think that biological sex and phenotypic sex are nearly synonymous, but it is confusing that we state biological sex in the comment and the PATO term is phenotypic sex.
Also, there are situations where one might want to indicate the genetic sex of an individual and this would be possible by using a different ontology term. I have written the following description in the documentation, do we agree on this?
Phenopackets make use of ontology terms to denote the sex of an individual. We recommend using
`PATO terms <https://www.ebi.ac.uk/ols/ontologies/pato/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FPATO_0000047>`_ for this purpose. There are various ways to define sex including phenotypically and genetically, and Phenopackets allows
any ontology term that represents sex in any way to be used. The PATO terms `female <https://www.ebi.ac.uk/ols/ontologies/pato/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FPATO_0000383>`_ and `male <https://www.ebi.ac.uk/ols/ontologies/pato/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FPATO_0000384>`_ represent phenotypic sex. There may be situations when it is more appropriate to use a term such as `male genotypic sex <https://www.ebi.ac.uk/ols/ontologies/pato/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FPATO_0020001>`_.
The rare disease FHIR example seems to use a lot of FHIR code to construct the FHIR message. Can we work on creating the FHIR export function directly from the Phenopacket of the main rare diseas example?
I just cloned the repo and started a ./mvnw clean install
, which leads to two errors:
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 24 source files to /Users/koehlers/git/phenopacket-schema/target/test-classes
[ERROR] error reading /Users/koehlers/.m2/repository/org/apache/commons/commons-lang3/3.6/commons-lang3-3.6.jar; invalid LOC header (bad signature)
The other one is in Tests:
[INFO] Running org.phenopackets.schema.v1.FhirTest
[ERROR] Tests run: 2, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.055 s <<< FAILURE! - in org.phenopackets.schema.v1.FhirTest
[ERROR] org.phenopackets.schema.v1.FhirTest.testCancerExample() Time elapsed: 0.051 s <<< FAILURE!
java.lang.NoClassDefFoundError: org/apache/commons/lang3/StringUtils
at org.hl7.fhir.r4.model.IdType.setValue(IdType.java:420)
at org.hl7.fhir.r4.model.Resource.setId(Resource.java:133)
at org.phenopackets.schema.v1.examples.CancerFhirExample.cancerBundle(CancerFhirExample.java:25)
at org.phenopackets.schema.v1.examples.TestExamples.cancerBundle(TestExamples.java:42)
at org.phenopackets.schema.v1.FhirTest.testCancerExample(FhirTest.java:19)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.lang3.StringUtils
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 18 more
[ERROR] org.phenopackets.schema.v1.FhirTest.testRareDiseaseExample() Time elapsed: 0.001 s <<< FAILURE!
java.lang.NoClassDefFoundError: org/apache/commons/lang3/StringUtils
at org.hl7.fhir.r4.model.IdType.setValue(IdType.java:420)
at org.hl7.fhir.r4.model.Resource.setId(Resource.java:133)
at org.phenopackets.schema.v1.examples.RareDiseaseFhirExample.rareDiseaseBundle(RareDiseaseFhirExample.java:24)
at org.phenopackets.schema.v1.examples.TestExamples.rareDiseaseBundle(TestExamples.java:28)
at org.phenopackets.schema.v1.FhirTest.testRareDiseaseExample(FhirTest.java:25)
I had to comment out the test case to get it to work
Currently we have
// Is the specimen tissue from the primary tumor, a metastasis or a recurrence?
// Most likely a child term of NCIT:C7062 (Neoplasm by Special Category)
// NCIT:C3677 (Benign Neoplasm)
// NCIT:C84509 (Primary Malignant Neoplasm)
// NCIT:C95606 (Second Primary Malignant Neoplasm)
// NCIT:C3261 (Metastatic Neoplasm)
// NCIT:C4813 (Recurrent Malignant Neoplasm)
OntologyClass tumor_progression = 18;
I think the word Placeholder is confusing in the comment for the genes element:
// Placeholder for gene concepts - could be used for either describing candidate genes or causative genes. The
// resources using these fields should define what this represents in their context.
suggested revision
This field defines a list of one or more genes that can be used to denote a collection of
candidate genes or causative genes. The resources using these fields should define what this
represents in their context.
@cmungall we had discussions on that in the context of GO and HPO. We decided that it makes sense to clearly separate positive from negative annotations, because it is otherwise likely that users will just iterate the list of phenotypes and forget to check if the phenotype is actually negated. This is why we have separate annotation files.
It seems to me that this danger exists in the current phenopacket schema as well. https://phenopackets-schema.readthedocs.io/en/latest/phenotype.html#phenotype-element
I.e. users will iterate through the list of phenotypes and would have to check for each phenotype if negated is true. I would favour a solution where users have to iterate the positive and the negative phenotypes separately. Opinions? Is this doable?
What are we trying to convey? Originally Phenotype was defined as a collection of positive and negative OntologyTerms.
// Rename to PhenotypicFeature?
message Phenotype {
string description = 1;
// perhaps we can simplify this and have a single class and a single boolean flag?
repeated OntologyClass types = 2;
repeated OntologyClass negated_types = 3;
// the values of this will come from the HPO onset hierarchy
// i.e. subclasses of HP:0003674
OntologyClass onset = 4;
// subclasses of HP:0012823 ! Clinical modifier
repeated OntologyClass modifiers = 5;
OntologyClass assay = 6;
Evidence evidence = 7;
//Measurement
//Environment
//Frequency? Do we want to be able to encode and HPO annotation?
}
This was used like this in an Individual
message Individual {
string id = 1;
google.protobuf.Timestamp date_of_birth = 2;
Sex sex = 3;
repeated Phenotype phenotypes = 4;
repeated Disease diagnoses = 5;
}
@cmungall what alternative are you proposing?
//perhaps this would be better as a GeneticFeature which would wrap 1-N variants. This would make it easier to group comp-het alleles.
message Gene {
// a CURIE with a prefix that is NCBIGene, HGNC, ENSEMBL, UCSC e.g. HGNC:347, ENSG00000120705, uc003ldc.6
// https://www.genenames.org/cgi-bin/gene_symbol_report?q=data/hgnc_data.php&hgnc_id=HGNC:3477
string id = 1;
// The official gene symbol as designated by the organism gene nomenclature committee e.g. ETF1 or Etf1
string symbol = 2;
// Taxonomy id from the NCBI e.g. NCBI:txid9606
// https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=9606
int32 ncbi_taxon_id = 3;
}
//SPDI format https://www.ncbi.nlm.nih.gov/variation/notation/
message Variant {
//do we want to consider explicitly defining strand and zero/one-based coordinates?
string sequence = 1;
int32 position = 2;
string deletion = 3;
string insertion = 4;
//genotype of this variant using the GENO ontology
OntologyClass genotype = 5;
//do we want to capture variant consequence with SO terms? This would also require the transcript.
}
Becomes...
//perhaps this would be better as a GeneticFeature which would wrap 1-N variants. This would make it easier to group comp-het alleles.
message GeneticFeature {
// a CURIE with a prefix that is NCBIGene, HGNC, ENSEMBL, UCSC e.g. HGNC:347, ENSG00000120705, uc003ldc.6
// https://www.genenames.org/cgi-bin/gene_symbol_report?q=data/hgnc_data.php&hgnc_id=HGNC:3477
string gene_id = 1;
// The official gene symbol as designated by the organism gene nomenclature committee e.g. ETF1 or Etf1
string gene_symbol = 2;
// Taxonomy id from the NCBI e.g. NCBI:txid9606
// https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=9606
int32 ncbi_taxon_id = 3;
repeated Variant variants = 4;
//genotype of this variant using the GENO ontology
OntologyClass genotype = 5;
}
in this case I've moved the genotype into the GeneticFeature. Perhaps better would be to use both the Gene and the Variant messages and simply create a new Genotype message which could be attached to an individual?
message Genotype {
Gene gene = 1;
repeated Variant alleles = 2;
//genotype of this variant using the GENO ontology
OntologyClass genotype = 3;
}
This is the main pinned repo for phenopackets, the README is quite developer centric. Should have links to proto schema, examples and ideally some kind of visual display
We seem to be defining both a list and a map
message AttributeValues {
repeated AttributeValue values = 1;
}
message Attributes {
map<string, AttributeValues>
}
This seems confusing and will make client code more complicated. It seems it might be better to demand a map whose key represents what the attribute is about? Otherwise can we specify when the list should be used and when the map should be used?
Consider adding 'submitted_by' or 'submitter_identifier' as fields to the MetaData message
Check NCIT terms and document
Currently, the comment for this item seems slightly dodgy.
Individuals genetically related to the patient - e.g. mother, father, siblings or members of a cohort.
It is expected that a pedigree be included if this and the patient fields are included.
In general, members of a cohort are not genetically related to one another, and so I would propose the following revision
Individuals related in some way to the patient. For instance, the individuals may
be genetically related or may be members of a cohort. If this field is used, then
it is expected that a pedigree will be included for genetically related individuals
for use cases such as genomic diagnostics. If a phenopacket is being used to
describe one member of a cohort, then in general one phenopacket will be
created for each of the individuals in the cohort. In general, applications that
process such files are expected to perform Q/C to ensure data consistency,
but this is not and cannot be enforced on the level of the protobuf message itself.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.