Code Monkey home page Code Monkey logo

ggbn's People

Contributors

stanblum avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

ggbn's Issues

Some questions about what ratification means for this standard

My questions about the proposed GGBN standard are related to how it will operate as a TDWG Standard. If I am understanding the situation correctly, the GGBN Data Standard is a standard that was created by an organization outside of TDWG (the Global Genome Biodiversity Network), and that standard is now being recommended for adoption by TDWG. This approach is not without precedent in TDWG. A large number of TDWG Prior standards were developed by non-TDWG groups, then adopted by TDWG as a standard. However, there are no examples of that sort among the TDWG "Current standards", particularly among the vocabulary standards.

So that leads me to wonder about some issues related to maintenance of this standard. The existing precedent for Current Standards that are vocabularies (Darwin and Audubon Cores) involves the minting of some terms whose namespaces fall within in the http://rs.tdwg.org/ subdomain. The terms in those namespaces "belong" to TDWG and the responsibility for maintenance of those terms falls to the vocabulary maintenance group chartered to maintain that particular vocabulary. There is a well-defined process, found in the TDWG Vocabulary Maintenance Specification (VMS), that lays out the criteria for making decisions about changes to existing terms and adding new terms in TDWG namespaces.

Both Darwin and Audubon Cores "borrow" terms from other vocabularies whose IRIs fall within namespaces outside of the http://rs.tdwg.org/subdomain. In the case of Darwin Core, the small number of borrowed terms are from Dublin Core, and those terms are exceptionally stable. Audubon Core borrows a large number of terms from numerous other namespaces and exact procedure for handling changes in these borrowed terms as they change is not yet entirely clear to the Audubon Core Maintenance Group. Many of these borrowed terms are not managed by a standards organization, so changes in those terms can happen without warning and without public input. Should Audubon Core automatically update the versions of those terms whenever they change? The current policy (decision 2.1) of the Maintenance Group is that those versions NOT be updated automatically unless the Maintenance Group has assessed the likely effect of such changes on the stability and interoperability of Audubon Core. This has not yet been done for any of the existing borrowed terms, so they are effectively "frozen" at the versions which were initially borrowed. The situation is further complicated by the fact that term versioning is not clear in some of the vocabularies that serve as sources for the borrowed terms.

So my question is how this standard will be maintained if it were to be adopted. The terms in the proposed standard are in the ggbn: namespace (http://data.ggbn.org/schemas/ggbn/terms/), which is not within the TDWG-controlled http://rs.tdwg.org/ subdomain. So for all practical purposes, the entire GGBN vocabulary is composed of "borrowed" terms. Presumably the maintenance of these terms falls to GGBN and not TDWG. If GGBN decides to change or add terms, what will be the effect on the actual TDWG standard? The reason for the adoption of the TDWG VMS was to eliminate the ambiguity surrounding the process of vocabulary changes and additions. If we simply say that TDWG is "ratifying" the GGBN standard, what does that mean about the change process for that vocabulary? My impression that the GGBN vocabulary was developed by a community-based process. Is it GGBN's intention that this will continue, or is GGBN turning over the control of the vocabulary to TDWG? Will there be a GGBN Maintenance Group within TDWG that will serve as the gatekeeper for the changes to the GGBN standard that are actually "accepted" by TDWG? What happens if the GGBN community decides to change some terms, but the TDWG GGBN Maintenance Group rejects those changes based on the criteria set out in the VMS?

I am not saying that the GGBN Data Standard should not be adopted as a TDWG standard. What I am saying is that it would be a really bad idea to adopt it without clear answers to these questions of vocabulary maintenance. These questions need to be addressed transparently and publicly before this proposal moves further through the ratification process.

I'm also concerned about the level of transparency in the review process itself. The TDWG Process which are official by-laws of the organization, designate that a Review Manager be appointed to manage the review process. The text of the process document do not specify the exact role that the Review Manager plays in the public review, but the flow chart document indicates that the Review Manager is responsible for managing the public review. In all recent standards ratifications, the person who has been the review manager has had a very public-facing role in managing the ratification process and has served independently from the Executive Committee in managing the review and making determinations about whether the public comments have been addressed by the Task Group proposing the draft. Who is the review manager for this ratification? For that matter, who are the members of the Task Group proposing the draft? The process specified in the Process document aren't suggestions, they are the actual by-laws of TDWG and it isn't clear to me that they are being followed here.

It is also the review manager's responsibility to make sure that the proposed standards documents conform to the TDWG Standards Documentation Specification (SDS) before they are submitted for ratification. The proposal should clearly designate specific documents that are to be included in the standard and each of those documents should be examined to make sure that they conform to the SDS. Based on the public review announcement, the "actual specification" is on the TDWG Terms Wiki, but the link I found was to the ggbn wiki. The description on the landing page says it describes "Draft Standard version 1" released on October 2016 but that a new stable version of the standard will be released at the end of 2019. What version are we actually adopting here?

The document at https://wiki.ggbn.org/ggbn/GGBN_Data_Standard_v1 does not conform to the SDS as required. In particular, there is no indication of the Type of the terms (Class, Property, or Concept). Some terms in the vocabulary seem to be properties, while others appear to be controlled vocabulary terms (concepts). The authors need to go through this document and check for conformance to the SDS.

There is another linked document at https://wiki.ggbn.org/ggbn/Mandatory_and_recommended_fields_for_sharing_data_with_GGBN . Is this intended to be included as part of the standard? If so, it should have headers indicating this. There is also no indication of what parts of the document are normative, non-normative, etc. Again, if this document is intended to be part of the standard, it needs to conform to the SDS.

To summarize:

  1. The responsible parties (composition of the task group and the name of the review manager) need to be made public.
  2. This proposal needs to state clearly what documents are included in the standards proposal.
  3. Each document designated as part of the proposed standard need to conform to the SDS.
  4. There needs to be clarity about how this vocabulary will be maintained in a manner consistent with TDWG policy as laid out in the VMS.

Mapping of terms: MIxS, GGBN, DwC

Hi,

Thanks for the great work!
May I know if there is a page that shows how the terms from MIxS are mapped to GGBN and Darwin Core?
Thank you!

Class:GGBN Permit Vocabulary

Definition These terms has been developed for specific GGBN purposes but can be used in broader context too. The permit vocabulary terms are actually specific to the legal aspects of sample acquisition, loaning, and use.
Repeatable No

Class:GGBN Material Sample Vocabulary

Definition These terms has been developed for specific GGBN purposes but can be used in other molecular context. The material sample vocabulary terms are specific to basic lab facts about a physical DNA or tissue sample. It also contains terms from MIxS.
Repeatable No

Class:GGBN Preparation Vocabulary

Definition These terms has been developed for specific GGBN purposes but can be used in broader context. The preparation vocabulary terms are specific to the aspects of specimen or tissue sample preparation or DNA extraction (handled as a preparation).
Repeatable No

Property:mixs:finishing_strategy

Label finishing strategy
Term mixs:finishing_strategy
IRI http://gensc.org/ns/mixs/finishing_strategy
Definition Was the genome project intended to produce a complete or draft genome, Coverage, the fold coverage of the sequencing expressed as 2x, 3x, 18x etc, and how many contigs were produced for the genome
Format text
Required No
Repeatable Yes

Property:mixs:chimera_check

Label chimera check
Term mixs:chimera_check
IRI http://gensc.org/ns/mixs/chimera_check
Definition A chimeric sequence, or chimera for short, is a sequence comprised of two or more phylogenetically distinct parent sequences. Chimeras are usually PCR artifacts thought to occur when a prematurely terminated amplicon reanneals to a foreign DNA strand and is copied to completion in the following PCR cycles. The point at which the chimeric sequence changes from one parent to the next is called the breakpoint or conversion point
Format text
Required No
Repeatable No

Property:mixs:annot_source

Label adapters
Term mixs:adapters
IRI http://gensc.org/ns/mixs/adapters
Definition Adapters provide priming sequences for both amplification and sequencing of the sample-library fragments. Both adapters should be reported; in uppercase letters
Format text
Required No
Repeatable Yes

Property:mixs:sop

Label sequence quality check
Term mixs:relevant standard operating procedures
IRI http://gensc.org/ns/mixs/sop
Definition Standard operating procedures used in assembly and/or annotation of genomes, metagenomes or environmental sequences
Format text
Required No
Repeatable Yes

Property:ggbn:purificationMethod

Label Purification Method (DNA or Amplification)
Term ggbn:purificationMethod
IRI http://data.ggbn.org/schemas/ggbn/terms/purificationMethod
Definition Method or protocol used for secundary purification of already extracted genomic DNA or of PCR product
Format text
Required No
Repeatable Yes
Examples QIAamp DNA Mini Kit
Usage no controlled vocabulary; this element should be used both for DNA and Amplification product; repeatable only within GGBN Amplification Vocabulary

Class:GGBN Single Read Vocabulary

Definition These terms has been developed for specific GGBN purposes but can be used in broader context. The single read vocabulary terms are specific to the aspects of a single read, including chromatograms.
Repeatable No

Class:GGBN Amplification Vocabulary

Definition These terms have been developed for specific GGBN purposes but can be used in any other molecular context. The amplification vocabulary terms are specific to the aspects of amplification, sequencing, and genetic accession numbers. It also contains terms from MIxS. It does not cover single read information (see GGBN Single Read Vocabulary).
Repeatable Yes

Property:mixs:assembly

Label assembly
Term mixs:assembly
IRI http://gensc.org/ns/mixs/assembly
Definition How was the assembly done (e.g. with a text based assembler like phrap or a flowgram assembler); estimated error rate associated with the finished sequences (e.g. error rate of 1 in 1000 bp); and the method of calculation
Format text
Required No
Repeatable No

Class:GGBN Loan Vocabulary

Definition These terms have been developed for specific GGBN purposes but can be used in a broader context. The loan vocabulary terms are specific to the aspects of loaning information on specimens, tissue or DNA samples.
Repeatable Yes
Notes From the perspective of the owner of materials, loans can be characterized as incoming (also can be called “borrows”) or outgoing. The party providing access is thus the lender, and the party receiving access is the borrower. Lenders send outgoing loans, borrowers receive incoming loans.

Property:ggbn:primerName

Label Primer Name
Term ggbn:primerName
IRI http://data.ggbn.org/schemas/ggbn/terms/primerName
Definition Name of primer used for this amplification, cloning, or single read
Format text
Required No
Repeatable Yes
Usage can be used for ABCD implementation; for Darwin Core-Archive implementation please use ggbn:primerNameForward or ggbn:primerNameReverse

Class:GGBN Gel Image Vocabulary

Definition These terms have been developed for specific GGBN purposes but can be used in other molecular context. The gel image vocabulary terms are specific to gel image facts only.
Repeatable No

Class:GGBN DNA Cloning Vocabulary

Definition These terms have been developed for specific GGBN purposes but can be used in any other molecular context. The DNA cloning vocabulary terms are specific to the aspects of DNA cloning. It also contains terms from MIxS.
Repeatable Yes

Property:mixs:mid

Label multiplex identifiers
Term mixs:mid
IRI http://gensc.org/ns/mixs/mid
Definition Molecular barcodes, called Multiplex Identifiers (MIDs), that are used to specifically tag unique samples in a sequencing run. Sequence should be reported in uppercase letters
Format text
Required No
Repeatable Yes

Good work

I have reviewed the GGBN standard, and I don't find any issues that prevent it from being ratified by TDWG. I think GGBN makes a fine addition to the TDWG and GSC standards collections, especially in how that it bridges those two organizations and others.

One suggestion, which does not directly impact the standard vocabularies, is to move a way from the use of "tissue sample". Most tissue samples (especially outside humans) are mixtures of many tissues, either an organ (like a leaf) or part of an organ (like a tail or toe clip). For example, in GSC's Plant specimen contextual data consensus (https://doi.org/10.1093/gigascience/giw002), we replaced "tissue type" with "plant structure" so that researchers could be more specific in describing the type of specimen that came from an organisms.

Class:GGBN Preservation Vocabulary

Definition These terms has been developed for specific GGBN purposes but can be used in broader context. The preservation vocabulary terms are specific to the aspects of sample preservation in a physical collection. This is to cover change of preservation during time.
Repeatable Yes

Property:mixs:seq_quality_check

Label sequence quality check
Term mixs:seq_quality_check
IRI http://gensc.org/ns/mixs/seq_quality_check
Definition Indicate if the sequence has been called by automatic systems (none) or undergone a manual editing procedure (e.g. by inspecting the raw data or chromatograms). Applied only for sequences that are not submitted to SRA or DRA e.g. none or manually edited
Format text
Required No
Repeatable Yes

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.