Code Monkey home page Code Monkey logo

dcc-metadata's People

Contributors

aksh77 avatar alexey-ebi avatar davidrichardson avatar elowy01 avatar lauraclarke avatar mil-m avatar peterwharrison avatar wizardfan avatar yroochun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dcc-metadata's Issues

FAANG metadata suggestions

@davidrichardson @elowy01

These are some issues I picked up on when reading the final version of the documents, they are stylistic rather than content but it would be nice to see them fixed before we push out the final PDFs on the website and to everyone in FAANG. I have also made some phrasing changes directly in the documents. This possibly gets a bit rambly but I am happy to talk though what I put here when in the office.

These are based on my reading of the FAANG metadata documents found in

https://github.com/FAANG/faang-metadata/blob/master/docs/

In all docs

We use a very readable description for attributes in the form

  • attribute name (data type) a brief description

Not all fields follow this form. I think many could, especially when the field is biosamples id

e.g from the experiment metadata

  • sample_id (biosample_id) the biosample id of the sample the experiement was run on

Also in the list of attributes, sometimes things are in back ticks and sometimes not, this alters the formatting in the PDF in an inconsistent way, it would be good to make this consistent

In the sample doc

Do we want a controlled list of species codes for the field, something like

Cow BTA
Pig SSC
Sheep OAR
Chicken GGA
Horse ECA
Goat CHR

In the experiment doc

The indent of bullets in the pdf seems a bit skew, I have tried to improve it but I am not sure if I have.

Is it reasonable to require the RNA purity info?

For the hi-c experiements, is there anyway to make these ontology terms or controlled vocabulary?

When archiving experimental protocols, do we want to talk to Biostudies about archiving the protocols there rather than just hosting them on the DCC ftp site, might give us more power for versioning protocols?

In the analysis doc

Do we have a destination for this metadata yet?

This document seems to have no statement about what is or isn't required? or if it does, it seems much less obvious than the sample or experiment document.

In this list

  1. Input data - a list of files used as input and references to the experiment records in a data archive
  2. Reference data - genome assembly, gene set, etc
  3. Analysis protocol

You mention references in the input data description and reference data at point 2. what are the references in point 1 and how are they different from the references in point 2.

Why are we asking for explict statement of percentage reads mapped if we get the total reads and the mapped read numbers, surely this value can be implied

This file lacks data types or submission info? there should be something here, even it is just a statement that we will collect this information in the first instance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.