Comments (11)
I think there is a bug with updating the timestamp
field because it should just be the last time the document was updated or inserted. I'll open up a separate issue for that.
from fauna.
So the inclusion_date
is just the very first time the document is added to the database. This would be formatted like collection_date
as YYYY-MM-DD
.
from fauna.
Yes. Exactly. inclusion_date
is first time a document appears and is formatted just like collection_date
.
from fauna.
Another thing to consider, each virus document and sequence document will have an inclusion_date
. When downloading we merge together the virus document and sequence document with this command.
command = r.table(sequence_table).merge(lambda sequence: r.table(virus_table).get(sequence[index]))
If the virus and sequence document have different inclusion_date
value then rethinkdbs merge
command defaults to the rightmost document in the merge command which would keep the virus inclusion_date
. I think it makes more sense to keep the sequence inclusion_date
since there might be multiple sequences per virus but this would require some work on the download side to adjust the merge command.
from fauna.
Hmm.... I think I'm okay with attaching inclusion_date
to virus when downloading FASTAs. As a concrete example, we'll often want to know something like when did A/HongKong/4801/2014
first appear in the database. More sequences can appear later, but that's not the main interest.
I do like having an inclusion_date
for each sequence and an inclusion_date
for each virus in the table. This just becomes a question of how to the merge when downloading.
from fauna.
Okay so say:
A/HongKong/4801/2014
has an inclusion_date
2014-04-01
The first sequence uploaded with it, EPI1
also has an inclusion_date
2014-04-01
A second sequence uploaded, EPI2
has a later inclusion_date
2014-08-31
Right now the command above would download EPI1
and EPI2
and they would both keep A/HongKong/4801/2014
's inclusion_date
2014-04-01
.
But this seems to be okay because we care more about when the virus is first uploaded. Both the sequence and virus will have inclusion_date
field though.
from fauna.
Hmm.... I see. Thinking more, what if we had virus_inclusion_date
and sequence_inclusion_date
fields. The merge could include one or both in the resulting FASTA. Seems a bit cleaner perhaps. What do you think?
from fauna.
I like that! They'd both be left after the merge and can be downloaded to the resulting fasta if needed.
from fauna.
Exactly. I like it.
from fauna.
I believe this works now. I also added the fields to current documents in vdb
and tdb
defaulted to 2016-09-03
. Also reminder that the inclusion_date
and timestamp
fields are based off utc time.
from fauna.
Fantastic! Thanks so much for making this happen @chacalle.
from fauna.
Related Issues (20)
- Suggest using direct clinical sample sequence for MEX_CIENI551 Zika genome
- Annotate titer TSVs with source and passage
- fauna uploads fail in python 3 unicode error HOT 1
- argument parser in upload.py HOT 3
- Migrate to pandas 0.17 HOT 6
- Fauna installation fails for some users who don't run `npm install` inside of `/chateau` HOT 3
- fauna doesn't work with rethinkdb 2.4 HOT 3
- Geographic error? HOT 2
- Switch out `xlrd` HOT 1
- fauna downloads fail with Python 3.10
- PhantomJS not found on PATH - installation via npm install HOT 2
- Set `serum_id` to `lot_number` for CDC titer imports HOT 4
- feat: BV-BRC support HOT 1
- serum_passage_category should be set to "egg" instead of "cell" for CDC human pool data like "L21/22 H3-EGG HUMAN POOL" HOT 7
- Assign correct host to titers from non-ferret hosts (e.g., human and mouse)
- Geolocation assignments fail for duplicate location names HOT 2
- Replace nextstrain remote with aws commands
- Automate backup of Fauna databases to S3 HOT 4
- Support ingest of individual-level human serology data for seasonal flu viruses
- Revisit tdb/upload's `index_fields` HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fauna.