Comments (7)
@joverlee521 Maybe we can work on this together? It seems like a good opportunity for me to learn more about fauna's internal workings...
from fauna.
Yup, I would want to keep the human-specific parsing in each respective upload script because I'm expecting each CC to provide them in different formats...If there's any parsing logic that can be shared then we can refactor into a new function.
from fauna.
Here's the current parsing of the serum passage category for CDC titers:
- The original
sr_passage
column in the CDC TSV is mapped toserum_antigen_passage
. - Within tdb/cdc_upload, the
serum_antigen_passage
column is used to inferserum_passage_category
. - The
format_passage
method is inherited from vdb/flu_upload, which uses a series of regexes to parse the passage category.
We can special case the human pool titers and use the lot_number
to format the serum_passage_category
. (lot_number
is the column that contains the names like 21/22 H3-EGG HUMAN POOL
since the serum_id
formatting happens after the serum passage formatting)
from fauna.
Thank you for laying out the steps so clearly, @joverlee521! Special casing the human pool titers sounds reasonable. Would that logic live in the format_passage
function?
from fauna.
Special casing the human pool titers sounds reasonable. Would that logic live in the format_passage function?
Hmm, I'm a little hesitant to make format_passage
any more complicated 😅
Maybe we can just keep all the human pool specific logic in one place within tdb/cdc_upload
:
diff --git a/tdb/cdc_upload.py b/tdb/cdc_upload.py
index 3a007c2..7aa6b3d 100644
--- a/tdb/cdc_upload.py
+++ b/tdb/cdc_upload.py
@@ -72,6 +72,7 @@ class cdc_upload(upload):
self.test_virus_strains.add(meas['virus_strain'])
if "Human" in meas['serum_id']:
meas['serum_host'] = 'human'
+ self.format_passage(meas, 'serum_id', 'serum_passage_category')
self.rethink_io.check_optional_attributes(meas, self.optional_fields)
self.remove_fields(meas)
if len(self.new_different_date_format) > 0:
from fauna.
I know what you mean! That function is among the hairier I've seen in this repo. If we start getting human data from other CCs, though, would you want to encode the human-specific parsing in each respective upload script? Or just refactor any shared parsing logic into a new function when we need to?
from fauna.
Sounds good to me!
from fauna.
Related Issues (20)
- Suggest using direct clinical sample sequence for MEX_CIENI551 Zika genome
- Annotate titer TSVs with source and passage
- fauna uploads fail in python 3 unicode error HOT 1
- argument parser in upload.py HOT 3
- Migrate to pandas 0.17 HOT 6
- Fauna installation fails for some users who don't run `npm install` inside of `/chateau` HOT 3
- fauna doesn't work with rethinkdb 2.4 HOT 3
- Geographic error? HOT 2
- Switch out `xlrd` HOT 1
- fauna downloads fail with Python 3.10
- PhantomJS not found on PATH - installation via npm install HOT 2
- Set `serum_id` to `lot_number` for CDC titer imports HOT 4
- feat: BV-BRC support HOT 1
- Assign correct host to titers from non-ferret hosts (e.g., human and mouse)
- Geolocation assignments fail for duplicate location names HOT 2
- Replace nextstrain remote with aws commands
- Automate backup of Fauna databases to S3 HOT 4
- Support ingest of individual-level human serology data for seasonal flu viruses
- Revisit tdb/upload's `index_fields` HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fauna.