Code Monkey home page Code Monkey logo

Comments (7)

huddlej avatar huddlej commented on June 21, 2024 1

@joverlee521 Maybe we can work on this together? It seems like a good opportunity for me to learn more about fauna's internal workings...

from fauna.

joverlee521 avatar joverlee521 commented on June 21, 2024 1

Yup, I would want to keep the human-specific parsing in each respective upload script because I'm expecting each CC to provide them in different formats...If there's any parsing logic that can be shared then we can refactor into a new function.

from fauna.

joverlee521 avatar joverlee521 commented on June 21, 2024

Here's the current parsing of the serum passage category for CDC titers:

  1. The original sr_passage column in the CDC TSV is mapped to serum_antigen_passage.
  2. Within tdb/cdc_upload, the serum_antigen_passage column is used to infer serum_passage_category.
  3. The format_passage method is inherited from vdb/flu_upload, which uses a series of regexes to parse the passage category.

We can special case the human pool titers and use the lot_number to format the serum_passage_category. (lot_number is the column that contains the names like 21/22 H3-EGG HUMAN POOL since the serum_id formatting happens after the serum passage formatting)

from fauna.

huddlej avatar huddlej commented on June 21, 2024

Thank you for laying out the steps so clearly, @joverlee521! Special casing the human pool titers sounds reasonable. Would that logic live in the format_passage function?

from fauna.

joverlee521 avatar joverlee521 commented on June 21, 2024

Special casing the human pool titers sounds reasonable. Would that logic live in the format_passage function?

Hmm, I'm a little hesitant to make format_passage any more complicated 😅
Maybe we can just keep all the human pool specific logic in one place within tdb/cdc_upload:

diff --git a/tdb/cdc_upload.py b/tdb/cdc_upload.py
index 3a007c2..7aa6b3d 100644
--- a/tdb/cdc_upload.py
+++ b/tdb/cdc_upload.py
@@ -72,6 +72,7 @@ class cdc_upload(upload):
                 self.test_virus_strains.add(meas['virus_strain'])
             if "Human" in meas['serum_id']:
                 meas['serum_host'] = 'human'
+                self.format_passage(meas, 'serum_id', 'serum_passage_category')
             self.rethink_io.check_optional_attributes(meas, self.optional_fields)
             self.remove_fields(meas)
         if len(self.new_different_date_format) > 0:

from fauna.

huddlej avatar huddlej commented on June 21, 2024

I know what you mean! That function is among the hairier I've seen in this repo. If we start getting human data from other CCs, though, would you want to encode the human-specific parsing in each respective upload script? Or just refactor any shared parsing logic into a new function when we need to?

from fauna.

huddlej avatar huddlej commented on June 21, 2024

Sounds good to me!

from fauna.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.