Code Monkey home page Code Monkey logo

Comments (10)

chacalle avatar chacalle commented on May 28, 2024

In the tsv_header branch parse_tsv_file uses the header to define the document fields. I couldn't find a good way to ensure that the file actually has a header though.

from fauna.

trvrb avatar trvrb commented on May 28, 2024

Looks good. Thanks @chacalle.

from fauna.

trvrb avatar trvrb commented on May 28, 2024

@chacalle ---

I just noticed that if you now upload a tsv with header fields a, b, c against a table where documents have fields a, b, c, d, e. Then the resulting table lacks fields d and e in the newly uploaded documents. These should have fields d and e but be null instead.

from fauna.

chacalle avatar chacalle commented on May 28, 2024

Yes, either all the fields need to be defined in the code like here and then use this to assign those fields to null if they're not in the document.

But now that we don't filter the documents based on fields besides strain we may not need to define all the fields for each table. Instead could look through all the documents already in the database, make a set of fields that are present, then make sure the documents to be inserted have those fields or assign them to null.

from fauna.

trvrb avatar trvrb commented on May 28, 2024

I think I'm changing my mind here (sorry, sorry). Let's have an easy way to define a schema for a table. For example:

https://github.com/blab/nextstrain-db/blob/master/vdb/zibra_download.py#L14

An individual tsv may have some subset of these fields in its header in whatever order.

from fauna.

chacalle avatar chacalle commented on May 28, 2024

I'm not sure I understand this.

from fauna.

chacalle avatar chacalle commented on May 28, 2024

Thinking about this more, one of the strengths of rethinkdb/noSQL databases is that the documents inserted can be flexible in their schema. It doesn't really matter if a document is missing a field, in chateau its displayed as undefined. We'd have to write a simple filter function for download.py but this would allow more flexibility when adding new documents to the database with new fields like you're doing for zibra. This also goes along with making filter in upload.py more flexible.

from fauna.

trvrb avatar trvrb commented on May 28, 2024

Hmm... what's happening is that the schema is rapidly evolving. So, for example, I just added a field for minion_barcode. Doing a vdb/upload with a set of new document with this field resulted in only the new documents having this field. We need to either:

  1. Allow missing fields within documents and have vdb handle this situation gracefully. Chateau seems to work pretty okay.
  2. When adding a new field, update previous documents to have a value of null for this new field.

This is a separate issue from enforcing a schema. At the moment, I like the idea of enforcing a schema, like what's accomplished here:

https://github.com/blab/nextstrain-db/blob/master/vdb/zibra_metadata_upload.py#L16
https://github.com/blab/nextstrain-db/blob/master/vdb/zibra_download.py#L14

This makes it so that I can run zibra_download to get the entire table and zibra_metadata_upload to repair the entire table. This has been very useful.

So, I might suggest having an enforced schema for a table, but make it easy to adjust this schema on the fly (adding and removing fields as appropriate).

from fauna.

chacalle avatar chacalle commented on May 28, 2024

0fc0ed1 filters even with undefined fields. So documents without a certain field get filtered out.

from fauna.

trvrb avatar trvrb commented on May 28, 2024

Closing this as schema has moved on.

from fauna.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.