Possible approaches for a v2 curation pipeline architecture. The sample pipeline implemented here is meant to exemplify typical work done by the curation code.
Input: BigQuery OMOP dataset, a few domain tables from 2 sites.
Pipeline stages:
- Merge multiple tables together
- Move data from person table elsewhere.
- Row-level table transform
- Retract participants by ID.
- Group-by-participant transforms, e.g. remove duplicate measurements, observations
- Generate derived tables, e.g. mapping Output: BigQuery OMOP CDR dataset.
./load-test-tables.sh [path/to/curation/repo/root]