cytomining / profiling-template Goto Github PK
View Code? Open in Web Editor NEWImage-based Profiling Template
License: BSD 3-Clause "New" or "Revised" License
Image-based Profiling Template
License: BSD 3-Clause "New" or "Revised" License
This file should be abstracted away from the config.yaml
. Currently, the config.yaml
contains experimental data and pipeline info.
The experiment.yml file should specify the following:
When generating profiles, this warning comes up a bunch
/home/ubuntu/work/projects/{PROJECT}/workspace/software/{PROJECT}/profiling-recipe/profiles/profile.py:636: FutureWarning:
The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
A major shortcoming of the current approach is forking profiling-recipe
means that you are limited to one fork (for a given GitHub account). Consider this scenario:
user-A
and I create a repo user-A/dataset-1
using cytomining/profiling-template
cytomining/profiling-recipe
into user-A/profiling-recipe
and weld it to user-A/dataset-1
, following the instructions in READMEuser-A/dataset-2
using cytomining/profiling-template
user-A/profiling-recipe
and weld it to user-A/dataset-2
So user-A/dataset-1
and user-A/dataset-2
have to use the same fork user-A/profiling-recipe
; there is no way around this.
One could create a branch per dataset, i.e.,
user-A/profiling-recipe
has a branch dataset-1
which is welded to user-A/dataset-1
user-A/profiling-recipe
has a branch dataset-2
which is welded to user-A/dataset-2
and so on. But this is getting messy!
Did you already ponder this @gwaygenomics ?
Need to run submodule update
after git pulls so that recipe is updated correctly
git pull
git submodule update --init --recursive
Currently, we use a single config yaml file called config.yml
.
We should consider splitting this file into three different config yaml files
File | Contents | Notes |
---|---|---|
pipeline.yml |
Modular block design of pipeline steps | Depends on cytomining/profiling-recipe#11 and cytomining/profiling-recipe#12 |
experiment.yml |
batch and plate info, decision on whether or not to process | This is the primary reason for the config split. For large experiments, the plate and batch specifications can get huge! |
advanced.yml |
Anything outside of the block design and plate info | e.g. #8 |
In https://github.com/jump-cellpainting/profiling-recipe/pull/14/files#diff-b9d876056b6394f85313599f3b2cf8500f517537c9a8f519afb74c6b13712f0aR47 (note, private repo) the variable Assay_Plate_Barcode
is hardcoded.
We should add this variable to an advanced.yaml
config file to avoid the scenario in which this column name changes, breaking the pipeline.
I created this repo by copying nearly verbatim from https://github.com/broadinstitute/pooled-cell-painting-profiling-template. We should figure out how to attribute this work to the author (@gwaygenomics) and related discussions (broadinstitute/pooled-cell-painting-profiling-recipe#3)
@gwaygenomics said this broadinstitute/lincs-cell-painting#60 (comment)
We might at some point also consider moving from gitLFS to dvc. It was super easy to get setup, and plays very nicely with AWS. I did this in the grit-benchmark repo (in broadinstitute/grit-benchmark#28)
The file pointer is in a readable format (YAML file)
outs:
- md5: c53856c1596f00a67a636389716d8219
size: 26948901
path: cellhealth_single_cell_umap_embeddings_SQ00014610_chr2.tsv.gz
dvc
and dvcs3
dependencies.gitignore
to ignore the files you used to previously track using GitLFSA lot of valuable documentation and notes have been taken in https://github.com/jump-cellpainting/pilot-data (private repo).
We should port over the common documentation from the JUMP repo to this upstream branch. It is likely that some of the documentation added in the JUMP repo is too specific and/or proprietary. Only the common documentation should be ported.
Also note that much of the instructions in pilot-data will be deprecated once the single cell aggregation options are improved via cytomining/pycytominer#111 and cytomining/pycytominer#112
This flag is currently only available at the batch level
I'm not sure why this happens, but when I generate profiles on a new machine, I've noticed that despite the profiling environment being active and despite plotly and kaleido clearly appearing in conda list
, I will get errors about plotly and kaleido modules not being found. The solution is to pip install them rather than conda install, but this is very strange that they wouldn't be available in the conda environment. Not sure what the solution is, but flagging this for future people in case anyone else sees this error.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.