cytomining / profiling-template Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 1.0 43 KB

Image-based Profiling Template

License: BSD 3-Clause "New" or "Revised" License

profiling-template's People

Watchers

profiling-template's Issues

Add experiment.yml config

This file should be abstracted away from the config.yaml. Currently, the config.yaml contains experimental data and pipeline info.

The experiment.yml file should specify the following:

batch
plate
process (true/false)
pipeline (this is new! - with a split of pipeline and data, we should have the ability to process different plates with different (and/or multiple) pipelines

frame.append method deprecation

When generating profiles, this warning comes up a bunch

/home/ubuntu/work/projects/{PROJECT}/workspace/software/{PROJECT}/profiling-recipe/profiles/profile.py:636: FutureWarning:

The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.

Revisit the instructions to fork profiling-recipe

A major shortcoming of the current approach is forking profiling-recipe means that you are limited to one fork (for a given GitHub account). Consider this scenario:

I am user-A and I create a repo user-A/dataset-1 using cytomining/profiling-template
I then fork cytomining/profiling-recipe into user-A/profiling-recipe and weld it to user-A/dataset-1, following the instructions in README
I later want to create a new repo user-A/dataset-2 using cytomining/profiling-template
I have to now used the same fork user-A/profiling-recipe and weld it to user-A/dataset-2

So user-A/dataset-1 and user-A/dataset-2 have to use the same fork user-A/profiling-recipe; there is no way around this.

One could create a branch per dataset, i.e.,

user-A/profiling-recipe has a branch dataset-1 which is welded to user-A/dataset-1
user-A/profiling-recipe has a branch dataset-2 which is welded to user-A/dataset-2

and so on. But this is getting messy!

Did you already ponder this @gwaygenomics ?

Add instructions for updating recipe submodule

Need to run submodule update after git pulls so that recipe is updated correctly

git pull

git submodule update --init --recursive

Config file migration decisions

Currently, we use a single config yaml file called config.yml.

We should consider splitting this file into three different config yaml files

File	Contents	Notes
`pipeline.yml`	Modular block design of pipeline steps	Depends on cytomining/profiling-recipe#11 and cytomining/profiling-recipe#12
`experiment.yml`	batch and plate info, decision on whether or not to process	This is the primary reason for the config split. For large experiments, the plate and batch specifications can get huge!
`advanced.yml`	Anything outside of the block design and plate info	e.g. #8

Abstract hardcoded Assay_Plate_Barcode option to config

In https://github.com/jump-cellpainting/profiling-recipe/pull/14/files#diff-b9d876056b6394f85313599f3b2cf8500f517537c9a8f519afb74c6b13712f0aR47 (note, private repo) the variable Assay_Plate_Barcode is hardcoded.

We should add this variable to an advanced.yaml config file to avoid the scenario in which this column name changes, breaking the pipeline.

Figure out attribution

I created this repo by copying nearly verbatim from https://github.com/broadinstitute/pooled-cell-painting-profiling-template. We should figure out how to attribute this work to the author (@gwaygenomics) and related discussions (broadinstitute/pooled-cell-painting-profiling-recipe#3)

Add instructions for using DVC for versioning data

@gwaygenomics said this broadinstitute/lincs-cell-painting#60 (comment)

We might at some point also consider moving from gitLFS to dvc. It was super easy to get setup, and plays very nicely with AWS. I did this in the grit-benchmark repo (in broadinstitute/grit-benchmark#28)

The file pointer is in a readable format (YAML file)

outs:
- md5: c53856c1596f00a67a636389716d8219
  size: 26948901
  path: cellhealth_single_cell_umap_embeddings_SQ00014610_chr2.tsv.gz

Steps

Read the docs https://dvc.org/doc/start
Create a destination prefix (a "folder") on S3, which will be the remote storage location for dvc.
Add the dvc and dvcs3 dependencies
Update your .gitignore to ignore the files you used to previously track using GitLFS
Follow steps here https://dvc.org/doc/start and here https://dvc.org/doc/start/data-versioning

Move JUMP pilot data documentation to cytomining/profiling-template

A lot of valuable documentation and notes have been taken in https://github.com/jump-cellpainting/pilot-data (private repo).

We should port over the common documentation from the JUMP repo to this upstream branch. It is likely that some of the documentation added in the JUMP repo is too specific and/or proprietary. Only the common documentation should be ported.

Also note that much of the instructions in pilot-data will be deprecated once the single cell aggregation options are improved via cytomining/pycytominer#111 and cytomining/pycytominer#112

Add a process true/false config option to individual plates

This flag is currently only available at the batch level

plotly and kaleido not available in conda environment

I'm not sure why this happens, but when I generate profiles on a new machine, I've noticed that despite the profiling environment being active and despite plotly and kaleido clearly appearing in conda list, I will get errors about plotly and kaleido modules not being found. The solution is to pip install them rather than conda install, but this is very strange that they wouldn't be available in the conda environment. Not sure what the solution is, but flagging this for future people in case anyone else sees this error.

cytomining / profiling-template Goto Github PK

profiling-template's People

Watchers

profiling-template's Issues

Add experiment.yml config

frame.append method deprecation

Revisit the instructions to fork profiling-recipe

Create a citation file

Add instructions for updating recipe submodule

Config file migration decisions

Abstract hardcoded Assay_Plate_Barcode option to config

Figure out attribution

Add instructions for using DVC for versioning data

Steps

Move JUMP pilot data documentation to cytomining/profiling-template

Add a process true/false config option to individual plates

plotly and kaleido not available in conda environment

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent