Code Monkey home page Code Monkey logo

profiling-template's People

Watchers

 avatar  avatar  avatar  avatar

profiling-template's Issues

Add experiment.yml config

This file should be abstracted away from the config.yaml. Currently, the config.yaml contains experimental data and pipeline info.

The experiment.yml file should specify the following:

  • batch
  • plate
  • process (true/false)
  • pipeline (this is new! - with a split of pipeline and data, we should have the ability to process different plates with different (and/or multiple) pipelines

frame.append method deprecation

When generating profiles, this warning comes up a bunch

/home/ubuntu/work/projects/{PROJECT}/workspace/software/{PROJECT}/profiling-recipe/profiles/profile.py:636: FutureWarning:

The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.

Revisit the instructions to fork profiling-recipe

A major shortcoming of the current approach is forking profiling-recipe means that you are limited to one fork (for a given GitHub account). Consider this scenario:

  1. I am user-A and I create a repo user-A/dataset-1 using cytomining/profiling-template
  2. I then fork cytomining/profiling-recipe into user-A/profiling-recipe and weld it to user-A/dataset-1, following the instructions in README
  3. I later want to create a new repo user-A/dataset-2 using cytomining/profiling-template
  4. I have to now used the same fork user-A/profiling-recipe and weld it to user-A/dataset-2

So user-A/dataset-1 and user-A/dataset-2 have to use the same fork user-A/profiling-recipe; there is no way around this.

One could create a branch per dataset, i.e.,

  • user-A/profiling-recipe has a branch dataset-1 which is welded to user-A/dataset-1
  • user-A/profiling-recipe has a branch dataset-2 which is welded to user-A/dataset-2

and so on. But this is getting messy!

Did you already ponder this @gwaygenomics ?

Config file migration decisions

Currently, we use a single config yaml file called config.yml.

We should consider splitting this file into three different config yaml files

File Contents Notes
pipeline.yml Modular block design of pipeline steps Depends on cytomining/profiling-recipe#11 and cytomining/profiling-recipe#12
experiment.yml batch and plate info, decision on whether or not to process This is the primary reason for the config split. For large experiments, the plate and batch specifications can get huge!
advanced.yml Anything outside of the block design and plate info e.g. #8

Add instructions for using DVC for versioning data

@gwaygenomics said this broadinstitute/lincs-cell-painting#60 (comment)

We might at some point also consider moving from gitLFS to dvc. It was super easy to get setup, and plays very nicely with AWS. I did this in the grit-benchmark repo (in broadinstitute/grit-benchmark#28)

The file pointer is in a readable format (YAML file)

outs:
- md5: c53856c1596f00a67a636389716d8219
  size: 26948901
  path: cellhealth_single_cell_umap_embeddings_SQ00014610_chr2.tsv.gz

Steps

  1. Read the docs https://dvc.org/doc/start
  2. Create a destination prefix (a "folder") on S3, which will be the remote storage location for dvc.
  3. Add the dvc and dvcs3 dependencies
  4. Update your .gitignore to ignore the files you used to previously track using GitLFS
  5. Follow steps here https://dvc.org/doc/start and here https://dvc.org/doc/start/data-versioning

Move JUMP pilot data documentation to cytomining/profiling-template

A lot of valuable documentation and notes have been taken in https://github.com/jump-cellpainting/pilot-data (private repo).

We should port over the common documentation from the JUMP repo to this upstream branch. It is likely that some of the documentation added in the JUMP repo is too specific and/or proprietary. Only the common documentation should be ported.

Also note that much of the instructions in pilot-data will be deprecated once the single cell aggregation options are improved via cytomining/pycytominer#111 and cytomining/pycytominer#112

plotly and kaleido not available in conda environment

I'm not sure why this happens, but when I generate profiles on a new machine, I've noticed that despite the profiling environment being active and despite plotly and kaleido clearly appearing in conda list, I will get errors about plotly and kaleido modules not being found. The solution is to pip install them rather than conda install, but this is very strange that they wouldn't be available in the conda environment. Not sure what the solution is, but flagging this for future people in case anyone else sees this error.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.