Code Monkey home page Code Monkey logo

mwc_mutants's Introduction

The Energetics of Molecular Adaptation in Transcriptional Regulation

Build Status DOI

Welcome to the GitHub repository for the MWC mutants project! This repository contains the entire project history as well as curated scripts to make reproducibility feasible.

Branches

This repository consists of three branches -- master, gh-pages, and publication. The master branch is the primary working branch the authors used during the research. It contains all project history, processing data scripts, preliminary figure scripts, and other exploratory data analysis files. The gh-pages brach contains all of the website files including hosting of the interactive figures. Finally, the publication branch (which you are reading this on) contains all final processed data, figure generating scripts, data analysis scripts, and the software module mut. If interested in reproducing the work in this publication, you can execute the scripts on this branch. Please see the individual directories for more information.

Installation

To reproduce this work, you will need to use the mut module -- a homegrown Python software package written explicitly for this work. The requirements can be installed by executing the following command using pip in the command line:

pip install -r requirements.txt

The software module itself can be installed locally by executing the command in the root directory,

pip install -e ./

We have written a file install.sh that performs both of these steps. To install, simply run

sh install.sh

from the root directory. When installed, a new folder mut.egg-info will be installed and is necessary to run any of the code.

Repository Architecture

This repository is broken up into several directories and subdirectories. Please see each directory for information regarding each file.

code

The name says it all. This repository contains all code used in this work to analyze data and generate figures. It is broken up into several subdirectories which separate the scripts by function.

  1. analysis | Contains all Python scripts which perform data analysis procedures. This includes parameter inference and calibration of various statistical models.
  2. figures | Contains all Python scripts which perform data visualization procedures. This includes generation all main and supplementary text figures as well as the interactive figures on the paper website.
  3. stan | Contains all inferential models and associated functions written in the Stan probabilistic programming language.
  4. processing | This folder contains a single script (example_processing.py) that illustrates how processing of the raw data was performed. It is kept very generic such that only a few parameters at the top of the script need to be changed from experiment to experiment.

data

This directory contains all processed and simulated data used in this work. The raw flow cytometry data is stored on the CaltechDATA research data repository under the DOI: 10.22002/D1.1241.

figures

This folder contains .pdf files of all main and supplementary text figures. No code exists in this directory.

mut

This is the heart-and-soul of the repository. It contains a series of Python files which define the myriad functions used in the processing, analysis, and visualization of data associated with this work. Pleas navigate to the directory to see a description of its contents.

tests

We tested a selection of functions we believe are critical to this work. This includes the gaussian gating routine for processing of flow cytometry, computation of MWC active probabilities, and scraping of processed datasets to collect only the accepted experiments. This repository is powered by Travis Continuous Integration which periodically runs the test functions to ensure that everything passes.

License

All creative works (writing, figures, etc) are licensed under the Creative Commons CC-BY 4.0 license. All software is distributed under the standard MIT license as follows

Copyright 2019 The Authors 

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

mwc_mutants's People

Contributors

gchure avatar mrazomej avatar sloosbarnes avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

minghao2016

mwc_mutants's Issues

Initial Meeting

Here's a list of what needs to be done and what we learned from the last round.

Overall, the paper should focus on how changes in the protein correlate to changes in the induction profiles / data collapse.

Strain Construction

We have three DNA binding domain and three inducer binding mutants

  • Q21M, Q21A, Y20I -- DNA binding domain mutants
  • Q294V, Q294K, F164T -- Inducer binding domain mutants.
  • All pairwise combinations between DNA and inducer domain mutants.

All mutants are in the RBS1027 strain with 260 repressors per cell. We may not need to do the experiment on all operators for the double mutants, but we'll keep it in mind.

  • Sequence all integrations and nicely collate the sequences into a single file.

Data Collection

  • Reduce number of replicates to 5 or 6 runs per mutant. We do not think that 10 is necessary given our results from the induction paper.

  • Plates should always include a WT strain to make sure that things are working as expected. The plate layout should always look something like the following:

screen shot 2017-02-23 at 4 31 36 pm

  • We should always move the data to the data server immediately after collection. If there is a suspicious data set, mark it as so on the server.

Data Processing

  • Immediately process the data by converting from .fcs to .csv, compute fold-change, and generate quality control plots.

    • Write a script that automatically compiles all of the data into a single flow_master.csv file.
  • Standardize the scripts such that all variables are defined at the beginning. Files from the example folder should be used for each new

Paper Writing

  • It will be useful to have drafts of artistic figures from the beginning.
    • Fig 1: States & Weights
    • Fig 2: Diagrams of what protein level mutants alter in the model and how they influence the model predictions. We can cast this to our predictions of the change of the Bohr parameter.
    • Fig 2: Plot predictions of leakiness, dynamic range, EC50, etc.
    • Fig 3: Data / fitting
    • Fig 4: Matrix of plots using predictions and data from from all double mutants.
    • Plotting of the odds ratios from WT to mutant predictions (maybe in appendix).

Outstanding Issues

Here are the outstanding experimental issues. We need at least 5 replicates of each.

  • Flow Cytometry Measurements.
  • Y20I double mutants in O1 and O3
  • Y20I single mutant only in O3

F164T

  • O1
  • O2
  • O3

Q294R

  • O1
  • O2
  • O3

Q294V

  • O1
  • O2
  • O3

Q294K

  • O1
  • O2
  • O3

Missing Mutants and Molecular Biology

We are missing the following strains. We need to remake them or find them and ensure they are stored in an organized format. Please check these off as they are remade.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.