Code Monkey home page Code Monkey logo

mosaicsim's Introduction

MosaicSim

Dependencies

The user should first download python3 if they do not already have it. The simulator requires the following python packages to run: pandas, numpy, biopython, tskit, msprime, glob

Each of the packages can be easily installed using pip, and running "pip3 install PACKAGE" for each of the packages above.

Alternatively, one could install these packages in a conda environment

Data Dependency

The simulator requires a genome from which to simulate from. We use the hg38 reference genome, which can be downloaded by first running "wget http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz" and then "gunzip hg38.fa.gz". Be sure the unzipped hg38.fa file is in the MosaicSim/data directory to avoid a runtime error. The user could also swap this genome for another reference genome by simply updating the "full_genome" parameter in the parameters.py file, as long as it is in standard FASTA format. If the alternative FASTA file does not follow the standard labeling conventions of hg38, the user may have to edit the chromosome preprocessing code in sim.py as desired.

Execution

Each run of the simulator is defined by first adjusting the study design and simulation parameters in "parameters.py" and then running the "sim.py" script. In particular, be sure to change the storage_dir variable to the directory you want to store your data, and change the other parameters as desired to your biological and study design choices. Descriptions of each of the parameters are included as comments in the parameters.py script and in the paper.

To run the simulation, simply execute "python3 sim.py [DIR_NAME]" in the repository. DIR_NAME defines the name of the directory under storage_dir where all your data will be stored for the simulation run.

The file optimization_sim.py gives an alternate single pass simulation method.

System Requirements

The program should, for standard sequencing settings, take less than 40G of memory and only one core per execution. The storage requirements depend on the parameters of the simulation. Run time was approximately 3 and a half hours for 3, 30x WGS samples.

mosaicsim's People

Contributors

arjunsrivatsa avatar leovam avatar

Stargazers

Weixiang Wang avatar  avatar Clint Valentine avatar Caleb Ellington avatar

Watchers

James Cloos avatar  avatar  avatar

Forkers

karchinlab

mosaicsim's Issues

questions about paired exon sequencing

Hi,
I have a few questions regarding the exonrunPairedSim function and wonder if you could provide some insights.

  1. The coverage was achieved when cov exceeds coverage using cov += 2batchsubblockratio where ratio = 2rl/fl. Can you briefly explain why it is necessary for the multiplication of 2 at both positions?
  2. It seems to me that you are adding all the mutations for a tumor clone and creating a .gz file first then do the sampling on the exon regions using the provided file. How do you handle the fragments/reads for positions/regions where there is CNV or DEL before the position?
  3. The coverage can be achieved by repeated sequencing for the exon regions. Does it mean each exon region will have similar coverage even if there is CNV or DEL events? i.e. will the generated reads differ in coverage such that copy number calling tools can be used to infer CNV using exon sequencing data?
    Thank you in advance for helping with the questions!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.