Code Monkey home page Code Monkey logo

analysis2's Introduction

build Status

Analysis 2

Analysis of inference methods on standard population models including selection. Here's how to get going, if you'd like to run this analysis

Get the Analysis repo

Now clone the analysis2 repo, and install its dependencies

$ git clone [email protected]:popsim-consortium/analysis2.git
$ cd analysis2/

Set up your python environment to run the analysis

We recommend you start by creating a new conda environment for the analysis. This can be done using the command below, which will create a new conda env called analysis2. Currently the workflow is targeted to run on python 3.9

$ conda env create -f environment.yml
$ conda activate analysis2

Set the config example

With the environment in place, the next step is to set the workflow parameters using the a config file. analysis2 currently ships with three example config files, each found in config/snakemake/: tiny_config.yaml, config.yaml, and production-config.yaml. Respectively these represent a very small run, a small run, and the final production settings used for the paper (TBD)

The workflow can be pointed at one of these config files by editing the following line in the Snakefile file

configfile: "workflows/config/snakemake/tiny_config.yaml"

Having set the config file, and perhaps edited to your wishes, you are now ready to try a dry run.

Perform a dry run of the workflow

To make sure that things are working, next run a dry run of the complete workflow

$ snakemake -c1 all -n

if the dry run checks out, you should be ready to run.

Run the workflow

You should now be set to run the complete workflow. This will consist broadly of: 1) simulating the chromosomes of interest, 2) downloading and installing the tools to be used in the anlaysis of the simulated data, 3) analysis of the simulated tree sequences using the aforementioned tools, and 4) summarization of the analyses into figures.

The Snakemake workflow has a number of targets:

  • all -- run the complete analysis
  • clean_all -- removes all simulations, downloads, and analysis
  • clean_ext -- removes all downloaded external tools
  • clean_output -- removes all simulation and analysis

To run the complete workflow on M cores use the following

$ snakemake -c M all

One can run the clean_ targets of the workflow similarly.

Running only a portion of the workflow

Sometimes the user only wants to run a subsection of the workflow. This is possible using Snakemake with the --snakefile option along with the component workflows we have included. For instance, to just perform the simulation steps of the workflow using 10 CPUs the user can say

$ snakemake -c 10 --snakefile workflows/simulation.snake

and only that part of the analysis pipeline will run. We currently have 3 sub-workflows: simulations.snake which does the simulations, n_t.snake which performs N(t) type analyses (e.g. msmc), and dfe.snake which houses the portion of the workflow that does estimation of the DFE.

Running the workflow on a cluster

Currently we have provided two example Snakemake profiles that allow a user to run the analysis2 workflow on quite easily. These can be found in workflows/config/snakemake/oregon_profile and workflows/config/snakemake/arizona_profile.

Cluster profile files

Each of those directories contains a single file, config.yaml, that lays out the cluster specific settings needed to launch jobs.

For instance the oregon_profile/config.yaml, which is meant to run using a cluster with a slurm scheduler looks like this

cluster:
        mkdir -p logs/{rule} &&
        sbatch
                --partition=kern,kerngpu
                --account=kernlab
                --cpus-per-task={threads}
                --mem={resources.mem_mb}
                --time={resources.time}
                --job-name=smk-{rule}-{wildcards}
                --output=logs/{rule}/{rule}-{wildcards}-%j.out
default-resources:
        - time=60
        - mem_mb=5000
        - threads=1
restart-times: 3
max-jobs-per-second: 10
max-status-checks-per-second: 1
local-cores: 1
latency-wait: 60
jobs: 500
keep-going: True
rerun-incomplete: True
printshellcmds: True
scheduler: greedy
use-conda: True

to adopt this to a new slurm cluster a user would have to:

  • change the partition value to appropriately named partitions
  • change the account name

Run command

At the command line this eases things tremendously. We can lauch the entire workflow simply with

$ snakemake --profile workflows/config/snakemake/oregon_profile/

Snakemake then will take care of all of the communication with the cluster, launching jobs, and monitoring them for completeness.

Further reading on profiles

There are a lot of great examples on how to set up profiles for running workflows on various cluster architectures. One excellent resource is this repository of publicly available profiles https://github.com/snakemake-profiles/doc

analysis2's People

Contributors

andrewkern avatar chriscrsmith avatar petrelharp avatar mufernando avatar xin-huang avatar grahamgower avatar stsmall avatar izabelcavassim avatar apragsdale avatar lntran26 avatar ryangutenkunst avatar nspope avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.