Code Monkey home page Code Monkey logo

finemap-cancer's Introduction

Introduction

Step 0. Setting up

Most software is already installed on our directory. The only things you will need to install are the R packages.

On midway2 do:

module load R/3.5.1

install.packages('tidyverse')
install.packages('bigsnpr')
install.packages("vroom")
devtools::install_github("stephenslab/[email protected]")
BiocManager::install(c("GenomicRanges", "plyranges","rtracklayer"))

Step 1. Summary statistics

Obtain summary statistics and place them here /project2/xinhe/CANCER_GERMLINE/SUMMARY_STATISTICS/

You will need summary statistics with the following 8 columns:

  • chromosome (just the number, no "chr", hg19!)
  • position (base pair position, hg19!)
  • beta (if you have Odds Ratio, you will need to transform it to log(Odds Ratio))
  • standard error (SE)
  • reference allele (A,C,T,G)
  • association/effect allele (A,C,T,G)
  • some kind of SNP ID (rsID, or chr:position:a1:a2)
  • p-value

Write down the column names somewhere for you will need them next.

Step 2. Clean summary statistics

Create a working directory and clone this repo into it

mkdir my_project_directory
cd my_project_directory
git clone [email protected]:aselewa/finemap-cancer.git .

Run the preprocessing step with THREE arguments:

./preprocess.sh SUMSTATS COLUMNS prefix
  • SUMSTATS : full path to your summary statistics
  • COLUMNS: a comma(,) delimited string with the columns we described above! Make sure they are in the order described above.
  • prefix: a meaningful name for the cancer/data type. i.e. BrCa for breast cancer

Example:

./preprocess.sh /project2/xinhe/alan/Cancer/GERMLINE/SUMMARY_STATISTICS/colorectal_raw_sumstats.txt.gz \
                chr,position_b37,ocac_overall_OR,ocac_overall_se,a0,a1,1KG_Phase3_ID,ocac_overall_pvalue \
                "ColCa"

This can take about 10 minutes depending on the size of your GWAS. If this worked, you should see a folder called cleaned_sumstats and inside there is {prefix}_cleaned_sumstats.txt.gz inside. The first few lines of the new file should look like:

zcat ColCa_cleaned_sumstats.txt.gz | head -n 5
chr     pos     a0      a1      beta    se      snp     pval    zscore  og_index        bigSNP_index    locus
1       13110   A       G       -0.02066        0.03231 1       0.52245 -0.6394305168678428     3       4       1
1       13116   G       T       -0.01255        0.01895 rs201725126     0.50772 -0.662269129287599      4       5       1
1       13118   G       A       -0.01255        0.01895 rs200579949     0.50772 -0.662269129287599      5       6       1
1       13550   A       G       0.13184 0.11236 1       0.24066 1.173371306514774       7       11      1

IMPORTANT: you can have multiple summary statistics in the cleaned_sumstats/ folder. Just run the preprocess.sh recipe for each summary statistics trait. MAKE SURE THEY HAVE DIFFERENT PREFIX!

Step 3. Annotations

Annotations should be a .bed file with only the first 3 columns: chromosome, start, end. Chromosome should just be the number, no "chr" and should be in hg19/b37 coordinates. In your project directory, you should see a folder called annotations/ Go into this. You should now see a folder called {prefix}/ where prefix is what you choose in step 2. Place your annotation file (.bed extension in here).

It may be useful to look at existing annotations that I placed at /project2/xinhe/CANCER_GERMLINE/ANNOTATIONS/

Your annotation should look like this:

head mybed.bed

1       564999  566395
1       568591  569959
1       569105  570623
1       713254  715133
1       762120  763780

IMPORTANT: you can place as many bed-files/annotations in the prefix directory as you want! Just make sure they have different names. These are the annotations that will be used for this specific trait.

Step 4. Running finemapping

We are now ready to run the main pipeline for finemapping.

With your project directory (my_project_directory) from earlier, you should now see a file called torus_susie_snakemake.py. Open this file using Vim or nano and edit the variable called pd, by placing the full path to your working directory.

pd = 'YOUR PROJCT DIRECTORY'

Example:

pd = '/project2/xinhe/alan/Cancer/mydir'

To run the pipeline, execute the following:

sbatch submit_snake.sbatch

Make sure the job is running:

qstat -u CNETID

Once it is over, you should see a directory called susie_finemapping which contains the finemapped results for each chromosome.

finemap-cancer's People

Contributors

aselewa avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.