Code Monkey home page Code Monkey logo

ppg's Introduction

ppg

Plasmodium Pop-Gen

Preprocessing the Plasmodium falciparum Community Project V7.0 Dataset

Bgzip and Tabix

# Ungzip, bgzip, and create an index file.
for CHR in 01 02 03 04 05 06 07 08 09 10 11 12 13 14; do
sbatch -J Pf3D7_${CHR}_v3.pf.clean_vcf -N 1 -n 1 -t 1-0 --mem=2G -p batch --account=ccmb-condo -o logs/Pf3D7_${CHR}_v3.pf.clean_vcf-%A.out -e logs/Pf3D7_${CHR}_v3.pf.clean_vcf-%A.err --mail-type=FAIL [email protected] --wrap="module load tabix; gunzip -d ./vcf_data/init_vcfs/Pf3D7_${CHR}_v3.pf7.vcf.gz; bgzip ./vcf_data/init_vcfs/Pf3D7_${CHR}_v3.pf7.vcf; tabix -p vcf ./vcf_data/init_vcfs/Pf3D7_${CHR}_v3.pf7.vcf.gz"
done

Split SNV Records

# Create new records for SNV variants at multiallelic sites.
for CHR in 01 02 03 04 05 06 07 08 09 10 11 12 13 14; do
sbatch -J Pf3D7_${CHR}_v3.pf.split_records -N 1 -n 1 -t 1-0 --mem=100G -p batch --account=ccmb-condo -o logs/Pf3D7_${CHR}_v3.pf.split_records-%A.out -e logs/Pf3D7_${CHR}_v3.pf.split_records-%A.err --mail-type=FAIL [email protected] --wrap="module load bcftools; bcftools norm -m -snps -Oz -o ./vcf_data/init_vcfs/Pf3D7_${CHR}_v3.pf7.split_records.vcf.gz ./vcf_data/init_vcfs/Pf3D7_${CHR}_v3.pf7.vcf.gz"
done

vcfs

identify_duplicate_records.py

  • sys.argv[1] = vcf_prefix: VCF name up to the .vcf.gz; e.g., Pf3D7_${CHR}_v3.pf7.
  • sys.argv[2] = chrom: Chromosome ID from the VCF for the regions file; e.g., Pf3D7_${CHR}_v3.
  • sys.argv[3] = vcf_path: File path to the VCF /users/dpeede/data/dpeede/13_plasmodium/vcf_data/init_vcfs.
  • sys.argv[4] = dup_path: File path to output the duplicate record region file; e.g, /users/dpeede/data/dpeede/13_plasmodium/vcf_data/dup_info.
# Example usage.
for CHR in 01 02 03 04 05 06 07 08 09 10 11 12 13 14; do
sbatch -J Pf3D7_${CHR}_v3.pf.identify_dups -N 1 -n 1 -t 1-0 --mem=2G -p batch --account=ccmb-condo -o logs/Pf3D7_${CHR}_v3.pf.identify_dups-%A.out -e logs/Pf3D7_${CHR}_v3.pf.identify_dups-%A.err --mail-type=FAIL [email protected] --wrap="module load tabix python; python ./ppg/vcfs/identify_duplicate_records.py Pf3D7_${CHR}_v3.pf7 Pf3D7_${CHR}_v3 /users/dpeede/data/dpeede/13_plasmodium/vcf_data/init_vcfs /users/dpeede/data/dpeede/13_plasmodium/vcf_data/dup_info | bgzip > /users/dpeede/data/dpeede/13_plasmodium/vcf_data/dup_info/Pf3D7_${CHR}_v3.pf7.dup_records.vcf.gz"
done

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.