leofountain / weaver Goto Github PK
View Code? Open in Web Editor NEWAllele-Specific Quantification of Structural Variations in Cancer Genomes
Allele-Specific Quantification of Structural Variations in Cancer Genomes
Weaver Allele specific base-pair resolution quantification of Strcutrual variations in cancer genome [email protected] [email protected] Version 0.20 ---------------------------- INSTALL ---------------------------- Bamtools (https://github.com/pezmaster31/bamtools) libraries are needed included in Weaver_SV/lib and Weaver_SV/inc export LD_LIBRARY_PATH=<PREFIX>/Weaver/Weaver_SV/lib/:$LD_LIBRARY_PATH libz required //-lz flag Parallel::ForkManager (http://search.cpan.org/~szabgab/Parallel-ForkManager-1.06/lib/Parallel/ForkManager.pm) perl package is needed Bedtools (https://github.com/arq5x/bedtools) Samtools (http://samtools.sourceforge.net/) BOOST C++ library (http://www.boost.org/) BWA (http://bio-bwa.sourceforge.net/) Bowtie (http://bowtie-bio.sourceforge.net/index.shtml) 1 Modify the required BOOST directory in src/Makefile 2 ./INSTALL.sh ----------------------------- DATA ----------------------------- wget http://bioen-compbio.bioen.illinois.edu/weaver/Weaver_data.tar.gz ----------------------------- EXAMPLE DATA ----------------------------- wget http://bioen-compbio.bioen.illinois.edu/weaver/Weaver_example.tar.gz RUN: Weaver PLOIDY -f SIMU.fa -S FINAL_SV -s SNP -g REGION -w X.bam.wig -r 0 -m map100mer.bd -p 64 solo_ploidy TARGET 2 Weaver LITE -f SIMU.fa -S FINAL_SV -s SNP -g REGION -w X.bam.wig -r 0 -m map100mer.bd -p 64 -t 20 -n 0 ---------------------------- Weaver_SV.pl ---------------------------- SV finding Input: BAM file from BWA Output: VCF file for SV ---------------------------- Weaver_pipeline.pl ---------------------------- Master program: 1 Generate SV 2 Generate other inputs needed for Weaver INPUTS DATA package: 1000 Genomes Project Phase 1 haplotypes ---------------------------- Weaver ---------------------------- Core PGM program INPUTS: 1 SV Outputs: 1 Purity and haploid-level sequencing coverage 2 Allele specific copy number of genomic regions 3 Allele specific copy number of structural variations 4 Relative timing of structural variations 5 Cancer scaffolds 5 Phasing of germline SNPs in CNV regions ---------------------------- Weaver_lite ---------------------------- Core PGM program, with SNP phasing disabled to speed up INPUTS: 1 SV 2 reference 3 Mappability (available for hg19) 4 Region (available for hg19) 5 wig (from bam) ---------------------------- Weaver PLOIDY ---------------------------- Weaver PLOIDY -f -S -s ../SNP_dens -g GAP_20140416_num -w -r 1 -m -p 16 INPUTS: -f reference file (fasta), should match the reference used in original bam file. Especially for most TCGA datasets, the alignment was performed on //www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta, which does not have "chr" prefix [MANDATORY] -S SV file, with format consistent with Weaver_SV. [MANDATORY] -s SNP file, with ref and alt mappings [MANDATORY] -w wig file from bam, storing the coverage information [MANDATORY] -r 1, if first time running (generating temp files); 0 if want to use existing temp files. [default 1] -m mappability file, download from http://bioen-compbio.bioen.illinois.edu/weaver/Weaver_data.tar.gz [MANDATORY] -p number of cores [default 1] ----------------------------- FILE FORMAT DECLARITIONS ----------------------------- Wiggle file: Wiggle file need to be declared with fixedStep, step 1 and span 1 fixedStep chrom=chr1 start=9994 step=1 span=1 if a chromosome has multiple declaration lines, they need to be sorted based on position: fixedStep chrom=chr1 start=9994 step=1 span=1 X X X fixedStep chrom=chr1 start=100 step=1 span=1 X X X Is not allowed Bam file: Must be sorted and indexed. SNP file: NGS SNP link file 1KGP SNP link SV: Genome region file: GAP regions in assembly are annotated. ################### Output: ################### REGION_CN_PHASE: storing phased allele specific copy number of genome CHR BEGIN END ALLELE_1_CN ALLELE_2_CN SV_CN_PHASE: Structural variation copy number and phasing, catagory CHR_1 POS_1 ORI_1 ALLELE_ CHR_2 POS_2 ORI_2 ALLELE_ CN germline/somatic_post_aneuploidy/somatic_pre_aneuploidy ############### CONTACT ############### Yang Li Ma Lab Bioengineering Dept., University of Illinois at Urbana-Champaign [email protected] https://github.com/leofountain/Weaver
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.