Code Monkey home page Code Monkey logo

weaver's Introduction

Weaver

Allele specific base-pair resolution quantification of Strcutrual variations in cancer genome

[email protected]
[email protected]

Version 0.20

----------------------------
INSTALL
----------------------------

Bamtools (https://github.com/pezmaster31/bamtools) libraries are needed

	included in Weaver_SV/lib and Weaver_SV/inc

export LD_LIBRARY_PATH=<PREFIX>/Weaver/Weaver_SV/lib/:$LD_LIBRARY_PATH

libz required //-lz flag

Parallel::ForkManager (http://search.cpan.org/~szabgab/Parallel-ForkManager-1.06/lib/Parallel/ForkManager.pm) perl package is needed

Bedtools (https://github.com/arq5x/bedtools)

Samtools (http://samtools.sourceforge.net/)

BOOST C++ library (http://www.boost.org/)

BWA (http://bio-bwa.sourceforge.net/)

Bowtie (http://bowtie-bio.sourceforge.net/index.shtml)




1	Modify the required BOOST directory in src/Makefile

2	./INSTALL.sh


-----------------------------
DATA
-----------------------------

wget http://bioen-compbio.bioen.illinois.edu/weaver/Weaver_data.tar.gz





-----------------------------
EXAMPLE DATA
-----------------------------

wget http://bioen-compbio.bioen.illinois.edu/weaver/Weaver_example.tar.gz



RUN:

Weaver PLOIDY -f SIMU.fa -S FINAL_SV -s SNP -g REGION -w X.bam.wig -r 0 -m map100mer.bd -p 64
solo_ploidy TARGET 2
Weaver LITE -f SIMU.fa -S FINAL_SV -s SNP -g REGION -w X.bam.wig -r 0 -m map100mer.bd -p 64 -t 20 -n 0

----------------------------
Weaver_SV.pl
----------------------------
SV finding
Input:
	BAM file from BWA

Output:
	VCF file for SV

----------------------------
Weaver_pipeline.pl
----------------------------
Master program:
	1	Generate SV
	2	Generate other inputs needed for Weaver

INPUTS

DATA package:
	1000 Genomes Project Phase 1 haplotypes



----------------------------
Weaver
----------------------------
Core PGM program

INPUTS:
	1	SV

Outputs:
	1	Purity and haploid-level sequencing coverage
	2	Allele specific copy number of genomic regions
	3	Allele specific copy number of structural variations
	4	Relative timing of structural variations
	5	Cancer scaffolds
	5	Phasing of germline SNPs in CNV regions



----------------------------
Weaver_lite
----------------------------
Core PGM program, with SNP phasing disabled to speed up

INPUTS:
	1	SV
	2	reference
	3	Mappability (available for hg19)
	4	Region (available for hg19)
	5	wig (from bam)



----------------------------
Weaver PLOIDY
----------------------------

Weaver PLOIDY -f  -S  -s ../SNP_dens -g GAP_20140416_num -w  -r 1 -m  -p 16



INPUTS:

-f reference file (fasta), should match the reference used in original bam file. Especially for most TCGA datasets, the alignment was performed on //www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta, which does not have "chr" prefix  [MANDATORY]

-S SV file, with format consistent with Weaver_SV. [MANDATORY]
 
-s SNP file, with ref and alt mappings [MANDATORY]

-w wig file from bam, storing the coverage information [MANDATORY]

-r 1, if first time running (generating temp files); 0 if want to use existing temp files. [default 1]

-m mappability file, download from http://bioen-compbio.bioen.illinois.edu/weaver/Weaver_data.tar.gz [MANDATORY]

-p number of cores [default 1]


-----------------------------
FILE FORMAT DECLARITIONS
-----------------------------

Wiggle file:

Wiggle file need to be declared with fixedStep, step 1 and span 1
fixedStep chrom=chr1 start=9994 step=1 span=1
if a chromosome has multiple declaration lines, they need to be sorted based on position:
fixedStep chrom=chr1 start=9994 step=1 span=1
X
X
X
fixedStep chrom=chr1 start=100 step=1 span=1
X
X
X
Is not allowed



Bam file:

Must be sorted and indexed.

SNP file:

NGS SNP link file


1KGP SNP link


SV:


Genome region file:

GAP regions in assembly are annotated.

###################
Output:
###################

REGION_CN_PHASE: storing phased allele specific copy number of genome

CHR	BEGIN	END	ALLELE_1_CN	ALLELE_2_CN




SV_CN_PHASE: Structural variation copy number and phasing, catagory

CHR_1	POS_1	ORI_1	ALLELE_	CHR_2   POS_2   ORI_2   ALLELE_	CN	germline/somatic_post_aneuploidy/somatic_pre_aneuploidy



###############
CONTACT
###############

Yang Li
Ma Lab
Bioengineering Dept., University of Illinois at Urbana-Champaign

[email protected]
https://github.com/leofountain/Weaver

weaver's People

Contributors

leofountain avatar

Stargazers

 avatar Markus Riester avatar Eric T. Dawson avatar Eric Martin avatar Joep de Ligt avatar

Watchers

 avatar

weaver's Issues

What are -t and -n values in Weaver LITE ?

In the Weaver LITE last step (taken up from Weaver github)
Weaver LITE -f SIMU.fa -S FINAL_SV -s SNP -g REGION -w X.bam.wig -r 0 -m map100mer.bd -p 64 -t 20 -n 0

What are -t and -n values?
I am running Weaver for whole exome sequencing data and there is only one sample and no control, only single sample. What are the values that need to be entered in -t and -n option and how can I get these values?

Unable to reproduce example results

I have installed weaver from github to ~/weaver and unpacked Weaver_data.tar.gz and Weaver_example.tar.gz to the Weaver_data and Weaver_example sub-directories respectively.
I have fixed the broken symlinks in ~/weaver/data

I am attempting to run the example. When I rerun Weaver_example/cmd I get the following output:

$ ../bin/bam2bw.pl lite X.bam 64 M
$ ~/weaver/bin/Weaver PLOIDY -f SIMU.fa -S FINAL_SV -s SNP -g REGION -w X.bam.wig -r 0 -m map100mer.bd -p 64 -t 20 -n 0
RUN MODE        PLOIDY
THREAD was set to 64.
FASTA was set to SIMU.fa.
WIG was set to X.bam.wig.
MAP was set to map100mer.bd.
SV was set to FINAL_SV.
SNP was set to SNP.
GAP was set to REGION.
RUNFLAG was set to 0.
Getting coverage profile...
Getting coverage profile done!
Getting GC content done!
Getting Mapability done!
Estimated cancer haplotype coverage:    0
Estimated normal haplotype coverage:    0
$ ~/weaver/bin/Weaver LITE -f SIMU.fa -S FINAL_SV -s SNP -g REGION -w X.bam.wig -r 0 -m map100mer.bd -p 64 -t 20 -n 0
RUN MODE        LITE
THREAD was set to 64.
FASTA was set to SIMU.fa.
WIG was set to X.bam.wig.
MAP was set to map100mer.bd.
SV was set to FINAL_SV.
SNP was set to SNP.
GAP was set to REGION.
RUNFLAG was set to 0.
TUMOR coverage was set to 20.
NORMAL was set to 0.
Getting coverage profile...
Getting coverage profile done!
Getting GC content done!
Getting Mapability done!
base_mean = 20
best_norm = 0
LBP scan
LBP
LBP init
LBP print
LBP scan
$ ls -l
total 1231020
-rw-r----- 1 cameron.d allstaff        405 Oct 30  2014 cmd
-rw-r--r-- 1 cameron.d allstaff          0 May 22 15:42 EACH_REGION
-rw-r--r-- 1 cameron.d allstaff          0 May 22 15:42 EACH_REGION_1
drwxr-x--- 2 cameron.d allstaff          9 Oct 30  2014 INPUT
drwxr-x--- 2 cameron.d allstaff         11 Nov  4  2014 OUTPUT
-rw-r--r-- 1 cameron.d allstaff          0 May 22 15:42 REGION_CN_PHASE
-rw-r--r-- 1 cameron.d allstaff          0 May 22 15:42 SNP_CN_PHASE
-rw-r--r-- 1 cameron.d allstaff          0 May 22 15:42 SV_CN_PHASE
-rw-r--r-- 1 cameron.d allstaff          0 May 22 15:42 SV_REMOVED
-rw-r--r-- 1 cameron.d allstaff          0 May 22 15:42 SV_SELECTED
-rw-r--r-- 1 cameron.d allstaff          0 May 22 15:42 TARGET
-rw-r--r-- 1 cameron.d allstaff          0 May 22 15:42 tempfile
-rw-r----- 1 cameron.d allstaff 1097298665 Oct 30  2014 X.bam
-rw-r----- 1 cameron.d allstaff     161816 Oct 30  2014 X.bam.bai
-rw-r--r-- 1 cameron.d allstaff         28 May 22 15:19 X.bam.G1
-rw-r--r-- 1 cameron.d allstaff         28 May 22 15:19 X.bam.G2
-rw-r--r-- 1 cameron.d allstaff  162603325 May 22 15:24 X.bam.wig

Note that the files are almost all empty files and are located in the ~/weaver/Weaver_example directory, not the ~/weaver/Weaver_example/OUTPUT subdirectory.

What else is required to reproduce the example output for the supplied example?

Command-line documentation does not match source code

I am attempting to run weaver and, for the programs I have attempted to run, neither the online github documentation, nor the program usage help match the actual command-line arguments required. To run these programs, I have had to look and the source code for the relevant .pl to work out how to run it.

For example:

  • Weaver_SV.pl
    • github documentation lists only a BAM as the required input. When running the program, it indicates that the reference fasta is also required.
  • Weaver_pipeline.pl
    • the first two arguments loaded into the $RUN_TYPE and $MODEFLAG variables are not mentioned anywhere in the documentation.
    • --gap documentation appears to be for --fa
    • multiple arguments not marked as [MANDATORY] are mandatory
    • Setting -F still requires --fa as the check for an empty $FA occurs before $FULLFA is set. Is this parameter intended to allow for the bwa & bowtie indexes to not require colocation with the reference fasta?

Does there exist an all-in-one wrapper that can go from bam to weaver output for hg19 and/or hg38?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.