Code Monkey home page Code Monkey logo

phantompeakqualtools's Introduction

WARNING: If you see the following error, then downgrade your spp to 1.14 or 1.15.x.

Error in lwcc(x, y, s, e, return.peaks = return.peaks, bg.x = bg.x, bg.y = bg.y,  :
  INTEGER() can only be applied to a 'integer', not a 'character'

Summary

This package computes informative enrichment and quality measures for ChIP-seq/DNase-seq/FAIRE-seq/MNase-seq data. It can also be used to obtain robust estimates of the predominant fragment length or characteristic tag shift values in these assays.

Introduction

This set of programs operate on mapped Illumina single-end read datasets in tagAlign or BAM format. They can be used to

  1. Compute the predominant insert-size (fragment length) based on strand cross-correlation peak
  2. Compute Data quality measures based on relative phantom peak
  3. Call Peaks and regions for punctate binding datasets

Citations

If you are using the code or results in any formal publication please cite

[1] Landt SG1, Marinov GK, Kundaje A et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 2012 Sep;22(9):1813-31. doi: 10.1101/gr.136184.111.

[2] Kharchenko PK, Tolstorukov MY, Park PJ, Design and analysis of ChIP-seq experiments for DNA-binding proteins Nat Biotechnol. 2008 Dec;26(12):1351-9

Dependencies

NOTE: The current package does not run on a MacOS or Windows.

  • unix, bash, R (>=3.1), awk, samtools, boost C++ libraries
  • R packages: spp (1.14, 1.15.x), caTools, snow, snowfall, bitops, Rsamtools (in Bioconductor)

Files

  1. spp_1.14.tar.gz : modified SPP peak-caller package (The original SPP-peak caller package was written by Peter Kharchenko [2], https://github.com/hms-dbmi/spp)
  2. run_spp.R : The script to compute the frag length, data quality characteristics based on cross-correlation analysis and/or peak calling

Installation

  1. First make sure that you have installed R (version 3.1 or higher)

  2. Also, you must have the Boost C++ libraries installed. Most linux distributions have these preinstalled. If not, you can easily get these from your standard package manager for your linux distribution. e.g synaptic package manager (apt-get) for ubuntu or emerge for gentoo.

  3. Clone the repo.

    $ git clone https://github.com/kundajelab/phantompeakqualtools
    
  4. Install the following R packages

    • snow (if you want parallel processing)
    • snowfall
    • bitops
    • caTools
    • Rsamtools (in the Bioconductor package)
    $ cd phantompeakqualtools
    $ R
    (From within R)
    > install.packages("snow", repos="http://cran.us.r-project.org")
    > install.packages("snowfall", repos="http://cran.us.r-project.org")
    > install.packages("bitops", repos="http://cran.us.r-project.org")
    > install.packages("caTools", repos="http://cran.us.r-project.org")
    > source("http://bioconductor.org/biocLite.R")
    > biocLite("Rsamtools",suppressUpdates=TRUE)
    > install.packages("./spp_1.14.tar.gz")
    
  5. If your alignment files are BAM, you must have the samtools executable in your path so that the R script run_spp.R can call it using the system() command You can get samtools from (here)[http://samtools.sourceforge.net/] You can add the following line to your ~/.bashrc file

    export PATH="<path_to_samtools_executable>:${PATH}"
    
  6. Run run_spp.R

    Rscript run_spp.R <options>
    

General usage

Usage: Rscript run_spp.R <options>

  • Mandatory arguments

    argument description
    -c=<ChIP_alignFile> full path and name (or URL) of tagAlign/BAM file (can be gzipped) (FILE EXTENSION MUST BE tagAlign.gz, tagAlign, bam or bam.gz)
  • Mandatory arguments for peak calling

    argument description
    -i=<Input_alignFile> full path and name (or URL) of tagAlign/BAM file (can be gzipped) (FILE EXTENSION MUST BE tagAlign.gz, tagAlign, bam or bam.gz)
  • Optional arguments

    argument description
    -s=<min>:<step>:<max> strand shifts at which cross-correlation is evaluated, default=-500:5:1500
    -speak=<strPeak> user-defined cross-correlation peak strandshift
    -x=<min>:<max> strand shifts to exclude (This is mainly to avoid region around phantom peak) default=10:(readlen+10)
    -p=<nodes> number of parallel processing nodes, default=0
    -fdr=<falseDisoveryRate> false discovery rate threshold for peak calling
    -npeak=<numPeaks> threshold on number of peaks to call
    -tmpdir=<tempdir> temporary directory (if not specified R function tempdir() is used)
    -filtchr=<chrnamePattern> pattern to use to remove tags that map to specific chromosomes e.g. _ will remove all tags that map to chromosomes with _ in their name
  • Output arguments

    argument description
    -odir=<outputDirectory> name of output directory (If not set same as ChIP file directory is used)
    -savn=<narrowpeakfilename> NarrowPeak file name (fixed width peaks)
    -savn
    -savr=<regionpeakfilename> RegionPeak file name (variable width peaks with regions of enrichment around peak summits)
    -savr
    -savd=<rdatafile> save Rdata file
    -savd
    -savp=<plotdatafile> save cross-correlation plot
    -savp
    -out=<resultfile> append peakshift/phantomPeak results to a file
    -rf if plot or rdata or narrowPeak file exists replace it. If not used then the run is aborted if the plot or Rdata or narrowPeak file exists
    -clean if used it will remove the original chip and control files after reading them in. CAUTION: Use only if the script calling run_spp.R is creating temporary files

Typical usage

  1. Determine strand cross-correlation peak / predominant fragment length OR print out quality measures. -out=<outFile> will create and/or append to a file named several important characteristics of the dataset.

    Rscript run_spp.R -c=<tagAlign/BAMfile> -savp -out=<outFile>
    

    The file contains 11 tab delimited columns.

    col. abbreviation description
    1 Filename tagAlign/BAM filename
    2 numReads effective sequencing depth i.e. total number of mapped reads in input file
    3 estFragLen comma separated strand cross-correlation peak(s) in decreasing order of correlation.
    4 corr_estFragLen comma separated strand cross-correlation value(s) in decreasing order (COL2 follows the same order)
    5 phantomPeak Read length/phantom peak strand shift
    6 corr_phantomPeak Correlation value at phantom peak
    7 argmin_corr strand shift at which cross-correlation is lowest
    8 min_corr minimum value of cross-correlation
    9 NSC Normalized strand cross-correlation coefficient (NSC) = COL4 / COL8
    10 RSC Relative strand cross-correlation coefficient (RSC) = (COL4 - COL8) / (COL6 - COL8)
    11 QualityTag Quality tag based on thresholded RSC (codes= -2:veryLow, -1:Low, 0:Medium, 1:High, 2:veryHigh)

    The top 3 local maxima locations that are within 90% of the maximum cross-correlation value are output. In almost all cases, the top (first) value in the list represents the predominant fragment length. If you want to keep only the top value simply run

    sed -r 's/,[^\t]+//g' <outFile> > <newOutFile>
    

    You can run the program on multiple datasets in parallel and append all the quality information to the same for a summary analysis.

    NSC values range from a minimum of 1 to larger positive numbers. 1.1 is the critical threshold. Datasets with NSC values much less than 1.1 (< 1.05) tend to have low signal to noise or few peaks (this could be biological eg.a factor that truly binds only a few sites in a particular tissue type OR it could be due to poor quality)

    RSC values range from 0 to larger positive values. 1 is the critical threshold. RSC values significantly lower than 1 (< 0.8) tend to have low signal to noise. The low scores can be due to failed and poor quality ChIP, low read sequence quality and hence lots of mismappings, shallow sequencing depth (significantly below saturation) or a combination of these. Like the NSC, datasets with few binding sites (< 200) which is biologically justifiable also show low RSC scores.

    Qtag is a thresholded version of RSC.

  2. Peak calling

    Rscript run_spp.R -c=<ChIP_tagalign/BAM_file> -i=<control_tagalign/BAM_file> -fdr=<fdr> -odir=<peak_call_output_dir> -savr -savp -savd -rf
    Rscript run_spp.R -c=<ChIP_tagalign/BAM_file> -i=<control_tagalign/BAM_file> -npeak=<npeaks> -odir=<peak_call_output_dir> -savr -savp -savd -rf
    
  3. For IDR analysis you want to call a large number of peaks (relaxed threshold) so that the IDR model has access to a sufficient noise component.

    Rscript run_spp.R -c=<ChIP_tagalign/BAM_file> -i=<control_tagalign/BAM_file> -npeak=300000 -odir=<peak_call_output_dir> -savr -savp -rf -out=<resultFile>
    

Notes

  • It is EXTREMELY important to filter out multi-mapping reads from the BAM/tagAlign files. Large number of multimapping reads can severly affect the phantom peak coefficient and peak calling results.

  • If a dataset seems to have high PCR bottlenecking, then you might want to actually clamp the number of unique mappping reads per position to 1 or upto 5. If not the phantom peak coefficient can be artificially good.

  • For the IDR rescue strategy, one needs to pool reads from replicates and then shuffle and subsample the mapped reads to create two balanced pseudoReplicates. This is much easier to implement on tagAlign/BED read-mapping files using the unix 'shuf' command. So it is recommended to use the tagAlign format.

  • In most cases, you can simply use the maximum reported strand correlation peak as the predominant fragment length. However, it is useful to manually take a look at the cross-correlation plot to make sure the selected max peak is not an artifact.

  • Also, if there are problems with library size-selection, a dataset's cross-correlation profile can have multiple strong cross-correlation peaks. This is currently not autodetected.

Input file formats

  1. BAM format: This is a binary alignment format specified in http://samtools.sourceforge.net/SAM-1.3.pdf You MUST have samtools installed to use run_spp.R with BAM files

  2. TagAlign files: This a text-based BED3+3 alignment format that is easier to manipulate. It contains 6 tab delimited columns.

    col. abbrv. type description
    1 chrom string Name of the chromosome
    2 chromStart int The starting position of the feature in the chromosome. The first base in a chromosome is numbered 0.
    3 chromEnd int The ending position of the feature in the chromosome or scaffold. The chromEnd base is not included in the display of the feature. For example, the first 100 bases of a chromosome are defined as chromStart=0, chromEnd=100, and span the bases numbered 0-99.
    4 sequence string Sequence of this read
    5 score int Indicates uniqueness or quality (preferably 1000/alignmentCount).
    6 strand char Orientation of this read (+ or -)

NOTE: You dont have to store the sequence of reads in the sequence field as the peak caller never really uses that field. You can just put the letter 'N' in that field. This saves space significantly.

For the IDR rescue strategy, one needs to use shuffled and subsampled version of the alignment files. This is much easier to implement on tagAlign text files using the unix shuf command. So it is recommended to preferably use the tagAlign format.

Converting BAM to TAGALIGN files

It is very quick to convert pre-filtered BAM files (after removing unmapped, low quality reads, multimapping reads and duplicates) to gzipped tagAlign files using

samtools view -F 0x0204 -o - <bamFile> | awk 'BEGIN{OFS="\t"}{if (and($2,16) > 0) {print $3,($4-1),($4-1+length($10)),"N","1000","-"} else {print $3,($4-1),($4-1+length($10)),"N","1000","+"} }' | gzip -c > <gzip_TagAlignFileName>

Output file formats

  1. NarrowPeak/RegionPeak format: The output peak file is in BED6+4 format known as tagAlign. It consists of 10 tab-delimited columns

    col. abbrv. type description
    1 chrom string Name of the chromosome
    2 chromStart int The starting position of the feature in the chromosome. The first base in a chromosome is numbered 0.
    3 chromEnd int The ending position of the feature in the chromosome or scaffold. The chromEnd base is not included in the display of the feature. For example, the first 100 bases of a chromosome are defined as chromStart=0, chromEnd=100, and span the bases numbered 0-99.
    4 name string Name given to a region (preferably unique). Use '.' if no name is assigned.
    5 score int Indicates how dark the peak will be displayed in the browser (1-1000). If '0', the DCC will assign this based on signal value. Ideally average signalValue per base spread between 100-1000.
    6 strand char +/- to denote strand or orientation (whenever applicable). Use '.' if no orientation is assigned.
    7 signalValue float Measurement of overall (usually, average) enrichment for the region.
    8 pValue float Measurement of statistical signficance (-log10). Use -1 if no pValue is assigned.
    9 qValue float Measurement of statistical significance using false discovery rate (-log10). Use -1 if no qValue is assigned.
    10 peak int Point-source called for this peak; 0-based offset from chromStart. Use -1 if no point-source called.

Releted pipelines and tools

References

ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Landt SG*, Marinov GK*, Kundaje A*, Kheradpour P*, Pauli F, Batzoglou S, Bernstein BE, Bickel P, Brown JB, Cayting P, Chen Y, DeSalvo G, Epstein C, Fisher-Aylor KI, Euskirchen G, Gerstein M, Gertz J, Hartemink AJ, Hoffman MM, Iyer VR, Jung YL, Karmakar S, Kellis M, Kharchenko PV, Li Q, Liu T, Liu XS, Ma L, Milosavljevic A, Myers RM, Park PJ, Pazin MJ, Perry MD, Raha D, Reddy TE, Rozowsky J, Shoresh N, Sidow A, Slattery M, Stamatoyannopoulos JA, Tolstorukov MY, White KP, Xi S, Farnham PJ, Lieb JD, Wold BJ, Snyder M. Genome Res. 2012 Sep;22(9):1813-31. doi: 10.1101/gr.136184.111.

Contributors

  • Anshul Kundaje - Assistant Professor, Dept. of Genetics, Stanford University (email: anshul AT kundaje DOT net)
  • Youngsook Lucy Jung - Postdoctoral Fellow, Park Lab, Harvard Medical School
  • Peter Kharchenko - Assistant Professor, Dept. of Biomedical Informatics, Harvard Medical School
  • Peter Park - Associate Professor, Dept. of Biomedical Informatics, Harvard Medical School

phantompeakqualtools's People

Contributors

akundaje avatar annashcherbina avatar leepc12 avatar russhyde avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

phantompeakqualtools's Issues

Negative Strand length

Hello,

I am performing peak calling with various tools (macs2, PeakSeq, CSAR, SPP). In order to perform peak calling in macs2 without model building (--nomodel and --ext size) we need to provide an estimated fragment length. Since run_spp.R provides the information of the estimated fragment length, I performed the analysis.

It is observed that the peak quality of the chip bam file is "2" and strand shift value with maximum correlation at "-5". I couldn't understand the result. The next two-strand shift values are 190 and 215. Can you or anyone help me to understand the result?

I posted a similar question on "[biostar]" but I didn't get any response.

Thank you

Getting lots of compilation errors on install

# ... many more error messages removed - last one fatal
bed2vector.cpp:2530:77: error: qualified-id in declaration before 'li'
       hash_map<string, int, hash<string>,equal_to<string> >::const_iterator li=cind_map.find(chr);
                                                                             ^~
bed2vector.cpp:2532:10: error: 'li' was not declared in this scope
       if(li==cind_map.end()) {
          ^~
bed2vector.cpp:2532:23: error: request for member 'end' in 'cind_map', which is of non-class type 'int'
       if(li==cind_map.end()) {
                       ^~~
bed2vector.cpp:2536:10: error: no match for 'operator[]' (operand types are 'int' and 'std::__cxx11::string {aka std::__cxx11::basic_string<char>}')
  cind_map[chr]=cind;
          ^
/mnt/work/endrebak/software/anaconda/lib/R/etc/Makeconf:167: recipe for target 'bed2vector.o' failed
make: *** [bed2vector.o] Error 1
ERROR: compilation failed for package 'spp'
* removing '/mnt/work/endrebak/software/anaconda/lib/R/library/spp'

another problem: Error: protect(): protection stack overflow

Hi,

another problem with phantompeakqual or with R

Decompressing ChIP file
Decompressing control file
Loading required package: caTools
Reading ChIP tagAlign/BAM file Hs_940_temp//M940_Crep2.tagAlign.gz 
opened Hs_940_temp/M940_Crep2_spp_tmp//M940_Crep2.tagAlign144c7b15c3b9
done. read 42273553 fragments
ChIP data read length 37 
[1] TRUE
Reading Control tagAlign/BAM file Hs_940_.temp/control.tagAlign.gz 
opened Hs_940_.temp/M940_Crep2_spp_tmp//control.tagAlign144c62adab27
done. read 2922304 fragments
Error: protect(): protection stack overflow
Execution halted
Error: protect(): protection stack overflow
Execution halted

I googled and found http://stackoverflow.com/questions/28728774/how-to-set-max-ppsize-in-r

after adding one line

options("experssion" = 500000)

on top of the run_spp.R script, same error.
Thanks for looking into this.
Tommy

multiple datasets

Hi,

How do you run phantompeakqualtools for multiple dataset? How to specify multiple input files?

Use phantompeakqualtools in CUT&Tag?

The software phantompeakqualtools could be used in CUT&Tag data? Here is the plot created by phantompeakqualtools. The antibody used was H3K27ac, and the library was sequenced based on PE150 strategy. I can't understand why the highest peak shown in the left of the so-called phantom peak, which marked by the blue dashed line. And what's the meaning of the highest peak. Any advice would be appreciated!
Plot

run_spp, not tag.lwcc error

Hi,

I have a problem when I run run_spp.R:
Error in argfun(job) : object 'tag.lwcc' not found
Calls: find.binding.positions ... sendData -> sendData.SOCKnode -> serialize -> argfun
Execution halted
Thanks a lot!

Kun

Unable to install spp_1.14.tar.gz - boost version minimum?

Installation of spp R package gives many errors:

bed2vector.cpp: In function 'SEXPREC* read_bed_ends(SEXP)':
bed2vector.cpp:118:55: error: template argument 3 is invalid
hash_map<string, int, hash,equal_to > cind_map;
^
bed2vector.cpp:163:59: error: template argument 3 is invalid
hash_map<string, int, hash,equal_to >::const_iterator li=cind_map.find(chr);
^
bed2vector.cpp:163:77: error: qualified-id in declaration before 'li'
hash_map<string, int, hash,equal_to >::const_iterator li=cind_map.find(chr);
^~
bed2vector.cpp:165:10: error: 'li' was not declared in this scope
if(li==cind_map.end()) {
^~
bed2vector.cpp:165:23: error: request for member 'end' in 'cind_map', which is of non-class type 'int'
if(li==cind_map.end()) {
^~~
bed2vector.cpp:169:10: error: no match for 'operator[]' (operand types are 'int' and 'std::__cxx11::string {aka std::__cxx11::basic_string}')
cind_map[chr]=cind;

Is this perhaps related to wrong boost version? What version of boost libs is needed?

CentOS 6
R version 3.5.1
boost 1.69.0 py37h8619c78_1001 conda-forge
boost-cpp 1.69.0 h11c811c_1000 conda-forge

need to load caTools

In the script run_spp.R at line 650 you also need to load caTools otherwise you get an error

library(caTools)

Adding flags to simulate SET from PET bams in lines 448-450 (run_spp.R)

I am looking at lines 448-450 in run_spp.R and it looks like one can simulate RSC and NSC computation on PET bam file, by running adding samtools flags to select only the first mate or second mate in lines 448-450. Ideally one could compute both ways and report the average? Is there something I missed?

It would save some compute if one doesn't have to write bam files containing only one mate when run_spp.R does the tagaAlign conversion anyway.

awk: line 2: function and never defined

Hello,
The conversion from BAM to tagAlign is not working for me. It gives me the following error: awk: line 2: function and never defined. Any ideas? I'm sure I'm just forgetting something in the command line...

samtools view -F 0x0204 -o - CNVR192.R1.filtered01.bam | awk 'BEGIN{OFS="\t"}{if (and($2,16) > 0) {print $3,($4-1),($4-1+length($10)),"N","1000","-"} else {print $3,($4-1),($4-1+length($10)),"N","1000","+"} }' | gzip -c > CNVR192.tagAlign.gz

Thanks.
Rita

install.packages("./spp_1.14.tar.gz") Error

Hello,
While installing the spp_1.14.tar.gz package with R 3.6.1 on CentOS 7, compilation of bed2vector.cpp fails. Here is the R output:

> install.packages("./spp_1.14.tar.gz")
inferring 'repos = NULL' from 'pkgs'
Warning in untar2(tarfile, files, list, exdir, restore_times) :
  skipping pax global extended headers
* installing *source* package ‘spp’ ...
** using staged installation
checking for gcc... /home/msimenc/software/anaconda3/envs/R/bin/x86_64-conda_cos6-linux-gnu-cc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether /home/msimenc/software/anaconda3/envs/R/bin/x86_64-conda_cos6-linux-gnu-cc accepts -g... yes
checking for /home/msimenc/software/anaconda3/envs/R/bin/x86_64-conda_cos6-linux-gnu-cc option to accept ISO C89... none needed
checking for BZ2_bzDecompressInit in -lbz2... yes
checking whether we are using the GNU C++ compiler... yes
checking whether /home/msimenc/software/anaconda3/envs/R/bin/x86_64-conda_cos6-linux-gnu-c++ accepts -g... yes
checking for grep that handles long lines and -e... /usr/bin/grep
checking how to run the C++ preprocessor... /home/msimenc/software/anaconda3/envs/R/bin/x86_64-conda_cos6-linux-gnu-c++ -E
checking for a sed that does not truncate output... /home/msimenc/software/anaconda3/envs/R/bin/sed
checking for Boost headers version >= 1.41.0... /usr/include
checking for Boost's header version... 1_53
configure: creating ./config.status
config.status: creating src/Makevars
** libs
x86_64-conda_cos6-linux-gnu-c++ -std=gnu++11 -I"/home/msimenc/software/anaconda3/envs/R/lib/R/include" -DNDEBUG   -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/msimenc/software/anaconda3/envs/R/include -I/home/msimenc/software/anaconda3/envs/R/include -Wl,-rpath-link,/home/msimenc/software/anaconda3/envs/R/lib -I./ -D_FASTMAP -DMAQ_LONGREADS  -I/usr/include -fpic  -fvisibility-inlines-hidden  -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/msimenc/software/anaconda3/envs/R/include -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/r-base_1574178861180/work=/usr/local/src/conda/r-base-3.6.1 -fdebug-prefix-map=/home/msimenc/software/anaconda3/envs/R=/usr/local/src/conda-prefix  -c bed2vector.cpp -o bed2vector.o
In file included from /home/msimenc/software/anaconda3/envs/R/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/ext/hash_map:60:0,
                 from pc.h:4,
                 from bed2vector.cpp:1:
/home/msimenc/software/anaconda3/envs/R/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/backward/backward_warning.h:32:2: warning: #warning This file includes at least one deprecated or antiquated header which may be removed without further notice at a future date. Please use a non-deprecated interface with equivalent functionality instead. For a listing of replacement headers and interfaces, consult the file backward_warning.h. To disable this warning use -Wno-deprecated. [-Wcpp]
 #warning \
  ^~~~~~~
bed2vector.cpp: In function 'SEXPREC* read_bed_ends(SEXP)':
bed2vector.cpp:118:55: error: template argument 3 is invalid
   hash_map<string, int, hash<string>,equal_to<string> > cind_map;
                                                       ^
bed2vector.cpp:163:59: error: template argument 3 is invalid
       hash_map<string, int, hash<string>,equal_to<string> >::const_iterator li=cind_map.find(chr);
                                                           ^
bed2vector.cpp:163:77: error: qualified-id in declaration before 'li'
       hash_map<string, int, hash<string>,equal_to<string> >::const_iterator li=cind_map.find(chr);
                                                                             ^~
bed2vector.cpp:165:10: error: 'li' was not declared in this scope
       if(li==cind_map.end()) {
          ^~
bed2vector.cpp:165:23: error: request for member 'end' in 'cind_map', which is of non-class type 'int'
       if(li==cind_map.end()) {
                       ^~~
bed2vector.cpp:169:10: error: no match for 'operator[]' (operand types are 'int' and 'std::__cxx11::string {aka std::__cxx11::basic_string<char>}')
  cind_map[chr]=cind;
          ^
bed2vector.cpp: In function 'SEXPREC* read_meland_old(SEXP)':
bed2vector.cpp:254:55: error: template argument 3 is invalid
   hash_map<string, int, hash<string>,equal_to<string> > cind_map;
                                                       ^
bed2vector.cpp:296:59: error: template argument 3 is invalid
       hash_map<string, int, hash<string>,equal_to<string> >::const_iterator li=cind_map.find(chr);
                                                           ^
bed2vector.cpp:296:77: error: qualified-id in declaration before 'li'
       hash_map<string, int, hash<string>,equal_to<string> >::const_iterator li=cind_map.find(chr);
                                                                             ^~
bed2vector.cpp:298:10: error: 'li' was not declared in this scope
       if(li==cind_map.end()) {
          ^~
bed2vector.cpp:298:23: error: request for member 'end' in 'cind_map', which is of non-class type 'int'
       if(li==cind_map.end()) {
                       ^~~
bed2vector.cpp:302:10: error: no match for 'operator[]' (operand types are 'int' and 'std::__cxx11::string {aka std::__cxx11::basic_string<char>}')
  cind_map[chr]=cind;
          ^
bed2vector.cpp: In function 'SEXPREC* read_meland(SEXP, SEXP)':
bed2vector.cpp:432:55: error: template argument 3 is invalid
   hash_map<string, int, hash<string>,equal_to<string> > cind_map;
                                                       ^
bed2vector.cpp:476:59: error: template argument 3 is invalid
       hash_map<string, int, hash<string>,equal_to<string> >::const_iterator li=cind_map.find(chr);
                                                           ^
bed2vector.cpp:476:77: error: qualified-id in declaration before 'li'
       hash_map<string, int, hash<string>,equal_to<string> >::const_iterator li=cind_map.find(chr);
                                                                             ^~
bed2vector.cpp:478:10: error: 'li' was not declared in this scope
       if(li==cind_map.end()) {
          ^~
bed2vector.cpp:478:23: error: request for member 'end' in 'cind_map', which is of non-class type 'int'
       if(li==cind_map.end()) {
                       ^~~
bed2vector.cpp:482:10: error: no match for 'operator[]' (operand types are 'int' and 'std::__cxx11::string {aka std::__cxx11::basic_string<char>}')
  cind_map[chr]=cind;
          ^
bed2vector.cpp: In function 'SEXPREC* read_eland_mismatches(SEXP)':
bed2vector.cpp:624:55: error: template argument 3 is invalid
   hash_map<string, int, hash<string>,equal_to<string> > cind_map;
                                                       ^
bed2vector.cpp:692:59: error: template argument 3 is invalid
       hash_map<string, int, hash<string>,equal_to<string> >::const_iterator li=cind_map.find(chr);
                                                           ^
bed2vector.cpp:692:77: error: qualified-id in declaration before 'li'
       hash_map<string, int, hash<string>,equal_to<string> >::const_iterator li=cind_map.find(chr);
                                                                             ^~
bed2vector.cpp:694:10: error: 'li' was not declared in this scope
       if(li==cind_map.end()) {
          ^~
bed2vector.cpp:694:23: error: request for member 'end' in 'cind_map', which is of non-class type 'int'
       if(li==cind_map.end()) {
                       ^~~
bed2vector.cpp:698:10: error: no match for 'operator[]' (operand types are 'int' and 'std::__cxx11::string {aka std::__cxx11::basic_string<char>}')
  cind_map[chr]=cind;
          ^
bed2vector.cpp: In function 'SEXPREC* read_eland(SEXP, SEXP, SEXP)':
bed2vector.cpp:817:55: error: template argument 3 is invalid
   hash_map<string, int, hash<string>,equal_to<string> > cind_map;
                                                       ^
bed2vector.cpp:868:59: error: template argument 3 is invalid
       hash_map<string, int, hash<string>,equal_to<string> >::const_iterator li=cind_map.find(chr);
                                                           ^
bed2vector.cpp:868:77: error: qualified-id in declaration before 'li'
       hash_map<string, int, hash<string>,equal_to<string> >::const_iterator li=cind_map.find(chr);
                                                                             ^~
bed2vector.cpp:870:10: error: 'li' was not declared in this scope
       if(li==cind_map.end()) {
          ^~
bed2vector.cpp:870:23: error: request for member 'end' in 'cind_map', which is of non-class type 'int'
       if(li==cind_map.end()) {
                       ^~~
bed2vector.cpp:874:10: error: no match for 'operator[]' (operand types are 'int' and 'std::__cxx11::string {aka std::__cxx11::basic_string<char>}')
  cind_map[chr]=cind;
          ^
bed2vector.cpp: In function 'SEXPREC* read_eland_extended(SEXP, SEXP, SEXP)':
bed2vector.cpp:1004:55: error: template argument 3 is invalid
   hash_map<string, int, hash<string>,equal_to<string> > cind_map;
                                                       ^
bed2vector.cpp:1078:59: error: template argument 3 is invalid
       hash_map<string, int, hash<string>,equal_to<string> >::const_iterator li=cind_map.find(chr);
                                                           ^
bed2vector.cpp:1078:77: error: qualified-id in declaration before 'li'
       hash_map<string, int, hash<string>,equal_to<string> >::const_iterator li=cind_map.find(chr);
                                                                             ^~
bed2vector.cpp:1080:10: error: 'li' was not declared in this scope
       if(li==cind_map.end()) {
          ^~
bed2vector.cpp:1080:23: error: request for member 'end' in 'cind_map', which is of non-class type 'int'
       if(li==cind_map.end()) {
                       ^~~
bed2vector.cpp:1084:10: error: no match for 'operator[]' (operand types are 'int' and 'std::__cxx11::string {aka std::__cxx11::basic_string<char>}')
  cind_map[chr]=cind;
          ^
bed2vector.cpp: In function 'SEXPREC* read_eland_multi(SEXP, SEXP, SEXP)':
bed2vector.cpp:1213:55: error: template argument 3 is invalid
   hash_map<string, int, hash<string>,equal_to<string> > cind_map;
                                                       ^
bed2vector.cpp:1366:59: error: template argument 3 is invalid
       hash_map<string, int, hash<string>,equal_to<string> >::const_iterator li=cind_map.find(chr);
                                                           ^
bed2vector.cpp:1366:77: error: qualified-id in declaration before 'li'
       hash_map<string, int, hash<string>,equal_to<string> >::const_iterator li=cind_map.find(chr);
                                                                             ^~
bed2vector.cpp:1368:10: error: 'li' was not declared in this scope
       if(li==cind_map.end()) {
          ^~
bed2vector.cpp:1368:23: error: request for member 'end' in 'cind_map', which is of non-class type 'int'
       if(li==cind_map.end()) {
                       ^~~
bed2vector.cpp:1372:10: error: no match for 'operator[]' (operand types are 'int' and 'std::__cxx11::string {aka std::__cxx11::basic_string<char>}')
  cind_map[chr]=cind;
          ^
bed2vector.cpp: In function 'SEXPREC* read_bowtie(SEXP, SEXP)':
bed2vector.cpp:1500:55: error: template argument 3 is invalid
   hash_map<string, int, hash<string>,equal_to<string> > cind_map;
                                                       ^
bed2vector.cpp:1575:59: error: template argument 3 is invalid
       hash_map<string, int, hash<string>,equal_to<string> >::const_iterator li=cind_map.find(chr);
                                                           ^
bed2vector.cpp:1575:77: error: qualified-id in declaration before 'li'
       hash_map<string, int, hash<string>,equal_to<string> >::const_iterator li=cind_map.find(chr);
                                                                             ^~
bed2vector.cpp:1577:10: error: 'li' was not declared in this scope
       if(li==cind_map.end()) {
          ^~
bed2vector.cpp:1577:23: error: request for member 'end' in 'cind_map', which is of non-class type 'int'
       if(li==cind_map.end()) {
                       ^~~
bed2vector.cpp:1581:10: error: no match for 'operator[]' (operand types are 'int' and 'std::__cxx11::string {aka std::__cxx11::basic_string<char>}')
  cind_map[chr]=cind;
          ^
bed2vector.cpp: In function 'SEXPREC* read_helicostabf(SEXP, SEXP)':
bed2vector.cpp:1714:55: error: template argument 3 is invalid
   hash_map<string, int, hash<string>,equal_to<string> > cind_map;
                                                       ^
bed2vector.cpp:1792:59: error: template argument 3 is invalid
       hash_map<string, int, hash<string>,equal_to<string> >::const_iterator li=cind_map.find(chr);
                                                           ^
bed2vector.cpp:1792:77: error: qualified-id in declaration before 'li'
       hash_map<string, int, hash<string>,equal_to<string> >::const_iterator li=cind_map.find(chr);
                                                                             ^~
bed2vector.cpp:1794:10: error: 'li' was not declared in this scope
       if(li==cind_map.end()) {
          ^~
bed2vector.cpp:1794:23: error: request for member 'end' in 'cind_map', which is of non-class type 'int'
       if(li==cind_map.end()) {
                       ^~~
bed2vector.cpp:1798:10: error: no match for 'operator[]' (operand types are 'int' and 'std::__cxx11::string {aka std::__cxx11::basic_string<char>}')
  cind_map[chr]=cind;
          ^
bed2vector.cpp: In function 'SEXPREC* read_maqmap(SEXP, SEXP)':
bed2vector.cpp:1941:55: error: template argument 3 is invalid
   hash_map<string, int, hash<string>,equal_to<string> > cind_map;
                                                       ^
bed2vector.cpp:1984:59: error: template argument 3 is invalid
       hash_map<string, int, hash<string>,equal_to<string> >::const_iterator li=cind_map.find(chr);
                                                           ^
bed2vector.cpp:1984:77: error: qualified-id in declaration before 'li'
       hash_map<string, int, hash<string>,equal_to<string> >::const_iterator li=cind_map.find(chr);
                                                                             ^~
bed2vector.cpp:1986:10: error: 'li' was not declared in this scope
       if(li==cind_map.end()) {
          ^~
bed2vector.cpp:1986:23: error: request for member 'end' in 'cind_map', which is of non-class type 'int'
       if(li==cind_map.end()) {
                       ^~~
bed2vector.cpp:1990:10: error: no match for 'operator[]' (operand types are 'int' and 'std::__cxx11::string {aka std::__cxx11::basic_string<char>}')
  cind_map[chr]=cind;
          ^
bed2vector.cpp: In function 'SEXPREC* read_tagalign(SEXP)':
bed2vector.cpp:2119:55: error: template argument 3 is invalid
   hash_map<string, int, hash<string>,equal_to<string> > cind_map;
                                                       ^
bed2vector.cpp:2161:59: error: template argument 3 is invalid
       hash_map<string, int, hash<string>,equal_to<string> >::const_iterator li=cind_map.find(chr);
                                                           ^
bed2vector.cpp:2161:77: error: qualified-id in declaration before 'li'
       hash_map<string, int, hash<string>,equal_to<string> >::const_iterator li=cind_map.find(chr);
                                                                             ^~
bed2vector.cpp:2163:10: error: 'li' was not declared in this scope
       if(li==cind_map.end()) {
          ^~
bed2vector.cpp:2163:23: error: request for member 'end' in 'cind_map', which is of non-class type 'int'
       if(li==cind_map.end()) {
                       ^~~
bed2vector.cpp:2167:10: error: no match for 'operator[]' (operand types are 'int' and 'std::__cxx11::string {aka std::__cxx11::basic_string<char>}')
  cind_map[chr]=cind;
          ^
bed2vector.cpp: In function 'SEXPREC* read_arachne(SEXP)':
bed2vector.cpp:2270:55: error: template argument 3 is invalid
   hash_map<string, int, hash<string>,equal_to<string> > cind_map;
                                                       ^
bed2vector.cpp:2329:59: error: template argument 3 is invalid
       hash_map<string, int, hash<string>,equal_to<string> >::const_iterator li=cind_map.find(chr);
                                                           ^
bed2vector.cpp:2329:77: error: qualified-id in declaration before 'li'
       hash_map<string, int, hash<string>,equal_to<string> >::const_iterator li=cind_map.find(chr);
                                                                             ^~
bed2vector.cpp:2331:10: error: 'li' was not declared in this scope
       if(li==cind_map.end()) {
          ^~
bed2vector.cpp:2331:23: error: request for member 'end' in 'cind_map', which is of non-class type 'int'
       if(li==cind_map.end()) {
                       ^~~
bed2vector.cpp:2335:10: error: no match for 'operator[]' (operand types are 'int' and 'std::__cxx11::string {aka std::__cxx11::basic_string<char>}')
  cind_map[chr]=cind;
          ^
bed2vector.cpp: In function 'SEXPREC* read_arachne_long(SEXP)':
bed2vector.cpp:2441:55: error: template argument 3 is invalid
   hash_map<string, int, hash<string>,equal_to<string> > cind_map;
                                                       ^
bed2vector.cpp:2530:59: error: template argument 3 is invalid
       hash_map<string, int, hash<string>,equal_to<string> >::const_iterator li=cind_map.find(chr);
                                                           ^
bed2vector.cpp:2530:77: error: qualified-id in declaration before 'li'
       hash_map<string, int, hash<string>,equal_to<string> >::const_iterator li=cind_map.find(chr);
                                                                             ^~
bed2vector.cpp:2532:10: error: 'li' was not declared in this scope
       if(li==cind_map.end()) {
          ^~
bed2vector.cpp:2532:23: error: request for member 'end' in 'cind_map', which is of non-class type 'int'
       if(li==cind_map.end()) {
                       ^~~
bed2vector.cpp:2536:10: error: no match for 'operator[]' (operand types are 'int' and 'std::__cxx11::string {aka std::__cxx11::basic_string<char>}')
  cind_map[chr]=cind;
          ^
make: *** [/home/msimenc/software/anaconda3/envs/R/lib/R/etc/Makeconf:175: bed2vector.o] Error 1
ERROR: compilation failed for package ‘spp’
* removing ‘/home/msimenc/software/anaconda3/envs/R/lib/R/library/spp’
Warning message:
In install.packages("./spp_1.14.tar.gz") :
  installation of package ‘./spp_1.14.tar.gz’ had non-zero exit status

Peak Calling - Error when calculating statistical thresholds

I have been trying to call peaks/generate narrow peaks files for a number of chip replicates and psuedo-replicates using the following settings:

-npeak=300000
-speak={Fragment_Length_Estimated_Previously_from_spp}
-x=-500:{Max_Exclusion_Range_Estimated_From_First_100_Reads}

It has run without errors on about half of the files but for others I am getting the following error:

""
calculating statistical thresholds
Error in if (!is.null(k) & dim(k)[1] > 0) { : argument is of length zero
Calls: find.binding.positions -> lwcc.prediction -> do.call -> lapply -> FUN
Execution halted
""

Based on a similar issue posted here where npeak files where not saved I have tried:

  • changing the -fdr to 0.95
  • removing the -npeak threshold
  • specifying the npeak file with -savn='filename'
  • a combination of all of the above.

But I still get the same error.

Any help on fixing this issue would be much appreciated!

unable to save narrowpeak data

Trying to run a bam file, but I get errors when I add the "-savn" option (everything else runs fine when I omit the "-savn").

$ Rscript run_spp.R -c= -savp -savn -savd -savp -odir=files/phantompeak -out=files/phantompeak/peakshift -rf
Loading required package: caTools
Error in if (is.na(control.file) && !is.na(output.npeak.file)) { :
missing value where TRUE/FALSE needed
Calls: parse.arguments
In addition: Warning message:
In is.na(output.npeak.file) :
is.na() applied to non-(list or vector) of type 'NULL'
Execution halted

Also, is there a way to identify later on which peaks are likely to be phantompeaks after peak calling?

Add to bioconda?

Should this be added to bioconda?

If so, how modified is the SPP version? Would it be possible to make use of the SPP already in bioconda?
In your atac-seq-pipeline, you use r-spp v1.13. Is that sufficient?

cannot find runmean function

In this line:
c$y <- runmean(cc$y,sbw,alg="fast")

in newer R version(atleast, 3.4.4), runmean is not found. Instead, library(caTools) should be called, then runmean can be used.

install.packages("./spp_1.14.tar.gz"), error.

I googled this chat, install.packages("./spp_1.14.tar.gz"). The post here complained that the package cannot be installed. I replicated the same error <inferring 'repos = NULL' from 'pkgs' Error in getOctD(x, offset, len) : invalid octal digit>. Is there any progress on this issue or I should upgrade some additional packages? Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.