kundajelab / chipseq_pipeline Goto Github PK

View Code? Open in Web Editor NEW

105.0 40.0 40.0 1.55 MB

AQUAS TF and histone ChIP-seq pipeline

License: BSD 3-Clause "New" or "Revised" License

Shell 5.17% CSS 0.17% JavaScript 4.23% HTML 0.29% Python 17.18% Java 70.98% Perl 1.21% AngelScript 0.78%

chip-seq-pipelines chip-seq genomics

chipseq_pipeline's Issues

Fatal error: /data/fmao/soft/chipseq_pipeline/modules/graphviz.bds, line 126, pos 18. Trying to access element number 1 from list 'etc' (list size: 1).

Final parameter values: [0.86 1.24 0.53 0.01]
Number of reported peaks - 24706/24706 (100.0%)

Number of peaks passing IDR cutoff of 0.05 - 7/24706 (0.0%)

Task has finished (119 seconds).
62944 (process ID) old priority 0, new priority 0
Task has finished (0 seconds).

== Done do_idr()
63010 (process ID) old priority 0, new priority 0
63265 (process ID) old priority 0, new priority 0
Fatal error: /data/fmao/soft/chipseq_pipeline/modules/graphviz.bds, line 126, pos 18. Trying to access element number 1 from list 'etc' (list size: 1).
chipseq.bds, line 67 : main()
chipseq.bds, line 70 : void main() { // chipseq pipeline starts here
chipseq.bds, line 92 : report()
chipseq.bds, line 1204 : void report() {
chipseq.bds, line 1211 : html += html_graph() // graphviz workflow diagram
graphviz.bds, line 83 : string html_graph() { // graph diagram
graphviz.bds, line 90 : dot := _make_dot("$rpt_aux_dir/$prefix"+"workflow.dot")
graphviz.bds, line 108 : string _make_dot( string file ) {
graphviz.bds, line 120 : for ( int k=0; k<_tmp_in.size(); k++ ) {
graphviz.bds, line 126 : box_name := etc[1]

Creating checkpoint file: Config or command line option disabled checkpoint file creation, nothing done.

Fatal error: TF_chipseq_pipeline/modules/align_bwa.bds, line 87, pos 3. Task/s failed.

Fatal error: TF_chipseq_pipeline/modules/align_bwa.bds, line 87, pos 3. Task/s failed.
chipseq.bds, line 76 : main()
chipseq.bds, line 79 : void main() { // chipseq pipeline starts here
chipseq.bds, line 87 : align() // align and postalign
chipseq.bds, line 318 : void align() {
chipseq.bds, line 355 : for ( int ctl=0; ctl <= 1; ctl++) { // iterate through inputs (ctl==0 : exp. replicate, ctl==1 : control)
chipseq.bds, line 358 : for ( int rep=1; rep <= get_num_rep( ctl ); rep++) {
chipseq.bds, line 364 : if ( no_par ) align( ctl, rep, nth_rep{group} ) // parallel jobs for align() for each replicate and each control
chipseq.bds, line 374 : void align( int ctl, int rep, int nth_rep ) {
chipseq.bds, line 376 : if ( is_se( ctl, rep ) ) align_SE( ctl, rep, nth_rep )
chipseq.bds, line 377 : else align_PE( ctl, rep, nth_rep )
chipseq.bds, line 550 : void align_PE( int ctl, int rep, int nth_rep ) {
chipseq.bds, line 563 : if ( is_input_fastq( ctl, rep ) ) {
chipseq.bds, line 582 : if ( aligner == "bwa" ) {
chipseq.bds, line 583 : ( bam_, flagstat_qc_ ) = bwa_PE( pooled_fastq_pair1, pooled_fastq_pair2, aln_o_dir, qc_o_dir, group, nth_rep )
align_bwa.bds, line 66 : string[] bwa_PE( string fastq1, string fastq2, string o_dir, string log_o_dir, string group, int nth_bwa ) {
align_bwa.bds, line 76 : if ( out <- in ) { // compare file timestamps of in and out (to check if job is already done or not)
align_bwa.bds, line 87 : wait

Creating checkpoint file: Config or command line option disabled checkpoint file creation, nothing done.
00:00:51.033 Writing report file 'chipseq.bds.20170529_194354_069.report.html'
00:00:51.155 Program 'chipseq.bds.20170529_194354_069' finished, exit value: 1, tasks executed: 1, tasks failed: 1, tasks failed names: bwa_sam_PE rep1.
00:00:51.156 Finished. Exit code: 1
00:00:51.156 ExecutionerLocal 'Local[41]': Killed

Installing Genomes: TwoBitToFa not found

Hi!

I installed the pipeline then tried to install a genome but came across this

install_genome_data.sh: line 171: twoBitToFa: command not found

I tried to install twobittofa using conda but ran into some issues. Is there any workaround?

Fatal error: chipseq.bds, line 1354, pos 2. Task/s failed.

Hi @leepc12 ,

Hope you're doing well.

I'm running into the below error in call_peaks():

Fatal error: /path/to/chipseq_pipeline/chipseq.bds, line 1354, pos 2. Task/s failed.
chipseq.bds, line 91 :	main()
chipseq.bds, line 94 :	void main() { // chipseq pipeline starts here
chipseq.bds, line 103 :		if ( !pe_xcor_only ) {
chipseq.bds, line 105 :			call_peaks() // call peaks in parallel (MACS2,SPP)
chipseq.bds, line 929 :	void call_peaks() {
chipseq.bds, line 1354 :		wait

Here are the commands I used:

srun -p part1 --mem=1G --pty /bin/bash

python /path/to/chipseq.py -type histone -final_stage idr -q part1 -system slurm -title job_1 -screen job_1 -species hg19 -nth 8 -no_jsd -mem_macs2 20G -fastq1 /path/to/sample_1.fastq -fastq2 /path/to/sample_2.fastq -ctl_fastq1 /path/to/ctl_1.fastq -ctl_fastq2 /path/to/ctl_2.fastq -out_dir /path/to/peaks

Full job log attached here:
job_1.txt

I would really appreciate any help with debugging this issue.

-Easwaran

Issue with bds

Hi, I installed the piepline as suggested in README here without any error, but I get the below error when I just type bds or try to run the pipeline. Do you know how can I fix this? I have installed bds as recommended in the GitHub readme. I am not sure if it's a bds issue or something with our cluster. Thanks in advance for your help.

Exception in thread "main" java.lang.UnsupportedClassVersionError: org/bds/Bds : Unsupported major.minor version 52.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482)

[fread] Unexpected end of file (bwa_sai2sam_pe_core)

I am getting the following error for a paired end chipseq data.
.
.
.
[bwa_read_seq] 2.6% bases are trimmed.
[bwa_read_seq] 1.8% bases are trimmed.
[bwa_sai2sam_pe_core] convert to sequence coordinate...
[fread] Unexpected end of file

00:18:26.669 Wait: Task 'chipseq.bds.20180516_143724_219_parallel_30/task.align_bwa.bwa_sam_PE_ctl1.line_181.id_10' finished.
Fatal error: /path/TF_chipseq_pipeline/modules/align_bwa.bds, line 87, pos 3. Task/s failed.

I checked that the fastq files (i.e. for both replicate and control replicate, i checked the pairing of reads, between read1 and read2, it looks ok).

Interestingly this error is coming at the end of aligning a control replicate. The same control replicate I have used as control for a different replicate. There the whole pipeline ran till completion. How do I fix this? Thanks in advance...

Can't download hg19 genome data

Hello,

There is an issue downloading hg19 genome data. Stuck at "HTTP request sent, awaiting response..."
hg38 can be downloaded.

Can you please solve? Thanks!

How to modify bds if my SGE cluster using vf=memory,p=n_cpu?

I succeed in submitting my jobs using this pipeline. However, I found the resource claimed in bds.config does not work so the pipeline would run even if my cluster is full of jobs. My SGE cluster used vf=memory,p=n_cpu instead of h_vmem and slots. We used to submit jobs like:

qsub -cwd -l vf=10g,p=8 -V run.sh

In my bds.config, I used:

sge.pe = make
sge.mem = vf
sge.timeout = h_rt
sge.timeout2 = s_rt
clusterRunAdditionalArgs = -V

The question is how could I modify my config file to adding something like vf=10g,p=8 in qsub command? Thank you.

software installation

bioconda might help make the software installation much faster and reproducible. For example if you write a requirements.txt that looks something like:

bwa ==0.7.3
samtools ==0.1.19
bedtools ==2.19
ucsc-wigtobigwig
ucsc-bedgraphtobigwig
ucsc-bigwiginfo
ucsc-bedclip
matplotlib
numpy
scipy
# ...and everything else you need

Then everything can be installed with:

conda install \
--prefix $SOFTWARE_HOME/software_bds \
--file requirements.txt \
--channel bioconda

A couple details:

it looks like in install_dependencies.sh you manually separate Python environments, which can also be handled by conda. You would need a second requirements file and would need to choose a different directory
there are a couple of tools not yet on bioconda, but that's an easy fix. Exception to that might be MCR due to licensing
--prefix $HOME/software_bds in the conda command creates the environment there, minimizing changes to the pipeline

Just a suggestion. I used to have installation scripts like this but then found conda. Then bioconda came along and helped more. Made things much easier for other people to get things running and much less brittle.

Missing `-c` for `macs2 peakcall` ?

Seems that -c is missing between $tag and $ctl
https://github.com/kundajelab/chipseq_pipeline/blob/master/modules/callpeak_macs2_chipseq.bds#L188

sys macs2 callpeak -t $tag $ctl -f BED -n $prefix.tmp \
	-g $gensz -p $pval_thresh_macs2 --broad --nomodel --shift $shift_macs2 $extsize_param \
	--keep-dup $keep_dup_macs2 $extra_param_macs2

Encode pipeline description has the -c: https://docs.google.com/document/d/1lG_Rd7fnYgRpSIqrIfuVlAz2dW1VaSQThzk836Db99c/edit#

macs2 callpeak -t ${REP1_TA_FILE}.tagAlign.gz -c ${CONTROL_TA_PREFIX}.tagAlign.gz -f BED -n ${PEAK_OUTPUT_DIR}/${CHIP_TA_PREFIX} -g ${GENOMESIZE} -p 1e-2 --nomodel --shift 0 --extsize ${FRAGLEN} --keep-dup all -B --SPMR

shuf: end of file

Hi, I am running the pipeline for the first time for a histone chipseq data paired-end with two replicates and two control replicates. I am getting this error:
shuf: /data/path/prefix.trim_50bp.tagAlign.gz: end of file
Program & line : '/software/path/TF_chipseq_pipeline/modules/postalign_bed.bds', line 42

When I checked the latest version of that bds file on github I found an inserted line.

"no_random_source := false help Disable --random-source for UNIX shuf. Hot fix for end of file error."

Do I need to get the latest changes and rerun the pipeline from beginning ?
On a related note, how do I keep the pipeline up to date? Using
'git pull' ?
I did 'git pull origin master', which was probably a mistake, as I am getting this message now.

"error: Your local changes to the following files would be overwritten by merge:
default.env
Please, commit your changes or stash them before you can merge.
Aborting"
Sorry about the naive questions, and thanks in advance.

phantompeakqual Error in names(cc) < names(otl) :attempt to set an attribute on NULL

hi,
I run phantompeakqual on my duplicates removed bam files and got an error shown in the title.
later, I re-run the same command but on bam files before samtools rmdup -s, it worked fine.

seems that samtools rmdup did something phantompeakqual did not like.

Thanks,
Tommy

Fatal error: /TF_chipseq_pipeline/chipseq.bds, line 1529, pos 2. Task/s failed.

I am running chipseq_pipeline on our data and the following two commands have been used. The first one has no problems but the second one cannot go through with the error message I attached below too. Please help/advise. Thanks a lot.

COMMANDS:

bds /home/TF_chipseq_pipeline/chipseq.bds -species hg38 -nth 4 -out_dir FOXA2GAII -se -fastq1 SRR074923.fastq.gz -fastq2 SRR074926.fastq.gz -ctl_fastq1 SRR074928.fastq.gz (GOOD)

bds /home/TF_chipseq_pipeline/chipseq.bds -species hg38 -nth 4 -out_dir FOXA2GAI -se -fastq1:1 SRR074921.fastq.gz -fastq1:2 SRR074922.fastq.gz -fastq2:1 SRR074924.fastq.gz -fastq2:2 SRR074925.fastq.gz -ctl_fastq1 SRR074927.fastq.gz -ctl_fastq2 SRR074929.fastq.gz (ERROR)

ERROR:

== git info
Latest git commit : eb3e236 (Thu Aug 3 20:12:41 2017)
Reading parameters from section (default) in file(/home/ysun/TF_chipseq_pipeline/default.env)...

== configuration file info
Hostname : ****************
Configuration file :
Environment file : /home/ysun/TF_chipseq_pipeline/default.env

== parallelization info
No parallel jobs : false
Maximum # threads : 4

== cluster/system info
Walltime (general) : 5h50m
Max. memory (general) : 7G
Force to use a system : local
Process priority (niceness) : 0
Retiral for failed tasks : 0
Submit tasks to a cluster queue :
Unlimited cluster mem./walltime : false

== shell environment info
Conda env. : aquas_chipseq
Conda env. for python3 : aquas_chipseq_py3
Conda bin. directory :

Shell cmd. for init. : if [[ -f $(which conda) && $(conda env list | grep aquas_chipseq | wc -l) != "0" ]]; then source activate aquas_chipseq; sleep 5; fi; export PATH=/home/ysun/TF_chipseq_pipeline/.:/home/ysun/TF_chipseq_pipeline/modules:/home/ysun/TF_chi
pseq_pipeline/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)

Shell cmd. for init.(py3) : if [[ -f $(which conda) && $(conda env list | grep aquas_chipseq_py3 | wc -l) != "0" ]]; then source activate aquas_chipseq_py3; sleep 5; fi; export PATH=/home/ysun/TF_chipseq_pipeline/.:/home/ysun/TF_chipseq_pipeline/modules:/home/ysu
n/TF_chipseq_pipeline/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)

Shell cmd. for fin. : TASKTIME=$[$(date +%s)-${STARTTIME}]; echo "Task has finished (${TASKTIME} seconds)."; sleep 0

Cluster task min. len. : 60

Cluster task delay : 0

== output directory/title info
Output dir. : /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI
Title (prefix) : FOXA2GAI
Reading parameters from section (default) in file(/home/ysun/TF_chipseq_pipeline/default.env)...
Reading parameters from section (hg38) in file(/home/ysun/bds_pipeline_genome_data/aquas_chipseq_species.conf)...

== species settings
Species : hg38
Species file : /home/ysun/bds_pipeline_genome_data/aquas_chipseq_species.conf

Species name (WashU browser) : hg38
Ref. genome seq. fasta : /home/ysun/bds_pipeline_genome_data/hg38/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta
Chr. sizes file : /home/ysun/bds_pipeline_genome_data/hg38/hg38.chrom.sizes
Black list bed : /home/ysun/bds_pipeline_genome_data/hg38/hg38.blacklist.bed.gz
Ref. genome seq. dir. :

== ENCODE accession settings
ENCODE experiment accession :
ENCODE award RFA :
ENCODE assay category :
ENCODE assay title :
ENCODE award :
ENCODE lab :
ENCODE assembly genome :
ENCODE alias prefix : KLAB_PIPELINE

== report settings
URL root for output directory :
Genome coord. for browser tracks :

== align bwa settings
Param. for bwa : -q 5 -l 32 -k 2
BWA index : /home/ysun/bds_pipeline_genome_data/hg38/bwa_index/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta
Walltime (bwa) : 47h
Max. memory (bwa) : 12G

== align multimapping settings

alignments reported for multimapping : 0

== postalign bam settings
MAPQ reads rm thresh. : 30
Rm. tag reads with str. :
No dupe removal in filtering raw bam : false
Walltime (bam filter) : 23h
Max. memory (bam filter) : 12G
Dup marker : picard
Use sambamba markdup (instead of picard) : false

== postalign bed/tagalign settings
Max. memory for UNIX shuf : 12G

== postalign cross-corr. analysis settings
Max. memory for UNIX shuf : 12G
User-defined cross-corr. peak strandshift : -1
Extra parameters for cross-corr. analysis :

== callpeak spp settings
Threshold for # peak : 300000
Walltime (spp) : 47h
Max. memory (spp) : 12G
Stack size for run_spp.R :
Use-defined cross-corr. peak strandshift; if -1, use frag. len. :-1
Extra parameters for run_spp.R :

== callpeak gem settings
Threshold for # peak in GEM : 300000
Min. length of k-mers in GEM : 6
Max. length of k-mers in GEM : 13
Q-value threshold for GEM : 0.0
Read distribution txt for GEM : /home/ysun/TF_chipseq_pipeline/etc/Read_Distribution_default.txt
Extra parameters for GEM :
Walltime (GEM) : 47h
Max. memory (GEM) : 15G

== callpeak PeakSeq settings
Target FDR for PeakSeq :0.05
Number of simulations for PeakSeq :10
Enrichment mapped frag. len. for PeakSeq :-1
Minimum interpeak distance for PeakSeq :-1
Mappability map file for PeakSeq :
Maximum Q-value for PeakSeq :0.1
Background model for PeakSeq :Simulated
Extra parameters for PeakSeq :
Walltime (PeakSeq) : 47h
Max. memory (PeakSeq) : 12G

== callpeak macs2 settings
Genome size (hs,mm) : hs
Walltime (macs2) : 23h
Max. memory (macs2) : 15G
P-value cutoff (macs2 callpeak) : 0.01
--keep-dup (macs2 callpeak) : all
--extsize (macs2 callpeak); if -1 then use frag. len. : -1
--shift (macs2 callpeak) : 0
Extra parameters for macs2 callpeak :

== callpeak naiver overlap settings
Bedtools intersect -nonamecheck : false

== IDR settings
Append IDR threshold to IDR out_dir : false

== chipseq pipeline settings

replicates : 2

Type of ChIP-Seq pipeline : TF
Final stage for ChIP-Seq :
Signal tracks for pooled rep. only : false
Aligner to map raw reads : bwa

reads to subsample for cross-corr. analysis : 15000000

reads to subsample exp. replicates (0: no subsampling): 0

reads to subsample controls (0: no subsampling) : 0

Generate anonymized filt. bam : false
Peak caller for IDR analysis : spp
Control rep. depth ratio : 1.2
Scoring column for IDR : signal.value
IDR threshold : 0.05
Force to use pooled ctl : false
Peak calling for true reps only : false
No peak calling for self pseudo reps : false
Disable cross-correlation analysis : false
Disable g. peak filt. thru. n. peak : false
Disable genome browser tracks : false

== checking chipseq parameters ...

== checking input files ...

Rep1 fastq (SE) :
/home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/SRR074921.fastq.gz
/home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/SRR074922.fastq.gz
Rep2 fastq (SE) :
/home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/SRR074924.fastq.gz
/home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/SRR074925.fastq.gz
Control Rep1 fastq (SE) :
/home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/SRR074927.fastq.gz
Control Rep2 fastq (SE) :
/home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/SRR074929.fastq.gz
Distributing 4 to ...
{rep2=1, rep1=1, ctl1=1, ctl2=1}

== Done align()

== Done pool_tags()

Fewer reads in control 1 than experiment replicate 1. Using pooled controls for replicate 1.

Fewer reads in control 2 than experiment replicate 2. Using pooled controls for replicate 2.
Distributing 4 to ...
[1, 1, 1, 1]

== Done call_peaks()

== Done naive_overlap()
Task failed:
Program & line : '/home/ysun/TF_chipseq_pipeline/modules/callpeak_idr.bds', line 74
Task Name : 'idr2 rep2-pr'
Task ID : 'chipseq.bds.20170920_084412_704/task.callpeak_idr.idr2_rep2_pr.line_74.id_38'
Task PID : '30964'
Task hint : 'idr --samples /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/pseudo_reps/rep2/pr1/SRR074924_SRR074925.nodup.pr1.tagAlign_x_SRR074927.nodup_SRR074929.nodup.tagAlign.regionPeak.gz /home/ysun/published/GEO/GSE25836_Soccio
MolEndocr/FOXA2GAI/peak/spp/pseudo_reps/rep2/pr2/SRR074924'
Task resources : 'cpus: -1 mem: -1.0 B wall-timeout: 8640000'
State : 'ERROR'
Dependency state : 'ERROR'
Retries available : '1'
Input files : '[/home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/pseudo_reps/rep2/pr1/SRR074924_SRR074925.nodup.pr1.tagAlign_x_SRR074927.nodup_SRR074929.nodup.tagAlign.regionPeak.gz, /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/F
OXA2GAI/peak/spp/pseudo_reps/rep2/pr2/SRR074924_SRR074925.nodup.pr2.tagAlign_x_SRR074927.nodup_SRR074929.nodup.tagAlign.regionPeak.gz, /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/rep2/SRR074924_SRR074925.nodup.tagAlign_x_SRR074927.nodup_SRR07492
9.nodup.tagAlign.regionPeak.gz]'
Output files : '[/home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.filt.narrowPeak.gz, /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-
pr.unthresholded-peaks.txt.png, /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.log.txt, /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.unthres
holded-peaks.txt.gz, /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.filt.12-col.bed.gz]'
Script file : '/home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/chipseq.bds.20170920_084412_704/task.callpeak_idr.idr2_rep2_pr.line_74.id_38.sh'
Exit status : '1'
Program :

	# SYS command. line 76
	
	 if [[ -f $(which conda) && $(conda env list | grep aquas_chipseq_py3 | wc -l) != "0" ]]; then source activate aquas_chipseq_py3; sleep 5; fi;  export PATH=/home/ysun/TF_chipseq_pipeline/.:/home/ysun/TF_chipseq_pipeline/modules:/home/ysun/TF_chipseq_pipe

line/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)

	# SYS command. line 78
	
	 idr --samples /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/pseudo_reps/rep2/pr1/SRR074924_SRR074925.nodup.pr1.tagAlign_x_SRR074927.nodup_SRR074929.nodup.tagAlign.regionPeak.gz /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FO

XA2GAI/peak/spp/pseudo_reps/rep2/pr2/SRR074924_SRR074925.nodup.pr2.tagAlign_x_SRR074927.nodup_SRR074929.nodup.tagAlign.regionPeak.gz --peak-list /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/rep2/SRR074924_SRR074925.nodup.tagAlign_x_SRR074927.nodu
p_SRR074929.nodup.tagAlign.regionPeak.gz --input-file-type narrowPeak
--output-file /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.unthresholded-peaks.txt --rank signal.value --soft-idr-threshold 0.05
--plot --use-best-multisummit-IDR --log-output-file /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.log.txt

	# SYS command. line 82
	
	 idr_thresh_transformed=$(awk -v p=0.05 'BEGIN{print -log(p)/log(10)}')
	
	# SYS command. line 85
	
	 awk 'BEGIN{OFS="\t"} $12>='"${idr_thresh_transformed}"' {if ($2<0) $2=0; print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,"0"}' /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.unthresholded-peaks.tx

t
| sort | uniq | sort -k7n,7n | gzip -nc > /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.13-col.bed.gz

	# SYS command. line 88
	
	 zcat /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.13-col.bed.gz | awk 'BEGIN{OFS="\t"} {print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10}' | gzip -nc > /home/ysun/published/GEO/GSE25836_Soccio_Mo

lEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.narrowPeak.gz

	# SYS command. line 89
	
	 zcat /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.13-col.bed.gz | awk 'BEGIN{OFS="\t"} {print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12}' | gzip -nc > /home/ysun/published/GEO/GSE25836_S

occio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.12-col.bed.gz

	# SYS command. line 91
	
	 bedtools intersect -v -a <(zcat -f /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.13-col.bed.gz) -b <(zcat -f /home/ysun/bds_pipeline_genome_data/hg38/hg38.blacklist.bed.gz) | grep -P '

chr[\dXY]+[ \t]' | awk 'BEGIN{OFS="\t"} {if ($5>1000) $5=1000; print $0}' | gzip -nc > /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.filt.13-col.bed.gz

	# SYS command. line 92
	
	 zcat /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.filt.13-col.bed.gz | awk 'BEGIN{OFS="\t"} {print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10}' | gzip -nc > /home/ysun/published/GEO/GSE25836_Socc

io_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.filt.narrowPeak.gz

	# SYS command. line 93
	
	 zcat /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.filt.13-col.bed.gz | awk 'BEGIN{OFS="\t"} {print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12}' | gzip -nc > /home/ysun/published/GEO/GSE25

836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.filt.12-col.bed.gz

	# SYS command. line 95
	
	 gzip -f /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.unthresholded-peaks.txt
	
	# SYS command. line 96
	
	 rm -f /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.13-col.bed.gz /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.fil

t.13-col.bed.gz

	# SYS command. line 98
	
	 TASKTIME=$[$(date +%s)-${STARTTIME}]; echo "Task has finished (${TASKTIME} seconds)."; sleep 0

Fatal error: /home/ysun/TF_chipseq_pipeline/chipseq.bds, line 1529, pos 2. Task/s failed.
chipseq.bds, line 79 : main()
chipseq.bds, line 82 : void main() { // chipseq pipeline starts here
chipseq.bds, line 102 : do_idr() // IDR
chipseq.bds, line 1465 : void do_idr() {
chipseq.bds, line 1529 : wait

Creating checkpoint file: Config or command line option disabled checkpoint file creation, nothing done.

Building custom genome data

Are there any instructions on how to build custom genome data for a species that is not available for the pipeline? What are the usual resources to find relevant data, like the black list, ATAC specific annotation data?

Documentation for macs2?

This pipeline allows for use of macs2 in peak calling, but in the googledoc, I can't find what parameters are used. What parameters are given to macs2 for peak calling? Thanks!

.fq extension not removed from prefix in bwa alignement step

bwa_aln(), bwa_sam() and bwa_sam_PE() do not remove the "fq" extension of the fastq file while generating the prefix for the output files.
This causes the ouput bam file to be saved with the "fq" extension, which in turn causes failure in the postalign_bam.bds step with the following error:

[E::hts_open] fail to open file '/path/to/out/samplename.PE2SE.bam' samtools: failed to open "/path/to/out/samplename.PE2SE.bam" for reading: No such file or directory sambamba-sort: not enough data in stream

SPP error in one replicate

Hi,
when I run your pipeline with 2 treatment and 2 control conditions, I get the following error after supposably several attempts to run SPP on a rep2 (see below).

Do you have an idea why? Thanks for helping out.

| 9116 | running (RUNNING) | spp rep2 | | if [ $(which run_spp_nodups.R 2> /dev/null | wc -l || echo) == "1" ]; then RUN_SPP=$(which run_spp_nodups.R);; else RUN_SPP=$(which run_spp.R);; fi; Rscript ${RUN_SPP} -c=data/align/rep2/n-flag-n2.nodup.tagAlign.gz -p=1 -i=/ |

03:48:51.697 Writing report file 'chipseq.bds.20171126_120808_625.report.html'
There were 23 warnings (use warnings() to see them)
data/krishnakumar2016/chipseq.bds.20171126_120808_625/task.callpeak_spp.spp_rep2.line_68.id_18.sh: line 38: error_in_spp_output_peak_does_not_exist: command not found
Task failed:
Program & line : 'TF_chipseq_pipeline/modules/callpeak_spp.bds', line 68
Task Name : 'spp rep2'
Task ID : 'chipseq.bds.20171126_120808_625/task.callpeak_spp.spp_rep2.line_68.id_18'
Task PID : '9116'
Task hint : 'if [ $(which run_spp_nodups.R 2> /dev/null | wc -l || echo) == "1" ]; then RUN_SPP=$(which run_spp_nodups.R);; else RUN_SPP=$(which run_spp.R);; fi; Rscript ${RUN_SPP} -c=data/align/rep2/n-flag-n2.nodup.tagAlign.gz -p=1 -i=/'
Task resources : 'cpus: -1 mem: -1.0 B wall-timeout: 8640000'
State : 'ERROR'
Dependency state : 'ERROR'
Retries available : '1'
Input files : '[data/align/rep2/n-flag-n2.nodup.tagAlign.gz, data/align/ctl2/n-ctr-n2.nodup.tagAlign.gz]'
Output files : '[data/peak/spp/rep2/n-flag-n2.nodup.tagAlign_x_n-ctr-n2.nodup.tagAlign.regionPeak.gz, data/peak/spp/rep2/n-flag-n2.nodup.tagAlign_x_n-ctr-n2.nodup.tagAlign.ccscore, data/peak/spp/rep2/n-flag-n2.nodup.tagAlign_x_n-ctr-n2.nodup.tagAlign.pdf]'
Script file : 'data/chipseq.bds.20171126_120808_625/task.callpeak_spp.spp_rep2.line_68.id_18.sh'
Exit status : '1'
Program :

chipseq pipeline set up

If Error occured in jsd step as showed below.

Fatal error: /home/liangma1/TF_chipseq_pipeline/chipseq.bds, line 410, pos 3. Task/s failed.
chipseq.bds, line 90 : main()
chipseq.bds, line 93 : void main() { // chipseq pipeline starts here
chipseq.bds, line 100 : jsd() // plot fingerprint and compute synthetic JS distance
chipseq.bds, line 384 : void jsd() { // plot fingerprint
chipseq.bds, line 385 : if ( filt_bam.hasKey("ctl1") && filt_bam.hasKey("rep1") ) {

There are two options to solve this:

you can try with -species hg38
try to disable JSD plot (-no_jsd).

If use -bam file and -fastq file at the same time, don't use the same replicate number.

bds_scr h3k27ac_noncellcycle_dmso
~/TF_chipseq_pipeline/chipseq.bds
-se
-species hg19
-dup_marker sambamba
-nth 4
-type histone
-mem_macs2 30G
-fastq1 R1.2.fastq.gz
-fastq2 R1.3.fastq.gz
-fastq3 R1.fastq.gz
-bam4 ac_0.bam
-ctl_fastq1 o2.gz
-ctl_fastq2 o3.gz
-ctl_bam3 o1.bam \

-trim_bp is for paired end dataset only and it's for cross-correlation analysis only. There is no way to trim fastqs in the pipeline. You need to trim it yourself.
-fastq1 ... -bam1 ... means that you are specifying two files for replicate 1, which is not allowed in the pipeline.
ChIP-Seq pipeline does not trim adapters.
When your job is done successfully, you will see '== Done report()' at the end of the log file.

NA

aquas_chipseq pipline idr error

Hello

could you kindly help me to fix an error i keep getting when running aquas_chipseq pipeline , the error jam getting is as below:

00:16:53.803 Wait: Waiting for task to finish: chipseq.bds.20180325_124007_960/task.callpeak_idr.idr2_rep1_pr.line_80.id_36, state: ERROR
Task failed:
Program & line : '/home/adroubsa/TF_chipseq_pipeline/modules/callpeak_idr.bds', line 80
Task Name : 'idr2 rep1-pr'
Task ID : 'chipseq.bds.20180325_124007_960/task.callpeak_idr.idr2_rep1_pr.line_80.id_36'
Task PID : '41355'
Task hint : 'idr --samples /scratch/dragon/intel/adroubsa/HDAC2_Chip_Seq_DMD/HDAC_T6_DMD/DMD_T6_HDAC2_sam_files/HDAC2_DMD_T6_Chip_Seq_analysis/peak/macs2/pseudo_reps/rep1/pr1/D9813_T6_HDAC2_merged_Rep1.nodup.pr1.tagAlign_x_D9813_T6_HDAC2_Input1.nodup.tagAlign.pval0.01.500K.narrowPeak.gz /scratch/dragon/intel/adr'
Task resources : 'cpus: -1 mem: -1.0 B wall-timeout: 8640000'
State : 'ERROR'
Dependency state : 'ERROR'
Retries available : '1'

your kind help is much appreciated

"-q [SLURM_ACCOUNT_NAME]" interpreted as file

Hey Jin,

I can't figure out how to correctly specify the --account for SLURM job submission on SCG.

Your example: "python chipseq.py -system slurm -q_for_slurm_account -q [SLURM_ACCOUNT_NAME]"

I used:
python /srv/gsfs0/projects/snyder/chappell/TF_chipseq_pipeline/chipseq.py -type histone --screen GFP.H3K4me1 -system slurm -q_for_slurm_account -q mpsnyder -pe -species hg19 -nth 12 -fastq1_1 GFP.H3K4me1.repA.trim.R1.fq.gz -fastq1_2 GFP.H3K4me1.repA.trim.R2.fq.gz -fastq2_1 GFP.H3K4me1.repB.trim.R1.fq.gz -fastq2_2 GFP.H3K4me1.repB.trim.R2.fq.gz -ctl_fastq1_1 GFP.Input.repAB.trim.R1.fq.gz -ctl_fastq1_2 GFP.Input.repAB.trim.R2.fq.gz -out_dir GFP.H3K4me1 -mem_dedup 15G

Get the following error in .log output:
"sbatch: error: Unable to open file mpsnyder"

aquas_chipseq pipline SPP error

Hello

could you kindly help me to fix an error I keep getting when running aquas_chipseq pipeline , the error jam getting is as below：
04:55:10.240 Wait: Task 'chipseq.bds.20180405_152211_048/task.callpeak_spp.spp_rep1_pr1.line_70.id_33' finished.
04:55:10.240 Wait: Waiting for task to finish: chipseq.bds.20180405_152211_048/task.callpeak_spp.spp_rep1.line_70.id_31, state: FINISHED
04:55:10.240 Wait: Task 'chipseq.bds.20180405_152211_048/task.callpeak_spp.spp_rep1.line_70.id_31' finished.
04:55:10.240 Waiting for all 'parrallel' to finish.
04:55:10.240 Waiting for parallel 'chipseq.bds.20180405_152211_048_parallel_22' to finish. RunState: FINISHED
04:55:10.240 Waiting for parallel 'chipseq.bds.20180405_152211_048_parallel_23' to finish. RunState: FINISHED
Fatal error: /data/Workspace/Haichao/tool/chipseq_pipeline-master/chipseq.bds, line 1354, pos 2. Task/s failed.
chipseq.bds, line 91 : main()
chipseq.bds, line 94 : void main() { // chipseq pipeline starts here
chipseq.bds, line 103 : if ( !pe_xcor_only ) {
chipseq.bds, line 105 : call_peaks() // call peaks in parallel (MACS2,SPP)
chipseq.bds, line 929 : void call_peaks() {
chipseq.bds, line 1354 : wait

Creating checkpoint file: Config or command line option disabled checkpoint file creation, nothing done.
04:55:10.298 Writing report file 'chipseq.bds.20180405_152211_048.report.html'
04:55:10.316 Program 'chipseq.bds.20180405_152211_048' finished, exit value: 1, tasks executed: 31, tasks failed: 1, tasks failed names: spp rep1-pr1.
04:55:10.317 Finished. Exit code: 1
04:55:10.321 ExecutionerLocal 'Local[24]': Killed
bds -c /data/Workspace/Haichao/tool/chipseq_pipeline-master/bds.config -v /data/Workspace/Haichao/tool/chipseq_pipeline-master/chipseq.bds -out_dir ./SCI_final_notrim5_out -pe_no_trim_fastq -pe -type TF -species mm10 -fastq1_1:1 /data/NGS.Labdata/HiSeq_RUN/2018_03_13/trim/mouse-primary-SCI-ChIPseq-STAT3-Day7-rep1.paired.read1.fastq -fastq1_2:1 /data/NGS.Labdata/HiSeq_RUN/2018_03_13/trim/mouse-primary-SCI-ChIPseq-STAT3-Day7-rep1.paired.read2.fastq -ctl_fastq1_1:1 /data/NGS.Labdata/HiSeq_RUN/2018_03_13/trim/stat3-input.paired.read1.fastq -ctl_fastq1_2:1 /data/NGS.Labdata/HiSeq_RUN/2018_03_13/trim/stat3-input.paired.read2.fastq

Thank you!

Trimmed data-sets

Hello,

Is it correct to run the pipeline after trimming adaptors in paired-end data-sets? I get different read lengths.

Raquel

dead loop

Hi Jin,
I've updated the pipeline on SCG, and ran the pipeline again using the same sample files. It took 8h to run successfully last time. However, it's almost 2 days, the pipeline is still do the alignment, no error pop up though. I attached the two logs, could you let me know what is wrong? Thanks.

h3k27ac_noncellcycle_dmso.BDS20180129.log
h3k27ac_noncellcycle_dmso20180516.BDS.log

Read length

We have been using the Aquas pipeline successfully with 75bp SE reads. However, our sequencing core is adding NovaSeq machines that will at minimum give 100bp reads. Is there an advantage/disadvantage to the extra read length? Will we need to change anything in the pipeline to adjust for this difference?

Thank you in advance!

software installation

I have encountered the following problem.

Error: ERROR: placeholder '/root/miniconda3/envs/_build_placehold_placehold_placehold_placehold_placehold_p' too short in: glib-2.43.0-2

And then I followed your instruction: "If you see the following error, then update your Anaconda with conda update conda and downgrade it to 4.0.5 conda install conda==4.0.5." But it didn't work.
I found that the problem came from

conda create -n aquas_chipseq --file requirements.txt -y -c defaults -c bioconda -c r

in your install_dependencies.sh.
Then I read conda/conda-build#877 and https://groups.google.com/a/continuum.io/forum/#!topic/anaconda/fgDBJ2YwETI.
My anaconda path is

/share_bio/nas5/amsszhangsh_group/wangcan/software/python/anaconda3

It seemed that when I run

conda create -n aquas_chipseq --file requirements.txt -y -c defaults -c bioconda -c r

the prefix was

/share_bio/nas5/amsszhangsh_group/wangcan/software/python/anaconda3/envs/aquas_chipseq

which consisted of 86 characters. Then I tried to limit it to 80 characters. So I trimmed the trailing 6 characters in aquas_chipseq and run

conda create -n aquas_c --file requirements.txt -y -c defaults -c bioconda -c r

And I succeeded.
I found that conda 4.0.5 and 4.1.11 are no different on this problem. So the version of conda is not the point.
To solve this problem without modifying your script, I re-installed the anaconda in a directory with a shorter absolute path

/share_bio/nas5/amsszhangsh_group/wangcan/anaconda3

Then there were no problem when I run install_dependencies.sh.
If one does not want to re-install anaconda, I think to use conda create -p your_short_path can also work. For example, use

conda create -p /share_bio/nas5/amsszhangsh_group/wangcan/aquas_chipseq

to replace

conda create -n aquas_chipseq

conda create -n aquas_chipseq --file requirements.txt -y -c defaults -c bioconda -c r

ERROR:root:--extsize must >= 1!

more than 2 replicates can not be used for chips_pipeline

Hi, I am trying to run chipseq_pipeline using more than 2 replicates using the command as below. But it seems that the program can not take more than 2 replicates. I also attached part of the log file to show the files the program took. Is there any parameter I should set to using more replicates? Thanks.

$ bds /home/ysun/TF_chipseq_pipeline/chipseq.bds -trimmed_fastq TRUE -species hg38 -nth 12 -out_dir CTCF -se -fastq1 CTCF_islet6-1.fastq.gz -fastq2 CTCF_islet6-2.fastq.gz -ctl_fastq1 ctrl_islet2.fastq.gz -ctl_fastq2 ctrl_islet3.fastq.gz -ctl_fastq3 ctrl_islet4.fastq.gz -ctl_fastq4 ctrl_islet5.fastq.gz -ctl_fastq5 ctrl_islet6.fastq.gz

========================================================================
== git info
......

== checking chipseq parameters ...

== checking input files ...

Rep1 fastq (SE) :
/home/ysun/published/GEO/GSE23784_Stitzel_CellMetab/CTCF_islet6-1.fastq.gz
Rep2 fastq (SE) :
/home/ysun/published/GEO/GSE23784_Stitzel_CellMetab/CTCF_islet6-2.fastq.gz
Control Rep1 fastq (SE) :
/home/ysun/published/GEO/GSE23784_Stitzel_CellMetab/ctrl_islet2.fastq.gz
Control Rep2 fastq (SE) :
/home/ysun/published/GEO/GSE23784_Stitzel_CellMetab/ctrl_islet3.fastq.gz
Distributing 12 to ...
{rep2=1, rep1=4, ctl1=6, ctl2=1}

Suggestion on how to analysis the ChIP-seq data

Hi, I have ChIP-seq data for one patient and one normal person as well as input for both of them (they run at different batch and each batch has a input). I am wondering what is the best way to do the analysis? I have some thinkings but think could get better idea from here.

compare patient and normal person respectively to the batch input to see if there is any different in between?
compare patient to combine controls (input and normal person together) to see if there are any differences?

Thanks.

permission denied / file not found

Hey,

I installed everything according to your instructions without noticing any errors, but I cannot get chipseq.py to work. If I use fastq or bam as input, I get an error of the following form

fork/exec /media/nico/data/home2/projects/macrophages/chip_ana2/8-ENCODE-pipeline/chipseq.bds.20171207_115700_844/task.postalign_bam.dedup_bam_1_rep1.line_155.id_10.sh: permission denied

If I use peaks as input, the error has the following form:
Error (/home/nico/TF_chipseq_pipeline/modules/input_peak.bds, line 31, pos 33): File not found!
The files pass the initial check, though.

I don't know what to do next.

Here is the stderr output when running with a fastq and with a peak input.

log-17_12_07-11:56:57.txt

log-17_12_07-11:54:38.txt

pipeline deprecated?

Hello,

Thanks for making this software available.

I have been using the current version of chipseq_pipeline for the past few months and would like to continue using it. However, last week I noticed that the readme.md has the message 'This pipeline has been deprecated'.
I took a quick look at the new version of the pipeline, it seems that the steps are largely the same though the installation is different. Does this sound correct?

Thanks!

Undefined symbol: PyUnicodeUCS2_AsASCIIString

Hi there !

I followed all the very clear instructions you provide to install the pipeline and everything went fine.
I am giving my very first try on some toy datasets and while mapping / filtering / cross-correlation steps are running without any issue, I get the following error at the step right after, which forces the pipeline to exit :

= Done align()
00:00:37.633	Waiting for parallel 'chipseq.bds.20180117_161156_433_parallel_33' to finish. RunState: FINISHED
00:00:37.634	Waiting for parallel 'chipseq.bds.20180117_161156_433_parallel_32' to finish. RunState: FINISHED
00:00:41.753	Waiting for all 'parrallel' to finish.
00:00:41.754	Waiting for parallel 'chipseq.bds.20180117_161156_433_parallel_31' to finish. RunState: FINISHED
00:00:41.754	Waiting for parallel 'chipseq.bds.20180117_161156_433_parallel_30' to finish. RunState: FINISHED
00:00:41.755	Waiting for parallel 'chipseq.bds.20180117_161156_433_parallel_33' to finish. RunState: FINISHED
00:00:41.755	Waiting for parallel 'chipseq.bds.20180117_161156_433_parallel_32' to finish. RunState: FINISHED
00:00:42.504	Executioner factory: Creating new executioner type 'LOCAL'
00:00:42.668	ExecutionerLocal 'Local[34]': Queuing task: chipseq.bds.20180117_161156_433/task.postalign_bam.jsd.line_812.id_10
00:00:42.669	Waiting for all tasks to finish.
00:00:42.669	Wait: Waiting for task to finish: chipseq.bds.20180117_161156_433/task.postalign_bam.jsd.line_812.id_10, state: SCHEDULED
00:00:42.669	ExecutionerLocal 'Local[34]': Started running
00:00:42.670	ExecutionerLocal 'Local[34]': Task selected 'chipseq.bds.20180117_161156_433/task.postalign_bam.jsd.line_812.id_10' on host 'localhost'
Traceback (most recent call last):
  File "/pasteur/homes/piroux/miniconda3/envs/aquas_chipseq/bin/plotFingerprint", line 4, in <module>
    from deeptools.plotFingerprint import main
  File "/pasteur/homes/piroux/miniconda3/envs/aquas_chipseq/lib/python2.7/site-packages/deeptools/plotFingerprint.py", line 4, in <module>
    import numpy as np
  File "/mount/gensoft2/adm/lib/python2.7/site-packages/numpy-1.10.1-py2.7-linux-x86_64.egg/numpy/__init__.py", line 180, in <module>
    from . import add_newdocs
  File "/mount/gensoft2/adm/lib/python2.7/site-packages/numpy-1.10.1-py2.7-linux-x86_64.egg/numpy/add_newdocs.py", line 13, in <module>
    from numpy.lib import add_newdoc
  File "/mount/gensoft2/adm/lib/python2.7/site-packages/numpy-1.10.1-py2.7-linux-x86_64.egg/numpy/lib/__init__.py", line 8, in <module>
    from .type_check import *
  File "/mount/gensoft2/adm/lib/python2.7/site-packages/numpy-1.10.1-py2.7-linux-x86_64.egg/numpy/lib/type_check.py", line 11, in <module>
    import numpy.core.numeric as _nx
  File "/mount/gensoft2/adm/lib/python2.7/site-packages/numpy-1.10.1-py2.7-linux-x86_64.egg/numpy/core/__init__.py", line 14, in <module>
    from . import multiarray
ImportError: /mount/gensoft2/adm/lib/python2.7/site-packages/numpy-1.10.1-py2.7-linux-x86_64.egg/numpy/core/multiarray.so: undefined symbol: PyUnicodeUCS2_AsASCIIString
Task failed:
	Program & line     : '/pasteur/homes/piroux/TF_chipseq_pipeline/modules/postalign_bam.bds', line 812
	Task Name          : 'jsd'
	Task ID            : 'chipseq.bds.20180117_161156_433/task.postalign_bam.jsd.line_812.id_10'
	Task PID           : '30942'
	Task hint          : 'plotFingerprint -b /pasteur/projets/policy01/Bischof-NGS/work/PROJET_SENESCENCE/ChIP-seq_HISTONE/RAS_OIS/TST_AQUA_2/./align/rep1/D0K4Me1_Rep1.nodup.bam /pasteur/projets/policy01/Bischof-NGS/work/PROJET_SENESCENCE/ChIP-seq_HISTONE/RAS_OIS/TST_AQUA_2/./align/rep2/D0K4Me1_Rep2.nodup.bam /pasteur/projet'
	Task resources     : 'cpus: 4	mem: -1.0 B	wall-timeout: 8640000'
	State              : 'ERROR'
	Dependency state   : 'ERROR'
	Retries available  : '1'
	Input files        : '[/pasteur/projets/policy01/Bischof-NGS/work/PROJET_SENESCENCE/ChIP-seq_HISTONE/RAS_OIS/TST_AQUA_2/./align/rep1/D0K4Me1_Rep1.nodup.bam, /pasteur/projets/policy01/Bischof-NGS/work/PROJET_SENESCENCE/ChIP-seq_HISTONE/RAS_OIS/TST_AQUA_2/./align/rep2/D0K4Me1_Rep2.nodup.bam, /pasteur/projets/policy01/Bischof-NGS/work/PROJET_SENESCENCE/ChIP-seq_HISTONE/RAS_OIS/TST_AQUA_2/./align/ctl1/D0Input_Rep1.nodup.bam]'
	Output files       : '[/pasteur/projets/policy01/Bischof-NGS/work/PROJET_SENESCENCE/ChIP-seq_HISTONE/RAS_OIS/TST_AQUA_2/./qc/._jsd.png, /pasteur/projets/policy01/Bischof-NGS/work/PROJET_SENESCENCE/ChIP-seq_HISTONE/RAS_OIS/TST_AQUA_2/./qc/._jsd.qc]'
	Script file        : '/pasteur/projets/policy01/Bischof-NGS/work/PROJET_SENESCENCE/ChIP-seq_HISTONE/RAS_OIS/TST_AQUA_2/chipseq.bds.20180117_161156_433/task.postalign_bam.jsd.line_812.id_10.sh'
	Exit status        : '1'
	Program            :

		# SYS command. line 813

		 if [[ -f $(which conda) && $(conda env list | grep aquas_chipseq | wc -l) != "0" ]]; then source activate aquas_chipseq; sleep 5; fi;  export PATH=/pasteur/homes/piroux/TF_chipseq_pipeline/.:/pasteur/homes/piroux/TF_chipseq_pipeline/modules:/pasteur/homes/piroux/TF_chipseq_pipeline/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)

		# SYS command. line 814

		 plotFingerprint -b /pasteur/projets/policy01/Bischof-NGS/work/PROJET_SENESCENCE/ChIP-seq_HISTONE/RAS_OIS/TST_AQUA_2/./align/rep1/D0K4Me1_Rep1.nodup.bam /pasteur/projets/policy01/Bischof-NGS/work/PROJET_SENESCENCE/ChIP-seq_HISTONE/RAS_OIS/TST_AQUA_2/./align/rep2/D0K4Me1_Rep2.nodup.bam /pasteur/projets/policy01/Bischof-NGS/work/PROJET_SENESCENCE/ChIP-seq_HISTONE/RAS_OIS/TST_AQUA_2/./align/ctl1/D0Input_Rep1.nodup.bam --JSDsample /pasteur/projets/policy01/Bischof-NGS/work/PROJET_SENESCENCE/ChIP-seq_HISTONE/RAS_OIS/TST_AQUA_2/./align/ctl1/D0Input_Rep1.nodup.bam \
					--labels rep1 rep2 ctl1 \
					--outQualityMetrics /pasteur/projets/policy01/Bischof-NGS/work/PROJET_SENESCENCE/ChIP-seq_HISTONE/RAS_OIS/TST_AQUA_2/./qc/._jsd.qc \
					--minMappingQuality 30 \
					-T "Fingerprints of different samples" \
					--blackListFileName /pasteur/homes/piroux/TF_chipseq_pipeline/genome_data/hg19/wgEncodeDacMapabilityConsensusExcludable.bed.gz \
					--numberOfProcessors 4 \
					--plotFile /pasteur/projets/policy01/Bischof-NGS/work/PROJET_SENESCENCE/ChIP-seq_HISTONE/RAS_OIS/TST_AQUA_2/./qc/._jsd.png

		# SYS command. line 823

		 TASKTIME=$[$(date +%s)-${STARTTIME}]; echo "Task has finished (${TASKTIME} seconds)."; sleep 0
	StdErr (100000000 lines)  :
		Traceback (most recent call last):
		  File "/pasteur/homes/piroux/miniconda3/envs/aquas_chipseq/bin/plotFingerprint", line 4, in <module>
		    from deeptools.plotFingerprint import main
		  File "/pasteur/homes/piroux/miniconda3/envs/aquas_chipseq/lib/python2.7/site-packages/deeptools/plotFingerprint.py", line 4, in <module>
		    import numpy as np
		  File "/mount/gensoft2/adm/lib/python2.7/site-packages/numpy-1.10.1-py2.7-linux-x86_64.egg/numpy/__init__.py", line 180, in <module>
		    from . import add_newdocs
		  File "/mount/gensoft2/adm/lib/python2.7/site-packages/numpy-1.10.1-py2.7-linux-x86_64.egg/numpy/add_newdocs.py", line 13, in <module>
		    from numpy.lib import add_newdoc
		  File "/mount/gensoft2/adm/lib/python2.7/site-packages/numpy-1.10.1-py2.7-linux-x86_64.egg/numpy/lib/__init__.py", line 8, in <module>
		    from .type_check import *
		  File "/mount/gensoft2/adm/lib/python2.7/site-packages/numpy-1.10.1-py2.7-linux-x86_64.egg/numpy/lib/type_check.py", line 11, in <module>
		    import numpy.core.numeric as _nx
		  File "/mount/gensoft2/adm/lib/python2.7/site-packages/numpy-1.10.1-py2.7-linux-x86_64.egg/numpy/core/__init__.py", line 14, in <module>
		    from . import multiarray
		ImportError: /mount/gensoft2/adm/lib/python2.7/site-packages/numpy-1.10.1-py2.7-linux-x86_64.egg/numpy/core/multiarray.so: undefined symbol: PyUnicodeUCS2_AsASCIIString

00:01:00.967	Wait: Task 'chipseq.bds.20180117_161156_433/task.postalign_bam.jsd.line_812.id_10' finished.
00:01:00.968	Waiting for all 'parrallel' to finish.
00:01:00.968	Waiting for parallel 'chipseq.bds.20180117_161156_433_parallel_31' to finish. RunState: FINISHED
00:01:00.968	Waiting for parallel 'chipseq.bds.20180117_161156_433_parallel_30' to finish. RunState: FINISHED
00:01:00.968	Waiting for parallel 'chipseq.bds.20180117_161156_433_parallel_33' to finish. RunState: FINISHED
00:01:00.968	Waiting for parallel 'chipseq.bds.20180117_161156_433_parallel_32' to finish. RunState: FINISHED
Fatal error: /pasteur/homes/piroux/TF_chipseq_pipeline/chipseq.bds, line 410, pos 3. Task/s failed.
chipseq.bds, line 90 :	main()
chipseq.bds, line 93 :	void main() { // chipseq pipeline starts here
chipseq.bds, line 100 :		jsd() // plot fingerprint and compute synthetic JS distance
chipseq.bds, line 384 :	void jsd() { // plot fingerprint
chipseq.bds, line 385 :		if ( filt_bam.hasKey("ctl1") && filt_bam.hasKey("rep1") ) {
chipseq.bds, line 410 :			wait

Creating checkpoint file: Config or command line option disabled checkpoint file creation, nothing done.
00:01:01.076	Writing report file 'chipseq.bds.20180117_161156_433.report.html'
00:01:01.176	Program 'chipseq.bds.20180117_161156_433' finished, exit value: 1, tasks executed: 1, tasks failed: 1, tasks failed names: jsd.
00:01:01.176	Finished. Exit code: 1
00:01:01.177	ExecutionerLocal 'Local[34]': Killed
00:01:01.192	ExecutionerLocal 'Local[34]': Finished running

I assume it is somehow related to numpy library, but wasn't able to find any useful thread on the internet. Plus, I must confess my understanding in handling Python libraries is limited ...

Thx a lot for this awesome tool and for your previous help.

Pef

Crashing on bam_to_bed_atac step

@leepc12:

(This issue is for bds_atac, but I'm filing it here because the crash is caused by a module in TF_chipseq_pipeline)

The bds_atac pipeline is reproducibly crashing on nandi and mitra at the bam_to_bed step, possibly due to bad path input syntax for samtools view. This happens for multiple different samples, both when run as replicates and when run individually.

The source code line where the pipeline crashes with samtools complaining failed to open file for a file that exists and is readable is here: https://github.com/kundajelab/TF_chipseq_pipeline/blob/master/modules/postalign_bam.bds#L444

An example log file (from running bds_atac on mitra) is here: https://gist.github.com/chrisprobert/08a63955967426df1b4aace8d40c6eed, and the associated bds.conf file is here: https://gist.github.com/chrisprobert/f7658b4c92f153e37199ac7f86c718a7

From the log file (relevant line here), it looks like the input to samtools view is:
nonMitoChromosomes=$(samtools view -H "{1=/srv/persistent/cprobert/projects/jcharalel_atac_022616_ATAC_SKBR3_SKRT-28854828/analysis/SKRT-3-ATAC-022616_S1/SKRT-3-ATAC-022616_S1_R1.trim.PE2SE.nodup.bam}" | grep chr | cut -f2 | sed 's/SN://g' | grep -v chrM)

To me, the input path there for samtools view looks suspicious: browsing other parts of the BDS log the path syntax is a regular path like samtools view -H /srv/.../....bam, not wrapped in brackets like "{1=/srv/.../....bam}" is in the samtools view command above.

For example, here's a samtools view command from elsewhere in the log that executes ok without samtools crashing (note that the input file path is just provided as a path, not with the "{1=...}" syntax):
samtools view -F 1804 -f 2 -q 30 -u /srv/persistent/cprobert/projects/jcharalel_atac_022616_ATAC_SKBR3_SKRT-28854828/analysis/SKRT-3-ATAC-022616_S1/SKRT-3-ATAC-022616_S1_R1.trim.PE2SE.bam | samtools sort -n - /srv/persistent/cprobert/projects/jcharalel_atac_022616_ATAC_SKBR3_SKRT-28854828/analysis/SKRT-3-ATAC-022616_S1/SKRT-3-ATAC-022616_S1_R1.trim.PE2SE.dupmark

Any thoughts what could be causing this? Perhaps filt_bam is not being set correctly by whatever calls _bam_to_bed_atac?

job submission command

Hi Jin,
I have 2 tested files ran successfully, the command I used is as below. N2 results look weird as shown in washU browser. I guess my command is not right. What do you think? And I'm not sure the controls are needed, we don't have controls.

N1: single-end with replicates and controls

bds_scr h3k27ac_noncellcycle_dmso20180519 ~/TF_chipseq_pipeline/chipseq.bds -q chettys -se -species hg19 -dup_marker sambamba -nth 4 -type histone -mem_bwa 30G -mem_dedup 30G -mem_macs2 30G -out_dir out20180519 -fastq1 Sample_Hues6_P35_DMSO_H3K27ac_R1_2_TGTCGGAT.R1.2.fastq.gz -fastq2 Sample_Hues6_P35_DMSO_H3K27ac_R1_2_TGTCGGAT.R1.3.fastq.gz -fastq3 Sample_Hues6_P35_DMSO_H3K27ac_R1_2_TGTCGGAT.R1.fastq.gz -fastq4 Sample_Hues6_P37_DMSO_H3K27ac_GACCGTTG.R1.fastq.gz -ctl_fastq1 /labs/chettys/liangma1/test/chipseq/wce/wce_for_dmso2.gz -ctl_fastq2 /labs/chettys/liangma1/test/chipseq/wce/wce_for_dmso3.gz

N2: paired-end without replicates and controls

bds_scr SRR3554631 ~/TF_chipseq_pipeline/chipseq.bds -q chettys -pe -species hg19 -dup_marker sambamba -nth 4 -type histone -mem_bwa 30G -mem_dedup 30G -mem_macs2 30G -out_dir out -fastq1_1 SRR3554631_1.fastq.gz -fastq1_2 SRR3554631_2.fastq.gz

bigwig files

N1: out20180519/signal/macs2/pooled_rep/*
N2: out/signal/macs2/rep1/*

[Error] two same tag align file in (z)cat and shuf in pipe

zcat sampleXXX.tagAlign.gz | grep -v "chrM" | shuf -n 15000000 --random-source=sampleXXX.tagAlign.gz | gzip -nc > sampleXXX.no_chrM.15M.tagAlign.gz

shuf: ‘sampleXXX.tagAlign.gz’: end of file

Some reps get ERROR:root:--extsize must >= 1!

We have run several ChIP-seq datasets through the Aquas pipeline, and occasionally we will get ERROR:root:--extsize must >= 1!
It will cause any rep with this error to fail.
What causes this error and what can be done to correct it?

I have found that if the pipeline shows this error, we can change the macs2 --extsize parameter to 200, and restart the job. The pipeline will pick up from macs2 and apparently complete the job without error. The resulting output seems to be correct when looking at the number of peaks called, passing IDR, and visualizing bigwigs with IGV. However, the cross-correlation plot is clearly effected and the estimated fragment length is negative. Is this the correct or best way to address this issue?

I have attached the output log from such a run, along with the summary html and an example image of bigwig visualization between reps. Rep1 ran normally, Rep2 had the error.
Thanks.

JSON key error in example config

In the example JSON config ("example_conf_full.json"), two of the parameters don't match the chipseq.py file - npeaks_spp (should be cap_num_peak_spp) and use_system (should be system). This throws an error (line 583 of chipseq.py)

JSD error: "RuntimeError: module compiled against API version 0xc but this version of numpy is 0xa"

Got an error at the JSD step. I used nth=1 (I tried with more but they somehow did not work either, or said "not enough resources"), and on a server with 1 core, 80gb memory.
I think the pipeline install_dependencies.sh installed numpy 1.18.5.

Error pasted below:

03:18:53.464	ExecutionerLocal 'Local[28]': Task selected 'chipseq.bds.20200801_100653_737/task.postalign_bam.jsd.line_812.id_32' on host 'localhost'
RuntimeError: module compiled against API version 0xc but this version of numpy is 0xa
Traceback (most recent call last):
  File "/home/unix/levgenio/software/miniconda3/envs/aquas_chipseq/bin/plotFingerprint", line 4, in <module>
    from deeptools.plotFingerprint import main
  File "/home/unix/levgenio/software/miniconda3/envs/aquas_chipseq/lib/python2.7/site-packages/deeptools/plotFingerprint.py", line 15, in <module>
    import deeptools.countReadsPerBin as countR
  File "/home/unix/levgenio/software/miniconda3/envs/aquas_chipseq/lib/python2.7/site-packages/deeptools/countReadsPerBin.py", line 13, in <module>
    import pyBigWig
ImportError: numpy.core.multiarray failed to import
Task failed:
	Program & line     : '/home/unix/levgenio/software/TF_chipseq_pipeline/modules/postalign_bam.bds', line 812
	Task Name          : 'jsd'
	Task ID            : 'chipseq.bds.20200801_100653_737/task.postalign_bam.jsd.line_812.id_32'
	Task PID           : '22031'
	Task hint          : 'plotFingerprint -b /home/unix/levgenio/data/AQUAS/AQUAS_out/10222853_Microglia_out/align/rep1/10222853_Microglia_H3K27ac1_161020Tsa_D16-11125_hg19_bestmap.nodup.bam /home/unix/levgenio/data/AQUAS/AQUAS_out/10222853_Microglia_out/align/rep2/10222853_Microglia_H3K27ac2_161020Tsa_D16-11126_hg19_bestmap'
	Task resources     : 'cpus: 1	mem: -1.0 B	wall-timeout: 8640000'
	State              : 'ERROR'
	Dependency state   : 'ERROR'
	Retries available  : '1'
	Input files        : '[/home/unix/levgenio/data/AQUAS/AQUAS_out/10222853_Microglia_out/align/rep1/10222853_Microglia_H3K27ac1_161020Tsa_D16-11125_hg19_bestmap.nodup.bam, /home/unix/levgenio/data/AQUAS/AQUAS_out/10222853_Microglia_out/align/rep2/10222853_Microglia_H3K27ac2_161020Tsa_D16-11126_hg19_bestmap.nodup.bam, /home/unix/levgenio/data/AQUAS/AQUAS_out/10222853_Microglia_out/align/ctl1/10222853_Microglia_Input1_161020Tsa_D16-11119_hg19_bestmap.nodup.bam]'
	Output files       : '[/home/unix/levgenio/data/AQUAS/AQUAS_out/10222853_Microglia_out/qc/10222853_Microglia_out_jsd.png, /home/unix/levgenio/data/AQUAS/AQUAS_out/10222853_Microglia_out/qc/10222853_Microglia_out_jsd.qc]'
	Script file        : '/broad/compbio/levgenio/code/AQUAS_pipeline/chipseq.bds.20200801_100653_737/task.postalign_bam.jsd.line_812.id_32.sh'
	Exit status        : '1'
	Program            :

		# SYS command. line 813

		 if [[ -f $(which conda) && $(conda env list | grep aquas_chipseq | wc -l) != "0" ]]; then source activate aquas_chipseq; sleep 5; fi;  export PATH=/home/unix/levgenio/software/TF_chipseq_pipeline/.:/home/unix/levgenio/software/TF_chipseq_pipeline/modules:/home/unix/levgenio/software/TF_chipseq_pipeline/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)

		# SYS command. line 814

		 plotFingerprint -b /home/unix/levgenio/data/AQUAS/AQUAS_out/10222853_Microglia_out/align/rep1/10222853_Microglia_H3K27ac1_161020Tsa_D16-11125_hg19_bestmap.nodup.bam /home/unix/levgenio/data/AQUAS/AQUAS_out/10222853_Microglia_out/align/rep2/10222853_Microglia_H3K27ac2_161020Tsa_D16-11126_hg19_bestmap.nodup.bam /home/unix/levgenio/data/AQUAS/AQUAS_out/10222853_Microglia_out/align/ctl1/10222853_Microglia_Input1_161020Tsa_D16-11119_hg19_bestmap.nodup.bam --JSDsample /home/unix/levgenio/data/AQUAS/AQUAS_out/10222853_Microglia_out/align/ctl1/10222853_Microglia_Input1_161020Tsa_D16-11119_hg19_bestmap.nodup.bam \
					--labels rep1 rep2 ctl1 \
					--outQualityMetrics /home/unix/levgenio/data/AQUAS/AQUAS_out/10222853_Microglia_out/qc/10222853_Microglia_out_jsd.qc \
					--minMappingQuality 30 \
					-T "Fingerprints of different samples" \
					--blackListFileName /home/unix/levgenio/data/hg19/wgEncodeDacMapabilityConsensusExcludable.bed.gz \
					--numberOfProcessors 1 \
					--plotFile /home/unix/levgenio/data/AQUAS/AQUAS_out/10222853_Microglia_out/qc/10222853_Microglia_out_jsd.png

		# SYS command. line 823

		 TASKTIME=$[$(date +%s)-${STARTTIME}]; echo "Task has finished (${TASKTIME} seconds)."; sleep 0
	StdErr (100000000 lines)  :
		RuntimeError: module compiled against API version 0xc but this version of numpy is 0xa
		Traceback (most recent call last):
		  File "/home/unix/levgenio/software/miniconda3/envs/aquas_chipseq/bin/plotFingerprint", line 4, in <module>
		    from deeptools.plotFingerprint import main
		  File "/home/unix/levgenio/software/miniconda3/envs/aquas_chipseq/lib/python2.7/site-packages/deeptools/plotFingerprint.py", line 15, in <module>
		    import deeptools.countReadsPerBin as countR
		  File "/home/unix/levgenio/software/miniconda3/envs/aquas_chipseq/lib/python2.7/site-packages/deeptools/countReadsPerBin.py", line 13, in <module>
		    import pyBigWig
		ImportError: numpy.core.multiarray failed to import

03:19:03.543	Wait: Task 'chipseq.bds.20200801_100653_737/task.postalign_bam.jsd.line_812.id_32' finished.
03:19:03.543	Wait: Waiting for task to finish: chipseq.bds.20200801_100653_737/task.postalign_bam.dedup_bam_2_rep1.line_205.id_12, state: FINISHED
03:19:03.543	Wait: Task 'chipseq.bds.20200801_100653_737/task.postalign_bam.dedup_bam_2_rep1.line_205.id_12' finished.
03:19:03.543	Wait: Waiting for task to finish: chipseq.bds.20200801_100653_737/task.postalign_xcor.xcor_rep1.line_102.id_16, state: FINISHED
03:19:03.543	Wait: Task 'chipseq.bds.20200801_100653_737/task.postalign_xcor.xcor_rep1.line_102.id_16' finished.
03:19:03.543	Wait: Waiting for task to finish: chipseq.bds.20200801_100653_737/task.postalign_bam.markdup_bam_picard_ctl1.line_409.id_25, state: FINISHED
03:19:03.543	Wait: Task 'chipseq.bds.20200801_100653_737/task.postalign_bam.markdup_bam_picard_ctl1.line_409.id_25' finished.
03:19:03.543	Wait: Waiting for task to finish: chipseq.bds.20200801_100653_737/task.postalign_bam.dedup_bam_2_ctl2.line_205.id_30, state: FINISHED
03:19:03.543	Wait: Task 'chipseq.bds.20200801_100653_737/task.postalign_bam.dedup_bam_2_ctl2.line_205.id_30' finished.
03:19:03.543	Wait: Waiting for task to finish: chipseq.bds.20200801_100653_737/task.postalign_bam.dedup_bam_1_ctl2.line_155.id_28, state: FINISHED
03:19:03.543	Wait: Task 'chipseq.bds.20200801_100653_737/task.postalign_bam.dedup_bam_1_ctl2.line_155.id_28' finished.
03:19:03.543	Wait: Waiting for task to finish: chipseq.bds.20200801_100653_737/task.postalign_bam.bam_to_tag_rep1.line_595.id_14, state: FINISHED
03:19:03.543	Wait: Task 'chipseq.bds.20200801_100653_737/task.postalign_bam.bam_to_tag_rep1.line_595.id_14' finished.
03:19:03.543	Wait: Waiting for task to finish: chipseq.bds.20200801_100653_737/task.postalign_bam.bam_to_tag_rep1.line_595.id_13, state: FINISHED
03:19:03.543	Wait: Task 'chipseq.bds.20200801_100653_737/task.postalign_bam.bam_to_tag_rep1.line_595.id_13' finished.
Fatal error: /home/unix/levgenio/software/TF_chipseq_pipeline/chipseq.bds, line 414, pos 3. Task/s failed.

Creating checkpoint file: Config or command line option disabled checkpoint file creation, nothing done.
03:19:03.705	Writing report file 'chipseq.bds.20200801_100653_737.report.html'
03:19:03.739	Program 'chipseq.bds.20200801_100653_737' finished, exit value: 1, tasks executed: 23, tasks failed: 1, tasks failed names: jsd.
03:19:03.739	Finished. Exit code: 1
03:19:03.739	ExecutionerLocal 'Local[28]': Killed

Citing the AQUAS Pipeline

Hello,

I used the AQUAS TF Chip-seq peak calling pipeline for a work that I will be submitting very soon and I can't find which DOI to officially cite in the paper? Which DOI is the best to associate with the pipeline?

-Michael

MACS2 KeyError

Hello,
sorry to bother you, but is it possible in some way to update MACS2?
I get a KeyError with MACS2 if there is a conting which doesn't contain any peaks.
It is possible to get around this problem with the following fix:
def get_data_from_chrom (self, str chrom):
if not self.peaks.has_key(chrom):
self.peaks[chrom]=[]
return self.peaks[chrom]

In PeakIO.pyx file.

This issue is fixed in MACS 2.1.1, but if I change the requirements.txt file, MACS2 doesn't work anymore.

phantompeakqualtools missing from install_dependencies

When installing dependencies, it reports an error on line 63 for cloning phantompeakqualtools.
When attempting to run the pipeline, it errors at xcor with an stderr message of no run_spp.R in directories.

mm10

I tried installing the mm10 genome today and encountered issues retrieving data from http://mitra.stanford.edu/kundaje/genome_data/mm10/. I looked and it appears as though there is no more data here. Have the data for generating genome files been removed?

ERROR:root:--extsize must >= 1! and chipseq.bds can not get all replicates go through

I am running chipseq with 9 replicates and 11 controls as

bds chipseq.bds -final_stage peak -no_pseudo_rep -peak_caller macs2 -species hg19 -nth 12 -out_dir test_hg19 -title B10 -fastq1 SRR8701918.fastq.gz -fastq2 SRR8701919.fastq.gz -fastq3 SRR8701920.fastq.gz -fastq4 SRR8701921.fastq.gz -fastq5 SRR8701922.fastq.gz -fastq6 SRR8701923.fastq.gz -fastq7 SRR8701925.fastq.gz -fastq8 SRR8701926.fastq.gz -fastq9 SRR8701924.fastq.gz -ctl_fastq1 SRR8702298.fastq.gz -ctl_fastq2 SRR8702299.fastq.gz -ctl_fastq3 SRR8702300.fastq.gz -ctl_fastq4 SRR8702301.fastq.gz -ctl_fastq5 SRR8702302.fastq.gz -ctl_fastq6 SRR8702303.fastq.gz -ctl_fastq7 SRR8702304.fastq.gz -ctl_fastq8 SRR8702305.fastq.gz -ctl_fastq9 SRR8702306.fastq.gz -ctl_fastq10 SRR8702307.fastq.gz -ctl_fastq11 SRR8702308.fastq.gz

The alignment seemed fine. But in the "align" folder, I can only see 2 controls as "ctl1 ctl2 pooled_ctl pooled_pseudo_reps pooled_rep pseudo_reps rep1 rep2 rep3 rep4 rep5 rep6 rep7 rep8 rep9". Were other controls pooled together? Or 2 controls are the most control chipseq.bds can take?
macs2 peak call seemed failed and I got a lot of error message "ERROR:root:--extsize must >= 1!"

Any suggestion will be appreciated. Thanks.

kundajelab / chipseq_pipeline Goto Github PK

chipseq_pipeline's Issues

COMMANDS:

bds /home/TF_chipseq_pipeline/chipseq.bds -species hg38 -nth 4 -out_dir FOXA2GAI -se -fastq1:1 SRR074921.fastq.gz -fastq1:2 SRR074922.fastq.gz -fastq2:1 SRR074924.fastq.gz -fastq2:2 SRR074925.fastq.gz -ctl_fastq1 SRR074927.fastq.gz -ctl_fastq2 SRR074929.fastq.gz (ERROR)

ERROR:

alignments reported for multimapping : 0

replicates : 2

reads to subsample for cross-corr. analysis : 15000000

reads to subsample exp. replicates (0: no subsampling): 0

reads to subsample controls (0: no subsampling) : 0

Creating checkpoint file: Config or command line option disabled checkpoint file creation, nothing done.

N1: single-end with replicates and controls

N2: paired-end without replicates and controls

bigwig files

Recommend Projects

Recommend Topics

Recommend Org