I am getting the following error for a paired end chipseq data. . . .

[fread] Unexpected end of file (bwa_sai2sam_pe_core) about chipseq_pipeline HOT 6 CLOSED

kundajelab commented on July 21, 2024

[fread] Unexpected end of file (bwa_sai2sam_pe_core)

from chipseq_pipeline.

Comments (6)

leepc12 commented on July 21, 2024

Please search for the error on the internet before posting an issue.
https://www.biostars.org/p/186585/

Do ls -l on the genome data directory for bwa index and .fa file. Also please post a full log.

from chipseq_pipeline.

sbstatgen commented on July 21, 2024

I am attaching the full nohup output and error files (I was not aware of the attach option, have just started using github). This time I ran with 1 replicate and 1 control replicate and got same error.

Regarding the biostars thread, it turns out that I had gone through this and some other biostars posts for a few days before I posted the issue. Accordingly I had checked my fastq files for pairing of reads. This is why I mentioned in my original post that
"I checked that the fastq files (i.e. for both replicate and control replicate, i checked the pairing of reads, between read1 and read2, it looks ok)."
To be specific I checked all sequences are of same length (100) and all read1 and read2-ID-s are paired correctly. I may still be missing something in that biostars thread, please point it out (it happens that I am sort of new to sequencing analysis and terminology).

Regarding, the genome-data folder I had come across that suggestion somewhere and checked that it has all the files ".fa", ".fai", and bwa_index has amb, ann, bwt, pac and sa files. Pasting the "ls -l" here

sb1@sb-hpz800:bwa_index $ ls -l
total 5290524
lrwxrwxrwx 1 sb1 sb1 15 Apr 10 10:23 male.hg19.fa -> ../male.hg19.fa
-rw-rw-r-- 1 sb1 sb1 6548 Apr 10 11:10 male.hg19.fa.amb
-rw-rw-r-- 1 sb1 sb1 944 Apr 10 11:10 male.hg19.fa.ann
-rw-rw-r-- 1 sb1 sb1 3095694072 Apr 10 11:10 male.hg19.fa.bwt
-rw-rw-r-- 1 sb1 sb1 773923497 Apr 10 11:10 male.hg19.fa.pac
-rw-rw-r-- 1 sb1 sb1 1547847040 Apr 10 11:24 male.hg19.fa.sa

sb1@sb-hpz800:hg19 $ ls -l
total 3997972
drwxr-xr-x 2 sb1 root 4096 Apr 9 17:04 ataqc
drwxrwxr-x 2 sb1 sb1 4096 Apr 10 11:24 bwa_index
-rw-rw-r-- 1 sb1 sb1 376 Apr 10 10:23 hg19.chrom.sizes
-rw-rw-r-- 1 sb1 sb1 3157608038 Apr 10 10:19 male.hg19.fa
-rw-rw-r-- 1 sb1 sb1 788 Apr 10 10:23 male.hg19.fa.fai
-rw-r--r-- 1 sb1 root 936272240 Jan 28 2010 male.hg19.fa.gz
drwxr-xr-x 2 sb1 root 4096 Apr 10 10:23 seq
-rw-r--r-- 1 sb1 root 4731 May 5 2011 wgEncodeDacMapabilityConsensusExcludable.bed.gz

Oh and at the end of the run I am getting valid T0L1R1.PE2SE.sam.gz file. It looks complete but no sam file or any sai is seen for the second read R2. For rep1, I get all files including bam, tagAlign etc for read 1, but nothing for read 2.

Please suggest how I can go about debugging this. Thanks again for patiently reading all this.

eT4.txt
oT4.txt

from chipseq_pipeline.

leepc12 commented on July 21, 2024

Did you have enough disk space on your temporary directories and working dir?

$ echo $TMP
$ echo $TMPDIR
$ df -h $TMP
$ df -h $TMPDIR

How much memory do you have on your system? If it's a cluster and you are submitting jobs to a cluster engline like SGE or SLURM then use higher memory settings with -mem_bwa 30G. Let's see if this helps.

from chipseq_pipeline.

sbstatgen commented on July 21, 2024

Both $TMP and $TMPDIR are null (undefined) in my bash shell. But since the other replicates have run successfully, I guess that is not a problem. I think there is no storage issue.

[sb1@bladeamd-4 ssg-chipseq]$ df -h /data

Filesystem Size Used Avail Use% Mounted on
/dev/data 1.1P 453T 667T 41% /data

[sb1@bladeamd-4 ssg-chipseq]$ df -h /tmp

Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos_bladeamd--4-root 100G 6.8G 94G 7% /

I am using a single blade server, sometimes also a workstation (same error in both). On the server
MemTotal: 65774440 kB
MemFree: 48467220 kB
MemAvailable: 52434928 kB

There may be 1 or 2 other users running jobs occasionally (not in my control). On the workstation, I am the only user.

MemTotal: 24671836 kB
MemFree: 21266696 kB

As you suggested I tried mem_bwa 30G on the server and got the same error. What should be the number of threads "nth" ? This is lscpu on the server and workstation.

Server
CPU(s): 6
Thread(s) per core: 1
Core(s) per socket: 6

Workstation

CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 2
Core(s) per socket: 6

Another suspicion I have is that for this replicate somehow wall time is being crossed after creating .sai for 1'st read. I am trying this now with nth=8 on the server:

           "wt_bwa" : "71h",
            "mem_bwa" : "30G",

Is there any way to skip the index already done for read1 and proceed to read 2 ?

from chipseq_pipeline.

sbstatgen commented on July 21, 2024

Sorry it seems there are two unrelated issues here. I discovered that although my other runs were completed without error, they only gave files for R1 (read 1). This one had probably failed because of memory or storage or wall time issue. I am running with higher memory etc as you suggested and it has gone closer to completion (not yet completed). so i will close this issue. the primary issue now is why it is only taking the read1's separately. so i will probably post the issue with a different heading. thanks.

from chipseq_pipeline.

leepc12 commented on July 21, 2024

FYI, read1 (R1) fastq is separately processed (as single ended) for cross-correlation analysis only and it's not used for other downstream analyses like peak calling and IR.

from chipseq_pipeline.

[fread] Unexpected end of file (bwa_sai2sam_pe_core) about chipseq_pipeline HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent