The nchhstp-dtbe-varpipe-wgs from cdcgov

python2 or python 3?

In the introduction, python2... is required. However, the script clockwork at /tools/clockwork-0.11.3/python/scripts/ seems to use python3

#!/usr/bin/env python3

import argparse
import logging

import clockwork
.....

Docker image is not compatible with gitpod

Describe the bug
There's a chown error in the Docker image that probably only impacts those using gitpod.

Impact
It makes me sad.

To Reproduce
Steps to reproduce the behavior:

Go to gitpod and set up a working environment
Run docker pull ghcr.io/cdcgov/varpipe_wgs_with_refs:latest
See error

Expected behavior
I was hoping to pull and run the container.

Screenshots
Logs

$ docker pull ghcr.io/cdcgov/varpipe_wgs_with_refs:latest
latest: Pulling from cdcgov/varpipe_wgs_with_refs
675920708c8b: Pull complete 
369f07e1bfcc: Pull complete 
396ca6e8bff5: Extracting [==================================================>]  1.493GB/1.493GB
fdbf54dfdd92: Download complete 
d33475f0e2a1: Download complete 
1cb02021a3bb: Download complete 
a7949e93dce0: Download complete 
71e0629fd528: Download complete 
a8ab3ebdb29c: Download complete 
239eb7efc411: Download complete 
25d56755ff10: Download complete 
d1135ae50cb4: Download complete 
failed to register layer: ApplyLayer exit status 1 stdout:  stderr: failed to Lchown "/bioinf-tools/minimap2-2.24" for UID 54818, GID 1048853 (try increasing the number of subordinate IDs in /etc/subuid and /etc/subgid): lchown /bioinf-tools/minimap2-2.24: invalid argument

Desktop (please complete the following information):

gitpod

Additional context
Add any other context about the problem here.

runVarpipeline.sh isn't located in /varpipe_wgs/data in ghcr.io/cdcgov/varpipe_wgs_with_refs:latest

Describe the bug
runVarpipeline.sh doesn't exist in the Docker image.

Impact
The README steps cannot be followed. A workaround is to download 'runVarpipeline.sh' from github and add during runtime.

To Reproduce
Steps to reproduce the behavior:

docker pull ghcr.io/cdcgov/varpipe_wgs_with_refs:latest
docker run -it -v $(pwd):/varpipe_wgs/data ghcr.io/cdcgov/varpipe_wgs_with_refs:latest
cd /varpipe_wgs/data
./runVarpipeline.sh

Expected behavior
I expected runVarpipeline.sh to be located at /varpipe_wgs/data in the docker image

In https://github.com/CDCgov/NCHHSTP-DTBE-Varpipe-WGS/blob/master/Dockerfile.with_refs, it looks like there needs to be another copy command.

How to run the singularity image of the pipeline in a script?

I want to add the singularity image of the pipeline in my own script. Would you tell me how should I do?
Based on the introduction of the pipeline, you only show how to get in singularity shell and then run the commands. If I want to run the singularity image in my script, how should I do?

Many thanks

QC log : failed QC checks

When run pipeline, qc log showed "failed QC checks". What is detailed qc check standard? why it is failed?

AttributeError: 'module' object has no attribute 'FullLoader'

What type of help do you need?

Question

Please describe how you'd like us to help.
$ ./runVarpipeline.sh
ls: cannot access _L001_R2_001: No such file or directory
ls: cannot access _L001_R1_001: No such file or directory
rm: cannot remove ‘_L00’: No such file or directory

../tools/Varpipeline -q A01_R1_001.fastq.gz -r ../tools/ref2.fa -n A01 -q2 A01_R2_001.fastq.gz -a -v

Traceback (most recent call last):
File "../tools/Varpipeline", line 97, in
args.verbose, ' '.join(sys.argv))
File "/blue/bphl-florida/dongyibo/CDC_TB_pipeline/NCHHSTP-DTBE-Varpipe-WGS/tools/Varpipeline.py", line 76, in init
cfg = yaml.load(ymlfile, Loader=yaml.FullLoader)
AttributeError: 'module' object has no attribute 'FullLoader'
../tools/Varpipeline -q A04_R1_001.fastq.gz -r ../tools/ref2.fa -n A04 -q2 A04_R2_001.fastq.gz -a -v

Traceback (most recent call last):
File "../tools/Varpipeline", line 97, in
args.verbose, ' '.join(sys.argv))
File "/blue/bphl-florida/dongyibo/CDC_TB_pipeline/NCHHSTP-DTBE-Varpipe-WGS/tools/Varpipeline.py", line 76, in init
cfg = yaml.load(ymlfile, Loader=yaml.FullLoader)
AttributeError: 'module' object has no attribute 'FullLoader'
../tools/Varpipeline -q A06_R1_001.fastq.gz -r ../tools/ref2.fa -n A06 -q2 A06_R2_001.fastq.gz -a -v

Traceback (most recent call last):
File "../tools/Varpipeline", line 97, in
args.verbose, ' '.join(sys.argv))
File "/blue/bphl-florida/dongyibo/CDC_TB_pipeline/NCHHSTP-DTBE-Varpipe-WGS/tools/Varpipeline.py", line 76, in init
cfg = yaml.load(ymlfile, Loader=yaml.FullLoader)
AttributeError: 'module' object has no attribute 'FullLoader'

Debug the script "clockwork" in /tools/clockwork-0.11.3/

the command "......clockwork_v0.11.3.img......" should be corrected to "......_v01.11.3.img......"

Running Pipeline after Local install

Hello,

Besides singularity, I also tried the local installation option.

The download completed and I was able to run runVarpipeline.sh after copying my paired gzipped fastq files in the data/ directroy.
The pipeline ran to completion without printing an error messages to the terminal, but looking into the output files I noticed that most of the created files were empty, although the Lineage and interpretation file seemed to be generated correctly.
Looking at the Nextflow log I see

Command error:
  .command.sh: line 2: clockwork: command not found

I still see this error after updating the tools/clockwork-0.11.3/clockwork script to point at the full path to the clockwork image.

-Wes

Retain Input Fastqs

Hello,

Is it necessary to delete the input fastqs from the data folder after the script finishes running?
This seems to occur whether or not the script actually ran correctly, making in time consuming to re-copy the desired input files.
Could the input fastqs be reatined in data/ or moved to another folder under data/ something like "fastq"?
If deletion of these files is deemed more appropriate, could the readme be more clear that this will occur when running runVarpipeline.sh?

Thanks,
Wes

Process `map_reads (1)` terminated with an error exit status (1)

../tools/Varpipeline -q JTT20001087-FL-M04613-200220_S40_R1_001.fastq.gz -r ../tools/ref2.fa -n JTT20001087-FL-M04613-200220_S40 -q2 JTT20001087-FL-M04613-200220_S40_R2_001.fastq.gz -a -v

---[ nextflow remove contamination ]---
Command:
/varpipe_wgs/tools/clockwork-0.11.3/results/nextflow run /varpipe_wgs/tools/clockwork-0.11.3/nextflow/remove_contam.nf --ref_fasta /varpipe_wgs/tools/clockwork-0.11.3/OUT/ref.fa --ref_metadata_tsv /varpipe_wgs/tools/clockwork-0.11.3/OUT/remove_contam_metadata.tsv --reads_in1 JTT20001087-FL-M04613-200220_S40_R1_001.fastq.gz --reads_in2 JTT20001087-FL-M04613-200220_S40_R2_001.fastq.gz --outprefix Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/clockwork/JTT20001087-FL-M04613-200220_S40 --mapping_threads 12

Standard Output:
N E X T F L O W ~ version 20.07.1
Launching /varpipe_wgs/tools/clockwork-0.11.3/nextflow/remove_contam.nf [berserk_mahavira] - revision: b5b8e4e89e
[- ] process > make_jobs_tsv -
[- ] process > map_reads -
[- ] process > sam_to_fastq_files -

executor > local (1)
[ea/bd3185] process > make_jobs_tsv [100%] 1 of 1 âœ”
[- ] process > map_reads [ 0%] 0 of 1
[- ] process > sam_to_fastq_files -

executor > local (2)
[ea/bd3185] process > make_jobs_tsv [100%] 1 of 1 âœ”
[19/887305] process > map_reads (1) [ 0%] 0 of 1
[- ] process > sam_to_fastq_files -

executor > local (2)
[ea/bd3185] process > make_jobs_tsv [100%] 1 of 1 âœ”
[19/887305] process > map_reads (1) [ 0%] 0 of 1
[- ] process > sam_to_fastq_files -
Error executing process > 'map_reads (1)'

Caused by:
Process map_reads (1) terminated with an error exit status (1)

Command executed:

clockwork map_reads --threads 12 --unsorted_sam sample_name /varpipe_wgs/tools/clockwork-0.11.3/OUT/ref.fa contam_sam /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R1_001.fastq.gz /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R2_001.fastq.gz

Command exit status:
1

Command output:
(empty)

Command error:
[2023-01-30T10:15:03 - clockwork map_reads - INFO] Run command: fqtools count /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R1_001.fastq.gz /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R2_001.fastq.gz
[2023-01-30T10:15:08 - clockwork map_reads - INFO] Return code: 0
[2023-01-30T10:15:08 - clockwork map_reads - INFO] stdout:
493552
[2023-01-30T10:15:08 - clockwork map_reads - INFO] stderr:

[2023-01-30T10:15:08 - clockwork map_reads - INFO] Run command: minimap2 --split-prefix contam_sam.tmp.map_reads_set.i4vo5o1q/map.0.tmp -a -t 12 -x sr -R '@rg\tLB:LIB\tID:1\tSM:sample_name' /varpipe_wgs/tools/clockwork-0.11.3/OUT/ref.fa /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R1_001.fastq.gz /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R2_001.fastq.gz | awk '/^@/ || !(and($2,256) || and($2,2048))' > contam_sam.tmp.map_reads_set.i4vo5o1q/map.0
[2023-01-30T10:17:30 - clockwork map_reads - INFO] Return code: 0
[2023-01-30T10:17:31 - clockwork map_reads - INFO] stdout:

[2023-01-30T10:17:31 - clockwork map_reads - INFO] stderr:
[M::mm_idx_gen::93.012*1.85] collected minimizers
Killed
[2023-01-30T10:17:31 - clockwork map_reads - INFO] Run command: grep -c -v '^@' contam_sam.tmp.map_reads_set.i4vo5o1q/map.0
[2023-01-30T10:17:32 - clockwork map_reads - INFO] Return code: 1
Error running this command: grep -c -v '^@' contam_sam.tmp.map_reads_set.i4vo5o1q/map.0
Return code: 1
Output from stdout:
0

Output from stderr:

Traceback (most recent call last):
File "/usr/local/bin/clockwork", line 4, in
import('pkg_resources').run_script('clockwork==0.11.3', 'clockwork')
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 667, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1470, in run_script
exec(script_code, namespace, namespace)
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/EGG-INFO/scripts/clockwork", line 1019, in
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/clockwork/tasks/map_reads.py", line 13, in run
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/clockwork/read_map.py", line 133, in map_reads_set
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/clockwork/read_map.py", line 74, in map_reads
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/clockwork/utils.py", line 124, in sam_record_count
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/clockwork/utils.py", line 36, in syscall
Exception: Error in system call. Cannot continue

Work dir:
/varpipe_wgs/data/work/19/887305a2a7107db282285662fdf09d

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

executor > local (2)
[ea/bd3185] process > make_jobs_tsv [100%] 1 of 1 âœ”
[19/887305] process > map_reads (1) [ 0%] 0 of 1 âœ”
[- ] process > sam_to_fastq_files -
Error executing process > 'map_reads (1)'

Caused by:
Process map_reads (1) terminated with an error exit status (1)

Command executed:

clockwork map_reads --threads 12 --unsorted_sam sample_name /varpipe_wgs/tools/clockwork-0.11.3/OUT/ref.fa contam_sam /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R1_001.fastq.gz /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R2_001.fastq.gz

Command exit status:
1

Command output:
(empty)

Command error:
[2023-01-30T10:15:03 - clockwork map_reads - INFO] Run command: fqtools count /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R1_001.fastq.gz /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R2_001.fastq.gz
[2023-01-30T10:15:08 - clockwork map_reads - INFO] Return code: 0
[2023-01-30T10:15:08 - clockwork map_reads - INFO] stdout:
493552
[2023-01-30T10:15:08 - clockwork map_reads - INFO] stderr:

[2023-01-30T10:15:08 - clockwork map_reads - INFO] Run command: minimap2 --split-prefix contam_sam.tmp.map_reads_set.i4vo5o1q/map.0.tmp -a -t 12 -x sr -R '@rg\tLB:LIB\tID:1\tSM:sample_name' /varpipe_wgs/tools/clockwork-0.11.3/OUT/ref.fa /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R1_001.fastq.gz /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R2_001.fastq.gz | awk '/^@/ || !(and($2,256) || and($2,2048))' > contam_sam.tmp.map_reads_set.i4vo5o1q/map.0
[2023-01-30T10:17:30 - clockwork map_reads - INFO] Return code: 0
[2023-01-30T10:17:31 - clockwork map_reads - INFO] stdout:

[2023-01-30T10:17:31 - clockwork map_reads - INFO] stderr:
[M::mm_idx_gen::93.012*1.85] collected minimizers
Killed
[2023-01-30T10:17:31 - clockwork map_reads - INFO] Run command: grep -c -v '^@' contam_sam.tmp.map_reads_set.i4vo5o1q/map.0
[2023-01-30T10:17:32 - clockwork map_reads - INFO] Return code: 1
Error running this command: grep -c -v '^@' contam_sam.tmp.map_reads_set.i4vo5o1q/map.0
Return code: 1
Output from stdout:
0

Output from stderr:

Traceback (most recent call last):
File "/usr/local/bin/clockwork", line 4, in
import('pkg_resources').run_script('clockwork==0.11.3', 'clockwork')
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 667, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1470, in run_script
exec(script_code, namespace, namespace)
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/EGG-INFO/scripts/clockwork", line 1019, in
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/clockwork/tasks/map_reads.py", line 13, in run
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/clockwork/read_map.py", line 133, in map_reads_set
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/clockwork/read_map.py", line 74, in map_reads
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/clockwork/utils.py", line 124, in sam_record_count
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/clockwork/utils.py", line 36, in syscall
Exception: Error in system call. Cannot continue

Work dir:
/varpipe_wgs/data/work/19/887305a2a7107db282285662fdf09d

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

executor > local (2)
[ea/bd3185] process > make_jobs_tsv [100%] 1 of 1 âœ”
[19/887305] process > map_reads (1) [100%] 1 of 1, failed: 1 âœ˜
[- ] process > sam_to_fastq_files -
Error executing process > 'map_reads (1)'

Caused by:
Process map_reads (1) terminated with an error exit status (1)

Command executed:

clockwork map_reads --threads 12 --unsorted_sam sample_name /varpipe_wgs/tools/clockwork-0.11.3/OUT/ref.fa contam_sam /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R1_001.fastq.gz /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R2_001.fastq.gz

Command exit status:
1

Command output:
(empty)

Command error:
[2023-01-30T10:15:03 - clockwork map_reads - INFO] Run command: fqtools count /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R1_001.fastq.gz /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R2_001.fastq.gz
[2023-01-30T10:15:08 - clockwork map_reads - INFO] Return code: 0
[2023-01-30T10:15:08 - clockwork map_reads - INFO] stdout:
493552
[2023-01-30T10:15:08 - clockwork map_reads - INFO] stderr:

[2023-01-30T10:15:08 - clockwork map_reads - INFO] Run command: minimap2 --split-prefix contam_sam.tmp.map_reads_set.i4vo5o1q/map.0.tmp -a -t 12 -x sr -R '@rg\tLB:LIB\tID:1\tSM:sample_name' /varpipe_wgs/tools/clockwork-0.11.3/OUT/ref.fa /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R1_001.fastq.gz /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R2_001.fastq.gz | awk '/^@/ || !(and($2,256) || and($2,2048))' > contam_sam.tmp.map_reads_set.i4vo5o1q/map.0
[2023-01-30T10:17:30 - clockwork map_reads - INFO] Return code: 0
[2023-01-30T10:17:31 - clockwork map_reads - INFO] stdout:

[2023-01-30T10:17:31 - clockwork map_reads - INFO] stderr:
[M::mm_idx_gen::93.012*1.85] collected minimizers
Killed
[2023-01-30T10:17:31 - clockwork map_reads - INFO] Run command: grep -c -v '^@' contam_sam.tmp.map_reads_set.i4vo5o1q/map.0
[2023-01-30T10:17:32 - clockwork map_reads - INFO] Return code: 1
Error running this command: grep -c -v '^@' contam_sam.tmp.map_reads_set.i4vo5o1q/map.0
Return code: 1
Output from stdout:
0

Output from stderr:

Traceback (most recent call last):
File "/usr/local/bin/clockwork", line 4, in
import('pkg_resources').run_script('clockwork==0.11.3', 'clockwork')
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 667, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1470, in run_script
exec(script_code, namespace, namespace)
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/EGG-INFO/scripts/clockwork", line 1019, in
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/clockwork/tasks/map_reads.py", line 13, in run
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/clockwork/read_map.py", line 133, in map_reads_set
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/clockwork/read_map.py", line 74, in map_reads
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/clockwork/utils.py", line 124, in sam_record_count
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/clockwork/utils.py", line 36, in syscall
Exception: Error in system call. Cannot continue

Work dir:
/varpipe_wgs/data/work/19/887305a2a7107db282285662fdf09d

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

executor > local (2)
[ea/bd3185] process > make_jobs_tsv [100%] 1 of 1 âœ”
[19/887305] process > map_reads (1) [100%] 1 of 1, failed: 1 âœ˜
[- ] process > sam_to_fastq_files -
Error executing process > 'map_reads (1)'

Caused by:
Process map_reads (1) terminated with an error exit status (1)

Command executed:

clockwork map_reads --threads 12 --unsorted_sam sample_name /varpipe_wgs/tools/clockwork-0.11.3/OUT/ref.fa contam_sam /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R1_001.fastq.gz /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R2_001.fastq.gz

Command exit status:
1

Command output:
(empty)

Command error:
[2023-01-30T10:15:03 - clockwork map_reads - INFO] Run command: fqtools count /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R1_001.fastq.gz /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R2_001.fastq.gz
[2023-01-30T10:15:08 - clockwork map_reads - INFO] Return code: 0
[2023-01-30T10:15:08 - clockwork map_reads - INFO] stdout:
493552
[2023-01-30T10:15:08 - clockwork map_reads - INFO] stderr:

[2023-01-30T10:15:08 - clockwork map_reads - INFO] Run command: minimap2 --split-prefix contam_sam.tmp.map_reads_set.i4vo5o1q/map.0.tmp -a -t 12 -x sr -R '@rg\tLB:LIB\tID:1\tSM:sample_name' /varpipe_wgs/tools/clockwork-0.11.3/OUT/ref.fa /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R1_001.fastq.gz /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R2_001.fastq.gz | awk '/^@/ || !(and($2,256) || and($2,2048))' > contam_sam.tmp.map_reads_set.i4vo5o1q/map.0
[2023-01-30T10:17:30 - clockwork map_reads - INFO] Return code: 0
[2023-01-30T10:17:31 - clockwork map_reads - INFO] stdout:

[2023-01-30T10:17:31 - clockwork map_reads - INFO] stderr:
[M::mm_idx_gen::93.012*1.85] collected minimizers
Killed
[2023-01-30T10:17:31 - clockwork map_reads - INFO] Run command: grep -c -v '^@' contam_sam.tmp.map_reads_set.i4vo5o1q/map.0
[2023-01-30T10:17:32 - clockwork map_reads - INFO] Return code: 1
Error running this command: grep -c -v '^@' contam_sam.tmp.map_reads_set.i4vo5o1q/map.0
Return code: 1
Output from stdout:
0

Output from stderr:

Traceback (most recent call last):
File "/usr/local/bin/clockwork", line 4, in
import('pkg_resources').run_script('clockwork==0.11.3', 'clockwork')
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 667, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1470, in run_script
exec(script_code, namespace, namespace)
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/EGG-INFO/scripts/clockwork", line 1019, in
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/clockwork/tasks/map_reads.py", line 13, in run
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/clockwork/read_map.py", line 133, in map_reads_set
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/clockwork/read_map.py", line 74, in map_reads
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/clockwork/utils.py", line 124, in sam_record_count
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/clockwork/utils.py", line 36, in syscall
Exception: Error in system call. Cannot continue

Work dir:
/varpipe_wgs/data/work/19/887305a2a7107db282285662fdf09d

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

executor > local (2)
[ea/bd3185] process > make_jobs_tsv [100%] 1 of 1 âœ”
[19/887305] process > map_reads (1) [100%] 1 of 1, failed: 1 âœ˜
[- ] process > sam_to_fastq_files -
Error executing process > 'map_reads (1)'

Caused by:
Process map_reads (1) terminated with an error exit status (1)

Command executed:

clockwork map_reads --threads 12 --unsorted_sam sample_name /varpipe_wgs/tools/clockwork-0.11.3/OUT/ref.fa contam_sam /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R1_001.fastq.gz /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R2_001.fastq.gz

Command exit status:
1

Command output:
(empty)

Command error:
[2023-01-30T10:15:03 - clockwork map_reads - INFO] Run command: fqtools count /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R1_001.fastq.gz /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R2_001.fastq.gz
[2023-01-30T10:15:08 - clockwork map_reads - INFO] Return code: 0
[2023-01-30T10:15:08 - clockwork map_reads - INFO] stdout:
493552
[2023-01-30T10:15:08 - clockwork map_reads - INFO] stderr:

[2023-01-30T10:15:08 - clockwork map_reads - INFO] Run command: minimap2 --split-prefix contam_sam.tmp.map_reads_set.i4vo5o1q/map.0.tmp -a -t 12 -x sr -R '@rg\tLB:LIB\tID:1\tSM:sample_name' /varpipe_wgs/tools/clockwork-0.11.3/OUT/ref.fa /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R1_001.fastq.gz /varpipe_wgs/data/JTT20001087-FL-M04613-200220_S40_R2_001.fastq.gz | awk '/^@/ || !(and($2,256) || and($2,2048))' > contam_sam.tmp.map_reads_set.i4vo5o1q/map.0
[2023-01-30T10:17:30 - clockwork map_reads - INFO] Return code: 0
[2023-01-30T10:17:31 - clockwork map_reads - INFO] stdout:

[2023-01-30T10:17:31 - clockwork map_reads - INFO] stderr:
[M::mm_idx_gen::93.012*1.85] collected minimizers
Killed
[2023-01-30T10:17:31 - clockwork map_reads - INFO] Run command: grep -c -v '^@' contam_sam.tmp.map_reads_set.i4vo5o1q/map.0
[2023-01-30T10:17:32 - clockwork map_reads - INFO] Return code: 1
Error running this command: grep -c -v '^@' contam_sam.tmp.map_reads_set.i4vo5o1q/map.0
Return code: 1
Output from stdout:
0

Output from stderr:

Traceback (most recent call last):
File "/usr/local/bin/clockwork", line 4, in
import('pkg_resources').run_script('clockwork==0.11.3', 'clockwork')
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 667, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1470, in run_script
exec(script_code, namespace, namespace)
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/EGG-INFO/scripts/clockwork", line 1019, in
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/clockwork/tasks/map_reads.py", line 13, in run
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/clockwork/read_map.py", line 133, in map_reads_set
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/clockwork/read_map.py", line 74, in map_reads
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/clockwork/utils.py", line 124, in sam_record_count
File "/usr/local/lib/python3.8/dist-packages/clockwork-0.11.3-py3.8.egg/clockwork/utils.py", line 36, in syscall
Exception: Error in system call. Cannot continue

Work dir:
/varpipe_wgs/data/work/19/887305a2a7107db282285662fdf09d

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

---[ trimmomatic ]---
Command:
java -jar /varpipe_wgs/tools/trimmomatic-0.39.jar PE -threads 12 -trimlog Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/trimmomatic/trimLog.txt Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/clockwork/JTT20001087-FL-M04613-200220_S40.remove_contam.1.fq.gz Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/clockwork/JTT20001087-FL-M04613-200220_S40.remove_contam.2.fq.gz Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/trimmomatic/JTT20001087-FL-M04613-200220_S40_paired_1.fastq.gz Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/trimmomatic/JTT20001087-FL-M04613-200220_S40_unpaired_1.fastq.gz Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/trimmomatic/JTT20001087-FL-M04613-200220_S40_paired_2.fastq.gz Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/trimmomatic/JTT20001087-FL-M04613-200220_S40_unpaired_2.fastq.gz LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:40

Standard Error:
TrimmomaticPE: Started with arguments:
-threads 12 -trimlog Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/trimmomatic/trimLog.txt Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/clockwork/JTT20001087-FL-M04613-200220_S40.remove_contam.1.fq.gz Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/clockwork/JTT20001087-FL-M04613-200220_S40.remove_contam.2.fq.gz Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/trimmomatic/JTT20001087-FL-M04613-200220_S40_paired_1.fastq.gz Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/trimmomatic/JTT20001087-FL-M04613-200220_S40_unpaired_1.fastq.gz Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/trimmomatic/JTT20001087-FL-M04613-200220_S40_paired_2.fastq.gz Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/trimmomatic/JTT20001087-FL-M04613-200220_S40_unpaired_2.fastq.gz LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:40
Exception in thread "main" java.io.FileNotFoundException: Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/clockwork/JTT20001087-FL-M04613-200220_S40.remove_contam.1.fq.gz (No such file or directory)
at java.base/java.io.FileInputStream.open0(Native Method)
at java.base/java.io.FileInputStream.open(FileInputStream.java:216)
at java.base/java.io.FileInputStream.(FileInputStream.java:157)
at org.usadellab.trimmomatic.fastq.FastqParser.parse(FastqParser.java:135)
at org.usadellab.trimmomatic.TrimmomaticPE.process(TrimmomaticPE.java:265)
at org.usadellab.trimmomatic.TrimmomaticPE.run(TrimmomaticPE.java:555)
at org.usadellab.trimmomatic.Trimmomatic.main(Trimmomatic.java:80)

---[ rm ]---
Command:
rm Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/trimmomatic/JTT20001087-FL-M04613-200220_S40_unpaired_1.fastq.gz Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/trimmomatic/JTT20001087-FL-M04613-200220_S40_unpaired_2.fastq.gz

Standard Error:
rm: cannot remove 'Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/trimmomatic/JTT20001087-FL-M04613-200220_S40_unpaired_1.fastq.gz': No such file or directory
rm: cannot remove 'Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/trimmomatic/JTT20001087-FL-M04613-200220_S40_unpaired_2.fastq.gz': No such file or directory

########## Running BWA. ##########
---[ mkdir ]---
Command:
mkdir -p Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/bwa

---[ mkdir ]---
Command:
mkdir -p Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/bwa/index

---[ cp ]---
Command:
cp ../tools/ref2.fa Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/bwa/index/ref.fa

---[ bwa index ]---
Command:
/varpipe_wgs/tools/bwa index Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/bwa/index/ref.fa

Standard Error:
[bwa_index] Pack FASTA... 0.05 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 1.31 seconds elapse.
[bwa_index] Update BWT... 0.03 sec
[bwa_index] Pack forward-only FASTA... 0.03 sec
[bwa_index] Construct SA from BWT and Occ... 0.60 sec
[main] Version: 0.7.17-r1188
[main] CMD: /varpipe_wgs/tools/bwa index Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/bwa/index/ref.fa
[main] Real time: 2.102 sec; CPU: 2.020 sec

---[ CreateSequenceDictionary ]---
Command:
java -jar /varpipe_wgs/tools/picard.jar CreateSequenceDictionary R=Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/bwa/index/ref.fa O=Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/bwa/index/ref.dict

Standard Error:
INFO 2023-01-30 15:18:19 CreateSequenceDictionary

********** NOTE: Picard's command line syntax is changing.

********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)

********** The command line looks like this in the new syntax:

********** CreateSequenceDictionary -R Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/bwa/index/ref.fa -O Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/bwa/index/ref.dict

15:18:21.156 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/varpipe_wgs/tools/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Mon Jan 30 15:18:21 UTC 2023] CreateSequenceDictionary OUTPUT=Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/bwa/index/ref.dict REFERENCE=Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/bwa/index/ref.fa TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Mon Jan 30 15:18:21 UTC 2023] Executing as [email protected] on Linux 3.10.0-1160.80.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 17.0.4+8-Ubuntu-120.04; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.26.10
[Mon Jan 30 15:18:21 UTC 2023] picard.sam.CreateSequenceDictionary done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=432013312

---[ samtools faidx ]---
Command:
samtools faidx Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/bwa/index/ref.fa

---[ bwa mem ]---
Command:
/varpipe_wgs/tools/bwa mem -t 12 -R @rg\tID:JTT20001087-FL-M04613-200220_S40\tSM:JTT20001087-FL-M04613-200220_S40\tPL:ILLUMINA Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/bwa/index/ref.fa Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/trimmomatic/JTT20001087-FL-M04613-200220_S40_paired_1.fastq.gz Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/trimmomatic/JTT20001087-FL-M04613-200220_S40_paired_2.fastq.gz

Standard Error:
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[E::main_mem] fail to open file `Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/trimmomatic/JTT20001087-FL-M04613-200220_S40_paired_1.fastq.gz'.

########## Filtering alignment with GATK and Picard-Tools. ##########
---[ mkdir ]---
Command:
mkdir -p Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK

---[ mkdir ]---
Command:
mkdir -p Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/SamTools

---[ SamFormatConverter ]---
Command:
java -Xmx4g -jar /varpipe_wgs/tools/picard.jar SamFormatConverter INPUT=Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/bwa/bwa.sam VALIDATION_STRINGENCY=LENIENT OUTPUT=Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/GATK.bam

Standard Error:
INFO 2023-01-30 15:18:22 SamFormatConverter

********** NOTE: Picard's command line syntax is changing.

********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)

********** The command line looks like this in the new syntax:

********** SamFormatConverter -INPUT Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/bwa/bwa.sam -VALIDATION_STRINGENCY LENIENT -OUTPUT Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/GATK.bam

15:18:22.658 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/varpipe_wgs/tools/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Mon Jan 30 15:18:22 UTC 2023] SamFormatConverter INPUT=Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/bwa/bwa.sam OUTPUT=Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/GATK.bam VALIDATION_STRINGENCY=LENIENT VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Mon Jan 30 15:18:22 UTC 2023] Executing as [email protected] on Linux 3.10.0-1160.80.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 17.0.4+8-Ubuntu-120.04; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.26.10
[Mon Jan 30 15:18:22 UTC 2023] picard.sam.SamFormatConverter done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=425721856

---[ SortSam ]---
Command:
java -Xmx8g -Djava.io.tmpdir=Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/tmp -jar /varpipe_wgs/tools/picard.jar SortSam INPUT=Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/GATK.bam SORT_ORDER=coordinate OUTPUT=Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/GATK_s.bam VALIDATION_STRINGENCY=LENIENT TMP_DIR=Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/tmp

Standard Error:
INFO 2023-01-30 15:18:23 SortSam

********** NOTE: Picard's command line syntax is changing.

********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)

********** The command line looks like this in the new syntax:

********** SortSam -INPUT Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/GATK.bam -SORT_ORDER coordinate -OUTPUT Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/GATK_s.bam -VALIDATION_STRINGENCY LENIENT -TMP_DIR Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/tmp

15:18:23.545 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/varpipe_wgs/tools/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Mon Jan 30 15:18:23 UTC 2023] SortSam INPUT=Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/GATK.bam OUTPUT=Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/GATK_s.bam SORT_ORDER=coordinate TMP_DIR=[Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/tmp] VALIDATION_STRINGENCY=LENIENT VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Mon Jan 30 15:18:23 UTC 2023] Executing as [email protected] on Linux 3.10.0-1160.80.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 17.0.4+8-Ubuntu-120.04; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.26.10
INFO 2023-01-30 15:18:23 SortSam Finished reading inputs, merging and writing to output now.
[Mon Jan 30 15:18:23 UTC 2023] picard.sam.SortSam done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=432013312

---[ MarkDuplicates ]---
Command:
java -Xmx8g -jar /varpipe_wgs/tools/picard.jar MarkDuplicates INPUT=Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/GATK_s.bam OUTPUT=Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/GATK_sdr.bam METRICS_FILE=Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/MarkDupes.metrics ASSUME_SORTED=true REMOVE_DUPLICATES=false VALIDATION_STRINGENCY=LENIENT

Standard Error:
INFO 2023-01-30 15:18:24 MarkDuplicates

********** NOTE: Picard's command line syntax is changing.

********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)

********** The command line looks like this in the new syntax:

********** MarkDuplicates -INPUT Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/GATK_s.bam -OUTPUT Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/GATK_sdr.bam -METRICS_FILE Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/MarkDupes.metrics -ASSUME_SORTED true -REMOVE_DUPLICATES false -VALIDATION_STRINGENCY LENIENT

15:18:24.487 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/varpipe_wgs/tools/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Mon Jan 30 15:18:24 UTC 2023] MarkDuplicates INPUT=[Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/GATK_s.bam] OUTPUT=Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/GATK_sdr.bam METRICS_FILE=Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/MarkDupes.metrics REMOVE_DUPLICATES=false ASSUME_SORTED=true VALIDATION_STRINGENCY=LENIENT MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false ADD_PG_TAG_TO_READS=true DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Mon Jan 30 15:18:24 UTC 2023] Executing as [email protected] on Linux 3.10.0-1160.80.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 17.0.4+8-Ubuntu-120.04; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.26.10
INFO 2023-01-30 15:18:24 MarkDuplicates Start of doWork freeMemory: 115732352; totalMemory: 125829120; maxMemory: 8589934592
INFO 2023-01-30 15:18:24 MarkDuplicates Reading input file and constructing read end information.
INFO 2023-01-30 15:18:24 MarkDuplicates Will retain up to 31122951 data points before spilling to disk.
INFO 2023-01-30 15:18:24 MarkDuplicates Read 0 records. 0 pairs never matched.
INFO 2023-01-30 15:18:24 MarkDuplicates After buildSortedReadEndLists freeMemory: 227800464; totalMemory: 490733568; maxMemory: 8589934592
INFO 2023-01-30 15:18:24 MarkDuplicates Will retain up to 268435456 duplicate indices before spilling to disk.
INFO 2023-01-30 15:18:26 MarkDuplicates Traversing read pair information and detecting duplicates.
INFO 2023-01-30 15:18:26 MarkDuplicates Traversing fragment information and detecting duplicates.
INFO 2023-01-30 15:18:26 MarkDuplicates Sorting list of duplicate records.
INFO 2023-01-30 15:18:26 MarkDuplicates After generateDuplicateIndexes freeMemory: 1921945376; totalMemory: 4085252096; maxMemory: 8589934592
INFO 2023-01-30 15:18:26 MarkDuplicates Marking 0 records as duplicates.
INFO 2023-01-30 15:18:26 MarkDuplicates Found 0 optical duplicate clusters.
INFO 2023-01-30 15:18:26 MarkDuplicates Reads are assumed to be ordered by: coordinate
INFO 2023-01-30 15:18:26 MarkDuplicates Writing complete. Closing input iterator.
INFO 2023-01-30 15:18:26 MarkDuplicates Duplicate Index cleanup.
INFO 2023-01-30 15:18:26 MarkDuplicates Getting Memory Stats.
INFO 2023-01-30 15:18:26 MarkDuplicates Before output close freeMemory: 173314320; totalMemory: 184549376; maxMemory: 8589934592
INFO 2023-01-30 15:18:26 MarkDuplicates Closed outputs. Getting more Memory Stats.
INFO 2023-01-30 15:18:26 MarkDuplicates After output close freeMemory: 73334016; totalMemory: 83886080; maxMemory: 8589934592
[Mon Jan 30 15:18:26 UTC 2023] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=83886080

---[ BuildBamIndex ]---
Command:
java -Xmx8g -jar /varpipe_wgs/tools/picard.jar BuildBamIndex INPUT=Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/GATK_sdr.bam VALIDATION_STRINGENCY=LENIENT

Standard Error:
INFO 2023-01-30 15:18:27 BuildBamIndex

********** NOTE: Picard's command line syntax is changing.

********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)

********** The command line looks like this in the new syntax:

********** BuildBamIndex -INPUT Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/GATK_sdr.bam -VALIDATION_STRINGENCY LENIENT

15:18:27.586 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/varpipe_wgs/tools/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Mon Jan 30 15:18:27 UTC 2023] BuildBamIndex INPUT=Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/GATK_sdr.bam VALIDATION_STRINGENCY=LENIENT VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Mon Jan 30 15:18:27 UTC 2023] Executing as [email protected] on Linux 3.10.0-1160.80.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 17.0.4+8-Ubuntu-120.04; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.26.10
INFO 2023-01-30 15:18:27 BuildBamIndex Successfully wrote bam index file /varpipe_wgs/data/Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/GATK_sdr.bai
[Mon Jan 30 15:18:27 UTC 2023] picard.sam.BuildBamIndex done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=432013312

---[ samtools view ]---
Command:
samtools view -c Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/GATK_sdr.bam

---[ samtools view ]---
Command:
samtools view -bhF 4 -o Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_sdrcsm.bam Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/GATK_sdr.bam

---[ BuildBamIndex ]---
Command:
java -Xmx8g -jar /varpipe_wgs/tools/picard.jar BuildBamIndex INPUT=Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_sdrcsm.bam VALIDATION_STRINGENCY=LENIENT

Standard Error:
INFO 2023-01-30 15:18:28 BuildBamIndex

********** NOTE: Picard's command line syntax is changing.

********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)

********** The command line looks like this in the new syntax:

********** BuildBamIndex -INPUT Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_sdrcsm.bam -VALIDATION_STRINGENCY LENIENT

15:18:28.576 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/varpipe_wgs/tools/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Mon Jan 30 15:18:28 UTC 2023] BuildBamIndex INPUT=Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_sdrcsm.bam VALIDATION_STRINGENCY=LENIENT VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Mon Jan 30 15:18:28 UTC 2023] Executing as [email protected] on Linux 3.10.0-1160.80.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 17.0.4+8-Ubuntu-120.04; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.26.10
INFO 2023-01-30 15:18:28 BuildBamIndex Successfully wrote bam index file /varpipe_wgs/data/Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_sdrcsm.bai
[Mon Jan 30 15:18:28 UTC 2023] picard.sam.BuildBamIndex done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=432013312

---[ rm ]---
Command:
rm -r Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/tmp

---[ samtools view ]---
Command:
samtools view -c Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_sdrcsm.bam

---[ samtools depth ]---
Command:
samtools depth -a Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_sdrcsm.bam

---[ bedtools coverage ]---
Command:
/varpipe_wgs/tools/bedtools coverage -abam Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_sdrcsm.bam -b /varpipe_wgs/tools/bed_1.txt

---[ bedtools coverage ]---
Command:
/varpipe_wgs/tools/bedtools coverage -abam Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_sdrcsm.bam -b /varpipe_wgs/tools/bed_2.txt

---[ sort ]---
Command:
sort -nk 6 Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/SamTools/bed_1_coverage.txt

---[ sort ]---
Command:
sort -nk 6 Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/SamTools/bed_2_coverage.txt

---[ target region coverage estimator ]---
Command:
python /varpipe_wgs/tools/target_coverage_estimator.py /varpipe_wgs/tools/amp_bed.txt Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/SamTools/coverage.txt JTT20001087-FL-M04613-200220_S40

---[ sort ]---
Command:
sort -nk 3 Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/SamTools/target_region_coverage_amp.txt

---[ genome stats estimator ]---
Command:
python /varpipe_wgs/tools/genome_stats_estimator.py Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/SamTools/coverage.txt JTT20001087-FL-M04613-200220_S40

Standard Error:
Traceback (most recent call last):
File "/varpipe_wgs/tools/genome_stats_estimator.py", line 19, in
av_depth = depth/count
ZeroDivisionError: integer division or modulo by zero

---[ genome region coverage estimator ]---
Command:
python /varpipe_wgs/tools/genome_coverage_estimator.py Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/SamTools/bed_1_sorted_coverage.txt Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/SamTools/coverage.txt JTT20001087-FL-M04613-200220_S40

---[ genome region coverage estimator ]---
Command:
python /varpipe_wgs/tools/genome_coverage_estimator.py Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/SamTools/bed_2_sorted_coverage.txt Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/SamTools/coverage.txt JTT20001087-FL-M04613-200220_S40

---[ cat ]---
Command:
cat Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/SamTools/genome_region_coverage_1.txt Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/SamTools/genome_region_coverage_2.txt

---[ sort ]---
Command:
sort -nk 3 Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/SamTools/genome_region_coverage.txt

---[ sed ]---
Command:
sed -i 1d Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_genome_region_coverage.txt

---[ structural variant detector ]---
Command:
python /varpipe_wgs/tools/structvar_parser.py /varpipe_wgs/tools/BED.txt Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_genome_region_coverage.txt Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/SamTools/coverage.txt JTT20001087-FL-M04613-200220_S40

---[ stats estimator ]---
Command:
python /varpipe_wgs/tools/stats_estimator.py Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/SamTools/unmapped.txt Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/SamTools/mapped.txt Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_target_region_coverage.txt JTT20001087-FL-M04613-200220_S40 Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/SamTools/JTT20001087-FL-M04613-200220_S40_genome_stats.txt

Standard Error:
Traceback (most recent call last):
File "/varpipe_wgs/tools/stats_estimator.py", line 47, in
percent_mapped = (float(mapped)/float(unmapped))*100.00
ZeroDivisionError: float division by zero

########## Calling SNPs/InDels with Mutect2. ##########
---[ Mutect2 ]---
Command:
/varpipe_wgs/tools/gatk-4.2.4.0/gatk Mutect2 -R Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/bwa/index/ref.fa -I Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_sdrcsm.bam -O Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/mutect.vcf --max-mnp-distance 2 -L /varpipe_wgs/tools/intervals.bed

Standard Error:
No local jar was found, please build one by running

/varpipe_wgs/tools/gatk-4.2.4.0/gradlew localJar

or
export GATK_LOCAL_JAR=<path_to_local_jar>

---[ Mutect2 ]---
Command:
/varpipe_wgs/tools/gatk-4.2.4.0/gatk Mutect2 -R Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/bwa/index/ref.fa -I Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_sdrcsm.bam -O Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/full_mutect.vcf --max-mnp-distance 2

Standard Error:
No local jar was found, please build one by running

/varpipe_wgs/tools/gatk-4.2.4.0/gradlew localJar

or
export GATK_LOCAL_JAR=<path_to_local_jar>

---[ LeftAlignAndTrimVariants ]---
Command:
/varpipe_wgs/tools/gatk-4.2.4.0/gatk LeftAlignAndTrimVariants -R Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/bwa/index/ref.fa -V Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/mutect.vcf -O Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/gatk_mutect.vcf --split-multi-allelics

Standard Error:
No local jar was found, please build one by running

/varpipe_wgs/tools/gatk-4.2.4.0/gradlew localJar

or
export GATK_LOCAL_JAR=<path_to_local_jar>

---[ mv ]---
Command:
mv Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/gatk_mutect.vcf Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/mutect.vcf

Standard Error:
mv: cannot stat 'Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/gatk_mutect.vcf': No such file or directory

---[ LeftAlignAndTrimVariants ]---
Command:
/varpipe_wgs/tools/gatk-4.2.4.0/gatk LeftAlignAndTrimVariants -R Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/bwa/index/ref.fa -V Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/full_mutect.vcf -O Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/full_gatk_mutect.vcf --split-multi-allelics

Standard Error:
No local jar was found, please build one by running

/varpipe_wgs/tools/gatk-4.2.4.0/gradlew localJar

or
export GATK_LOCAL_JAR=<path_to_local_jar>

---[ mv ]---
Command:
mv Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/full_gatk_mutect.vcf Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/full_mutect.vcf

Standard Error:
mv: cannot stat 'Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/full_gatk_mutect.vcf': No such file or directory

---[ FilterMutectCalls ]---
Command:
/varpipe_wgs/tools/gatk-4.2.4.0/gatk FilterMutectCalls -R Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/bwa/index/ref.fa -V Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/mutect.vcf --min-reads-per-strand 1 --min-median-read-position 10 --min-allele-fraction 0.01 --microbial-mode true -O Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/JTT20001087-FL-M04613-200220_S40_filter.vcf

Standard Error:
No local jar was found, please build one by running

/varpipe_wgs/tools/gatk-4.2.4.0/gradlew localJar

or
export GATK_LOCAL_JAR=<path_to_local_jar>

---[ FilterMutectCalls ]---
Command:
/varpipe_wgs/tools/gatk-4.2.4.0/gatk FilterMutectCalls -R Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/bwa/index/ref.fa -V Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/full_mutect.vcf --min-reads-per-strand 1 --min-median-read-position 10 --min-allele-fraction 0.01 --microbial-mode true -O Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/JTT20001087-FL-M04613-200220_S40_full_filter.vcf

Standard Error:
No local jar was found, please build one by running

/varpipe_wgs/tools/gatk-4.2.4.0/gradlew localJar

or
export GATK_LOCAL_JAR=<path_to_local_jar>

---[ SnpEff ]---
Command:
java -Xmx4g -jar /varpipe_wgs/tools/snpEff/snpEff.jar NC_000962 Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/JTT20001087-FL-M04613-200220_S40_filter.vcf

Standard Error:
Error : Cannot read input file 'Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/JTT20001087-FL-M04613-200220_S40_filter.vcf'
Command line : SnpEff NC_000962 Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/JTT20001087-FL-M04613-200220_S40_filter.vcf

snpEff version SnpEff 4.3r (build 2017-09-06 16:41), by Pablo Cingolani
Usage: snpEff [eff] [options] genome_version [input_file]

variants_file                   : Default is STDIN

Options:
-chr : Prepend 'string' to chromosome name (e.g. 'chr1' instead of '1'). Only on TXT output.
-classic : Use old style annotations instead of Sequence Ontology and Hgvs.
-csvStats : Create CSV summary file.
-download : Download reference genome if not available. Default: true
-i : Input format [ vcf, bed ]. Default: VCF.
-fileList : Input actually contains a list of files to process.
-o : Ouput format [ vcf, gatk, bed, bedAnn ]. Default: VCF.
-s , -stats, -htmlStats : Create HTML summary file. Default is 'snpEff_summary.html'
-noStats : Do not create stats (summary) file

Results filter options:
-fi , -filterInterval : Only analyze changes that intersect with the intervals specified in this file (you may use this option many times)
-no-downstream : Do not show DOWNSTREAM changes
-no-intergenic : Do not show INTERGENIC changes
-no-intron : Do not show INTRON changes
-no-upstream : Do not show UPSTREAM changes
-no-utr : Do not show 5_PRIME_UTR or 3_PRIME_UTR changes
-no : Do not show 'EffectType'. This option can be used several times.

Annotations options:
-cancer : Perform 'cancer' comparisons (Somatic vs Germline). Default: false
-cancerSamples : Two column TXT file defining 'oringinal \t derived' samples.
-formatEff : Use 'EFF' field compatible with older versions (instead of 'ANN').
-geneId : Use gene ID instead of gene name (VCF output). Default: false
-hgvs : Use HGVS annotations for amino acid sub-field. Default: true
-hgvsOld : Use old HGVS notation. Default: false
-hgvs1LetterAa : Use one letter Amino acid codes in HGVS notation. Default: false
-hgvsTrId : Use transcript ID in HGVS notation. Default: false
-lof : Add loss of function (LOF) and Nonsense mediated decay (NMD) tags.
-noHgvs : Do not add HGVS annotations.
-noLof : Do not add LOF and NMD annotations.
-noShiftHgvs : Do not shift variants according to HGVS notation (most 3prime end).
-oicr : Add OICR tag in VCF file. Default: false
-sequenceOntology : Use Sequence Ontology terms. Default: true

Generic options:
-c , -config : Specify config file
-configOption name=value : Override a config file option
-d , -debug : Debug mode (very verbose).
-dataDir : Override data_dir parameter from config file.
-download : Download a SnpEff database, if not available locally. Default: true
-nodownload : Do not download a SnpEff database, if not available locally.
-h , -help : Show this help and exit
-noLog : Do not report usage statistics to server
-t : Use multiple threads (implies '-noStats'). Default 'off'
-q , -quiet : Quiet mode (do not show any messages or errors)
-v , -verbose : Verbose mode
-version : Show version number and exit

Database options:
-canon : Only use canonical transcripts.
-canonList : Only use canonical transcripts, replace some transcripts using the 'gene_id transcript_id' entries in .
-interaction : Annotate using inteactions (requires interaciton database). Default: true
-interval : Use a custom intervals in TXT/BED/BigBed/VCF/GFF file (you may use this option many times)
-maxTSL <TSL_number> : Only use transcripts having Transcript Support Level lower than <TSL_number>.
-motif : Annotate using motifs (requires Motif database). Default: true
-nextProt : Annotate using NextProt (requires NextProt database).
-noGenome : Do not load any genomic database (e.g. annotate using custom files).
-noExpandIUB : Disable IUB code expansion in input variants
-noInteraction : Disable inteaction annotations
-noMotif : Disable motif annotations.
-noNextProt : Disable NextProt annotations.
-onlyReg : Only use regulation tracks.
-onlyProtein : Only use protein coding transcripts. Default: false
-onlyTr <file.txt> : Only use the transcripts in this file. Format: One transcript ID per line.
-reg : Regulation track to use (this option can be used add several times).
-ss , -spliceSiteSize : Set size for splice sites (donor and acceptor) in bases. Default: 2
-spliceRegionExonSize : Set size for splice site region within exons. Default: 3 bases
-spliceRegionIntronMin : Set minimum number of bases for splice site region within intron. Default: 3 bases
-spliceRegionIntronMax : Set maximum number of bases for splice site region within intron. Default: 8 bases
-strict : Only use 'validated' transcripts (i.e. sequence has been checked). Default: false
-ud , -upDownStreamLen : Set upstream downstream interval length (in bases)

---[ SnpEff ]---
Command:
java -Xmx4g -jar /varpipe_wgs/tools/snpEff/snpEff.jar NC_000962 Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/JTT20001087-FL-M04613-200220_S40_full_filter.vcf

Standard Error:
Error : Cannot read input file 'Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/JTT20001087-FL-M04613-200220_S40_full_filter.vcf'
Command line : SnpEff NC_000962 Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp/GATK/JTT20001087-FL-M04613-200220_S40_full_filter.vcf

snpEff version SnpEff 4.3r (build 2017-09-06 16:41), by Pablo Cingolani
Usage: snpEff [eff] [options] genome_version [input_file]

variants_file                   : Default is STDIN

Options:
-chr : Prepend 'string' to chromosome name (e.g. 'chr1' instead of '1'). Only on TXT output.
-classic : Use old style annotations instead of Sequence Ontology and Hgvs.
-csvStats : Create CSV summary file.
-download : Download reference genome if not available. Default: true
-i : Input format [ vcf, bed ]. Default: VCF.
-fileList : Input actually contains a list of files to process.
-o : Ouput format [ vcf, gatk, bed, bedAnn ]. Default: VCF.
-s , -stats, -htmlStats : Create HTML summary file. Default is 'snpEff_summary.html'
-noStats : Do not create stats (summary) file

Results filter options:
-fi , -filterInterval : Only analyze changes that intersect with the intervals specified in this file (you may use this option many times)
-no-downstream : Do not show DOWNSTREAM changes
-no-intergenic : Do not show INTERGENIC changes
-no-intron : Do not show INTRON changes
-no-upstream : Do not show UPSTREAM changes
-no-utr : Do not show 5_PRIME_UTR or 3_PRIME_UTR changes
-no : Do not show 'EffectType'. This option can be used several times.

Annotations options:
-cancer : Perform 'cancer' comparisons (Somatic vs Germline). Default: false
-cancerSamples : Two column TXT file defining 'oringinal \t derived' samples.
-formatEff : Use 'EFF' field compatible with older versions (instead of 'ANN').
-geneId : Use gene ID instead of gene name (VCF output). Default: false
-hgvs : Use HGVS annotations for amino acid sub-field. Default: true
-hgvsOld : Use old HGVS notation. Default: false
-hgvs1LetterAa : Use one letter Amino acid codes in HGVS notation. Default: false
-hgvsTrId : Use transcript ID in HGVS notation. Default: false
-lof : Add loss of function (LOF) and Nonsense mediated decay (NMD) tags.
-noHgvs : Do not add HGVS annotations.
-noLof : Do not add LOF and NMD annotations.
-noShiftHgvs : Do not shift variants according to HGVS notation (most 3prime end).
-oicr : Add OICR tag in VCF file. Default: false
-sequenceOntology : Use Sequence Ontology terms. Default: true

Generic options:
-c , -config : Specify config file
-configOption name=value : Override a config file option
-d , -debug : Debug mode (very verbose).
-dataDir : Override data_dir parameter from config file.
-download : Download a SnpEff database, if not available locally. Default: true
-nodownload : Do not download a SnpEff database, if not available locally.
-h , -help : Show this help and exit
-noLog : Do not report usage statistics to server
-t : Use multiple threads (implies '-noStats'). Default 'off'
-q , -quiet : Quiet mode (do not show any messages or errors)
-v , -verbose : Verbose mode
-version : Show version number and exit

Database options:
-canon : Only use canonical transcripts.
-canonList : Only use canonical transcripts, replace some transcripts using the 'gene_id transcript_id' entries in .
-interaction : Annotate using inteactions (requires interaciton database). Default: true
-interval : Use a custom intervals in TXT/BED/BigBed/VCF/GFF file (you may use this option many times)
-maxTSL <TSL_number> : Only use transcripts having Transcript Support Level lower than <TSL_number>.
-motif : Annotate using motifs (requires Motif database). Default: true
-nextProt : Annotate using NextProt (requires NextProt database).
-noGenome : Do not load any genomic database (e.g. annotate using custom files).
-noExpandIUB : Disable IUB code expansion in input variants
-noInteraction : Disable inteaction annotations
-noMotif : Disable motif annotations.
-noNextProt : Disable NextProt annotations.
-onlyReg : Only use regulation tracks.
-onlyProtein : Only use protein coding transcripts. Default: false
-onlyTr <file.txt> : Only use the transcripts in this file. Format: One transcript ID per line.
-reg : Regulation track to use (this option can be used add several times).
-ss , -spliceSiteSize : Set size for splice sites (donor and acceptor) in bases. Default: 2
-spliceRegionExonSize : Set size for splice site region within exons. Default: 3 bases
-spliceRegionIntronMin : Set minimum number of bases for splice site region within intron. Default: 3 bases
-spliceRegionIntronMax : Set maximum number of bases for splice site region within intron. Default: 8 bases
-strict : Only use 'validated' transcripts (i.e. sequence has been checked). Default: false
-ud , -upDownStreamLen : Set upstream downstream interval length (in bases)

---[ create annotation ]---
Command:
python /varpipe_wgs/tools/create_annotation.py Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_DR_loci_raw_annotation.txt JTT20001087-FL-M04613-200220_S40

---[ create annotation ]---
Command:
python /varpipe_wgs/tools/create_annotation.py Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_full_raw_annotation.txt JTT20001087-FL-M04613-200220_S40

---[ parse annotation ]---
Command:
python /varpipe_wgs/tools/parse_annotation.py Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_DR_loci_raw_annotation.txt /varpipe_wgs/tools/mutation_loci.txt JTT20001087-FL-M04613-200220_S40

---[ parse annotation ]---
Command:
python /varpipe_wgs/tools/parse_annotation.py Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_full_raw_annotation.txt /varpipe_wgs/tools/mutation_loci.txt JTT20001087-FL-M04613-200220_S40

---[ rm ]---
Command:
rm /varpipe_wgs/data/snpEff_genes.txt

Standard Error:
rm: cannot remove '/varpipe_wgs/data/snpEff_genes.txt': No such file or directory

---[ rm ]---
Command:
rm /varpipe_wgs/data/snpEff_summary.html

Standard Error:
rm: cannot remove '/varpipe_wgs/data/snpEff_summary.html': No such file or directory

---[ lineage parsing ]---
Command:
python /varpipe_wgs/tools/lineage_parser.py /varpipe_wgs/tools/lineage_markers.txt Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_full_Final_annotation.txt Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40.lineage_report.txt JTT20001087-FL-M04613-200220_S40

---[ create summary report ]---
Command:
python /varpipe_wgs/tools/create_report.py Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_stats.txt Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_target_region_coverage.txt Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_DR_loci_Final_annotation.txt

Standard Error:
Traceback (most recent call last):
File "/varpipe_wgs/tools/create_report.py", line 21, in
print matrix[0][0] + ":" + "\t" + matrix[1][0]
IndexError: list index out of range

---[ run interpretation report ]---
Command:
python /varpipe_wgs/tools/interprete.py /varpipe_wgs/tools/reportable.txt Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_summary.txt Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_structural_variants.txt Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_DR_loci_annotation.txt Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_target_region_coverage.txt JTT20001087-FL-M04613-200220_S40

---[ print pdf report ]---
Command:
python /varpipe_wgs/tools/pdf_print.py Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_summary.txt Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_report.pdf

---[ rm ]---
Command:
rm -r Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/tmp

---[ rm ]---
Command:
rm Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_sdrcsm.bai

---[ rm ]---
Command:
rm Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/JTT20001087-FL-M04613-200220_S40_sdrcsm.bam

---[ rm ]---
Command:
rm -r Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/trimmomatic

---[ rm ]---
Command:
rm -r Output_01_30_2023/JTT20001087-FL-M04613-200220_S40/clockwork

---[ rm ]---
Command:
rm /varpipe_wgs/data/config.yml

---[ rm ]---
Command:
rm -r /varpipe_wgs/data/work

Singularity Installation Issues

Hello,

I tried following along with the readme and ran into several issues when trying to install via singularity.

First trying: singularity pull library://reagank/varpipe_wgs/pipeline_with_refs
I get the following error message: Unable to get library client configuration: remote has no library client

Next I tried cloning the repository and running: ./build_singularity.sh with_references
And got this error message: Unable to create build: failed to create build parent dir: stat /Users/hottel/.tmp: no such file or directory

Looking at what the build_singularity.sh script was trying to do, I was able to run singularity build pipeline_with_references.sif docker://ghcr.io/cdcgov/varpipe_wgs_with_refs:latest, which installed the image to my working directory

Alternatively, I also ran singularity pull docker://ghcr.io/cdcgov/varpipe_wgs_with_refs:latest, and the image was downloaded to my working directory.

In either case I was unable to run ./runVarpipeline.sh, looking into this with singularity shell I noticed that /varpipe_wgs/data did not contain runVarpipeline.sh. However, /varpipe_wgs/tools did seem to contain all the expected files.

-Wes

Process `map_reads (1)` terminated with an error exit status (127) Command error: .command.sh: line 2: clockwork: command not found

../tools/Varpipeline -q JTT20001087-FL-M04613-200220_S40_R1_001.fastq.gz -r ../tools/ref2.fa -n JTT20001087-FL-M04613-200220_S40 -q2 JTT20001087-FL-M04613-200220_S40_R2_001.fastq.gz -a -v

---[ nextflow remove contamination ]---
Command:
/blue/bphl-florida/share/cdc_tb_pipeline/NCHHSTP-DTBE-Varpipe-WGS/tools/clockwork-0.11.3/results/nextflow run /blue/bphl-florida/share/cdc_tb_pipeline/NCHHSTP-DTBE-Varpipe-WGS/tools/clockwork-0.11.3/nextflow/remove_contam.nf --ref_fasta /blue/bphl-florida/share/cdc_tb_pipeline/NCHHSTP-DTBE-Varpipe-WGS/tools/clockwork-0.11.3/OUT/ref.fa --ref_metadata_tsv /blue/bphl-florida/share/cdc_tb_pipeline/NCHHSTP-DTBE-Varpipe-WGS/tools/clockwork-0.11.3/OUT/remove_contam_metadata.tsv --reads_in1 JTT20001087-FL-M04613-200220_S40_R1_001.fastq.gz --reads_in2 JTT20001087-FL-M04613-200220_S40_R2_001.fastq.gz --outprefix Output_01_31_2023/JTT20001087-FL-M04613-200220_S40/clockwork/JTT20001087-FL-M04613-200220_S40 --mapping_threads 12

Standard Output:
N E X T F L O W ~ version 20.07.1
Launching /blue/bphl-florida/share/cdc_tb_pipeline/NCHHSTP-DTBE-Varpipe-WGS/tools/clockwork-0.11.3/nextflow/remove_contam.nf [trusting_engelbart] - revision: b2f84f8ad4
[- ] process > make_jobs_tsv -
[- ] process > map_reads -
[- ] process > sam_to_fastq_files -

executor > local (1)
[8c/453495] process > make_jobs_tsv [ 0%] 0 of 1
[- ] process > map_reads -
[- ] process > sam_to_fastq_files -

executor > local (2)
[8c/453495] process > make_jobs_tsv [100%] 1 of 1 âœ”
[10/85594a] process > map_reads (1) [ 0%] 0 of 1
[- ] process > sam_to_fastq_files -
Error executing process > 'map_reads (1)'

Caused by:
Process map_reads (1) terminated with an error exit status (127)

Command executed:

clockwork map_reads --threads 12 --unsorted_sam sample_name /blue/bphl-florida/share/cdc_tb_pipeline/NCHHSTP-DTBE-Varpipe-WGS/tools/clockwork-0.11.3/OUT/ref.fa contam_sam /blue/bphl-florida/share/cdc_tb_pipeline/NCHHSTP-DTBE-Varpipe-WGS/data/JTT20001087-FL-M04613-200220_S40_R1_001.fastq.gz /blue/bphl-florida/share/cdc_tb_pipeline/NCHHSTP-DTBE-Varpipe-WGS/data/JTT20001087-FL-M04613-200220_S40_R2_001.fastq.gz

Command exit status:
127

Command output:
(empty)

Command error:
.command.sh: line 2: clockwork: command not found

Work dir:
/blue/bphl-florida/share/cdc_tb_pipeline/NCHHSTP-DTBE-Varpipe-WGS/data/work/10/85594a9b2d5231d548e3b1fbc61d51

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

executor > local (2)
[8c/453495] process > make_jobs_tsv [100%] 1 of 1 âœ”
[10/85594a] process > map_reads (1) [100%] 1 of 1, failed: 1 âœ˜
[- ] process > sam_to_fastq_files -
Error executing process > 'map_reads (1)'

Caused by:
Process map_reads (1) terminated with an error exit status (127)

Command executed:

clockwork map_reads --threads 12 --unsorted_sam sample_name /blue/bphl-florida/share/cdc_tb_pipeline/NCHHSTP-DTBE-Varpipe-WGS/tools/clockwork-0.11.3/OUT/ref.fa contam_sam /blue/bphl-florida/share/cdc_tb_pipeline/NCHHSTP-DTBE-Varpipe-WGS/data/JTT20001087-FL-M04613-200220_S40_R1_001.fastq.gz /blue/bphl-florida/share/cdc_tb_pipeline/NCHHSTP-DTBE-Varpipe-WGS/data/JTT20001087-FL-M04613-200220_S40_R2_001.fastq.gz

Command exit status:
127

Command output:
(empty)

Command error:
.command.sh: line 2: clockwork: command not found

Work dir:
/blue/bphl-florida/share/cdc_tb_pipeline/NCHHSTP-DTBE-Varpipe-WGS/data/work/10/85594a9b2d5231d548e3b1fbc61d51

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

Optional Debug the script "Varpipeline.py" in /tools/

The terminal sometimes shows " AttributeError: module 'yaml' has no attribute 'FullLoader' ".
In the above situation, PyYAML version is probably too old. This was added in 5.1. But you can use yaml.Loader instead, it is an alias for FullLoader.

bug in the script "get_gatk.sh"

In the script "get_gatk.sh", "DL_URL=..." should be changed to "URL...". Also, "mv gatk-package-4.2.4.0-local.jar ../tools/gatk-4.2.4.0/" should be changed to "mv ./gatk-4.2.4.0/gatk-package-4.2.4.0-local.jar ../tools/gatk-4.2.4.0/ "

The corrected script is shown below:

   #DL_URL='https://github.com/broadinstitute/gatk/releases/download/4.2.4.0/gatk-4.2.4.0.zip'
   URL='https://github.com/broadinstitute/gatk/releases/download/4.2.4.0/gatk-4.2.4.0.zip'    #debug
   mkdir tmp
   cd tmp
   wget -O- $URL >gatk-4.2.4.0.zip
   unzip gatk-4.2.4.0.zip

   #mv gatk-package-4.2.4.0-local.jar ../tools/gatk-4.2.4.0/
  mv ./gatk-4.2.4.0/gatk-package-4.2.4.0-local.jar ../tools/gatk-4.2.4.0/    #debug
  cd ../
  rm -rf tmp/

Parallel processing of samples in Varpipe_wgs

Is your feature request related to a problem? Please describe.
I am running Varpipe_wgs with Singularity on ~150 TB samples on an AWS server.
I have set threads to 40 on the command-line (./runVarpipeline.sh 40), and it is running ~3-4 samples/hr.

Is there a way to set it to run multiple samples in parallel to speed things up?

what is the QC standards in the pipeline?

In the pipeline, what QC standards can cause failed QC check?
Is it low "Percent Reads Mapped" number?

cdcgov / nchhstp-dtbe-varpipe-wgs Goto Github PK

nchhstp-dtbe-varpipe-wgs's Introduction

Installing and Using the varpipe_wgs pipeline

Overview

Prepare the Data

Use Docker

Start the container

Run the pipeline

Use Singularity

Obtain the Singularity image

Start the Singularity image

Run the pipeline

Use Local

Prerequisites

Install the Pipeline

Run the Pipeline

Troubleshooting

Public Domain Standard Notice

License Standard Notice

Privacy Standard Notice

Contributing Standard Notice

Records Management Standard Notice

Additional Standard Notices

Disclaimer

nchhstp-dtbe-varpipe-wgs's People

Contributors

Stargazers

Watchers

Forkers

nchhstp-dtbe-varpipe-wgs's Issues

Recommend Projects

Recommend Topics

Recommend Org