Code Monkey home page Code Monkey logo

purple-nf's Introduction

purple-nf

Nextflow pipeline for CNV calling with PURPLE

Description

Pipeline using PURPLE for copy number calling from tumor/normal or tumor-only sequencing data.

Usage

#using a tn_pairs file
nextflow run iarcbioinfo/purple-nf -r v1.0 \
-profile singularity  --tn_file tn_pairs..txt \
--cohort_dir $PWD/CRAM \
--ref hs38DH.fa --ref_dict hs38DH.dict \
--output_folder PURPLE


#activate BAM files mode
nextflow run iarcbioinfo/purple-nf -r v1.0 \
-profile singularity  --tn_file tn_pairs..txt \
--cohort_dir $PWD/BAM \
--bam \
--ref hs38DH.fa --ref_dict hs38DH.dict \
--output_folder PURPLE

Dependencies

  1. This pipeline is based on nextflow. As we have several nextflow pipelines, we have centralized the common information in the IARC-nf repository. Please read it carefully as it contains essential information for the installation, basic usage and configuration of nextflow and our pipelines.
  2. External software:

You can avoid installing all the external software by only installing Docker or singularity. See the IARC-nf repository for more information.

Input (mandatory)

Type Description
--cohort_dir Folder containing all BAM/CRAM files
--tn_file File containing the list of names of BAM files to be processed
--ref Fasta file of reference genome [hg38.fa], should be indexed [hg38.fa.fai]
--ref_dict dict file for the reference genome [hg38.dict]

Example of Tumor/Normal pairs file (--tn_file)

A text file tabular separated, with the following header:

tumor_id	sample	tumor	normal
sample1_T1	sample1	sample1_T.cram	sample1_N.cram
sample2_T1	sample2	sample2_T.cram	sample2_N.cram
sample3_T1	sample3	sample3_T.cram	sample3_N.cram

Optional parameters

Name type Description
--tumor_only [flag] active tumor_only mode
--bam [flag] active bam mode [def:cram]
--output_folder [string] name of output folder
--cpu [Integer] Number of CPUs[def:2]
--mem [Integer] Max memory [def:8Gb]

Output

results
├── AMBER                               # AMBER result directory
│   ├── S00016_T_AMBER
│   │   ├── amber.version
│   │   ├── S00016_T_N.amber.snp.vcf.gz
│   │   ├── S00016_T_N.amber.snp.vcf.gz.tbi
│   │   ├── S00016_T_T.amber.baf.pcf
│   │   ├── S00016_T_T.amber.baf.tsv
│   │   ├── S00016_T_T.amber.baf.vcf.gz
│   │   ├── S00016_T_T.amber.baf.vcf.gz.tbi
│   │   ├── S00016_T_T.amber.contamination.tsv
│   │   ├── S00016_T_T.amber.contamination.vcf.gz
│   │   ├── S00016_T_T.amber.contamination.vcf.gz.tbi
│   │   └── S00016_T_T.amber.qc
├── COBALT									# COBALT result directory	
│   ├── S00016_T_COBALT
│   │   ├── cobalt.version
│   │   ├── S00016_T_N.cobalt.gc.median.tsv
│   │   ├── S00016_T_N.cobalt.ratio.median.tsv
│   │   ├── S00016_T_N.cobalt.ratio.pcf
│   │   ├── S00016_T_T.chr.len
│   │   ├── S00016_T_T.cobalt.gc.median.tsv
│   │   ├── S00016_T_T.cobalt.ratio.pcf
│   │   └── S00016_T_T.cobalt.ratio.tsv
│   ├── .....
├── PURPLE									# PURPLE result directory	
│   ├── S00016_T_PURPLE
│   │   ├── circos							# CIRCOS directory with files for plotting
│   │   ├── purple.version
│   │   ├── S00016_T_T.purple.cnv.gene.tsv
│   │   ├── S00016_T_T.purple.cnv.germline.tsv
│   │   ├── S00016_T_T.purple.cnv.somatic.tsv.           # Somatic Copy Number Segments
│   │   ├── S00016_T_T.purple.purity.range.tsv
│   │   ├── S00016_T_T.purple.purity.tsv
│   │   ├── S00016_T_T.purple.qc
│   │   ├── S00016_T_T.purple.segment.tsv
│   │   └── S00016_T_T.purple.somatic.clonality.tsv
│   ├── .....    
└── purple_summary.txt     # Summary file for all tumors
├── nf-pipeline_info		# Nextflow information directory
│   ├── purple_dag.html
│   ├── purple_report.html
│   ├── purple_timeline.html
│   ├── purple_trace.txt
│   └── run_parameters_report.txt # Custom file providing info for software versions and calling parameters

Common errors

Singularity

The first time that the container is built from the docker image, the TMPDIR should be defined in a non parallel file-system, you can set this like:

export TMPDIR=/tmp

multiple input files for each of the following file names: null, null.bai (or null, null.crai if using cram mode)

If the input file is not tabulation-separated, nextflow will return an input file name collision because it will not find the proper tumor and normal columns and thus will try to create dead symlinks to two sets of files named "null" and "null.bai" (see issue #2 ):

Error executing process > 'AMBER'

Caused by:
  Process `AMBER` input file name collision -- There are multiple input files for each of the following file names: null, null.bai

The solution is to format the input file as a tsv, replacing space separations by tabulations.

Contributions

Name Email Description
Matthieu Foll* [email protected] Developer to contact for support (link to specific gitter chatroom)
Alex Di Genova [email protected] Developer

purple-nf's People

Contributors

adigenova avatar nalcala avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

digenoma-lab

purple-nf's Issues

There are multiple input files for each of the following file names: null, null.bai.

Hi there,

I tried to apply purple-nf to my WGS bam files with tumor-only mode.
But when I ran the command below, an error occurred.

#!/bin/bash

BAMDIR=/data/komura/project/NGS/MENI/WGS/results/fq2bam
ref=/data/share/resources/ref/Homo_sapiens_assembly38

export TMPDIR=/tmp
export NXF_DEFAULT_DSL=1

nextflow run iarcbioinfo/purple-nf -r v1.1 \
-profile docker --tn_file t.txt \
--cohort_dir $BAMDIR \
--ref ${ref}.fasta --ref_dict ${ref}.dict \
--tumor_only \
--bam \
--cpu 6 \
--mem 64 \
--output_folder PURPLE_out

The error message was as follows.

-------------------Calling PARAMETERS---------------------
output_folder     : PURPLE_out
ref               : /data/share/resources/ref/Homo_sapiens_assembly38.fasta
ref_dict          : /data/share/resources/ref/Homo_sapiens_assembly38.dict
tn_file           : t.txt
help              : false
debug             : false
cohort_dir        : /data/komura/project/NGS/MENI/WGS/results/fq2bam
tumor_only        : true
bam               : true
somatic_vcfs      : null
max_memory        : 128 GB
max_cpus          : 8
max_time          : 10d
cpu               : 6
mem               : 64
----------------------------------------------------------


-------------------Software versions---------------------
hmftools-cobalt   : 1.11
hmftools-amber    : 2.52
hmftools-purple   : 3.5
----------------------------------------------------------


[-        ] process > HQ_VCF -
[-        ] process > COBALT -
[-        ] process > AMBER  -
[-        ] process > PURPLE -
WARN: Operator `spread` is deprecated -- it will be removed in a future release
[-        ] process > HQ_VCF -
[-        ] process > COBALT -
[-        ] process > AMBER  -
[-        ] process > PURPLE -
WARN: Operator `spread` is deprecated -- it will be removed in a future release
Error executing process > 'AMBER (7)'

Caused by:
  Process `AMBER` input file name collision -- There are multiple input files for each of the following file names: null, null.bai

The input file list was like this.

tumor_id    sample  tumor
W115868-T1_QT_1.fq.gz  W115868-T1_QT_1.fq.gz  W115868-T1_QT.bam
W185949-T3_QT_1.fq.gz  W185949-T3_QT_1.fq.gz  W185949-T3_QT.bam
W124995-T4_QT_1.fq.gz  W124995-T4_QT_1.fq.gz  W124995-T4_QT.bam
W180651-T4_QT_1.fq.gz  W180651-T4_QT_1.fq.gz  W180651-T4_QT.bam
W185949-T1_QT_1.fq.gz  W185949-T1_QT_1.fq.gz  W185949-T1_QT.bam
W115868-T4_QT_1.fq.gz  W115868-T4_QT_1.fq.gz  W115868-T4_QT.bam
W180651-T1_QT_1.fq.gz  W180651-T1_QT_1.fq.gz  W180651-T1_QT.bam
W124995-T1_QT_1.fq.gz  W124995-T1_QT_1.fq.gz  W124995-T1_QT.bam
W153058-T1_QT_1.fq.gz  W153058-T1_QT_1.fq.gz  W153058-T1_QT.bam
W153058-T3_QT_1.fq.gz  W153058-T3_QT_1.fq.gz  W153058-T3_QT.bam

It seems the names of the bam file were not properly processed.
How should I fix the problem?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.