A simple snakemake pipeline to call variant from NGS data of Sars-CoV-2 genome. It depends on bwa, freebayes, SnpEff/SnpSift and samtools
You can use conda to install dependencies.
git clone https://github.com/dridk/Sars-CoV-2-NGS-pipeline.git
conda env create -f environment.yml
conda activate covid
You can test the pipeline with our toys dataset :
snakemake -p A.results.csv B.results.csv -j4
From config.yml
set FASTQ_DIR variable with the folder containing your fastq files.
These files must follow the following pattern :
- SAMPLENAME_1.fastq.gz
- SAMPLENAME_2.fastq.gz
To get result of a specific SAMPLENAME:
snakemake -p SAMPLENAME.results.csv
To get fasta genom of a specific SAMPLENAME:
snakemake -p SAMPLENAME.fa
You can pass this consensus sequence to Pangolin to get the lineage.
Each sample comes with a csv file with the following columns :
- Gene Name
- Feature ID
- Variant position
- Reference bases
- Alternative bases
- HGVS coding name
- HGVS protein name
- Impact
- effect
ANN[*].GENE ANN[*].FEATUREID POS REF ALT ANN[*].HGVS_C ANN[*].HGVS_P ANN[*].IMPACT ANN[*].EFFECT
ORF1ab GU280_gp01 490 T A c.225T>A p.Asp75Glu MODERATE missense_variant
ORF1ab YP_009725297.1 490 T A c.225T>A p.Asp75Glu MODERATE missense_variant
ORF1ab YP_009742608.1 490 T A c.225T>A p.Asp75Glu MODERATE missense_variant
ORF1ab GU280_gp01.2 490 T A c.225T>A p.Asp75Glu MODERATE missense_variant
ORF1ab YP_009725298.1 490 T A c.-316T>A MODIFIER upstream_gene_variant
ORF1ab YP_009742609.1 490 T A c.-316T>A MODIFIER upstream_gene_variant
ORF1ab YP_009725299.1 490 T A c.-2230T>A MODIFIER upstream_gene_variant
ORF1ab YP_009742610.1 490 T A c.-2230T>A MODIFIER upstream_gene_variant