RaIDeN is a pipeline for Rapid IDentification of causal gene with target motif using NGS technology. RaIDeN automatically filters the candidate genes based on the patterns of structual variations and mutations.
Supplementary figure 1 in Shimizu et al. (2022)
Shimizu M, Hirabuchi A, Sugihara Y, Abe A, Takeda T, Kobayashi M, Hiraka Y, Kanzaki E, Oikawa K, Saitoh H, Langner T, Banfield MJ, Kamoun S, Terauchi R (2021) A genetically linked pair of NLR immune receptors show contrasting patterns of evolution. Proceedings of the National Academy of Sciences, 119(27): e2116896119
RaIDeN is written in Python3.
- Python >= 3.5
- samtools >= 1.7
- bcftools >= 1.7
- hisat2
- bedtools
- gffread
- bamtools
- pigz
- stringtie
- faqcs
- prinseq-lite
- seqkit
Please run the following command lines to install RaIDeN.
git clone https://github.com/YuSugihara/RaIDeN.git
cd RaIDeN
pip install .
Currently, RaIDeN doesn't support the installation via bioconda. However, you can easily install its dependencies because they are distributed via bioconda. You can try the command below:
$ conda install -c bioconda samtools bcftools hisat2 bedtools gffread bamtools pigz stringtie faqcs prinseq seqkit
$ raiden -r reference.fasta \
-a rnaseq.1.fastq,rnaseq.2.fastq \
-w wgs.1.fastq,wgs.2.fastq \
-o test \
-t 2
-
RaIDeN requires an assembled reference genome. This reference genome must contain a causal gene. If it is sure that the assembled reference genome contains the causal gene, RaIDeN allows contiguous (not chromosome-scale) referece genome.
-
RaIDeN requires RNA-seq for gene annotation. This RNA-seq must contain the sequences of causal gene because RaIDen only analyzes the annotated genomic region. Paired FASTQ files have to be separated by commna (eg. fastq1,fastq2). RaIDeN allows multiple RNA-seq samples. FASTQ files can be zipped.
-
RaIDen requires whole-genome sequence (WGS). The sample of this WGS should have an opposite trait to that of the reference genome. Since RaIDeN expects that the different traits come from structual variation or mutation on the annotated genomic region, this WGS must have a structual variation or mutation on the causal gene. Paired FASTQ files have to be separated by commna (eg. fastq1,fastq2). RaIDeN allows multiple WGSs. FASTQ files can be zipped.
-
Specified name cannot exist.
Selection by target motif has to be run separately by yourself.
RaIDeN generates 8 directories.
Output directory
├── 10_ref
├── 20_fastq
├── 30_bam
├── 40_bed
├── 50_annotation
├── 60_vcf
├── 70_result
└── log
The directory 70_result
includes final results.
70_result
├── all_candidate_genes.gff
├── candidate_genes_from_mutations.gff
├── candidate_genes_from_PA.gff
└── filtered_markers.bed
all_candidate_genes.gff
: GFF file including all candidates genescandidate_genes_from_mutations.gff
: GFF file including candidates genes only from SNPs/INDELscandidate_genes_from_PA.gff
: GFF file including candidates genes only from presence/absence markersfiltered_markers.bed
: BED file for the summary of mutations after filtering VCF. Columns are in this order.- contig name
- position -1
- position
- reference base
- mutation base
- number of missings in WGSs
- number of inconsistent markers in WGSs
The directory 50_annotation
includes the result of gene annotation.
50_annotation
├── annotation.fasta
├── annotation.gff
├── annotation.gtf
└── RNA-seq.bam
You can check the nucleotide sequences of the genes predicted by RNA-seq in annotation.fasta
.