The pipeline is for the analysis of multiplexed bulk RNA-seq data in C. elegans; the barcode/UMI structure follows celSeq method. The input are fastq files generated by bcl2fastq.
Steps:
- extract valid reads by extractValidReads.py. this scripts extract R1 with valid cell barcode (allow 1 mismatch) and attach the cell barcode&UMI to the header of corresponding R2;
- For Quality Control, we use FastQC, Picard, RSEQC to create qc outputs.
- A STAR pipeline to align valid reads to the genome. This module includes Quality Control and Genome Alignment using STAR
- A ESAT module to count reads for each feature
- a) Paired Reads: output of Bcl2fastq software which has a structure as shown at below.
S1_L001_R1_001.fastq.gz
S1_L001_R2_001.fastq.gz
- b) cellBarcodeFile: Valid Barcode File for valid cell barcode extraction.
s3://viafoundry/run_data/genome_data_other/CelSeq/bcSet_full.txt
_UMI table: The output file (${name}_umi_count_.txt
) is tab separated gene/transcript vs cell_Barcode matrix filled with count data as shown at the example below.
gene | ATCAATCGCGAACCGA | ACCCTCAACTCAAACA | ACTCATACCCGGAAAT |
---|---|---|---|
RNF14 | 0 | 0 | 0 |
MZT2B | 0 | 12 | 0 |
SPN | 0 | 2 | 8 |
- Docker: quay.io/viascientific/singlecell_esat:2.0
To start using the CelSeq Pipeline please go to Foundry Web page and click run button.
To install and start using the CelSeq pipeline by using command line, please follow these steps: Installation.
If you use Foundry in your research, please cite: Yukselen, O., Turkyilmaz, O., Ozturk, A.R. et al. DolphinNext: a distributed data processing platform for high throughput genomics. BMC Genomics 21, 310 (2020). https://doi.org/10.1186/s12864-020-6714-x