biocorecrg / allele_specific_rnaseq Goto Github PK

Allele Specific RNAseq

Dockerfile 6.63% Nextflow 67.63% Python 21.06% Shell 4.69%

allele_specific_rnaseq's Introduction

Allele-specific RNA-seq pipeline

This Nextflow[1] based workflow allows alignment of either single or paired ends reads to a reference transcriptome described as a genome in fasta file plus an annotation in GTF format using STAR[2] aligner considering the variant information provided as a VCF file.

STAR implements the WASP[3] method for filtering of allele specific alignments and reports within the resulting alignments which variant is found and from which allele. A python script based on pysam [4] efficiently separates the alignment generated by STAR into four different files:

Allele A
Allele B
Neither A nor B
Ambiguous (found both variants on one or different pairs)

HTseq-count[5] tool is then used for counting tags separated in those three categories mapping to the genes. Finally a report is generated using multiQC[6] that summarize the results of each step together with an initial QC evaluation fo raw reads done using FastQC[7]

Install

You need to install either Docker or Singularity and Nextflow.

Then you can clone the repository:

git clone --depth 1 [email protected]:biocorecrg/allele_specific_RNAseq.git

git clone --depth 1 https://github.com/biocorecrg/allele_specific_RNAseq.git

depending on your GitHub configuration.

Prepare the input data

You need to extract the SNP information from the global VCF file, so first of all you need to download the VCF file with the index:

wget ftp://ftp-mouse.sanger.ac.uk/REL-1505-SNPs_Indels/mgp.v5.merged.snps_all.dbSNP142.vcf.gz
wget ftp://ftp-mouse.sanger.ac.uk/REL-1505-SNPs_Indels/mgp.v5.merged.snps_all.dbSNP142.vcf.gz.tbi

then the reference genome:

wget ftp://ftp-mouse.sanger.ac.uk/ref/GRCm38_68.fa

and the annotation from Ensembl. We used the version Mus_musculus.GRCm38.68 not available in Ensembl archive.

You can use the pipelines specifying between Docker and Singularity containers by using the options

-with-singularity

-with-docker

The module makeAnno can be used for generating a VCF file with SNP for the interesting species and a genome with SNP position masked with Ns.

For using the module:

cd makeAnno
NXF_VER=20.01.0 nextflow run make_anno.nf -with-singularity -bg --vcffile mgp.v5.merged.snps_all.dbSNP142.vcf.gz --speciesA CAST_EiJ --speciesB 129S1_SvImJ --genome GRCm38_68.fa --outvcf CAST_EiJ-129S1_SvImJ.vcf > log

Or in case you want to compare against the reference genome:

cd makeAnno
NXF_VER=20.01.0 nextflow run make_anno.nf -with-singularity -bg --vcffile mgp.v5.merged.snps_all.dbSNP142.vcf.gz --speciesA CAST_EiJ --speciesB reference --genome GRCm38_68.fa --outvcf CAST_EiJ-C57BL_6J.vcf > log

This can take some memory for masking the genome. In the output folder named filteredVCF you will find both gzipped masked genome and gzipped vcf annotations.

Run the pipeline

cd allele_specific_RNAseq; 
NXF_VER=20.01.0 nextflow run as_rnaseq.nf -with-singularity -bg > log

optionally you might want to check your pipeline on Nextflow's Tower website. You need to register using an istitutional mail and set the token provided in a variable as described:

export TOWER_ACCESS_TOKEN=<<<<<TOKEN NUMBER>>>>>>
NXF_VER=20.01.0 nextflow run as_rnaseq.nf -with-singularity -bg -with-tower > log

you can check the status of your pipeline live on the tower website.

The parameters for running the pipeline are defined in the file params.config that can be changed accordingly.

parameter	value
reads	$baseDir/test/*_{1,2}.fastq.gz
genome	$baseDir/test/GRCm38_68_19.masked.fa.gz
annotation	$baseDir/test/Mus_musculus.GRCm38.68_19.gtf
strandness	reverse
variants	$baseDir/test/19_filt.vcf.gz
output	$baseDir/output_test
single	NO
varcut	1
title	Allele specific RNAseq project
subtitle	This is my wonderful RNA experiment
PI	Luca Cozzuto
User	Luca Cozzuto
UCSCgenomeID	mm10
email	[email protected]

providing a real email address will deliver a mail with the multiqc report when the analysis is finished.

Fastq reads

Fastq paired ends reads can be either plain or gzipped. You need to specify this syntax for capturing the identifier from single end samples:

PATH/*.fastq.gz

and this for paired end ones:

PATH/*_{1,2}.fastq.gz

Genome

Gzipped masked fasta file of the genome obtained running makeAnno

Strandness

It can be either "unstranded" or "forward" or "reverse". It depends on the kind of sequencing.

annotation

GTF file

variants

Gzipped VCF file obtained running makeAnno

single

YES: single end reads. NO: paired ends

varcut

Number of SNP needed for assigning a read to a variant.

Results

The following folder will contain the final outputs:

Index: the indexed genome
QC: containing fastQC results
Alignments: containing sorted BAM files as they were from normal bulk RNAseq
Report: A detailed report for the pipeline run, that will be sent via email.
Allele_alignments: containing BAM files containing reads with variants marked as: alleleA, alleleB, ref and ambiguous. The count is done considering the strand protocol used and specified as a parameter.
cut_N * A folder that is generated for each SNP cut off chosen containing the following sub-folders:
Allele_single_counts: containing the count per gene per sample and single alleles. The count is done considering the strand protocol used and specified as a parameter.
Allele_merged_Counts: containing the count per gene per allele in a single file for each sample. The count is done considering the strand protocol used and specified as a parameter.
Counts: composite counts per gene per sample. No distincion between alleles. Four fields: gene id, tot counts considering the sequencing unstranded, tot counts considering the sequencind done on the forward strand, tot counts considering reverse strand. This information is useful for validating the strand specific protocol used in the sequencing step.
Proportions: for each sample the read count per allele are divided for the sum of counts of the variants and multiplied for the composite count. The count is done considering the strand protocol used and specified as a parameter.

propA = alleleA/(alleleA+alleleB) * composite 
propB = alleleB/(alleleA+alleleB) * composite

allele_specific_rnaseq's People

Contributors

Stargazers

Watchers

Forkers

nephantes praveen-chitneedi

allele_specific_rnaseq's Issues

Dependecies Installation Path

Again, to make things clear for people with less knowledge. Where should the dependencies that need to be installed saved? in the same folder of the pipeline, in a subfolder, somewhere else? It probably doesn't matter, but clarifying these things helps alleviating possible issues.

Nextflow install

So should I either

Log in via ssh -Y [email protected] to then install nextflow there via curl -s https://get.nextflow.io | bash
OR
Get access to ssh -Y nextflow.linux.crg.es

SNP Cut-off output

Ciao Luca,
Volevo caricare su IGV i .bam generati con la nuova pipeline.
Ho visto che il file con il cutoff è solo uno.
Non dovrebbero essere due files? Uno per ogni allele?

Hybridness of Chromosomes

The following graphs show the hybridness of each chromosome

Docker/Singularity install

Ciao Luca,

Is there a preference for the use of either Docker or Singularity?
Also, does that mean one has to install their respective Linux versions on your cluster folder?

Thanks!
(Again, trying to make things stupid people proofed)

Singularity default in make_anno.nf config

Thank for the newflow pipeline

I am running

docker build .

getting bellow error

Step 1/52 : FROM biocorecrg/centos-perlbrew-pyenv3-java:centos7
 ---> 669f4335d0ab
Step 2/52 : MAINTAINER Toni Hermoso Pulido <[email protected]>
 ---> Using cache
 ---> a614883ded8f
Step 3/52 : MAINTAINER Luca Cozzuto <[email protected]>
 ---> Using cache
 ---> f5ec43923789
Step 4/52 : ARG FASTQC_VERSION=0.11.9
 ---> Using cache
 ---> 890ae896d2f5
Step 5/52 : ARG STAR_VERSION=2.7.3a
 ---> Using cache
 ---> b6fa72b82912
Step 6/52 : ARG QUALIMAP_VERSION=2.2.1
 ---> Using cache
 ---> ef797180b8fb
Step 7/52 : ARG MULTIQC_VERSION=1.8
 ---> Using cache
 ---> b9b3b0ad0809
Step 8/52 : ARG SAMTOOLS_VERSION=1.10
 ---> Using cache
 ---> 086d4fdaf996
Step 9/52 : ARG TOOL_MULTIQC_VERSION=1.1
 ---> Using cache
 ---> 15e96d4b9131
Step 10/52 : ARG HTSEQ_VERSION=0.11.1
 ---> Using cache
 ---> ffb60001880e
Step 11/52 : ARG BEDTOOLS_VERSION=2.29.2
 ---> Using cache
 ---> 14c1d8aa72b3
Step 12/52 : ARG HTSLIB_VERSION=1.10.2
 ---> Using cache
 ---> 26740cc60139
Step 13/52 : RUN bash -c 'curl -k -L https://github.com/arq5x/bedtools2/releases/download/v${BEDTOOLS_VERSION}/bedtools.static.binary > /usr/local/bin/bedtools'
 ---> Using cache
 ---> fea7465f39d3
Step 14/52 : RUN chmod +x /usr/local/bin/bedtools
 ---> Using cache
 ---> 635a3d951d7c
Step 15/52 : RUN bash -c 'curl -k -L https://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v${FASTQC_VERSION}.zip > fastqc.zip'
 ---> Using cache
 ---> 3da1252b8106
Step 16/52 : RUN unzip fastqc.zip; chmod 775 FastQC/fastqc; ln -s $PWD/FastQC/fastqc /usr/local/bin/fastqc
 ---> Using cache
 ---> 15d1984e09cb
Step 17/52 : RUN bash -c 'curl -k -L https://github.com/alexdobin/STAR/archive/${STAR_VERSION}.tar.gz > STAR.tar.gz'
 ---> Using cache
 ---> d1a2c53a1d9d
Step 18/52 : RUN tar -zvxf STAR.tar.gz
 ---> Using cache
 ---> 92582278affa
Step 19/52 : RUN cp STAR-${STAR_VERSION}/bin/Linux_x86_64/* /usr/local/bin/
 ---> Using cache
 ---> d6092c409df2
Step 20/52 : RUN rm STAR.tar.gz
 ---> Using cache
 ---> 45b35689a2f0
Step 21/52 : RUN bash -c 'curl -k -L https://bitbucket.org/kokonech/qualimap/downloads/qualimap_v${QUALIMAP_VERSION}.zip > qualimap.zip'
 ---> Using cache
 ---> a8ae3b0cb5f1
Step 22/52 : RUN unzip qualimap.zip
 ---> Using cache
 ---> e7129e401074
Step 23/52 : RUN rm qualimap.zip
 ---> Using cache
 ---> aff9a7243b63
Step 24/52 : RUN ln -s $PWD/qualimap_v${QUALIMAP_VERSION}/qualimap /usr/local/bin/
 ---> Using cache
 ---> 057398046625
Step 25/52 : RUN python3 -m pip install --upgrade --force-reinstall git+https://github.com/ewels/MultiQC.git
 ---> Running in d5a1b04d39eb
Collecting git+https://github.com/ewels/MultiQC.git
  Cloning https://github.com/ewels/MultiQC.git to /tmp/pip-apb_2fnm-build
Collecting matplotlib>=2.1.1 (from multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/09/03/b7b30fa81cb687d1178e085d0f01111ceaea3bf81f9330c937fb6f6c8ca0/matplotlib-3.3.4-cp36-cp36m-manylinux1_x86_64.whl (11.5MB)
Collecting networkx>=2.5.1 (from multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/f3/b7/c7f488101c0bb5e4178f3cde416004280fd40262433496830de8a8c21613/networkx-2.5.1-py3-none-any.whl (1.6MB)
Collecting numpy (from multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/45/b2/6c7545bb7a38754d63048c7696804a0d947328125d81bf12beaa692c3ae3/numpy-1.19.5-cp36-cp36m-manylinux1_x86_64.whl (13.4MB)
Collecting click (from multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/4a/a8/0b2ced25639fb20cc1c9784de90a8c25f9504a7f18cd8b5397bd61696d7d/click-8.0.4-py3-none-any.whl (97kB)
Collecting coloredlogs (from multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/a7/06/3d6badcf13db419e25b07041d9c7b4a2c331d3f4e7134445ec5df57714cd/coloredlogs-15.0.1-py2.py3-none-any.whl (46kB)
Collecting future>0.14.0 (from multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/45/0b/38b06fd9b92dc2b68d58b75f900e97884c45bedd2ff83203d933cf5851c9/future-0.18.2.tar.gz (829kB)
Collecting jinja2>=3.0.0 (from multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/20/9a/e5d9ec41927401e41aea8af6d16e78b5e612bca4699d417f646a9610a076/Jinja2-3.0.3-py3-none-any.whl (133kB)
Collecting lzstring (from multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/ab/0c/28347673b45e5f0975cdf1f6d69ede6ad049be873194c4e164d79aecd34c/lzstring-1.0.4.tar.gz
Collecting markdown (from multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/f3/df/ca72f352e15b6f8ce32b74af029f1189abffb906f7c137501ffe69c98a65/Markdown-3.3.7-py3-none-any.whl (97kB)
Collecting pyyaml>=4 (from multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/b3/85/79b9e5b4e8d3c0ac657f4e8617713cca8408f6cdc65d2ee6554217cedff1/PyYAML-6.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (603kB)
Collecting requests (from multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/2d/61/08076519c80041bc0ffa1a8af0cbd3bf3e2b62af10435d269a9d0f40564d/requests-2.27.1-py2.py3-none-any.whl (63kB)
Collecting rich>=10 (from multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/32/60/81ac2e7d1e3b861ab478a72e3b20fc91c4302acd2274822e493758941829/rich-12.6.0-py3-none-any.whl (237kB)
Collecting rich-click (from multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/2e/29/900393c3f7cc82104c6e93b3c57acbbd49d1a5da50a1c8fc52a3cc28543a/rich_click-1.2.1-py3-none-any.whl
Collecting simplejson (from multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/ec/e5/a4d5b615e4d3bf6ddde77b5f1c801adb869cb5ad565deda2402edc631388/simplejson-3.17.6-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (129kB)
Collecting spectra>=0.0.10 (from multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/50/50/ae8d87ab681a58e246215e51715566df8074a86d244fee31fa8585fe5f9f/spectra-0.0.11.tar.gz
Collecting python-dateutil>=2.1 (from matplotlib>=2.1.1->multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/36/7a/87837f39d0296e723bb9b62bbb257d0355c7f6128853c78955f57342a56d/python_dateutil-2.8.2-py2.py3-none-any.whl (247kB)
Collecting pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 (from matplotlib>=2.1.1->multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/80/c1/23fd82ad3121656b585351aba6c19761926bb0db2ebed9e4ff09a43a3fcc/pyparsing-3.0.7-py3-none-any.whl (98kB)
Collecting cycler>=0.10 (from matplotlib>=2.1.1->multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/5c/f9/695d6bedebd747e5eb0fe8fad57b72fdf25411273a39791cde838d5a8f51/cycler-0.11.0-py3-none-any.whl
Collecting pillow>=6.2.0 (from matplotlib>=2.1.1->multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/7d/2a/2fc11b54e2742db06297f7fa7f420a0e3069fdcf0e4b57dfec33f0b08622/Pillow-8.4.0.tar.gz (49.4MB)
Collecting kiwisolver>=1.0.1 (from matplotlib>=2.1.1->multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/a7/1b/cbd8ae738719b5f41592a12057ef5442e2ed5f5cb5451f8fc7e9f8875a1a/kiwisolver-1.3.1-cp36-cp36m-manylinux1_x86_64.whl (1.1MB)
Collecting decorator<5,>=4.3 (from networkx>=2.5.1->multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/ed/1b/72a1821152d07cf1d8b6fce298aeb06a7eb90f4d6d41acec9861e7cc6df0/decorator-4.4.2-py2.py3-none-any.whl
Collecting importlib-metadata; python_version < "3.8" (from click->multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/a0/a1/b153a0a4caf7a7e3f15c2cd56c7702e2cf3d89b1b359d1f1c5e59d68f4ce/importlib_metadata-4.8.3-py3-none-any.whl
Collecting humanfriendly>=9.1 (from coloredlogs->multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/f0/0f/310fb31e39e2d734ccaa2c0fb981ee41f7bd5056ce9bc29b2248bd569169/humanfriendly-10.0-py2.py3-none-any.whl (86kB)
Collecting MarkupSafe>=2.0 (from jinja2>=3.0.0->multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/fc/d6/57f9a97e56447a1e340f8574836d3b636e2c14de304943836bd645fa9c7e/MarkupSafe-2.0.1-cp36-cp36m-manylinux1_x86_64.whl
Collecting idna<4,>=2.5; python_version >= "3" (from requests->multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/fc/34/3030de6f1370931b9dbb4dad48f6ab1015ab1d32447850b9fc94e60097be/idna-3.4-py3-none-any.whl (61kB)
Collecting urllib3<1.27,>=1.21.1 (from requests->multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/6f/de/5be2e3eed8426f871b170663333a0f627fc2924cc386cd41be065e7ea870/urllib3-1.26.12-py2.py3-none-any.whl (140kB)
Collecting certifi>=2017.4.17 (from requests->multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/1d/38/fa96a426e0c0e68aabc68e896584b83ad1eec779265a028e156ce509630e/certifi-2022.9.24-py3-none-any.whl (161kB)
Collecting charset-normalizer~=2.0.0; python_version >= "3" (from requests->multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/06/b3/24afc8868eba069a7f03650ac750a778862dc34941a4bebeb58706715726/charset_normalizer-2.0.12-py3-none-any.whl
Collecting commonmark<0.10.0,>=0.9.0 (from rich>=10->multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/b1/92/dfd892312d822f36c55366118b95d914e5f16de11044a27cf10a7d71bbbf/commonmark-0.9.1-py2.py3-none-any.whl (51kB)
Collecting pygments<3.0.0,>=2.6.0 (from rich>=10->multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/4f/82/672cd382e5b39ab1cd422a672382f08a1fb3d08d9e0c0f3707f33a52063b/Pygments-2.13.0-py3-none-any.whl (1.1MB)
Collecting typing-extensions<5.0,>=4.0.0; python_version < "3.9" (from rich>=10->multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/45/6b/44f7f8f1e110027cf88956b59f2fad776cca7e1704396d043f89effd3a0e/typing_extensions-4.1.1-py3-none-any.whl
Collecting dataclasses<0.9,>=0.7; python_version < "3.7" (from rich>=10->multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/fe/ca/75fac5856ab5cfa51bbbcefa250182e50441074fdc3f803f6e76451fab43/dataclasses-0.8-py3-none-any.whl
Collecting colormath>=3.0.0 (from spectra>=0.0.10->multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/ce/cf/70ea34103a76cc6fb1892289bda321cd0cc73b1a5500ee7fe9ef9f64acef/colormath-3.0.0.tar.gz
Collecting six>=1.5 (from python-dateutil>=2.1->matplotlib>=2.1.1->multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/d9/5a/e7c31adbe875f2abbb91bd84cf2dc52d792b5a01506781dbcf25c91daf11/six-1.16.0-py2.py3-none-any.whl
Collecting zipp>=0.5 (from importlib-metadata; python_version < "3.8"->click->multiqc==1.14.dev0)
  Downloading https://files.pythonhosted.org/packages/bd/df/d4a4974a3e3957fd1c1fa3082366d7fff6e428ddb55f074bf64876f8e8ad/zipp-3.6.0-py3-none-any.whl
Installing collected packages: six, python-dateutil, pyparsing, numpy, cycler, pillow, kiwisolver, matplotlib, decorator, networkx, typing-extensions, zipp, importlib-metadata, click, humanfriendly, coloredlogs, future, MarkupSafe, jinja2, lzstring, markdown, pyyaml, idna, urllib3, certifi, charset-normalizer, requests, commonmark, pygments, dataclasses, rich, rich-click, simplejson, colormath, spectra, multiqc
  Running setup.py install for pillow: started
    Running setup.py install for pillow: finished with status 'error'
    Complete output from command /usr/local/python/versions/3.6.3/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-dir23ma_/pillow/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-v24jnr9g-record/install-record.txt --single-version-externally-managed --compile:
    /usr/local/python/versions/3.6.3/lib/python3.6/distutils/dist.py:261: UserWarning: Unknown distribution option: 'long_description_content_type'
      warnings.warn(msg)
    /usr/local/python/versions/3.6.3/lib/python3.6/distutils/dist.py:261: UserWarning: Unknown distribution option: 'project_urls'
      warnings.warn(msg)
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.6
    creating build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/TiffImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/SunImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/PpmImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/DcxImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/ImageGrab.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/ImageFile.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/ContainerIO.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/ImageTransform.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/WmfImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/ImageMath.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/ImageMorph.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/IcoImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/ImageFilter.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/ImageFont.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/TiffTags.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/IcnsImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/XVThumbImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/BlpImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/ImageQt.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/TarIO.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/ImageMode.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/GifImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/_binary.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/EpsImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/ImageShow.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/BmpImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/GimpGradientFile.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/ImageChops.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/GribStubImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/_util.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/GdImageFile.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/FtexImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/MpegImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/PcfFontFile.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/_version.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/WebPImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/ImageOps.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/MpoImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/JpegPresets.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/FontFile.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/ImageTk.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/WalImageFile.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/PaletteFile.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/PdfParser.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/FliImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/SgiImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/ImageSequence.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/FpxImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/Hdf5StubImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/SpiderImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/GimpPaletteFile.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/IptcImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/XbmImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/ImageCms.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/PngImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/PsdImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/ImageWin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/CurImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/ImageStat.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/features.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/TgaImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/FitsStubImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/ImagePalette.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/_tkinter_finder.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/PalmImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/PdfImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/ImageDraw2.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/Image.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/MspImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/ImtImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/PcdImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/BdfFontFile.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/JpegImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/PixarImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/ImageDraw.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/ExifTags.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/__main__.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/DdsImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/PSDraw.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/ImageEnhance.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/PcxImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/XpmImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/ImImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/Jpeg2KImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/ImageColor.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/GbrImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/McIdasImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/__init__.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/MicImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/BufrStubImagePlugin.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/PyAccess.py -> build/lib.linux-x86_64-3.6/PIL
    copying src/PIL/ImagePath.py -> build/lib.linux-x86_64-3.6/PIL
    running egg_info
    writing src/Pillow.egg-info/PKG-INFO
    writing dependency_links to src/Pillow.egg-info/dependency_links.txt
    writing top-level names to src/Pillow.egg-info/top_level.txt
    warning: manifest_maker: standard file '-c' not found
    
    reading manifest file 'src/Pillow.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    warning: no files found matching '*.c'
    warning: no files found matching '*.h'
    warning: no files found matching '*.sh'
    warning: no previously-included files found matching '.appveyor.yml'
    warning: no previously-included files found matching '.clang-format'
    warning: no previously-included files found matching '.coveragerc'
    warning: no previously-included files found matching '.editorconfig'
    warning: no previously-included files found matching '.readthedocs.yml'
    warning: no previously-included files found matching 'codecov.yml'
    warning: no previously-included files matching '.git*' found anywhere in distribution
    warning: no previously-included files matching '*.pyc' found anywhere in distribution
    warning: no previously-included files matching '*.so' found anywhere in distribution
    no previously-included directories found matching '.ci'
    writing manifest file 'src/Pillow.egg-info/SOURCES.txt'
    running build_ext
    
    
    The headers or library files could not be found for jpeg,
    a required dependency when compiling Pillow from source.
    
    Please see the install instructions at:
       https://pillow.readthedocs.io/en/latest/installation.html
    
    Traceback (most recent call last):
      File "/tmp/pip-build-dir23ma_/pillow/setup.py", line 1024, in <module>
        zip_safe=not (debug_build() or PLATFORM_MINGW),
      File "/usr/local/python/versions/3.6.3/lib/python3.6/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/usr/local/python/versions/3.6.3/lib/python3.6/distutils/dist.py", line 955, in run_commands
        self.run_command(cmd)
      File "/usr/local/python/versions/3.6.3/lib/python3.6/distutils/dist.py", line 974, in run_command
        cmd_obj.run()
      File "/usr/local/python/versions/3.6.3/lib/python3.6/site-packages/setuptools/command/install.py", line 61, in run
        return orig.install.run(self)
      File "/usr/local/python/versions/3.6.3/lib/python3.6/distutils/command/install.py", line 545, in run
        self.run_command('build')
      File "/usr/local/python/versions/3.6.3/lib/python3.6/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/usr/local/python/versions/3.6.3/lib/python3.6/distutils/dist.py", line 974, in run_command
        cmd_obj.run()
      File "/usr/local/python/versions/3.6.3/lib/python3.6/distutils/command/build.py", line 135, in run
        self.run_command(cmd_name)
      File "/usr/local/python/versions/3.6.3/lib/python3.6/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/usr/local/python/versions/3.6.3/lib/python3.6/distutils/dist.py", line 974, in run_command
        cmd_obj.run()
      File "/usr/local/python/versions/3.6.3/lib/python3.6/site-packages/setuptools/command/build_ext.py", line 75, in run
        _build_ext.run(self)
      File "/usr/local/python/versions/3.6.3/lib/python3.6/distutils/command/build_ext.py", line 339, in run
        self.build_extensions()
      File "/tmp/pip-build-dir23ma_/pillow/setup.py", line 790, in build_extensions
        raise RequiredDependencyException(f)
    __main__.RequiredDependencyException: jpeg
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-dir23ma_/pillow/setup.py", line 1037, in <module>
        raise RequiredDependencyException(msg)
    __main__.RequiredDependencyException:
    
    The headers or library files could not be found for jpeg,
    a required dependency when compiling Pillow from source.
    
    Please see the install instructions at:
       https://pillow.readthedocs.io/en/latest/installation.html
    
    
    
    ----------------------------------------
Command "/usr/local/python/versions/3.6.3/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-dir23ma_/pillow/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-v24jnr9g-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-dir23ma_/pillow/
You are using pip version 9.0.1, however version 22.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
The command '/bin/sh -c python3 -m pip install --upgrade --force-reinstall git+https://github.com/ewels/MultiQC.git' returned a non-zero code: 1```

Running the pipeline Params.config

To add a small description that users only have to change a copy of the params.config file when they want to run the pipeline.

nextflow doesn't run

Hi Luca,
I will have to re-open this issue.... because I also get this:
-bash: nextflow: command not found

I entered the cluster via ant-login but then used the command ssh -Y nextflow.linux.crg.es, but when I try to run this

cd makeAnno nextflow run make_anno.nf -bg --vcffile mgp.v5.merged.snps_all.dbSNP142.vcf.gz --speciesA CAST_EiJ --speciesB 129S1_SvImJ --genome GRCm38_68.fa --outvcf CAST_EiJ-129S1_SvImJ.vcf > log

I get the -bash: nextflow: command not found

So I was going through this issue and copied export PATH=/software/bpayer/bin/:$PATH in my .bashrc, and when I tried again to do the annotaions, I again got the message command not found...

What else could I try? I don't understand how I have to tell the computer where to go and look for nextflow... I think that's the issue..
Thanks a lot!

Originally posted by @llonchsilvia in #17 (comment)

how to install singularity

Hi Luca,

I have been trying to install singularity doing the following:

I found this website https://github.com/sylabs/singularity/releases?after=v3.3.0-rc.3

Then to install follow the instruccions from the file INSTALL.md. Go to this website: https://sylabs.io/guides/3.0/user-guide/installation.html#install-on-windows-or-mac

First install home-brew. Use this command:

which -s brew
if [[ $? != 0 ]] ; then
    # Install Homebrew
    ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
else
    brew update
fi

I tried but is not allowing me to do so because I am supposed to update my curl system, but that might screw my macbook... So I am not really sure on how to proceed.
What would you recommend?

Thanks!

groupCounts terminated for an unknown reason -- Likely terminated by the external system

Ciao Luca,

I have the following error when running the pipeline on all my samples:

[omissis]
[b4/da7fbf] Cached process > groupCounts (NP03_29643_CAGATC)
[omissis]
[d9/dc3a90] Cached process > makePropoportions (NP03_29643_CAGATC)
[74/e20a4a] Cached process > makePropoportions (5N1a_34794_TGACCA)
[e5/fada0e] Submitted process > groupCounts (ES03_29641_CGATGT)
[b2/17f4e6] Submitted process > groupCounts (EP1a_34792_CGATGT)
[48/004e20] Submitted process > groupCounts (ES1e_34795_ACAGTG)
[1b/e898cd] Submitted process > makePropoportions (NP01_27989_ACAGTG)
[9a/04a315] Submitted process > makePropoportions (ES2e_34803_AGTCAA)
[13/616524] Submitted process > makePropoportions (NP04_29644_GATCAG)
[b8/bc0702] Submitted process > makePropoportions (5N1e_34798_ACTTGA)
Error executing process > 'groupCounts (ES03_29641_CGATGT)'

Caused by:
  Process `groupCounts (ES03_29641_CGATGT)` terminated for an unknown reason -- Likely it has been terminated by the external system

Command executed:

  join_counts.sh ES03_29641_CGATGT

Command exit status:
  -

Command output:
  (empty)

Work dir:
  /nfs/users/bpayer/jseverino/projects/01_mRNAseq_PGCLC/allele_specific_RNAseq_v6/work/e5/fada0e6632dae7928a061b006801f7

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Pipeline BIOCORE@CRG Master of Pore completed!
Started at  2020-08-30T21:29:48.881+02:00
Finished at 2020-08-30T21:37:00.923+02:00
Time elapsed: 7m 12s
Execution status: failed
WARN: Killing pending tasks (5)

I have the feeling the jobs are being killed for 3-4 samples only, the others worked out fine.
-resume did not fixed.

After this is solved I'll send a general feedback about the pipeline.
Thanks,
Jackie

QConRawReads terminated for an unknown reason

Hi Luca,

I ran the pipeline and gave 48h to be sure it could be completed. After 12h, the only missing steps were QConRawReads of 2 samples (out of 32) and the multiQCreport. I waited until the 48h passed and it did not finish. This is the error I got:

I am rerunning it indicating 24h to complete the task, but I see that there are still some samples which did not complete the QConRawReads and the mapping is finishing.

Do you know what the problem is?

Thanks,
Mercedes

make_anno with data from reference genome strain cross

Hi @lucacozzuto ,

I have a doubt about how to use the make_anno.nf script in a case in which the RNA-seq comes from a mouse samples that is the cross between C57BL/6J and CAST/EiJ.

The thing is that the mouse genome that one can get from ftp-mouse.sanger.ac.uk/ref/GRCm38_68.fa is from the strain C57BL/6J (as stated in the readme of the mouse genomes project).

So, I believe, in the vcf header of mgp.v5.merged.snps_all.dbSNP142.vcf there's no C57BL_6J, while there are 129S1_SvImJ and CAST_EiJ for example.

With this type of data what should be specified in --speciesB in the following run:

nextflow run make_anno.nf -with-singularity -bg --vcffile mgp.v5.merged.snps_all.dbSNP142.vcf.gz \
                                             --speciesA CAST_EiJ \
                                             --speciesB XXX \
                                             --genome GRCm38_68.fa --outvcf CAST_EiJ-C57BL_6J.vcf > log

Thanks,
Nicco

parseVCF.py usage

Hi Luca,

For reproducibility and ease to use the pipeline, could you please add some information in the README about how to run parseVCF.py? So in the future other people in the lab can use it more easily.

For example: which python version should I use for parseVCF.py?

My default it version 3

> python2 --version
Python 2.7.5

> python --version
Python 3.4.5

However I get this silly error when running it with python3:

> singularity exec -e asrnaseq_0.2.sif python ../projects/01_mRNAseq_PGCLC/allele_specific_RNAseq_v5/bin/parseVCF.py -h
  File "../projects/01_mRNAseq_PGCLC/allele_specific_RNAseq_v5/bin/parseVCF.py", line 36
    os.system("gzip "+ opts.wotus) 
                                  ^
TabError: inconsistent use of tabs and spaces in indentation

But when I run it with python2 I have no problems.

Also a minor thing regarding the usage of parseVCF.py

> singularity exec -e asrnaseq_0.2.sif python2 ../projects/01_mRNAseq_PGCLC/allele_specific_RNAseq_v5/bin/parseVCF.py -1 CAST_EiJ -2 129S1_SvImJ -i mgp.v5.merged.snps_all.dbSNP142.vcf.gz -o CAST_EiJ-129S1_SvImJ.vcf -g GRCm38_68.fa
Error: The requested file (CAST_EiJ-129S1_SvImJ.vcf) could not be opened. Error message: (No such file or directory). Exiting!
gzip: GRCm38_68.fa.masked: No such file or directory
gzip: CAST_EiJ-129S1_SvImJ.vcf: No such file or directory

So I did touch CAST_EiJ-129S1_SvImJ.vcf and run again and it went fine and I do get the masked genome and the compressed CAST_EiJ-129S1 VCF file.

> ls references
CAST_EiJ-129S1_SvImJ.vcf.gz  GRCm38_68.fa.masked.gz  cas                                          mgp.v5.merged.indels.dbSNP142.normed.vcf.gz.tbi
GRCm38_68.fa                 asrnaseq_0.2.sif        mgp.v5.merged.indels.dbSNP142.normed.vcf.gz  mus

Thanks!
Jackie

Mapping error

Hi Luca,

The pipeline was working until it got to the mapping step. Then I got this error:

Error executing process > 'mapping ()'

Caused by:
File /ReadsPerGene.out.tab is out of the scope of process working dir: /nfs/users/bpayer/mbarrero/allele_specific_RNAseq/work/96/64248a8b984465abda0413c3fb2bf7

Source block:
def output = "${pair_id}"
def variants = unzipBash("${variants_file}")
"""
if [ echo "${reads}"| cut -f 1 -d " " | grep ".gz" ]; then gzipped=" --readFilesCommand zcat "; else gzipped=""; fi
STAR --genomeDir ${STARgenome}
--readFilesIn ${reads}
$gzipped
--waspOutputMode SAMtag
--outSAMunmapped Within
--outSAMtype BAM SortedByCoordinate
--runThreadN ${task.cpus}
--outFileNamePrefix ${pair_id}
--quantMode GeneCounts
--outSAMattributes NH HI AS nM NM MD jM jI XS MC ch vA vW vG
--varVCFfile ${variants};
mkdir ${output}
mv *.out.tab ${output}/
mv Aligned ${output}/
mv Log ${output}/

      	samtools index ${pair_id}/*.out.bam
      """

Work dir:
/nfs/users/bpayer/mbarrero/allele_specific_RNAseq/work/96/64248a8b984465abda0413c3fb2bf7

Running on single end data

Ciao Luca,

come va?

Vorrei provare a lanciare la pipeline su dei dati publicati di RNAseq che pero' sono single end e non paired end?

Secondo te la pipeline e' robusta abbastanza per capirlo oppure lo dovrei specificare in qualche modo?

Grazie mille, spero di rivederci presto!
Jackie

High percentage of duplicated reads

Hi Luca,

I have run the pipeline for all my samples and it worked, but in the multiQC report I observe a high percentage of duplicate reads for all the samples. Do you think this is a problem? Can it affect the further analysis?

Thanks!

Counts output file description

For the counts file it says the following

Counts: composite counts per gene per sample. No distincion between alleles

However, the generate tab file includes 3 columns and not only 1 for the composite counts. Could you please update the description and put a short explainer on which column to use? To our understanding:
1st = reads from both strands -> should be used for unstranded prepared RNA
2nd = reads from the wrong strand
3rd = reads from the right strand -> and should therefore be used for total RNA (stranded)

Grazie :)

RNA SNP QC

title: XCR RNAseq estimation
author: Enrique Vidal
date: '`r format(Sys.Date(), "%B %d, %Y")`'
output:
html_document:
toc: true
toc_float: true
code_folding: hide

XCR RNAseq estimation

Settings and dependencies

We need to set up some dependencies and prepare auxiliary functions.

## library("knitr")
## opts_chunk$set(fig.path = 'figures_atac_peaks_lastsamples/', dev=c('png', 'pdf'))


library("data.table")
library("dplyr")
library("magrittr")
library("tidyr")
library("stringr")
library("parallel")

library("cowplot")
library("scales")
library("gghighlight")
library("viridis")

## library("Rtsne")
## library("umap")

library("DESeq2")

if(Sys.info()["nodename"] == "curro"){
    cluster_home <- "/mnt/cluster/"
    cluster_4dgenome <- "/mnt/cluster_4DGenome/"
    cluster_4dgenome_no_backup <- "/mnt/cluster_4DGenome_no_backup/"
    cluster_jquilez <- "/mnt/cluster_jquilez/"
    samtools_bin <- "samtools"
}else{
    cluster_home <- "/users/GR/mb/evidal/"
    cluster_4dgenome <- "/users/project/4DGenome/"
    cluster_4dgenome_no_backup <- "/users/project/4DGenome_no_backup/"
    cluster_jquilez <- "/users/project/jquilez/"
    samtools_bin <- "/software/mb/el7.2/samtools-1.8/samtools"
}
options(stringsAsFactors = F,
        device= "png")

pp <- function(p, ...){
    paste0(cluster_4dgenome,
           "public_docs/test/prueba.png") %>%
        save_plot(p, ...)
}

pp <- function(p, ...) print(p)

## chromosomes <- c(1:19, "X")

And some functions

Import data


load("processed/gene_matrices_rna.RData")
load("processed/annotation.RData")

annot %<>% mutate(chrom = str_remove(chrom, "chr"))

Bulk expression


do_bulk <- function(the_chrom, mat){
    X <- mat
    ii <- filter(annot, chrom == the_chrom) %>%
        pull(gene_id) %>%
        match(rownames(X))
    X <- X[ii,]
    covas <- meta[match(colnames(X), meta$sample_id),]
    X <- X[apply(X, 1, sd) > 0,]
    dds <- DESeqDataSetFromMatrix(X,
                                  covas,
                                  ~ 0 + condition)
    dds %<>% DESeq()
    res <- resultsNames(dds) %>%
        str_subset("condition") %>%
        mclapply(function(tt){
            cat(tt, "\n")
            lfcShrink(dds, coef = tt, type = "ashr") %>%
                as.data.frame() %>%
                mutate(condition = tt,
                       info = row.names(.))
        },mc.cores = 5, mc.preschedule = F) %>%
        rbindlist()
    res %<>% mutate(condition = str_remove(condition, "condition"),
                    ll = (log2FoldChange - 2 * lfcSE),
                    ul = (log2FoldChange + 2 * lfcSE),
                    ll50 = (log2FoldChange - 1 * lfcSE),
                    ul50 = (log2FoldChange + 1 * lfcSE),
                    est = (log2FoldChange)) %>%
        inner_join(dplyr::select(meta, condition, time, stage)) %>%
        group_by(time) %>%
        mutate(ii = as.numeric(stage) %>% min(),
               max_stage = levels(stage)[ii],
               ii = as.numeric(stage) %>% max(),
               min_stage = levels(stage)[ii],
               ii = NULL) %>%
        ungroup()
    ccs <- counts(dds, normalized = T)
    list(ccs = ccs, res = res, dds = dds)
}

out_bulk <- mclapply(list("13", "X"), do_bulk, mat = mat_bulk,
                     mc.cores = 2, mc.preschedule = F)

Allelic ratio


do_as <- function(the_chrom, mat){
    X <- mat
    ii <- filter(annot, chrom == the_chrom) %>%
        pull(gene_id) %>%
        match(rownames(X))
    X <- X[ii,]
    covas <- meta[match(colnames(X), meta$sample_id),]
    X <- X[apply(X, 1, sd) > 0,]
    dds <- DESeqDataSetFromMatrix(X,
                                  covas,
                                  ~ 0 + condition + condition:geno)
    sizeFactors(dds) <- rep(1, ncol(X))
    dds %<>% DESeq()
    res <- resultsNames(dds) %>%
        str_subset("mus") %>%
        mclapply(function(tt){
            lfcShrink(dds, coef = tt, type = "ashr") %>%
                as.data.frame() %>%
                mutate(condition = tt,
                       info = row.names(.))
        },mc.cores = 5, mc.preschedule = F) %>%
        rbindlist()
    log2fc2ap <- function(x) 2 ^ x / (1 + 2 ^ x)
    res %<>% mutate(condition = str_remove(condition, "condition") %>%
                        str_remove(".genomus"),
                    ll = log2fc2ap(log2FoldChange - 2 * lfcSE),
                    ul = log2fc2ap(log2FoldChange + 2 * lfcSE),
                    ll50 = log2fc2ap(log2FoldChange - 1 * lfcSE),
                    ul50 = log2fc2ap(log2FoldChange + 1 * lfcSE),
                    est = log2fc2ap(log2FoldChange)) %>%
        inner_join(dplyr::select(meta, condition, time, stage)) %>%
        group_by(time) %>%
        mutate(ii = as.numeric(stage) %>% min(),
               max_stage = levels(stage)[ii],
               ii = as.numeric(stage) %>% max(),
               min_stage = levels(stage)[ii],
               ii = NULL) %>%
        ungroup()
    ccs <- counts(dds, normalized = T)
    list(ccs = ccs, res = res, dds = dds)
}

out_as <- mclapply(list("13", "X"), do_as, mat = cbind(mat_mus, mat_cas),
                     mc.cores = 2, mc.preschedule = F)

Store results


save(out_as, file = paste0("processed/rnaseq_as_dds.RData"))
save(out_bulk, file = paste("processed/rnaseq_bulk_dds.RData"))

Bulk expression (all chromosomes)


X <- mat_bulk
ii <- filter(annot) %>%
    pull(gene_id) %>%
    match(rownames(X))
X <- X[ii,]
covas <- meta[match(colnames(X), meta$sample_id),]
X <- X[apply(X, 1, sd) > 0,]
dds <- DESeqDataSetFromMatrix(X,
                              covas,
                              ~ 0 + condition)
dds %<>% DESeq()
res <- resultsNames(dds) %>%
    str_subset("condition") %>%
    mclapply(function(tt){
        cat(tt, "\n")
        lfcShrink(dds, coef = tt, type = "ashr") %>%
            as.data.frame() %>%
            mutate(condition = tt,
                   info = row.names(.))
    },mc.cores = 5, mc.preschedule = F) %>%
    rbindlist()
res %<>% mutate(condition = str_remove(condition, "condition"),
                ll = (log2FoldChange - 2 * lfcSE),
                ul = (log2FoldChange + 2 * lfcSE),
                ll50 = (log2FoldChange - 1 * lfcSE),
                ul50 = (log2FoldChange + 1 * lfcSE),
                est = (log2FoldChange)) %>%
    inner_join(dplyr::select(meta, condition, time, stage)) %>%
    group_by(time) %>%
    mutate(ii = as.numeric(stage) %>% min(),
           max_stage = levels(stage)[ii],
           ii = as.numeric(stage) %>% max(),
           min_stage = levels(stage)[ii],
           ii = NULL) %>%
    ungroup()
ccs <- counts(dds, normalized = T)

full_bulk <- list(ccs = ccs, dds = dds, res = res)


save(full_bulk, file = paste("processed/rnaseq_fullbulk_dds.RData"))

Allelic ratio (all chromosomes)


X <- cbind(mat_mus, mat_cas)
ii <- filter(annot) %>%
    pull(gene_id) %>%
    match(rownames(X))
X <- X[ii,]
covas <- meta[match(colnames(X), meta$sample_id),]
X <- X[apply(X, 1, sd) > 0,]
dds <- DESeqDataSetFromMatrix(X,
                              covas,
                              ~ 0 + condition + condition:geno)
sizeFactors(dds) <- rep(1, ncol(X))
dds %<>% DESeq()
res <- resultsNames(dds) %>%
    str_subset("mus") %>%
    mclapply(function(tt){
        lfcShrink(dds, coef = tt, type = "ashr") %>%
            as.data.frame() %>%
            mutate(condition = tt,
                   info = row.names(.))
    },mc.cores = 5, mc.preschedule = F) %>%
    rbindlist()
log2fc2ap <- function(x) 2 ^ x / (1 + 2 ^ x)
res %<>% mutate(condition = str_remove(condition, "condition") %>%
                    str_remove(".genomus"),
                ll = log2fc2ap(log2FoldChange - 2 * lfcSE),
                ul = log2fc2ap(log2FoldChange + 2 * lfcSE),
                ll50 = log2fc2ap(log2FoldChange - 1 * lfcSE),
                ul50 = log2fc2ap(log2FoldChange + 1 * lfcSE),
                est = log2fc2ap(log2FoldChange)) %>%
    inner_join(dplyr::select(meta, condition, time, stage)) %>%
    group_by(time) %>%
    mutate(ii = as.numeric(stage) %>% min(),
           max_stage = levels(stage)[ii],
           ii = as.numeric(stage) %>% max(),
           min_stage = levels(stage)[ii],
           ii = NULL) %>%
    ungroup()
ccs <- counts(dds, normalized = T)

full_as <- list(ccs = ccs, dds = dds, res = res)

save(full_as, file = paste("processed/rnaseq_fullas_dds.RData"))

compare full with per-chromosome


genes <- filter(annot, chrom == "X") %>% pull(gene_id)

full_bulk_X <- filter(full_bulk$res, info %in% genes) %>% unique()
chrm_bulk_X <- lapply(out_bulk, "[[", "res") %>%
    rbindlist() %>%
    filter(info %in% genes) %>%
    unique()

p <- inner_join(full_bulk_X, chrm_bulk_X, by = c("info", "condition")) %>%
    filter(condition == "ES0") %>%
    ggplot(aes(x = (log2FoldChange.x),
               xend = (log2FoldChange.x),
               col = log2FoldChange.y > log2FoldChange.x,
               y = log2FoldChange.x,
               yend = log2FoldChange.y)) +
    geom_segment(arrow = arrow()) +
    labs(col = NULL)
pp(p)

p <- inner_join(full_bulk_X, chrm_bulk_X, by = c("info", "condition")) %>%
    filter(condition == "NP0") %>%
    ggplot(aes(x = est.x,
               y = est.y)) +
    geom_point(pch = ".") +
    labs(col = NULL)
pp(p)


inner_join(full_bulk_X, chrm_bulk_X, by = c("info", "condition"),
           suffix = c("_full", "_chrm")) %>%
    mutate(d = (est_full - est_chrm)) %>%
    group_by(condition, sign(est_full - .5)) %>%
    summarize(m = min(d),
              M = max(d)) %>%
    ungroup() %>%
    knitr::kable(digits = 2)

p <- data.frame(full = full_bulk_X$est,
                chrm = chrm_bulk_X$est) %>%
    gather(dataset, est) %>%
    group_by(dataset) %>%
    mutate(r = rank(est)) %>%
    ungroup() %>%
    ggplot(aes(x = r, y = est, col = dataset)) +
    geom_line()
pp(p)

p <- data.frame(full = full_bulk_X$est,
                chrm = chrm_bulk_X$est) %>%
    ggplot(aes(x = full,
               y = chrm)) +
    geom_point(pch = ".")
pp(p)

## sel <- "ENSMUSG00000070641.3"

the_gene <- "Mecp2"
sel <- filter(annot, gene_name == the_gene) %>% pull(gene_id)
cond <- "6NG"
cc_full <- filter(full_bulk$res, info == sel, condition == cond) %>%
    unique()
cc_chrm <- lapply(out_bulk, "[[", "res") %>%
    rbindlist() %>%
    filter(info == sel, condition == cond) %>%
    unique()
cc <- rbind(mutate(cc_full, dataset = "full"),
            mutate(cc_chrm, dataset = "chrm")) %>%
    dplyr::select(dataset, est, ll, ul, ll50, ul50)
nn_full <- full_bulk$ccs[sel,] %>%
    as.data.frame() %>%
    setNames("n") %>%
    mutate(info = rownames(.)) %>%
    separate(info, c("batch", "condition", "replicate", "geno"), sep = ";") %>%
    filter(condition == cond) %>%
    dplyr::select(batch, replicate, geno, n) %>%
    spread(geno, n) %>%
    mutate(p = mus / (mus + cas),
           dataset = "full")
nn_chrm <- lapply(out_bulk, "[[", "ccs") %>%
    do.call(rbind, .) %>%
    (function(x) x[sel,]) %>%
    as.data.frame() %>%
    setNames("n") %>%
    mutate(info = rownames(.)) %>%
    separate(info, c("batch", "condition", "replicate", "geno"), sep = ";") %>%
    filter(condition == cond) %>%
    dplyr::select(batch, replicate, geno, n) %>%
    spread(geno, n) %>%
    mutate(p = mus / (mus + cas),
           dataset = "chrm")
nn <- rbind(nn_full, nn_chrm)
cc
group_by(nn, dataset) %>%
    do({p <- sum(.$mus) / sum(.$mus + .$cas)
        glm(cbind(mus, cas) ~ 1, family = "binomial", data = .) %>%
       confint.default() %>%
       arm::invlogit() %>%
       as.data.frame() %>%
       setNames(c("ll", "ul")) %>%
       mutate(est = p)}) %>%
    ungroup()

SNP file not found

Ciao Luca,

There seems to be an issue, that the SNP file does not exist.
No such file 'mgp.v5.merged.snps_all.dbSNP142.normed.vcf.gz'.

Should I use this one instead?
mgp.v5.merged.snps_all.dbSNP142.vcf.gz

Thanks!
Moritz

Sort and index bam files

Hi Luca,

at the moment I need to look at the alignment files in the IGV genome browser.
Would it be possible to add

samtools sort file_allele_[A-B].bam > file_allele_[A-B]_srtd.bam
samtools index file_allele_[A-B]_srtd.bam

to all the output/Allele_alignments/*.bam ?

Thanks a lot!
Jackie

Can't install it

`[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
`
Hey Luca, any idea on why I can't install the pipeline? Nextflow, Docker and bash are all installed. Thanks!

$baseDir Change

Do I only have to change $baseDir for the params.config files? And not the nextflow.config file?
(obviously for both the makeAnno and the main one to run the pipeline)

Description adding Singularity to bash

Lets add a small description on how to add the following to the bash profile
module use /software/as/el7.2/EasyBuild/CRG/modules/all module load Singularity/3.2.1

varcut = 1

varcut in params.config should be "1". Just in case someone forgets to change it. Just to keep things as close to the real setup as possible.

Error executing process > 'getReadLength'

Hi Luca,

I am trying to run the allele-specific pipeline for the first time and i got this error as soon as it started. Any ideas what is going wrong?

Thanks,
Tom

Workflow execution completed unsuccessfully
The exit status of the task that caused the workflow execution to fail was: 255

Error executing process > 'getReadLength'

Caused by:
Process getReadLength terminated with an error exit status (255)

Command executed:

if [ echo K0EP02_15186AAD_GCCGTGAAGT-GCATGGAGCG_R1_001.fastq.gz | grep "gz" ]; then cat="zcat"; else cat="cat"; fi
$cat K0EP02_15186AAD_GCCGTGAAGT-GCATGGAGCG_R1_001.fastq.gz | awk '{num++}{if (num%4==2){line++; sum+=length($0)} if (line==100) {printf "%.0f", sum/100; exit} }'

Command exit status:
255

Command output:
(empty)

Command error:
�[91mERROR : Unknown image format/type: /nfs/users/bpayer/tmattimoe/allele_specific/allele_specific_RNAseq/singularity/biocorecrg-asrnaseq-0.3.img
�[0m�[31mABORT : Retval = 255
�[0m

Work dir:
/nfs/users/bpayer/tmattimoe/allele_specific/allele_specific_RNAseq/work/d4/96570abc650b44ada60f0dc9c50993

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

Wrong title

ChIP and RIP seq pipeline
I don't know why the title is wrong.

Getting Scripts to observe a hybridness

Hello,

Its a very handy and wonderfull tool which i am succecfully used in my allelic analysis, Thank you for that

I just need some thread issues mentioned for testing the hybridness here is it possible provide the scripts to test the hridness of the data

Thank you

Problem running nextflow

Hi Luca,

I am trying to run nextflow for a paired end sample with the following command:

nextflow run as_rnaseq.nf -bg -with-tower -with-singularity > log

The problem is that I constantly get errors at the first step:

**Error executing process > 'getReadLength'

Caused by:
Process getReadLength terminated with an error exit status (255)

Command executed:

if [ echo NPC1_14191AAC_CGATGT_R1_001.fastq.gz | grep "gz" ]; then cat="zcat"; else cat="cat"; fi
$cat NPC1_14191AAC_CGATGT_R1_001.fastq.gz | awk '{num++}{if (num%4==2){line++; sum+=length($0)} if (line==100) {printf "%.0f", sum/100; exit} }'**

My params config file is like this right now:

**params {

reads = "/users/bpayer/mbarrero/allele_specific_RNAseq/test/{NPC1_14191AAC_CGATGT_R1_001,NPC1_14191AAC_CGATGT_R2_001}.fastq.gz"
genome = "/users/bpayer/mbarrero/allele_specific_RNAseq/makeAnno/filteredVCF/GRCm38_68.fa.masked.gz"
annotation = "/users/bpayer/sequencing_analysis/pipe_data/Mus_musculus.GRCm38.68.gtf"
strandness = "reverse"
indexfolder = "/users/bpayer/mbarrero/allele_specific_RNAseq/index"
variants = "/users/bpayer/mbarrero/allele_specific_RNAseq/makeAnno/filteredVCF/CAST_EiJ-129S1_SvImJ.vcf.gz"
output = "/users/bpayer/mbarrero/allele_specific_RNAseq/output_test"
single = "NO"
varcut = 1
title = "Allele specific RNAseq project PAYER_09"
subtitle = "PAYER_09_npc1"
PI = "Payer"
User = "Mercedes Barrero"
UCSCgenomeID = "mm10"
email = "[email protected]"**

Do you know which could be the issue?

Thanks!

Input data

To already keep the VCF file with SNPs readily available, so people do not have to generate that every time a new user uses the pipeline, since we will anyway always use mus and cas only (this is only for the private version as well)

Symbolic link for data

Maybe for the private version we can add that the simplest is to use a symbolic link to link to your data. ln -s /users/bpayer/sequencing_data/Moritz_Bauer/2018-10-15_ACCTCFANXX /users/bpayer/mbauer/allele_specific_RNAseq

strandness

Hi Luca,

I have a quick question. In my stranded total RNAseq, if I have paired-end sequencing, what to I have to specify in "strandness" in the params.config? Is it "reverse"?

Thanks a lot,
Mercedes

mapping error

Hi Luca,

I am running new sequencing files with the pipeline and I got one mapping error in 1 out of 16 samples. The error is this one below, do you know how I could fix it? Thanks!

Error executing process > 'mapping (IMGFP2_00030AAD_GCCAAGAC)'

Caused by:
Process mapping (IMGFP2_00030AAD_GCCAAGAC) terminated with an error exit status (138)

Command executed:

if [ echo "IMGFP2_00030AAD_GCCAAGAC_R1_001.fastq.gz IMGFP2_00030AAD_GCCAAGAC_R2_001.fastq.gz"| cut -f 1 -d " " | grep ".gz" ]; then gzipped=" --readFilesCommand zcat "; else gzipped=""; fi
STAR --genomeDir GRCm38_68 --readFilesIn IMGFP2_00030AAD_GCCAAGAC_R1_001.fastq.gz IMGFP2_00030AAD_GCCAAGAC_R2_001.fastq.gz $gzipped --waspOutputMode SAMtag --outSAMunmapped Within --outSAMtype BAM SortedByCoordinate --runThreadN 8 --outFileNamePrefix IMGFP2_00030AAD_GCCAAGAC --quantMode GeneCounts --outSAMattributes NH HI AS nM NM MD jM jI XS MC ch vA vW vG --varVCFfile <(zcat CAST_EiJ-129S1_SvImJ.vcf.gz);
mkdir IMGFP2_00030AAD_GCCAAGAC
mv *.out.tab IMGFP2_00030AAD_GCCAAGAC/
mv Aligned IMGFP2_00030AAD_GCCAAGAC/
mv Log IMGFP2_00030AAD_GCCAAGAC/

samtools index IMGFP2_00030AAD_GCCAAGAC/*.out.bam

Command exit status:
138

Command output:
Mar 01 16:39:51 ..... started STAR run
Mar 01 16:39:51 ..... loading genome
Mar 01 20:54:48 ..... loading variations VCF
Mar 01 20:59:26 ..... started mapping
Mar 01 22:00:15 ..... finished mapping
Mar 01 22:00:16 ..... started sorting BAM
Mar 01 22:03:46 ..... finished successfully

Command wrapper:
Mar 01 16:39:51 ..... started STAR run
Mar 01 16:39:51 ..... loading genome
Mar 01 20:54:48 ..... loading variations VCF
Mar 01 20:59:26 ..... started mapping
Mar 01 22:00:15 ..... finished mapping
Mar 01 22:00:16 ..... started sorting BAM
Mar 01 22:03:46 ..... finished successfully

Work dir:
/nfs/users/bpayer/mbarrero/allele_specific_RNAseq/work/38/d151949ee46db3860317010d8a9c99

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

Making it public

Since we are now uploading the paper to bioRxiv these days and will send it out for publication, can we now make it public?

params.config reads

Change reads = "$baseDir/test/*_{1,2}.fastq.gz" to reads = "$baseDir/test/*_read{1,2}.fastq.gz" in the params-config for the as_rnaseq execution since that is the structure fastq files from the CRG facility have.

ant-login vs. nextflow

Ciao Luca,

since we apparently need to use both the ant-login as we as the nextflow linux cluster, would it be possible to add to the readme in which of the environments to do each of the steps? I am honestly a bit confused by this now. For example, would it mean that I have to install nextflow also in my ant-login mbauer folder to do the "run make_anno.nf"?

Thanks!

Singularity path error

Ciao Luca,

I have the following data that I downloaded as you wrote in the README

> cd /users/bpayer/jseverino/references 

> ls 
VCF
genome
asrnaseq-0.2.simg 

> ls VCF
mgp.v5.merged.indels.dbSNP142.normed.vcf.gz  
mgp.v5.merged.indels.dbSNP142.normed.vcf.gz.tbi

> ls genome
GRCm38_68.fa

And when I run singularity to mask the genome and make the new index file with:

singularity exec -e ./asrnaseq_0.2.simg parseVCF.py -1 CAST_EiJ -2 129S1_SvImJ -i VCF/mgp.v5.merged.snps_all.dbSNP142.vcf.gz -o VCF/CAST_EiJ-129S1_SvImJ.vcf -g genome/GRCm38_68.fa

I get the error:

ERROR  : Image path ./asrnaseq_0.2.simg doesn't exist: No such file or directory
ABORT  : Retval = 255

I believe this is a silly problem with my local PATH or my singularity not understanding where it is or what to do.
Could you help me please figure it out? Cause on google I could not find any solution to this problem.

Thanks a lot!
Jackie

PS: in your README you wrote singularity exec -e ./asrnaseq_0.2.**sif** but I think you meant simg anyway also with .sif I get the same error.

-with-tower flag requires nextflow version 20?

Ciao Luca,

ho provato a fare

nextflow run as_rnaseq.nf -bg -with-tower > log 
Unknown option: -with-tower -- Check the available commands and options and syntax with 'help'

Ed ho visto che nell help non ci sono informazioni riguardanti il flag `-with-tower``

Ho quindi pensato che possa essere un problema di versione di nextflow

 nextflow -version
      N E X T F L O W
      version 19.07.0 build 5106
      created 27-07-2019 13:22 UTC (15:22 CEST)
      cite doi:10.1038/nbt.3820
      http://nextflow.io

Se infatti gli definisco la versione di nextflow da usare con export NXF_VER=20.01.0 a quel punto

nextflow run as_rnaseq.nf -bg -with-tower > log 

CAPSULE: Downloading dependency org.codehaus.groovy:groovy-json:jar:2.5.8
CAPSULE: Downloading dependency org.codehaus.groovy:groovy-templates:jar:2.5.8
CAPSULE: Downloading dependency org.iq80.leveldb:leveldb-api:jar:0.12
CAPSULE: Downloading dependency io.nextflow:nf-commons:jar:20.01.0
CAPSULE: Downloading dependency org.codehaus.groovy:groovy-nio:jar:2.5.8
CAPSULE: Downloading dependency org.codehaus.groovy:groovy-xml:jar:2.5.8
CAPSULE: Downloading dependency io.nextflow:nextflow:jar:20.01.0
CAPSULE: Downloading dependency io.nextflow:nf-tower:jar:20.01.0
CAPSULE: Downloading dependency org.codehaus.groovy:groovy:jar:2.5.8
CAPSULE: Downloading dependency io.nextflow:nxf-s3fs:jar:1.0.7
CAPSULE: Downloading dependency jline:jline:jar:2.14.6
CAPSULE: Downloading dependency org.iq80.leveldb:leveldb:jar:0.12
CAPSULE: Downloading dependency io.nextflow:nf-httpfs:jar:20.01.0
CAPSULE: Downloading dependency com.amazonaws:aws-java-sdk-ecs:jar:1.11.542

e posso vedere il progresso della pipeline su https://tower.nf/watch/LQEOPE8w

E`corretto come ho fatto?
Parliamone anche a voce

Private vs Public

As we discussed, we try to in the end have a private, CRG internal for dummies version with more in detail description, as well as a public version, that will be used for the publication .

Content of the `Counts` folder

Ciao Luca,

Nella sezione del README dei Results stavo leggendo che la cartella Counts
contiene:

composite counts per gene per sample. No distincion between alleles

Non mi e' tanto chiaro il concetto di composite counts? Io mi sarei aspetta un file con 2 colonne una del gene ID e la seconda con le counts. Invece ci sono 4 colonne (una gene ID e 3 di counts) e non capisco la differenza tra le tre colonne della counts.

Inoltre le prima 4 righe (che mi sembrano un header) non capisco bene che informazione riportino.

Qui ti riporto l esempio di un file:

N_unmapped	231896	231896	231896
N_multimapping	1161389	1161389	1161389
N_noFeature	176413	458139	3269481
N_ambiguous	170644	46725	3691
ENSMUSG00000094121	0	0	0
ENSMUSG00000031166	41	37	4
ENSMUSG00000031167	85	33	52
ENSMUSG00000055188	8	7	1
ENSMUSG00000039201	81	75	6
ENSMUSG00000031168	159	146	13
ENSMUSG00000031169	688	661	27
ENSMUSG00000031171	200	191	9
ENSMUSG00000031170	14	13	1

Grazie mille per il tuo aiuto,
Jackie

module makeAnno

Hi Luca,
I am having issues with the module makeAnno...
I am trying to run this command:
cd makeAnno

nextflow run make_anno.nf -bg --vcffile mgp.v5.merged.snps_all.dbSNP142.vcf.gz --speciesA CAST_EiJ --speciesB 129S1_SvImJ --genome GRCm38_68.fa --outvcf CAST_EiJ-129S1_SvImJ.vcf > log

and I get absolutely no result.. I've been reading the issue Mortiz opened about the $baseDir change, and tried everything is there... I changed the path to the vcf files and the reference genome, put them in different folders in the cluster and tested... but I can't get it to work...

What else would you suggest to try?

Thanks!

Ensembl Annotation

Cia Luca,

I am missing some information on where and how to get the ensembl annotation

and the annotation from Ensembl. We used the version Mus_musculus.GRCm38.68 not available in Ensembl archive.

Do I get it from here?
ftp://ftp.ensembl.org/pub/release-68/fasta/mus_musculus/

Thanks!
Moritz

P.S. I am just serving as a test pilot right now to see if less experienced people like Jackie could also easily run the pipeline in case you are wondering

biocorecrg / allele_specific_rnaseq Goto Github PK

allele_specific_rnaseq's Introduction

Allele-specific RNA-seq pipeline

Install

Prepare the input data

Run the pipeline

Fastq reads

Genome

Strandness

annotation

variants

single

varcut

Results

allele_specific_rnaseq's People

Contributors

Stargazers

Watchers

Forkers

allele_specific_rnaseq's Issues

title: XCR RNAseq estimation author: Enrique Vidal date: 'r format(Sys.Date(), "%B %d, %Y")' output: html_document: toc: true toc_float: true code_folding: hide

XCR RNAseq estimation

Settings and dependencies

Import data

Bulk expression

Allelic ratio

Store results

Bulk expression (all chromosomes)

Allelic ratio (all chromosomes)

compare full with per-chromosome

Recommend Projects

Recommend Topics

Recommend Org

title: XCR RNAseq estimation
author: Enrique Vidal
date: '`r format(Sys.Date(), "%B %d, %Y")`'
output:
html_document:
toc: true
toc_float: true
code_folding: hide