Code Monkey home page Code Monkey logo

galig's Introduction

Support this project by running your production jobs at BatchX

ASGAL

ASGAL (Alternative Splicing Graph ALigner) is a tool for detecting the alternative splicing events expressed in a RNA-Seq sample with respect to a gene annotation. The main idea behind ASGAL is the following one: the alternative splicing events can be detected by aligning the RNA-Seq reads against the splicing graph of the gene.

The instructions to install and use ASGAL are at http://asgal.algolab.eu.

Prerequisites

See here for more details.

Compiling

git clone --recursive https://github.com/AlgoLab/galig.git
cd galig
make prerequisites
make

Running

./asgal -g [genome] -a [annotation] -s [sample] -o outputFolder

In more detail:

# Align RNA-Seq reads to a splicing graph
./bin/SpliceAwareAligner -g [reference] -a [annotation] -s [sample] -o outputFolder/output.mem

# Convert alignments to SAM format
python3 ./scripts/formatSAM.py -m output.mem -g [reference] -a [anotation] -o outputFolder/output.sam

# Detect events from alignments
python3 ./scripts/detectEvents.py -g [reference] -a [annotation] -m output.mem -o outputFolder/output.events.csv

Example

cd example
tar xfz input.tar.gz
../asgal -g ./input/genome.fa -a ./input/annotation.gtf -s ./input/sample_1.fa -o outputFolder

This command will produce four files in the output folder:

  • a .mem file containing the alignments to the splicing graph
  • a .sam file, containing the alignments to the splicing graph mapped to the reference genome
  • a .events.csv file, containing the alternative splicing events detected in the RNA-Seq sample
  • a .log file, containing the log of the execution

An extended explanation of this example can be found here.

The tool has been tested only on 64bit Linux system. You can find more information at http://asgal.algolab.eu.

Join the chat at https://gitter.im/AlgoLab/galig

galig's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

galig's Issues

logs files

Hi,

I am trying to run ASGAL and I am interested: how the files in the logs/ASGAL folder are produced and what do they mean? In the genomewide mode are they produced for each gene in the genome annotation or only for some of them? Which ones?

Thank you!

Need help to run the tool, no result for sample.mem

Hi,
I am currently working on RNA seq data (Paired end), of human sample, I need to check the alternative splicing event of a specific gene "NEK1" in the sample data, and may be after that for all the genes, my sample has no replicate.I was trying to use ASGAL tool, but while I am giving the input for SpliceAwareAligner tools for generatinig sample.mem file form fastq files nothing is coming up as output. I am using the annotation gtf for that gene only extracted from hg19 human genome gtf.
Here is the command I am usingbin/SpliceAwareAligner -g hg19.fa -a annotation_NEK1.gtf -s NEK1_NM1.fastq -o asgal/NEK1_NM1_output.memPlease help me to run the tool I need the data urgently, can you please help with this?

Error when testing with example files

HI, I encountered the following error when testing with your example data after installation. Do you know what caused the issue and is it caused by inproper installation?
Thanks!

File "/opt/galig/asgal", line 54
eprint(f"command: '{' '.join(command)}'")
^
SyntaxError: invalid syntax

Computation of PSI (percent spliced in)

I found that ASGAL generated only the number of supporting reads for each of the splicing events.
Could you please let me know (with code if possible) how to compute PSI values of each of these events?
Thank you,

Issues with Dockerfile

Hi Luca,

We wrote earlier,

It seems that you don't have biopython installed. But from your first message, it seems that you installed it... Can you import the Bio module from the python3 shell?

I apologize for the delay, however, I was a little too quick to write the dockerfile, I wrote for you. It still doesn't work, unfortunately.

I've been trying to use my docker with BioPython Installed. However, even when the python shell can import pandas, or Bio, asgal can't seem to:

I have no name!@2cfb5973d9f4:/myvol1$ asgal --multi -g Homo_sapiens.GRCh38.dna.primary_assembly.fa -a splicing_variants.gtf -s one_test_1.fastq -s2 one_test_2.fastq -t splicing_variants_transcripts.fa -o asgal_results
Traceback (most recent call last):
  File "/opt/galig/asgal", line 8, in <module>
    import pandas as pd
ModuleNotFoundError: No module named 'pandas'
I have no name!@2cfb5973d9f4:/myvol1$ python
Python 3.7.4 (default, Aug 13 2019, 20:35:49) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
>>> 
>>> import Bio
>>> 

Furthermore, If I may bring your attention to the Dockerfile you have as well. It had the following error, when building it.

...
In file included from /galig/sdsl-lite/compiled/include/sdsl/rrr_vector.hpp:27:0,
                 from /galig/sdsl-lite/compiled/include/sdsl/bit_vectors.hpp:10,
                 from /galig/src/SplicingGraph.hpp:12,
                 from /galig/src/SplicingGraph.cpp:1:
/galig/sdsl-lite/compiled/include/sdsl/rrr_helper.hpp: In constructor 'sdsl::binomial_coefficients<n>::impl::impl() [with short unsigned int n = 63]':
/galig/sdsl-lite/compiled/include/sdsl/rrr_helper.hpp:251:9: internal compiler error: Segmentation fault
         impl() {
         ^~~~
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.
/galig/Makefile:150: recipe for target 'SplicingGraph.o' failed
make[1]: *** [SplicingGraph.o] Error 1
target.mk:16: recipe for target '/galig/obj' failed
make: *** [/galig/obj] Error 2
The command '/bin/sh -c git clone --recursive https://github.com/AlgoLab/galig.git ;     cd galig ;     make prerequisites ;     make' returned a non-zero code: 2

I use the patch you gave me in my Dockerfile, in the RUN command, before I make salmon and asgal:

wget https://github.com/AlgoLab/galig/files/4437983/CMakeLists.txt.patch.txt ;\
git apply CMakeLists.txt.patch.txt ;\

Of course, you are welcome to a complete Dockerfile as soon as we can compose one together, if you can help.

Installation Error: ModuleNotFoundError: No module named 'Bio'

I installed python3
biopython
pysam
gffutils
pandas
cmake
samtools
zlib in a newly created conda environment.

conda create -n asgal -y
conda activate asgal
conda install python=3.6 -y
conda install biopython
pip install pysam
pip install pandas
conda install samtools -y
conda install cmake -y
conda install gffutils -y
conda install zlib

And then execute the following commands:

git clone --recursive https://github.com/AlgoLab/galig.git
cd galig
make prerequisites
make

However when I try to run asgal (./asgal -h), there is an error:

Traceback (most recent call last):
  File "./asgal", line 12, in <module>
    from Bio import SeqIO
ModuleNotFoundError: No module named 'Bio'

How can I fix this problem?

ALK ATI event

Hello everyone,
I have been using ASGAL for some time now and I'm very content with the obtained results, congrats on the implementation.
Lately I have been working with samples that present the ALK ATI isoform. ASGAL hasn't been successful at calling this event. After running the program with a few samples (all having this alteration) I have started to think that ASGAL may not be designed to identify this type of event (based on my interpretation of the documentation), but before I jump into that conclusion I would like to know your thought on this. This image provides a nice description of the ALK ATI event.
Let me know if I should provide additional information.
Thanks in advance!

Asgal (from run in your provided docker): doesn't find Transcript file, looks for gtf instead.

Hi Luca,

It's me again, I ran ASGAL on the docker you gave me and I seem to be coming up with an issue that took me a while to fix, because of a misguiding error message:

Starting ASGAL run for /MOUNT/input/fastq/subsample_15_100K_1.fastq and /MOUNT/input/fastq/subsample_15_100K_2.fastq ...
[ Oct 28, 2020 -  2:28:46PM ] args Namespace(allevents=False, annoPath='/MOUNT/input/splicing_variants.gtf', debug=False, e='3', l='15', multiMode=True, outputPath='/MOUNT/output/asgal-output/subsample_15_100K_1-output', refPath='/MOUNT/input/Homo_sapiens.GRCh38.dna.primary_assembly.fa', sample1Path='/MOUNT/input/fastq/subsample_15_100K_1.fastq', sample2Path='/MOUNT/input/fastq/subsample_15_100K_2.fastq', split_only=False, threads='6', transPath='/MOUNT/input/custom_transcripts.fasta', verbose=False, w='3')

Transcripts file /MOUNT/input/splicing_variants.gtf not found. Halting...

I have no name!@061a56a9b51a:/$ ls /MOUNT/input/splicing_variants.gtf
/MOUNT/input/splicing_variants.gtf

I had the transcript file missing... but it told me that the gtf was missing.
I just solved this after I posted the issue here.. :P
I'll close this issue.. but please note the misguiding error message.

Salmon installation not found after pip installation in Asgal

I installed Asgal in the virtual environment on shared resources HPC using python 3.8 and installed all the packages required using pip install. On Asgal gives error of not finding Salmon

[ Mar 08, 2021 - 10:44:56AM ] args Namespace(allevents=False, annoPath='/GENOMEFILES/ensemble_genomefasta/Homo_sapiens.GRCh38.100.gtf', debug=False, e='3', l='15', multiMode=True, outputPath='/SOFTWARES/asgalvm/output/R01', refPath='/GENOMEFILES/ensemble_genomefasta/Homo_sapiens.GRCh38.dna.primary_assembly.fa', sample1Path='/U2OS/u2os_rawdata/63-Z01-F001/raw_data/R01/R01_1_val_1.fq.gz', sample2Path='/U2OS/u2os_rawdata/63-Z01-F001/raw_data/R01/R01_2_val_2.fq.gz', split_only=False, threads='2', transPath='/GENOMEFILES/ensemble_genomefasta/Homo_sapiens.GRCh38.cds.all.fa.gz', verbose=False, w='3')
[ Mar 08, 2021 - 10:44:56AM ] Opening input annotation...
[ Mar 08, 2021 - 10:44:56AM ] Splitting input annotation...
[ Mar 08, 2021 - 10:45:05AM ] number of genes 60683
[##################################################] 60683/60683
[ Mar 08, 2021 - 10:50:03AM ] Done.
[ Mar 08, 2021 - 10:50:03AM ] Splitting input reference...
[ Mar 08, 2021 - 10:50:54AM ] Done.
[ Mar 08, 2021 - 10:50:54AM ] Running Salmon indexing...
Traceback (most recent call last):
  File "/SOFTWARES/asgalvm/galig/asgal", line 585, in <module>
    main()
  File "/SOFTWARES/asgalvm/galig/asgal", line 576, in main
    runSalmon(args)
  File "SOFTWARES/asgalvm/galig/asgal", line 183, in runSalmon
    command_check_return(salmon_index_cmd, salmonIndexLog, salmonIndexLog, verbose=args.verbose)
  File "/SOFTWARES/asgalvm/galig/asgal", line 57, in command_check_return
    completed_process = subprocess.run(command,
  File "/cluster/software/Python/3.8.6-GCCcore-10.2.0/lib/python3.8/subprocess.py", line 489, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/cluster/software/Python/3.8.6-GCCcore-10.2.0/lib/python3.8/subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/cluster/software/Python/3.8.6-GCCcore-10.2.0/lib/python3.8/subprocess.py", line 1702, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: /SOFTWARES/asgalvm/galig/salmon/bin/salmon

I will highly appreciate any help to rectify this error.

Thanks
best
Sa

.mem and sam files generated as empty

Hi,
I am using ASGAL tool to find MET14 deletion and EGFR variation events in the samples. I am running genome wide analysis for these two genes. ASGAL run is successful but I am getting .mem and sam file as empty and hence no events reported in the final files.

command I used is as below:

./asgal --multi -g genome.fa -a annotation2.gtf -s sample1.fastq.gz -s2 sample2.fastq.gz -t transcript.fa --allevents -o output

could you please help me in this case as soon as possible?

Another permissions question

Hi, I'm attempting to run ASGAL via the Docker image and encounter an issue in the salmon quant step.

The error is

Traceback (most recent call last):
  File "/galig/asgal", line 585, in <module>
    main()
  File "/galig/asgal", line 576, in main
    runSalmon(args)
  File "/galig/asgal", line 210, in runSalmon
    command_check_return(salmon_quant_cmd, salmonBam, salmonQuantLog, shell=True, verbose=args.verbose)
  File "/galig/asgal", line 62, in command_check_return
    completed_process.check_returncode()
  File "/usr/lib/python3.8/subprocess.py", line 444, in check_returncode
    raise CalledProcessError(self.returncode, self.args, self.stdout,
subprocess.CalledProcessError: Command '/galig/salmon/bin/salmon quant -p 2 -i /data/output/salmon/salmon_index -l A -1 /data/sample_1.fq -2 /data/sample_2.fq -o /data/output/salmon/salmon_out --no-version-check --validateMappings --writeMappings --writeUnmappedNames | samtools view -Sb - | samtools sort -' returned non-zero exit status 1.

And when I check the log/salmon_quant.log file it appears to be because of a permission denied error.

[E::hts_open_format] Failed to open file "./samtools.71.441.tmp.0000.bam" : Permission denied
samtools sort: failed to create temporary file "./samtools.71.441.tmp.0000.bam": Permission denied

Do you have any idea how to fix this?

Thanks!

Rachel

Docker permissions issue

I am running into issues with group permission with docker.

docker run -v "$PWD"/input:/data algolab/asgal:v1.1.1
Starting with UID:GID 0:0
groupadd: GID '0' already exists
useradd: group 'group' does not exist
error: failed switching to "user:group": unable to find user user: no matching entries in passwd file

Error: Reference genome not found Halting...

Hi,

I will really appreciate it if you could help me with this. I keep getting this error saying reference genome not found.

Here is the command I used:
dixi06@nia-login05:/scratch/a/amaclea3/dixi06/JeanData/Jdata$ singularity exec ./asgal_v1.1.6.sif /galig/asgal -g Medicago_truncatula.MedtrA17_4.0.dna.chromosome.1.fa -a annotation.gtf -s NRNSA_S103_R1_001.fastq.gz -o "LATD Medtr1g009200"

Thanks.

-Dixi

FAILED dependencies of target libtbb; tbb-2018_U3.tgz

Greetings, maintainers,

I need your tool on a docker container. However, I haven't been able to install the tool. Could you help me out?

Commands run:

$ docker run -it --rm ubuntu:latest
root@<container-id>:/docker_main# cat /etc/issue
Ubuntu 18.04.4 LTS \n \l

root@<container-id>/docker_main# apt-get update && apt-get install build-essential git python3 python3-pip python3-setuptools python3-biopython python3-biopython-sql python3-pysam cmake libboost1.65-all-dev samtools unzip wget curl zlib1g-dev liblzma-dev libjemalloc-dev libjemalloc1 libghc-bzlib-dev libgff-dev libtbb-dev

root@<container-id>/docker_main# pip3 install gffutils; git clone --recursive https://github.com/AlgoLab/galig.git ; cd galig; make prerequisites

ERROR message

.
..
...
[ 23%] Completed 'libstadenio'
[ 23%] Built target libstadenio
Scanning dependencies of target libtbb
[ 24%] Creating directories for 'libtbb'
[ 25%] Performing download step for 'libtbb'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 125 100 125 0 0 679 0 --:--:-- --:--:-- --:--:-- 679
100 126 100 126 0 0 345 0 --:--:-- --:--:-- --:--:-- 345
100 2843k 0 2843k 0 0 2008k 0 --:--:-- 0:00:01 --:--:-- 5407k
tbb-2018_U3.tgz: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match
tbb-2018_U3.tgz did not match expected SHA256! Exiting.
CMakeFiles/libtbb.dir/build.make:89: recipe for target 'libtbb-prefix/src/libtbb-stamp/libtbb-download' failed
make[4]: *** [libtbb-prefix/src/libtbb-stamp/libtbb-download] Error 1
CMakeFiles/Makefile2:178: recipe for target 'CMakeFiles/libtbb.dir/all' failed
make[3]: *** [CMakeFiles/libtbb.dir/all] Error 2
Makefile:162: recipe for target 'all' failed
make[2]: *** [all] Error 2
[ 8%] Built target libcereal
[ 15%] Built target libdivsufsort
[ 23%] Built target libstadenio
[ 24%] Performing download step for 'libtbb'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 125 100 125 0 0 1893 0 --:--:-- --:--:-- --:--:-- 1893
100 126 100 126 0 0 1482 0 --:--:-- --:--:-- --:--:-- 1482
100 2843k 0 2843k 0 0 2660k 0 --:--:-- 0:00:01 --:--:-- 3650k
tbb-2018_U3.tgz: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match
tbb-2018_U3.tgz did not match expected SHA256! Exiting.
CMakeFiles/libtbb.dir/build.make:89: recipe for target 'libtbb-prefix/src/libtbb-stamp/libtbb-download' failed
make[4]: *** [libtbb-prefix/src/libtbb-stamp/libtbb-download] Error 1
CMakeFiles/Makefile2:178: recipe for target 'CMakeFiles/libtbb.dir/all' failed
make[3]: *** [CMakeFiles/libtbb.dir/all] Error 2
Makefile:162: recipe for target 'all' failed
make[2]: *** [all] Error 2
/galig/Makefile:126: recipe for target '/galig/salmon/bin/salmon' failed
make[1]: *** [/galig/salmon/bin/salmon] Error 2
target.mk:16: recipe for target '/galig/obj' failed
make: *** [/galig/obj] Error 2


Any help would be much appreciated,
Thanking you,
Amit

Novel intron retention events in genome-wide mode

Hello, I had a more general question about running ASGAL in genome-wide mode.

I have a dataset where I know there are 3 novel retained intron events. I was wondering if, after the pre-filtering step performed with by quasi-mapping with Salmon, the reads mapping to these novel retained introns would still be included in down-stream alternative splicing analysis.

I'm asking because I have run ASGAL in genome-wide mode, but fail to detect any events. When I look at the output SAM file for a gene that should have a retained intron, there is no coverage (whereas when I look with a more typical spliced alignment to the reference genome tool, e.g. STAR, there is coverage). I have attached a screenshot of what I mean.
Screen Shot 2021-06-17 at 3 19 55 PM

Could you let me know if I am mis-understanding something, or should be running the tool differently?

Thanks!

Rachel

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.