Code Monkey home page Code Monkey logo

reconstruction's Introduction

2013, July 4th RGASP_analyzer (developed for RGASP, for details see: http://www.gencodegenes.org/rgasp/)

This small program can be used to compare an automatically generated gtf file with a gtf file annotation. It will output how many features (exons, transcript and genes) are within the annotation and the prediction and how many of the predicted features are correct.

Installation: - make sure Java 1.5 or newer is installed - jargs.jar must be on the Java classpath - bash must be available - awk must be installed and in the path

Usage: java -Xms1G -Xmx2G org.rgasp.PredictionAnalyser -l, --level ... one of "cds", "exon", "both"

						if "cds" is specified, only cds records within the annotation and
						the prediction are considered
						
						if "exon" is specified only exon records within the annotation and
						the prediction are considered
						
						if "both" is specified a "cds" and an "exon" analysis
						will be performed separately
						
-m, --mode			... one of "fixed", "flexible", "both"

						if "fixed" mode is specified, predicted records are verified
						only if there is an annotation record with the same borders
						
						if "flexible" mode is specified, predicted records, that correspond to
						a first or last exon of a transcript in the annotation, may have flexible
						outer borders
						
						if "both" is specified, fixed and flexible mode analysis will be performed
						separately
						
-a, --annotation	... path to annotation file
-p, --prediction	... path to prediction file / folder
-o, -- output		... output folder

Example: java -Xms1G -Xmx2G -jar RGASP_analyzer.jar -l exon -m both -a path2annotation -p path2prediction -o outputfolder

File format: Both the annotation and the prediction files are expected to be in gtf file format. Please note that the key-value pairs gene_id and transcript_id are required! (for details see: http://www.sanger.ac.uk/resources/databases/encode/gencodeformat.html)

Please note, that this analysis only considers gtf records with the feature-type exon or CDS.
Make sure, that you adapt your gtf file accordingly.
Example: If your gtf file contains information about coding exons and UTRs, use this information
to create exon records.

Please make sure, that the file format for the annotation and the prediction files are consistent,
especially make sure that:
	- chromosome names are called consistent
	- stop codons are consistently within the coding exons or not

Output: In the folder specified with the -o, --output parameter the following tables will be created

- LEVEL_MODE_statistics.txt
  lists for a single prediction or each prediction file in the prediction folder how many exons,
  transcripts, or genes have been predicted, and how many of them are true positives.
  Example:
  	sensitivity for (coding) exons can be calculated by: TP_Exons / Num_Exons_Ref
  	specificity for (coding) exons can be calculated by: TP_Exons_Pred / Num_Exons_Pred

- LEVEL_MODE_exon.tbl
  binary table that lists for each exon of an annotated transcript whether it has been
  predicted (1) or not (0) by a prediction file.
  Exons are identified by their gene and transcript identifier and their position within the transcript

- LEVEL_MODE_exon_unique.tbl
  binary table as LEVEL_MODE_exon.tbl except that exons that are shared between different transcript
  isoforms will only appear once

- LEVEL_MODE_transcript.tbl
  binary table that lists for each transcript of an annotation file whether it has been predicted (1) or not (0)
  by a prediction file.

- LEVEL_MODE_gene.tbl
  binary table that lists for each gene of an annotation file whether it has been predicted (1) or not (0)
  by a prediction file.
  
While the statistic table gives you fast access to sensitivity and specificity values, the other tables can
be used to make more complex follow-up analyses.

Memory: The memory Java allocated by default will most probably not be sufficient to run your analysis. Therefore, you would usually run the analysis with the -Xms1G -Xmx2G arguments. Depending on your input file sizes, you might want to increase these argument.

Contact: When facing any problems please contact: Tamara Steijger European Bioinformatics Institute [email protected]

reconstruction's People

Contributors

tsteijger avatar

Stargazers

Tyrome Sweet avatar Wenbin Mei avatar Rory Kirchner avatar

Watchers

Botond Sipos avatar James Cloos avatar  avatar

reconstruction's Issues

Installation

Hi everyone,

Thanks for the super nice papers; I'm having trouble getting this going. Is there a way I can generate a JAR file from this repository?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.