Code Monkey home page Code Monkey logo

pgx's Introduction

PGx

PGx supports proteogenomic intregration of mass spectrometry (MS) proteomics data with next generation sequencing (NGS) by mapping identified peptides onto their putative genomic coordinates.

Installation

The ideal installation setup for the python script-set which constitutes PGx is as a set of executable scripts in the path of the shell. For this you will need to move the scripts to a repository on the $PATH of your shell and add a #! line to the top of the script pointing to your preferred python installation. For example, if your python is installed in /usr/bin/python you should add #!/usr/bin/python to the top of every script, then chmod (see explanation for the chmod command here) the scripts so they are executable (at which point it is customary to remove the .py suffix from the filename).

Proteome Directories

PGx expects custom proteomes to be located in directories, where each directory contains a file called proteome.fasta and a file called proteome.bed.

The file called proteome.fasta represent the sequence space that was searched by e.g. Mascot or Sequest during the proteomics phase of the Proteogenomics study. It must contain nothing but the entry name in the header line of each protein entry (the line that starts with: ">") and the protein sequence should span exactly one line (historically FASTA files have been limited to 60, 70 or 80 characters per line which is an unnecessary formatting requirement for modern bioinformatic pipelines.

The file called proteome.bed represents a mapping of the sequence space back to its genomic coordinates. In proteogenomic studies this file is usually a side-effect of the sequencing effort and is often exported as a BED file (the format utilized by the PGx framework).

A minimal example proteome is provided in this repository (oncoproteome). The custom proteome contains an unexpected peptide discovered to be expressed in cancer cells as part of the CPTAC research program (it is unexpected due to its presence in what is normally an intronic region of the gene).

Example Usage

Once the scripts are installed, custom proteomes can be indexed and then queried for the location of peptides. The locations can then be mapped into a derived BED file which can be visualized as a custom track on all major genome browsers.

The sequence of commands is:

pgx_index oncoproteome
pgx_query oncoproteome peptides.txt > hits.txt
pgx_bed oncoproteome hits.txt > hits.bed

If the scripts have not been installed as executable commands, the sequence can be run as follows:

python pgx_index.py oncoproteome
python pgx_query.py oncoproteome peptides.txt > hits.txt
python pgx_bed.py oncoproteome hits.txt > hits.bed

If the second argument is missing, the scripts assume the input will be piped into STDIN, which means it is possible to generate one liners such as:

echo SPPDSPTDALMQLAK | pgx_query oncoproteome | pgx_bed oncoproteome

Miscellaneous Scripts

The misc directory contains additional scripts which support the automated download of RefSeq as well as the ability to query peptides with a possible nsSNP (essentially a single amino acid change in the protein sequence).

pgx's People

Watchers

James Cloos avatar Anurag Raj avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.