Code Monkey home page Code Monkey logo

popnet's Introduction

###Getting started

PopNet is available at: www.compsysbio.org

Requirements:

Operating System:
	Ubuntu 14.04 or equivalent Linux-based OS
	Mac OS X

Dependencies:
	Python 3.0.0 or later
	Numpy package for Python3 (available from http://www.numpy.org/)
	Matplotlib package for Python3 (available from http://matplotlib.org/)
	MCL clustering software (available from http://micans.org/mcl/)

Visualization:
	Cytoscape 2.7.0 or later (available from www.cytoscape.org/)
	enhancedGraphics plugin for Cytoscape (available from http://apps.cytoscape.org/apps/enhancedgraphics)

Optional:
	MUMmer3 or later for generating Nucmer input files
	SplitsTree4 or later for neighbor-net output

Note: Program will not work on Windows!

To Run: Extract contents to local folder Edit config file cd /path/to/local_folder python PopNet/PopNet.py /path/to/config/file

###Introduction

PopNet is a set of Python scripts that generates a XGMML file to be visualized in Cytoscape. The scripts are run by executing the runner script, FullRunner.py, with the path to a configuration file as argument. All configurable options of the program are contained in the configuration file. The output XGMML file need to be manually loaded into cytoscape for visualization. Information from several steps of the program are written to other files in the output folder, which may be useful for extracting particular bits of information, optimizing parameters, or debugging

###Input Requirements

PopNet accepts two types of input: individual .snps files as generated by the Nucmer pipeline or tabular files containing SNPs for all samples. However, the two types can not be mixed in a single analysis. For snps files, one .snps file for each sample must be present in the folder specified by 'base_directory' in the config file. For the tabular format, a single .txt file matching the name specified by 'filename' must be present in the 'base_directory' folder. Examples of both file types are available in the Examples folder.

PopNet expects that all individual genomes to be aligned to a common reference. In other words, that the coordinates in each genome refers to the same location. In addition, all chromosome names should be in the format: XXXX_ChrI XXXX_ChrXIV

###Configuration File

The locations of input files, output files, and a number of run-time parameters are set using a configuration file formatted according to the Python configparser module. Multiple instances of PopNet may be run in parallel using different (or the same) configuration files.

Each option in the configuration file is explained in the templates Example_Config_Toxo.txt and Example_Config_Yeast.txt

###Output

Output files are generated at the location specified in the config file, and are always named the same way.

The key file for network visualization is located at: /path/to/results/cytoscape/cytoscapeGenome.xgmml

Other potentially helpful output include: Heatmaps.pdf Metrics to help you decide on the Inflation (I) and Pre-inflation (pI) parameters for secondary clustering

log.txt                    A log file that includes all the run parameters

Genome_nexus.nex           A neighbor-net of the population

groups.txt.mci             The raw matrix used in secondary clustering. You can quickly try out the effects of
                           different I and pI values.

persistentResult.txt       The clustering results of each individual chromosome segment during primary clustering

results.txt                A tab-delimited file containing all the SNPs used

/cytoscape/tabNetwork.tsv  A tab-delimited file containing all chromosome paintings in the network. Useful for
                           focusing on specific locations on the genome.	

The remaining files are either intermediate files or for debug purposes.

###Visualization

Start Cytoscape
Install the enhancedGraphics plugin if not installed
Select 'Import Network from File'
Find and Select 'cytoscapeGenome.xgmml' described in the Output section
Select Layout -> Profuse Force Directed Layout or another of your choice
In the Control Panel -> Style -> Properties -> Paint -> Custom Paint 1 select Image/Chart 1
Image/Chart 1 will now be in the list of properties
Under Image/chart 1, set Column = Gradient, Mapping Type = Passthrough Mapping
The chromosome paintings will now be visible
Adjust other visual properties as needed

###Additional Diagnostics

The NodeSummary.py script, included in the PopNet directory, is able to generate stacked bar graphs (similar those seen in supplemental figure 2) to aid in the determination of parameters such as section length and gap penalty. Currently, the script has not been optimized for user experience, and requires some direct editing by the user.

The first step is to run PopNet once under each of the condition being compared (i.e. with section length = 2000, 4000, 6000.. etc), and saving the resulting xgmml as IXXPIXXSXXXX.xgmml, where the first two 'XX' are the I and PI values used multiplied by 10, and the 'XXXX' following S is the value of the variable parameter. (i.e. if section length is being varied, and the run is done with I = 4, PI = 1.5, Section length = 8000, the file should be saved as I40PI15S8000.xgmml). All the numerical values need to be integers. If the file is not in this format it would not be recognized by NodeSummary.py.

Place the generated xgmml files into a folder. This is the directory to be specified in the NodeSummary.py. Open NodeSummary.py, and go to the main function at the bottom. The parameters to be specified include the directory containing the input files, title of the graph, axis titles, the output file's name, and the bins. The bins control the size of the features represented by each stack on the stacked bar graph. If all the features are fall into the same stack (e.g. because they are too large or small), the bins can be adjusted to offer better resolution. Please note that the script only looks at recombinant features (i.e. regions where a sample has inherited genes from an ancestry other than its own).

Run the script to generate the graph. The output file will be placed in the directory of the input files.

###Examples

Two example datasets are provided, one for yeast in .SNPs format and one for toxoplasma in tabular format. Note that either format can be used for any species, the examples simply give two different species in two formats.

The two example configuration scripts can be used to analyze the example datasets.

To run the yeast example:

First, edit the Example_Config_Yeast file to change the base_directory and output_directory to the absolute path of /examples/Nucmer and /examples/Nucmer/output on your computer. The prefilled default serves to illustrate the format, and will not work.

Then: cd PopNet python scripts/FullRunner.py Example_Config_Yeast.txt Open Cytoscape Select 'Import Network from File' Find and Select /examples/Nucmer/output/cytoscape/cytoscapeGenome.xgmml Follow subsequent steps described in Visualization to visualize

A similar procedure can be used to run the Toxoplasma dataset using Example_Config_Toxo.txt and /examples/Tabular/Toxo20.txt

Pregenerated results are already placed in the coresponding /output folders for reference.

popnet's People

Contributors

xescape avatar

Watchers

James Cloos avatar Wtong avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.