Code Monkey home page Code Monkey logo

pypanda's Introduction

Fork description

I corrected the code after some methods have been deprecated. I added the import for AnalyzePanda and AnalyzeLioness in this README.

PyPanda (Python Panda)

Python implementation of PANDA (Passing Attributes between Networks for Data Assimilation)

Glass K, Huttenhower C, Quackenbush J, Yuan GC. Passing Messages Between Biological Networks to Refine Predicted Interactions, PLoS One, 2013 May 31;8(5):e64832

Table of Contents

Panda algorithm

To find agreement between the three input networks first the responsibility (R) is calculated.

Thereafter availability (A) is calculated.

Availability and responsibility are combined with the following formula.

Protein cooperativity and gene co-regulatory networks are updated.

P and C are updated to satisfy convergence.

Hamming distance is calculated every iteration.

Installation

PyPanda requires Python 2.7. We recommand the following commands to install PyPanda (on Ubuntu and Debian derived systems, also works on OSX):

With root access

git clone https://github.com/davidvi/pypanda.git
cd pypanda
sudo python setup.py install

Without root access

git clone https://github.com/davidvi/pypanda.git
cd pypanda
python setup.py install --user
#to run from the command line you will need to make pypanda executable and add the bin directory to your PATH:
cd bin
chmod +x pypanda
echo "$(pwd):PATH" >> ~/.bashrc
source ~/.bashrc

To run PyPanda from Windows (tested on Windows 10) install Git (https://git-scm.com/downloads) and Anaconda Python2.7 (https://www.continuum.io/downloads) and from the Anaconda Prompt run:

git clone https://github.com/davidvi/pypanda.git
cd pypanda
python setup.py install

Usage

Run from the terminal

PyPanda can be run directly from the terminal with the following options:

-h help
-e (required) expression values
-m (optional) pair file of motif edges, when not provided analysis continues with Pearson correlation matrix
-p (optional) pair file of PPI edges
-f (optional) remove missing values (default is False)
-o (required) output file
-q (optional) output lioness single sample network

To run PyPanda on the example data:

$ pypanda -e ToyData/ToyExpressionData.txt -m ToyData/ToyMotifData.txt -p ToyData/ToyPPIData.txt -f True -o test_panda.txt -q test_lioness.txt

To reconstruct a single sample Lioness Pearson correlation network:

$ pypanda -e ToyData/ToyExpressionData.txt -o test_panda_pearson.txt -q test_lioness_pearson.txt

Run from iPython notebook

Import PyPanda library:

from pypanda import Panda
from pypanda import Lioness
import pandas as pd
from pypanda.analyze_panda import AnalyzePanda
from pypanda.analyze_lioness import AnalyzeLioness

Run Panda algorithm, leave out motif and PPI data to use Pearson correlation network:

p = Panda('ToyData/ToyExpressionData.txt', 'ToyData/ToyMotifData.txt', 'ToyData/ToyPPIData.txt', remove_missing=False)

Save the results:

p.save_panda_results(file = 'Toy_Panda.pairs')

Return a network plot:

plot = AnalyzePanda(p)
plot.top_network_plot(top=100, file='top_100_genes.png')

Calculate indegrees for further analysis:

indegree = p.return_panda_indegree()

Calculate outdegrees for further analysis:

outdegree = p.return_panda_outdegree()

Run the Lioness algorithm for single sample networks:

l = Lioness(p)

Save Lioness results:

l.save_lioness_results(file = 'Toy_Lioness.txt')

Return a network plot for one of the Lioness single sample networks:

plot = AnalyzeLioness(l)
plot.top_network_plot(column= 0, top=100, file='top_100_genes.png')

Results

Example Panda output:
TF  Gene  Motif Force
---------------------
CEBPA	AACSL	0.0	-0.951416589143
CREB1	AACSL	0.0	-0.904241609324
DDIT3	AACSL	0.0	-0.956471642313
E2F1	AACSL	1.0	3.6853160511
EGR1	AACSL	0.0	-0.695698519643

Example lioness output:
Sample1 Sample2 Sample3 Sample4
-------------------------------
-0.667452814003	-1.70433776179	-0.158129613892	-0.655795512803
-0.843366539284	-0.733709815256	-0.84849895139	-0.915217389738
3.23445386464	2.68888472802	3.35809757371	3.05297381396
2.39500370135	1.84608635425	2.80179804094	2.67540878165
-0.117475863987	0.494923925853	0.0518448588965	-0.0584810456421

TF, Gene and Motif order is identical to the panda output file.

pypanda's People

Contributors

davidvi avatar djaovx avatar aless80 avatar

Stargazers

 avatar zhiyang avatar  avatar  avatar  avatar Cameron Smith avatar Woo Jung avatar Rashid avatar Emese Sukei avatar xrkk avatar  avatar  avatar WPZ avatar Andrea Mastropietro avatar Mihika Sharma avatar Jeffrey Hsu avatar Saumya avatar Diego Pinheiro avatar Jeremy Chambers avatar ringsaturn avatar Shuo Wang avatar Marieke Kuijjer avatar Ping-Han Hsieh avatar Freeman Wang avatar Fadhl avatar Paul L. Maurizio avatar Mahboobe avatar Giulio Rossetti avatar Martin Holub avatar Sunnie Grace McCalla avatar Chao Gao avatar  avatar Dongqing avatar Ariel Vina-Rodriguez avatar Luca Pinello avatar @mkarots avatar Marco Galardini avatar

Watchers

James Cloos avatar Pony avatar Mariano_Javier_de_Leon_Dominguez_Romero avatar Marieke Kuijjer avatar  avatar

pypanda's Issues

Error

I am getting this error while running this line
p = Panda('ToyExpressionData.txt', 'ToyMotifData.txt', 'ToyPPIData.txt', remove_missing=False)

In anaconda Jupyter Notebook, Anaconda , Python 3

TypeError Traceback (most recent call last)
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2441 try:
-> 2442 return self._engine.get_loc(key)
2443 except KeyError:

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

KeyError: range(1, 51)

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last)
in ()
----> 1 p = Panda('ToyExpressionData.txt', 'ToyMotifData.txt')

C:\ProgramData\Anaconda3\lib\site-packages\pypanda-0.1-py3.6.egg\pypanda\panda.py in init(self, expression_file, motif_file, ppi_file, remove_missing)
26 self.__remove_missing()
27 #expression data to matrix
---> 28 self.__expression_data_to_matrix()
29 #motif data to matrix
30 if self.motif_data is not None:

C:\ProgramData\Anaconda3\lib\site-packages\pypanda-0.1-py3.6.egg\pypanda\panda.py in __expression_data_to_matrix(self)
66 self.gene_names = list(self.expression_data[0])
67 self.num_genes = len(self.gene_names)
---> 68 self.expression_data = self.expression_data[range(1, len(self.expression_data.columns))]
69 self.expression_matrix = np.matrix(self.expression_data.as_matrix())
70 return None

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in getitem(self, key)
1962 return self._getitem_multilevel(key)
1963 else:
-> 1964 return self._getitem_column(key)
1965
1966 def _getitem_column(self, key):

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
1969 # get column
1970 if self.columns.is_unique:
-> 1971 return self._get_item_cache(key)
1972
1973 # duplicate columns & possible reduce dimensionality

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
1643 res = cache.get(item)
1644 if res is None:
-> 1645 values = self._data.get(item)
1646 res = self._box_item_values(item, values)
1647 cache[item] = res

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
3588
3589 if not isnull(item):
-> 3590 loc = self.items.get_loc(item)
3591 else:
3592 indexer = np.arange(len(self.items))[isnull(self.items)]

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2442 return self._engine.get_loc(key)
2443 except KeyError:
-> 2444 return self._engine.get_loc(self._maybe_cast_indexer(key))
2445
2446 indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

KeyError: range(1, 51)

error in jupiter

Hi everyone,

I installed and just tried with the toy data

I am using python 3.7, I just updated all the libraries, windows 10 and anaconda.

from pypanda import Panda
from pypanda import Lioness
import pandas as pd
from pypanda.analyze_panda import AnalyzePanda
from pypanda.analyze_lioness import AnalyzeLioness

p = Panda('pypanda/ToyData/ToyExpressionData.txt', 'pypanda/ToyData/ToyMotifData.txt', 'pypanda/ToyData/ToyPPIData.txt', remove_missing=False)

I then received this error:

TypeError Traceback (most recent call last)
in
----> 1 p = Panda('pypanda/ToyData/ToyExpressionData.txt', 'pypanda/ToyData/ToyMotifData.txt', 'pypanda/ToyData/ToyPPIData.txt', remove_missing=False)

~\Anaconda3\lib\site-packages\pypanda-0.1-py3.7.egg\pypanda\panda.py in init(self, expression_file, motif_file, ppi_file, remove_missing)
29 #motif data to matrix
30 if self.motif_data is not None:
---> 31 self.__motif_data_to_matrix()
32 #ppi data to matrix
33 if self.motif_data is not None:

~\Anaconda3\lib\site-packages\pypanda-0.1-py3.7.egg\pypanda\panda.py in __motif_data_to_matrix(self)
81 idx_tfs = map(functools.partial(match, b = self.unique_tfs), self.motif_data[0])
82 idx_genes = map(functools.partial(match, b = self.gene_names), self.motif_data[1])
---> 83 idx = np.ravel_multi_index((idx_tfs, idx_genes), self.motif_matrix.shape)
84 self.motif_matrix.ravel()[idx] = self.motif_data[2]
85 return None

TypeError: Iterator operand or requested dtype holds references, but the REFS_OK flag was not enabled

thank you for your help

Division issue

Hi,
I'm getting some division errors when trying to run PANDA.

/path/to/.local/lib/python2.7/site-packages/numpy/lib/function_base.py:3167: RuntimeWarning: invalid value encountered in true_divide
c /= stddev[:, None]
/path/to/.local/lib/python2.7/site-packages/numpy/lib/function_base.py:3168: RuntimeWarning: invalid value encountered in true_divide
c /= stddev[None, :]

This appears to be related to np.corrcoef(self.expression_matrix), i.e. there is something in my input counts matrix that means numpy cannot generate a proper correlation matrix.
I'm supplying a matrix of tissue aware normalised counts (using YARN).

Does PANDA expect normalised counts, TPMs, log2 counts?

Cheers

Zero expression error

It seems that when there are zero expression genes (expression values equal to zeros across all samples), pypanda will fail, e.g.,

numpy/lib/nanfunctions.py:1136: RuntimeWarning: Degrees of freedom <= 0 for slice.
  warnings.warn("Degrees of freedom <= 0 for slice.", RuntimeWarning)
step: 0, hamming: nan
running panda took: 106.649751902 seconds
Finished Panda run...
step: 0, hamming: nan
running panda took: 108.9292171 seconds
step: 0, hamming: nan
running panda took: 109.55636096 seconds
step: 0, hamming: nan
running panda took: 108.327214003 seconds
step: 0, hamming: nan
running panda took: 107.53530407 seconds
step: 0, hamming: nan
running panda took: 107.594507933 seconds
step: 0, hamming: nan
...

I think we should either pre-check or filter out those zeros genes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.