davidvi / pypanda Goto Github PK

Python implementation of PANDA (Passing Attributes between Networks for Data Assimilation)

Python 100.00%

pypanda's Introduction

Fork description

I corrected the code after some methods have been deprecated. I added the import for AnalyzePanda and AnalyzeLioness in this README.

PyPanda (Python Panda)

Python implementation of PANDA (Passing Attributes between Networks for Data Assimilation)

Glass K, Huttenhower C, Quackenbush J, Yuan GC. Passing Messages Between Biological Networks to Refine Predicted Interactions, PLoS One, 2013 May 31;8(5):e64832

Panda implementation
Installation
Usage
- iPython
- Terminal
[Results] (#results)

Panda algorithm

To find agreement between the three input networks first the responsibility (R) is calculated.

Thereafter availability (A) is calculated.

Availability and responsibility are combined with the following formula.

Protein cooperativity and gene co-regulatory networks are updated.

P and C are updated to satisfy convergence.

Hamming distance is calculated every iteration.

Installation

PyPanda requires Python 2.7. We recommand the following commands to install PyPanda (on Ubuntu and Debian derived systems, also works on OSX):

With root access

git clone https://github.com/davidvi/pypanda.git
cd pypanda
sudo python setup.py install

Without root access

git clone https://github.com/davidvi/pypanda.git
cd pypanda
python setup.py install --user
#to run from the command line you will need to make pypanda executable and add the bin directory to your PATH:
cd bin
chmod +x pypanda
echo "$(pwd):PATH" >> ~/.bashrc
source ~/.bashrc

To run PyPanda from Windows (tested on Windows 10) install Git (https://git-scm.com/downloads) and Anaconda Python2.7 (https://www.continuum.io/downloads) and from the Anaconda Prompt run:

git clone https://github.com/davidvi/pypanda.git
cd pypanda
python setup.py install

Usage

Run from the terminal

PyPanda can be run directly from the terminal with the following options:

-h help
-e (required) expression values
-m (optional) pair file of motif edges, when not provided analysis continues with Pearson correlation matrix
-p (optional) pair file of PPI edges
-f (optional) remove missing values (default is False)
-o (required) output file
-q (optional) output lioness single sample network

To run PyPanda on the example data:

$ pypanda -e ToyData/ToyExpressionData.txt -m ToyData/ToyMotifData.txt -p ToyData/ToyPPIData.txt -f True -o test_panda.txt -q test_lioness.txt

To reconstruct a single sample Lioness Pearson correlation network:

$ pypanda -e ToyData/ToyExpressionData.txt -o test_panda_pearson.txt -q test_lioness_pearson.txt

Run from iPython notebook

Import PyPanda library:

from pypanda import Panda
from pypanda import Lioness
import pandas as pd
from pypanda.analyze_panda import AnalyzePanda
from pypanda.analyze_lioness import AnalyzeLioness

Run Panda algorithm, leave out motif and PPI data to use Pearson correlation network:

p = Panda('ToyData/ToyExpressionData.txt', 'ToyData/ToyMotifData.txt', 'ToyData/ToyPPIData.txt', remove_missing=False)

Save the results:

p.save_panda_results(file = 'Toy_Panda.pairs')

Return a network plot:

plot = AnalyzePanda(p)
plot.top_network_plot(top=100, file='top_100_genes.png')

Calculate indegrees for further analysis:

indegree = p.return_panda_indegree()

Calculate outdegrees for further analysis:

outdegree = p.return_panda_outdegree()

Run the Lioness algorithm for single sample networks:

l = Lioness(p)

Save Lioness results:

l.save_lioness_results(file = 'Toy_Lioness.txt')

Return a network plot for one of the Lioness single sample networks:

plot = AnalyzeLioness(l)
plot.top_network_plot(column= 0, top=100, file='top_100_genes.png')

Results

Example Panda output:
TF  Gene  Motif Force
---------------------
CEBPA	AACSL	0.0	-0.951416589143
CREB1	AACSL	0.0	-0.904241609324
DDIT3	AACSL	0.0	-0.956471642313
E2F1	AACSL	1.0	3.6853160511
EGR1	AACSL	0.0	-0.695698519643

Example lioness output:
Sample1 Sample2 Sample3 Sample4
-------------------------------
-0.667452814003	-1.70433776179	-0.158129613892	-0.655795512803
-0.843366539284	-0.733709815256	-0.84849895139	-0.915217389738
3.23445386464	2.68888472802	3.35809757371	3.05297381396
2.39500370135	1.84608635425	2.80179804094	2.67540878165
-0.117475863987	0.494923925853	0.0518448588965	-0.0584810456421

TF, Gene and Motif order is identical to the panda output file.

pypanda's People

Contributors

Stargazers

Watchers

Forkers

xflicsu chenxofhit md-club mararie aless80 zhuernyeo twangxxx chinhbn flalix sameemur kulansam

pypanda's Issues

Error

I am getting this error while running this line
p = Panda('ToyExpressionData.txt', 'ToyMotifData.txt', 'ToyPPIData.txt', remove_missing=False)

In anaconda Jupyter Notebook, Anaconda , Python 3

TypeError Traceback (most recent call last)
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2441 try:
-> 2442 return self._engine.get_loc(key)
2443 except KeyError:

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

KeyError: range(1, 51)

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last)
in ()
----> 1 p = Panda('ToyExpressionData.txt', 'ToyMotifData.txt')

C:\ProgramData\Anaconda3\lib\site-packages\pypanda-0.1-py3.6.egg\pypanda\panda.py in init(self, expression_file, motif_file, ppi_file, remove_missing)
26 self.__remove_missing()
27 #expression data to matrix
---> 28 self.__expression_data_to_matrix()
29 #motif data to matrix
30 if self.motif_data is not None:

C:\ProgramData\Anaconda3\lib\site-packages\pypanda-0.1-py3.6.egg\pypanda\panda.py in __expression_data_to_matrix(self)
66 self.gene_names = list(self.expression_data[0])
67 self.num_genes = len(self.gene_names)
---> 68 self.expression_data = self.expression_data[range(1, len(self.expression_data.columns))]
69 self.expression_matrix = np.matrix(self.expression_data.as_matrix())
70 return None

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in getitem(self, key)
1962 return self._getitem_multilevel(key)
1963 else:
-> 1964 return self._getitem_column(key)
1965
1966 def _getitem_column(self, key):

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
1969 # get column
1970 if self.columns.is_unique:
-> 1971 return self._get_item_cache(key)
1972
1973 # duplicate columns & possible reduce dimensionality

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
1643 res = cache.get(item)
1644 if res is None:
-> 1645 values = self._data.get(item)
1646 res = self._box_item_values(item, values)
1647 cache[item] = res

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
3588
3589 if not isnull(item):
-> 3590 loc = self.items.get_loc(item)
3591 else:
3592 indexer = np.arange(len(self.items))[isnull(self.items)]

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2442 return self._engine.get_loc(key)
2443 except KeyError:
-> 2444 return self._engine.get_loc(self._maybe_cast_indexer(key))
2445
2446 indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

KeyError: range(1, 51)

Repositoy with basic corrections

Although this repository QuackenbushLab/pypanda contains updated code, I forked this repo and corrected some basic things that got broken in newer versions of pandas. See https://github.com/aless80/pypanda-1

error in jupiter

Hi everyone,

I installed and just tried with the toy data

I am using python 3.7, I just updated all the libraries, windows 10 and anaconda.

from pypanda import Panda
from pypanda import Lioness
import pandas as pd
from pypanda.analyze_panda import AnalyzePanda
from pypanda.analyze_lioness import AnalyzeLioness

p = Panda('pypanda/ToyData/ToyExpressionData.txt', 'pypanda/ToyData/ToyMotifData.txt', 'pypanda/ToyData/ToyPPIData.txt', remove_missing=False)

I then received this error:

TypeError Traceback (most recent call last)
in
----> 1 p = Panda('pypanda/ToyData/ToyExpressionData.txt', 'pypanda/ToyData/ToyMotifData.txt', 'pypanda/ToyData/ToyPPIData.txt', remove_missing=False)

~\Anaconda3\lib\site-packages\pypanda-0.1-py3.7.egg\pypanda\panda.py in init(self, expression_file, motif_file, ppi_file, remove_missing)
29 #motif data to matrix
30 if self.motif_data is not None:
---> 31 self.__motif_data_to_matrix()
32 #ppi data to matrix
33 if self.motif_data is not None:

~\Anaconda3\lib\site-packages\pypanda-0.1-py3.7.egg\pypanda\panda.py in __motif_data_to_matrix(self)
81 idx_tfs = map(functools.partial(match, b = self.unique_tfs), self.motif_data[0])
82 idx_genes = map(functools.partial(match, b = self.gene_names), self.motif_data[1])
---> 83 idx = np.ravel_multi_index((idx_tfs, idx_genes), self.motif_matrix.shape)
84 self.motif_matrix.ravel()[idx] = self.motif_data[2]
85 return None

TypeError: Iterator operand or requested dtype holds references, but the REFS_OK flag was not enabled

thank you for your help

Division issue

Hi,
I'm getting some division errors when trying to run PANDA.

/path/to/.local/lib/python2.7/site-packages/numpy/lib/function_base.py:3167: RuntimeWarning: invalid value encountered in true_divide
c /= stddev[:, None]
/path/to/.local/lib/python2.7/site-packages/numpy/lib/function_base.py:3168: RuntimeWarning: invalid value encountered in true_divide
c /= stddev[None, :]

This appears to be related to np.corrcoef(self.expression_matrix), i.e. there is something in my input counts matrix that means numpy cannot generate a proper correlation matrix.
I'm supplying a matrix of tissue aware normalised counts (using YARN).

Does PANDA expect normalised counts, TPMs, log2 counts?

Cheers

Zero expression error

It seems that when there are zero expression genes (expression values equal to zeros across all samples), pypanda will fail, e.g.,

numpy/lib/nanfunctions.py:1136: RuntimeWarning: Degrees of freedom <= 0 for slice.
  warnings.warn("Degrees of freedom <= 0 for slice.", RuntimeWarning)
step: 0, hamming: nan
running panda took: 106.649751902 seconds
Finished Panda run...
step: 0, hamming: nan
running panda took: 108.9292171 seconds
step: 0, hamming: nan
running panda took: 109.55636096 seconds
step: 0, hamming: nan
running panda took: 108.327214003 seconds
step: 0, hamming: nan
running panda took: 107.53530407 seconds
step: 0, hamming: nan
running panda took: 107.594507933 seconds
step: 0, hamming: nan
...

I think we should either pre-check or filter out those zeros genes.

davidvi / pypanda Goto Github PK

pypanda's Introduction

Fork description

PyPanda (Python Panda)

Table of Contents

Panda algorithm

Installation

With root access

Without root access

Usage

Run from the terminal

Run from iPython notebook

Results

pypanda's People

Contributors

Stargazers

Watchers

Forkers

pypanda's Issues

Recommend Projects

Recommend Topics

Recommend Org