ranganathanlab / pysca Goto Github PK

Python3 implementation of Statistical Coupling Analysis (SCA)

Home Page: https://ranganathanlab.gitlab.io/pySCA

License: Other

Jupyter Notebook 96.92% Python 2.91% Shell 0.18%

pysca's Introduction

pySCA

09.2020

Copyright (C) 2019 Olivier Rivoire, Rama Ranganathan, and Kimberly Reynolds

This program is free software distributed under the BSD 3-clause license, please see the file LICENSE for details.

The current version of the Statistical Coupling Analysis (SCA) analysis is implemented in Python. This directory contains the necessary code for running the SCA calculations, as well examples/tutorials for the dihydrofolate reductase (DHFR) enzyme family, the S1A serine proteases, the small G-protein family and the Beta-lactamase enzyme family. The tutorials are distributed as Jupyter notebooks; for details please see: https://jupyter.org/.

For installation instructions, and an introduction to using the toolbox, please refer to the website:

https://ranganathanlab.gitlab.io/pySCA

or look through the RST files included with the pySCA distribution.

Contents of `/`


bin/	Executables for running SCA analysis functions
data/	Input data (including those needed for the tutorials)
docs/	HTML documentation (generated by Sphinx)
figs/	Figures used for the notebooks and documentation
notebooks/	Example SCA notebooks
output/	Output files (empty at install, use `runAllNBCalcs.sh`)
pysca/	Python code for SCA
scripts/	Utility scripts used to generate example data

Contents of `bin/`


annotateMSA	Annotates alignments with phylogenetic/taxonomic information
scaProcessMSA	Conducts some initial processing of the sequence alignment
scaCore	Runs the core SCA calculations
scaSectorID	Defines sectors given the results of the calculations in scaCore

Contents of `pysca/`


scaTools.py	The SCA toolbox - functions for the SCA calculations
settings.py	Global configuration settings for the analysis

Contents of `notebooks/`


SCA_DHFR.ipynb	Example for DHFR
SCA_G.ipynb	Example for the small G proteins
SCA_betalactamase.ipynb	Example for the beta-lactamases
SCA_S1A.ipynb	Example for the S1A serine protease

pysca's People

Contributors

Stargazers

Watchers

Forkers

ericrouviere faffermcgee praljakreps amustafa-e

pysca's Issues

annotateMSA

Hi,

We download PFAM alignment and directly run in annotateMSA, but I believe there might be a problem when new annotated file was created.
All goes swimmingly, yet when we make analysis at the step pf phylogenic tree or taxonomic group -- it is giving error -- seems like the " '|' " that is in the annotated database are not being read correctly.

An example error is like

IndexError Traceback (most recent call last)
in
8 s1 = h.split('__')
9 s2 = s1[0].split('|')
---> 10 hs = s1[1].split('|')
11 tax = []
12 annot[s2[1]] = sca.Annot(s1[0], hs[2], ','.join(hs[3:-2]))

IndexError: list index out of range

We tried this for a couple of different PFAM alignments and gave us same problem? Any suggestions?

Cheers,

Mehmet

scaCore.py error "ValueError: pvals < 0, pvals > 1 or pvals contains NaNs"

Hi,

I am running scaCore.py with command ./scaCore.py XXXX.db and it reports an error as in the following:
Computing the sequence projections.
Computing the SCA conservation and correlation values.
Computing matrix randomizations...
Traceback (most recent call last):
File "./scaCore.py", line 91, in <module> Vrand, Lrand, Crand = sca.randomize(msa_num, options.Ntrials, seqw, options.lbda)
File "/path/scaTools.py", line 1231, in randomize msa_rand = randAlg(fr01, Mseq)
File "/path/scaTools.py", line 1190, in randAlg Maa = np.random.multinomial(Mseq, frq[i,:])
File "mtrand.pyx", line 4249, in numpy.random.mtrand.RandomState.multinomial
File "_common.pyx", line 376, in numpy.random._common.check_array_constraint
File "_common.pyx", line 362, in numpy.random._common._check_array_cons_bounded_0_1 ValueError: pvals < 0, pvals > 1 or pvals contains NaNs
The XXXX.db used here is a scaProcessMSA.py treated MSA XXXX.fa with all default setting. I have 4 MSA and 1 of them could be processed successfully with scaCore.py but three of them reports the same error like I showed above. Could you please help me with this? Thank you for any inputs in advance.

Is there any Supplemental section in the gitlab page?

In https://ranganathanlab.gitlab.io/pySCA/get_started/
it is indicated in section 3 that some eq. could be found in the Supplemental section. I searched for "Supplemental" but did not get any...

NCBI Annotations Returns Errors for some Accession Numbers

We are unable to annotate our custom psi-blast alignment. When running the annotateMSA utility, it will return an error for about 20% of accession numbers even though they are valid and exist. Some examples include MXQ93025.1, GCC37908.1, NXC41758.1, XP_023187272.1, and GCB83467.1. We tried breaking the file into smaller pieces in case it was a handler overload. We added our NCBI API key, in case it was exceeding the request limit. We tried adding the Entrez parameter idtype="acc" to line 359 of scaTools.py in case determining the type of identifier was the problem. handle = Entrez.esummary(db="protein", id=",".join(id_block), idtype="acc"). We re-installed all the latest versions of pySCA and Biopython. We tried multiple emails in case it was a user issue. We tried running pySCA on both the most recent versions of Mac OS and Windows. We got the same error each time. Each time the exact same sequences will error. I did a thorough search of possible Entrez errors and it doesn't seem that Entrez has problems handling certain accession, which makes me think it is a pySCA issue. Thank you for the help!

To initiate the annotation we used the command:
annotateMSA -i decipher.an -o align_annotate.an -a 'ncbi' -l Acc_Num

The output error:

❯ annotateMSA -i decipher.an -o align_annotate.an -a 'ncbi' -l Acc_Num
Beginning annotation
Traceback (most recent call last):
  File "/opt/anaconda3/bin/annotateMSA", line 186, in <module>
    sca.AnnotNCBI(options.Input_MSA, options.output, options.idList)
  File "/opt/anaconda3/lib/python3.8/site-packages/pysca/scaTools.py", line 361, in AnnotNCBI
    taxonList = Entrez.read(handle)
  File "/opt/anaconda3/lib/python3.8/site-packages/Bio/Entrez/__init__.py", line 508, in read
    record = handler.read(handle)
  File "/opt/anaconda3/lib/python3.8/site-packages/Bio/Entrez/Parser.py", line 304, in read
    self.parser.ParseFile(handle)
  File "/opt/concourse/worker/volumes/live/71f8613d-c53a-40aa-4c7b-351131b1952c/volume/python_1599203882312/work/Modules/pyexpat.c", line 461, in EndElement
  File "/opt/anaconda3/lib/python3.8/site-packages/Bio/Entrez/Parser.py", line 666, in endErrorElementHandler
    raise RuntimeError(value)
RuntimeError: Invalid uid MXQ93025.1 at position=29

I included the accession list file as Acc_Num.txt and the alignment file as decipher.txt.
decipher.txt
Acc_Num.txt

with open(pfam_seq) OSError: [Errno 22] Invalid argument: '|'

When attempting to run the annotation step:

python ..\bin\annotateMSA -i PF00034_full_length_sequences.fasta -o PF00034_full_length_sequences.an

I get the following output/error:

Beginning annotation Traceback (most recent call last): File "C:\Users\dlamm\pySCA\bin\annotateMSA", line 178, in <module> sca.AnnotPfam( File "C:\Users\dlamm\AppData\Local\Programs\Python\Python39\lib\site-packages\pysca\scaTools.py", line 180, in AnnotPfam with open(pfam_seq) as fp: OSError: [Errno 22] Invalid argument: '|'

In settings I have :

path2pfamseq = r"C:\Users\dlamm\pfamseq.txt"

Running on:
Windows 10
Python 3.9
pySCA 6.1

ranganathanlab / pysca Goto Github PK

pysca's Introduction

pySCA

Contents of `/`

Contents of `bin/`

Contents of `pysca/`

Contents of `notebooks/`

pysca's People

Contributors

Stargazers

Watchers

Forkers

pysca's Issues

annotateMSA

scaCore.py error "ValueError: pvals < 0, pvals > 1 or pvals contains NaNs"

Is there any Supplemental section in the gitlab page?

NCBI Annotations Returns Errors for some Accession Numbers

with open(pfam_seq) OSError: [Errno 22] Invalid argument: '|'

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

ranganathanlab / pysca Goto Github PK

pysca's Introduction

pySCA

Contents of /

Contents of bin/

Contents of pysca/

Contents of notebooks/

pysca's People

Contributors

Stargazers

Watchers

Forkers

pysca's Issues

Recommend Projects

Recommend Topics

Recommend Org

Contents of `/`

Contents of `bin/`

Contents of `pysca/`

Contents of `notebooks/`