Code Monkey home page Code Monkey logo

pysca's Introduction

pySCA

Website Build Status

09.2020

Copyright (C) 2019 Olivier Rivoire, Rama Ranganathan, and Kimberly Reynolds

This program is free software distributed under the BSD 3-clause license, please see the file LICENSE for details.

The current version of the Statistical Coupling Analysis (SCA) analysis is implemented in Python. This directory contains the necessary code for running the SCA calculations, as well examples/tutorials for the dihydrofolate reductase (DHFR) enzyme family, the S1A serine proteases, the small G-protein family and the Beta-lactamase enzyme family. The tutorials are distributed as Jupyter notebooks; for details please see: https://jupyter.org/.

For installation instructions, and an introduction to using the toolbox, please refer to the website:

https://ranganathanlab.gitlab.io/pySCA

or look through the RST files included with the pySCA distribution.

Contents of /

bin/ Executables for running SCA analysis functions
data/ Input data (including those needed for the tutorials)
docs/ HTML documentation (generated by Sphinx)
figs/ Figures used for the notebooks and documentation
notebooks/ Example SCA notebooks
output/ Output files (empty at install, use runAllNBCalcs.sh)
pysca/ Python code for SCA
scripts/ Utility scripts used to generate example data

Contents of bin/

annotateMSA Annotates alignments with phylogenetic/taxonomic information
scaProcessMSA Conducts some initial processing of the sequence alignment
scaCore Runs the core SCA calculations
scaSectorID Defines sectors given the results of the calculations in scaCore

Contents of pysca/

scaTools.py The SCA toolbox - functions for the SCA calculations
settings.py Global configuration settings for the analysis

Contents of notebooks/

SCA_DHFR.ipynb Example for DHFR
SCA_G.ipynb Example for the small G proteins
SCA_betalactamase.ipynb Example for the beta-lactamases
SCA_S1A.ipynb Example for the S1A serine protease

pysca's People

Contributors

jamesmkrieger avatar olgais93 avatar reynoldsk avatar sudorook avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

pysca's Issues

annotateMSA

Hi,

We download PFAM alignment and directly run in annotateMSA, but I believe there might be a problem when new annotated file was created.
All goes swimmingly, yet when we make analysis at the step pf phylogenic tree or taxonomic group -- it is giving error -- seems like the " '|' " that is in the annotated database are not being read correctly.

An example error is like


IndexError Traceback (most recent call last)
in
8 s1 = h.split('__')
9 s2 = s1[0].split('|')
---> 10 hs = s1[1].split('|')
11 tax = []
12 annot[s2[1]] = sca.Annot(s1[0], hs[2], ','.join(hs[3:-2]))

IndexError: list index out of range


We tried this for a couple of different PFAM alignments and gave us same problem? Any suggestions?

Cheers,

Mehmet

scaCore.py error "ValueError: pvals < 0, pvals > 1 or pvals contains NaNs"

Hi,

I am running scaCore.py with command ./scaCore.py XXXX.db and it reports an error as in the following:
Computing the sequence projections.
Computing the SCA conservation and correlation values.
Computing matrix randomizations...
Traceback (most recent call last):
File "./scaCore.py", line 91, in <module> Vrand, Lrand, Crand = sca.randomize(msa_num, options.Ntrials, seqw, options.lbda)
File "/path/scaTools.py", line 1231, in randomize msa_rand = randAlg(fr01, Mseq)
File "/path/scaTools.py", line 1190, in randAlg Maa = np.random.multinomial(Mseq, frq[i,:])
File "mtrand.pyx", line 4249, in numpy.random.mtrand.RandomState.multinomial
File "_common.pyx", line 376, in numpy.random._common.check_array_constraint
File "_common.pyx", line 362, in numpy.random._common._check_array_cons_bounded_0_1 ValueError: pvals < 0, pvals > 1 or pvals contains NaNs
The XXXX.db used here is a scaProcessMSA.py treated MSA XXXX.fa with all default setting. I have 4 MSA and 1 of them could be processed successfully with scaCore.py but three of them reports the same error like I showed above. Could you please help me with this? Thank you for any inputs in advance.

NCBI Annotations Returns Errors for some Accession Numbers

We are unable to annotate our custom psi-blast alignment. When running the annotateMSA utility, it will return an error for about 20% of accession numbers even though they are valid and exist. Some examples include MXQ93025.1, GCC37908.1, NXC41758.1, XP_023187272.1, and GCB83467.1. We tried breaking the file into smaller pieces in case it was a handler overload. We added our NCBI API key, in case it was exceeding the request limit. We tried adding the Entrez parameter idtype="acc" to line 359 of scaTools.py in case determining the type of identifier was the problem. handle = Entrez.esummary(db="protein", id=",".join(id_block), idtype="acc"). We re-installed all the latest versions of pySCA and Biopython. We tried multiple emails in case it was a user issue. We tried running pySCA on both the most recent versions of Mac OS and Windows. We got the same error each time. Each time the exact same sequences will error. I did a thorough search of possible Entrez errors and it doesn't seem that Entrez has problems handling certain accession, which makes me think it is a pySCA issue. Thank you for the help!

To initiate the annotation we used the command:
annotateMSA -i decipher.an -o align_annotate.an -a 'ncbi' -l Acc_Num

The output error:

❯ annotateMSA -i decipher.an -o align_annotate.an -a 'ncbi' -l Acc_Num
Beginning annotation
Traceback (most recent call last):
  File "/opt/anaconda3/bin/annotateMSA", line 186, in <module>
    sca.AnnotNCBI(options.Input_MSA, options.output, options.idList)
  File "/opt/anaconda3/lib/python3.8/site-packages/pysca/scaTools.py", line 361, in AnnotNCBI
    taxonList = Entrez.read(handle)
  File "/opt/anaconda3/lib/python3.8/site-packages/Bio/Entrez/__init__.py", line 508, in read
    record = handler.read(handle)
  File "/opt/anaconda3/lib/python3.8/site-packages/Bio/Entrez/Parser.py", line 304, in read
    self.parser.ParseFile(handle)
  File "/opt/concourse/worker/volumes/live/71f8613d-c53a-40aa-4c7b-351131b1952c/volume/python_1599203882312/work/Modules/pyexpat.c", line 461, in EndElement
  File "/opt/anaconda3/lib/python3.8/site-packages/Bio/Entrez/Parser.py", line 666, in endErrorElementHandler
    raise RuntimeError(value)
RuntimeError: Invalid uid MXQ93025.1 at position=29

I included the accession list file as Acc_Num.txt and the alignment file as decipher.txt.
decipher.txt
Acc_Num.txt

with open(pfam_seq) OSError: [Errno 22] Invalid argument: '|'

When attempting to run the annotation step:

python ..\bin\annotateMSA -i PF00034_full_length_sequences.fasta -o PF00034_full_length_sequences.an

I get the following output/error:

Beginning annotation Traceback (most recent call last): File "C:\Users\dlamm\pySCA\bin\annotateMSA", line 178, in <module> sca.AnnotPfam( File "C:\Users\dlamm\AppData\Local\Programs\Python\Python39\lib\site-packages\pysca\scaTools.py", line 180, in AnnotPfam with open(pfam_seq) as fp: OSError: [Errno 22] Invalid argument: '|'

In settings I have :

path2pfamseq = r"C:\Users\dlamm\pfamseq.txt"

Running on:
Windows 10
Python 3.9
pySCA 6.1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.