Code Monkey home page Code Monkey logo

pdbtools's Introduction

pdbtools

A set of tools for manipulating and doing calculations on wwPDB macromolecule structure files

PDBtools has recently changed. The original version was a set of python scripts to be downloaded and used locally on the command line. It has now been reorganized into a python packaged and scripts are now installed globally. Thus, you can run any of the old pdb_tools from the commandline from anywhere on your filesystem.

If you'd like to download the old version of pdbtools, download version v0.1.

Introduction

pdbTools is a set of command line python scripts that manipulate wwPDB protein and nucleic acid structure files. There are many programs, both open source and proprietary, that perform similar tasks; however, most of these tools are buried within programs of larger functionality. Thus, relatively simple calculations often involve learning a new program, compiling modules, and installing libraries. To fill a niche (and get the tasks done that I needed done), I started writing my own toolset. This has evolved into the pdbTools suite. The suite of programs is characterized by the following philosophy:

  • Each program should run as a stand-alone application with a standard, GNU/POSIX style command line interface.
  • Each program should be written in such a way to allow it to be used as a library of functions for more complex programs.
  • Programs should require a minimum of external dependencies.

Most of the scripts will run "out of the box" using a python interpreter. The command line parser is designed to be flexible. It will take an arbitrarily long list of pdb files, pdb ids, text files with pdb ids, or some mixture of all three. If the pdb file or id is not in the working directory, scripts will attempt to download the pdb file from RCSB. Depending on the type of operation being done, a program will either write output files in the working directory or will print to stdout. All structure outputs are written in standard pdb format. All data outputs are in fixed-width column format. They were designed to be read by the statistics package R; however, they should be easily parsed by other graphing programs.

Note: These scripts are only compatible with Python version 2.4-2.7.

Installation

Install the development version by cloning this repo and running pip:

pip install -e .

from inside the package.

Current functions

Miscellaneous

  • download pdb files from the RCSB database: download.py

Structure-based calculations

Geometry

Energy calculation

  • calculate coulomb energy: coulomb.py
  • calculate the dipole moment of the protein: moment.py
  • calculate pKa of ionizable groups using the Solvent-Accessibility-modified Tanford-Kirkwood method satk.py (requires fortran compiler)

Structure properties

  • extract structure experiment properties: exper.py
  • extract protein sequence from structure: seq.py
  • calculate theoretical pI, MW, fraction titratable residues, charge: param.py

File/structure manipulation

Some of the programs are written as interfaces to other programs: CHARMM, [NACCESS](http://www.bioinf.manchester.ac.uk/naccess/ NACCESS), which must be downloaded and installed separately if their functions are desired. To use satk.py, a set of fortran packages must be compiled.

Usage

Commandline usage

Almost all programs in the pdbTools suite have the same command-line usage:

pdb_XXXX pdb_input optional_args > output

pdb_input can be one of the following (in any arbitrary combination):

  • pdb files
  • directories of pdb files
  • four-character pdb ids
  • text files containing whitespace delimited (i.e. space, tab, carriage return) lists of any combination of the other allowed types of arguments. If the list of arguments contains pdb files or ids that do not exist locally, the parser will attempt to download the files from the RCSB database.

optional_args: Although the arguments to each program are identical, the options are quite different depending on the program requirements. The best way to learn how to use a particular program is to type XXXX.py --help. This will spit out a list of available options. In most cases, the options are actually optional: the program will use a sane default if none is specified. In some cases (notably mutator.py), options must be specified for the program to run.

output: Most scripts dump out a pdb file to standard out. This can be captured using the ">" redirect. Some write an output file that uses the name of the input pdb file as a suffix (e.g. close-contacts.py 1stn.pdb creates a file called 1stn.pdb.close_contacts).

API

Version 0.2 has moved all pdbtools into a set of modules. These can be used to develop new scripts easily.

Note: You can download the original pdbtools scripts (prior to packaging) here.

Third Party Software

Some scripts require installation of third-party programs. These should be installed according to the instructions given by the third-party, then placed into the $PATH variable. To use the scripts that require CHARMM, the $CHARMM environment variable must be set to the directory containing the charmm binary and the $CHARMM_LIB environment variable to the directory containing the charmm parameter files.

Contributing

If you find a bug or have an idea for a program you'd like in this package, feel free to open an issue. Even better: feel free to make a pull request!

Project Owner

Mike Harms (https://github.com/harmsm, http://harmslab.uoregon.edu)

pdbtools's People

Contributors

deborahharrus avatar harmsm avatar zsailer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pdbtools's Issues

pkg_resources.DistributionNotFound

Hello pdbtools author,

I installed pdbtools in anaconda/bin directory in CentOS 7 system. When I run the scripts of pdbtools, I get the "pkg_resources.DistributionNotFound" error. For example:

pdb_residue_renumber input.pdb > output.pdb

I get the error information: "pkg_resources.DistributionNotFound: The 'pdbtools==0.1' distribution was not found and is required by the application"

Could you tell me how to solve the problem?

Renumbering residues from 1

I am trying to renumber the residue numbers starting from 1 (first residue should be 1) for the attached protein https://www.dropbox.com/s/z7xj629fhq42ey3/test.pdb?dl=0

If I use
pdb_residue_renumber test.pdb

it doesn't really do the numbering properly. Also when I tried to use the flag -s START_RES using following command
pdb_residue_renumber test.pdb -s 1 -r False -g True I got an error like

ERROR! false could not be retrieved!
ERROR! true could not be retrieved!
Usage: pdb_residue_renumber [options] pdb files and directories with pdb files
pdb_residue_renumber: error: pdb could not be found on rcsb!

Any help or pointing towards some solution in this matter will be highly appreciated.

Best regards,
Bhakat

Python2 artifacts in Python3 syntax

After the commit running 2-to-3 the base syntax of the scripts is now only supporting python3.x, however, there are some python2 exclusive expressions remaining.

For example the reference to string.letters in clean.py.

I haven't checked all files so there might be more instances of python2 expressions remaining.

For of now the current state of pdb_clean does not work on both python2 and python3.

Module errors when installed within conda environment

I want to install this package under a conda environment but have been encountering syntax errors when I try to run the scripts.

I first attempted to install with the following commands:
conda create -n pdbtools python=2.7 conda activate pdbtools pip install git+git://github.com/harmslab/pdbtools

However when I then try and run a command from the package I get the following error:
pdb_bfactor

Traceback (most recent call last): File "/Users/samuelhaysom/opt/anaconda3/envs/pdbtools/bin/pdb_bfactor", line 19, in <module> from pdbtools.helper import cmdline File "/Users/samuelhaysom/opt/anaconda3/envs/pdbtools/lib/python2.7/site-packages/pdbtools/__init__.py", line 1, in <module> from . import addH, atom_renumber, bfactor, centerasu, centermass, clean, closecontacts, contact, contactplot, coulomb, dist_filter, disulfide, download, exper, iondist, ligand, moment, mutator, neighbors, offset, oligomer, param, residue_renumber, sasa, satk, seq, splitnmr, subset, torsion, watercontact, helper File "/Users/samuelhaysom/opt/anaconda3/envs/pdbtools/lib/python2.7/site-packages/pdbtools/addH.py", line 14, in <module> from .helper import container, cmdline, geometry ImportError: No module named helper

I then tried to install as in the instructions for this repo, after first creating and activating a conda environment as follows:
conda create -n pdbtools python=2.7 conda activate pdbtools git clone https://github.com/harmslab/pdbtools cd pdbtools pip install -e .

I then got a different error:
Traceback (most recent call last): File "/Users/samuelhaysom/opt/anaconda3/envs/pdbtools/bin/pdb_bfactor", line 7, in <module> exec(compile(f.read(), __file__, 'exec')) File "/Users/samuelhaysom/pdbtools/scripts/pdb_bfactor", line 19, in <module> from pdbtools.helper import cmdline File "/Users/samuelhaysom/pdbtools/pdbtools/__init__.py", line 1, in <module> from . import addH, atom_renumber, bfactor, centerasu, centermass, clean, closecontacts, contact, contactplot, coulomb, dist_filter, disulfide, download, exper, iondist, ligand, moment, mutator, neighbors, offset, oligomer, param, residue_renumber, sasa, satk, seq, splitnmr, subset, torsion, watercontact, helper File "/Users/samuelhaysom/pdbtools/pdbtools/addH.py", line 16, in <module> from .disulfide import pdbDisulfide File "/Users/samuelhaysom/pdbtools/pdbtools/disulfide.py", line 19, in <module> from .clean import stripACS File "/Users/samuelhaysom/pdbtools/pdbtools/clean.py", line 269 print(log[-1], end=' ') ^ SyntaxError: invalid syntax

Not sure if there is a simple solution here but can I suggest that at some point it would be good to make this package compatible with either PYPI or anaconda to make installation easier and the package more accessible.

reg pdb_contact.py

Hi, I am not sure about the unit of the distance cut off. My proteins average width is ~90 A which is equivalent to ~1000 cut off. Is it some arbitrary unit?

Does pdbtools support the free version of CHARMM (version 40)?

Recently Chemistry at HARvard Macromolecular Mechanics released a free version of CHARMM, called 'charmm'. I tried to uses the free version of charmm with pdbtools to clean a pdb file, but failed.
The script ./pdb_clean.py shows :
CharmmInterfaceError It appears that CHARMM has failed. Input written to: charmm.inp Output writen to: charmm.out
Please tell me how can I solve this problem?

https://www.charmm.org/charmm/showcase/news/free-charmm/

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.