Code Monkey home page Code Monkey logo

rdkit / rdkit Goto Github PK

View Code? Open in Web Editor NEW
2.5K 83.0 849.0 200 MB

The official sources for the RDKit library

License: BSD 3-Clause "New" or "Revised" License

CMake 1.14% C++ 23.93% Python 5.69% C 2.46% JavaScript 0.20% HTML 64.41% Makefile 0.01% QMake 0.01% Smarty 0.01% LLVM 0.06% Shell 0.01% Java 0.58% C# 0.07% Fortran 0.01% Yacc 0.09% Lex 0.01% SMT 0.01% Jupyter Notebook 0.75% Dockerfile 0.01% SWIG 0.56%
cheminformatics c-plus-plus python rdkit

rdkit's Introduction

RDKit

Azure build Status Documentation Status DOI

RDKit is a collection of cheminformatics and machine-learning software written in C++ and Python.

  • BSD license - a business friendly license for open source
  • Core data structures and algorithms in C++
  • Python 3.x wrapper generated using Boost.Python
  • Java and C# wrappers generated with SWIG
  • 2D and 3D molecular operations
  • Descriptor and Fingerprint generation for machine learning
  • Molecular database cartridge for PostgreSQL supporting substructure and similarity searches as well as many descriptor calculators
  • Cheminformatics nodes for KNIME
  • Contrib folder with useful community-contributed software harnessing the power of the RDKit

Community

Code

Web presence

Materials from user group meetings

Documentation

Available on the RDKit page and in the Docs folder on GitHub

Installation

Installation instructions are available in Docs/Book/Install.md.

Binary distributions, anaconda, homebrew

  • binaries for conda python or, if you are using the conda-forge stack, the RDKit is also available from conda-forge.
  • RPMs for RedHat Enterprise Linux, Centos, and Fedora. Contributed by Gianluca Sforna.
  • debs for Ubuntu and other Debian-derived Linux distros. Contributed by the Debichem team.
  • homebrew formula for building on the Mac. Contributed by Eddie Cao.
  • recipes for building using the excellent conda package manager. Contributed by Riccardo Vianello.
  • APKs for Alpine Linux. Contributed by da Verona
  • Wheels at PyPi for all major platforms and python versions. Contributed by Christopher Kuenneth

Projects using RDKit

  • ROBERT - Automated Machine Learning Protocols
  • AQME - Automated Quantum Mechanical Environment
  • chemprop - message passing neural networks for molecular property prediction
  • RMG - Reaction Mechanism Generator
  • RDMC - Reaction Data and Molecular Conformers - package for dealing with reactions, molecules, conformers, mainly in 3D
  • pychemprojections - python library for visualizing various 2D projections of molecules.
  • pychemovality - python library for estimating the ovality of molecules.
  • ChEMBL Structure Pipeline - ChEMBL protocols used to standardise and salt strip molecules.
  • FPSim2 - Simple package for fast molecular similarity searches.
  • Datamol (docs, repo) - A Python library to intuitively manipulate molecules.
  • Scopy (docs, paper) - an integrated negative design Python library for desirable HTS/VS database design
  • stk (docs, paper) - a Python library for building, manipulating, analyzing and automatic design of molecules.
  • gpusimilarity - A Cuda/Thrust implementation of fingerprint similarity searching
  • Samson Connect - Software for adaptive modeling and simulation of nanosystems
  • mol_frame - Chemical Structure Handling for Dask and Pandas DataFrames
  • RDKit.js - The official JavaScript release of RDKit
  • DeepChem - python library for deep learning for chemistry
  • mmpdb - Matched molecular pair database generation and analysis
  • CheTo (paper)- Chemical topic modeling
  • OCEAN (paper)- Optimized cross reactivity estimation
  • ChEMBL Beaker - standalone web server wrapper for RDKit and OSRA
  • ZINC - Free database of commercially-available compounds for virtual screening
  • sdf_viewer.py - an interactive SDF viewer
  • sdf2ppt - Reads an SDFile and displays molecules as image grid in powerpoint/openoffice presentation.
  • MolGears - A cheminformatics tool for bioactive molecules
  • PYPL - Simple cartridge that lets you call Python scripts from Oracle PL/SQL.
  • shape-it-rdkit - Gaussian molecular overlap code shape-it (from silicos it) ported to RDKit backend
  • WONKA - Tool for analysis and interrogation of protein-ligand crystal structures
  • OOMMPPAA - Tool for directed synthesis and data analysis based on protein-ligand crystal structures
  • OCEAN - web-tool for target-prediction of chemical structures which uses ChEMBL as datasource
  • chemfp - very fast fingerprint searching
  • rdkit_ipynb_tools - RDKit Tools for the IPython Notebook
  • Vernalis KNIME nodes
  • Erlwood KNIME nodes
  • AZOrange

License

Code released under the BSD license.

rdkit's People

Contributors

adalke avatar alexandersavelyev avatar apahl avatar avaucher avatar bp-kelley avatar coleb avatar d-b-w avatar daenuprobst avatar davidacosgrove avatar e-kwsm avatar gedeck avatar greglandrum avatar ichirutake avatar jasondbiggs avatar jlvarjo avatar jones-gareth avatar k-ujihara avatar mcs07 avatar mwojcikowski avatar nadineschneider avatar ptosco avatar rachelnwalker avatar ricrogz avatar rvianello avatar samoturk avatar sriniker avatar tadhurst-cdd avatar thegodone avatar unixjunkie avatar vfscalfani avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rdkit's Issues

Cannot generate coordinates for output from DeleteSubstructs

As the example shows, this came from problems with salt stripping and is not helped by sanitization.

In [15]: m = Chem.MolFromSmiles('[I-].C[n+]1c(\\C=C\\2/C=CC=CN2CC=C)sc3ccccc13') 

In [16]: sr = SaltRemover.SaltRemover()

In [17]: nm =sr(m)

In [18]: AllChem.Compute2DCoords(nm)
[05:47:08] 

****
Pre-condition Violation

Violation occurred on line 656 in file /scratch/RDKit_trunk/Code/GraphMol/Depictor/EmbeddedFrag.cpp
Failed Expression: d_eatoms.find(aid) == d_eatoms.end()
****

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-18-fbe6edb91321> in <module>()
----> 1 AllChem.Compute2DCoords(nm)

RuntimeError: Pre-condition Violation

In [19]: Chem.SanitizeMol(nm)
Out[19]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE

In [20]: AllChem.Compute2DCoords(nm)
[05:47:18] 

****
Pre-condition Violation

Violation occurred on line 656 in file /scratch/RDKit_trunk/Code/GraphMol/Depictor/EmbeddedFrag.cpp
Failed Expression: d_eatoms.find(aid) == d_eatoms.end()
****

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-20-fbe6edb91321> in <module>()
----> 1 AllChem.Compute2DCoords(nm)

RuntimeError: Pre-condition Violation

In [21]: p = Chem.MolFromSmiles('[I-]')

In [22]: Chem.DeleteSubstructs(m,p)
Out[22]: <rdkit.Chem.rdchem.Mol at 0x3630980>

In [23]: nm2=Chem.DeleteSubstructs(m,p)

In [24]: AllChem.Compute2DCoords(nm2)
[05:47:57] 

****
Pre-condition Violation

Violation occurred on line 656 in file /scratch/RDKit_trunk/Code/GraphMol/Depictor/EmbeddedFrag.cpp
Failed Expression: d_eatoms.find(aid) == d_eatoms.end()
****

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-24-c2a107eeb439> in <module>()
----> 1 AllChem.Compute2DCoords(nm2)

RuntimeError: Pre-condition Violation

In [25]: Chem.MolToSmiles(nm2,True)
Out[25]: 'C=CCN1C=CC=C/C1=C\\c1sc2ccccc2[n+]1C'

Double bond stereochemistry not preserved in reactions.

Reported by Sabrina Syeda.
Thread here: http://www.mail-archive.com/[email protected]/msg03080.html

>>rxn = AllChem.ReactionFromSmarts('[CX4:4][CH1:3]=[CH1:2][CX4:5].[Br:1]>>[C:5][C:2]=[C:3][C:4][Br:1]')
>>rxn.Initialize()
>>r = [Chem.MolFromSmiles('CCC\C=C\C(C)C'), Chem.MolFromSmiles('Br')]
>>ps = rxn.RunReactants(tuple(r))
>> for p in ps:
    ...:     for m in p:
    ...:         print Chem.MolToSmiles(m, isomericSmiles= True)
    ...:         
[out] CCC(Br)C=CC(C)C
[out] CCCC=CC(C)(C)Br

Docs for Descriptors.MolWt are wrong

In [21]: Descriptors.MolWt?
Type:       function
String Form:<function <lambda> at 0x2f70320>
File:       /scratch/RDKit_trunk/rdkit/Chem/Descriptors.py
Definition: Descriptors.MolWt(*x, **y)
Docstring:
The average molecular weight of the molecule ignoring hydrogens

>>> MolWt(Chem.MolFromSmiles('CC'))
30.07...
>>> MolWt(Chem.MolFromSmiles('[NH4+].[Cl-]'))
53.49...

MCS code does not support stereochemistry

Thread here: http://www.mail-archive.com/[email protected]/msg02934.html

In [2]: mol1 = Chem.MolFromSmiles("Fc1ccc(cc1)[C@@]3(OCc2cc(C#N)ccc23)CCCN(C)C") 
In [3]: mol2 = Chem.MolFromSmiles("Fc1ccc(cc1)[C@]3(OCc2cc(C#N)ccc23)CCCN(C)C")

In [4]: from rdkit.Chem import MCS

In [6]: MCS.FindMCS((mol1,mol2))
Out[6]: MCSResult(numAtoms=24, numBonds=26, smarts='[F]-[#6]:1:[#6]:[#6]:[#6](-[#6]-2(-[#6]-[#6]-[#6]-[#7](-[#6])-[#6])-[#8]-[#6]-[#6]:3:[#6]:[#6](:[#6]:[#6]:[#6]:3-2)-[#6]#[#7]):[#6]:[#6]:1', completed=1)

SDWriter initialized on a file object can produce an unhandled C++ exception

Not calling the flush method of an SDWriter initialized on a file object may produce an unhandled exception and terminate the interpreter:

 In [7]: with open('xyz.sdf', 'w') as xyz:
    ...:     w = Chem.SDWriter(xyz)
    ...:     w.write(Chem.MolFromSmiles('c1ccccc1'))
    ...:     w.write(Chem.MolFromSmiles('c1ccccc1'))
    ...:     w.write(Chem.MolFromSmiles('c1ccccc1'))
    ...:     w.write(Chem.MolFromSmiles('c1ccccc1'))
    ...:     w.write(Chem.MolFromSmiles('c1ccccc1'))
    ...:     
 terminate called after throwing an instance of 'boost::python::error_already_set' 
 Aborted

Molecules from InChI have incorrect molecular weight.

In [24]: m =Chem.MolFromInchi('InChI=1S/C10H9N3O/c1-7-11-10(14)9(13-12-7)8-5-3-2-4-6-8/h2-6H,1H3,(H,11,12,14)')

In [25]: em = Chem.EditableMol(m)

In [26]: em.RemoveBond(8,7)

In [27]: nm = em.GetMol()

In [29]: frags = Chem.GetMolFrags(nm,asMols=True)

In [30]: [Descriptors.MolWt(x) for x in frags]
Out[30]: [5.04, 6.048]

It doesn't always happen though:

In [31]: m = Chem.MolFromSmiles('CO')
In [32]: em = Chem.EditableMol(m)

In [33]: em.RemoveBond(0,1)

In [34]: nm = em.GetMol()

In [35]: frags = Chem.GetMolFrags(nm,asMols=True)

In [36]: [Descriptors.MolWt(x) for x in frags]
Out[36]: [16.043, 18.015]

Compatibility with sdf files served by the PDB

SDFs provided by the PDB (Protein Data Bank) have a slightly different format than what RDKit is expecting.

Example file that would fail with the old code
http://www.rcsb.org/pdb/download/downloadLigandFiles.do?ligandIdList=XK2&structIdList=1HVR&instanceType=all&excludeUnobserved=false&includeHydrogens=false

The old error was:


Post-condition Violation
Element '' not found
Violation occurred on line 91 in file /home/jandom/workspace/rdkit/Code/GraphMol/PeriodicTable.h
Failed Expression: anum>-1


Hashed topological torsion fingerprints not compatible with old version.

2012_12_1:

In [9]: AllChem.GetHashedTopologicalTorsionFingerprint(Chem.MolFromSmiles('CCCCO'),nBits=4192).GetNonzeroElements()
Out[9]: {544: 1, 1760: 1}

2013_03_1:

In [3]: AllChem.GetHashedTopologicalTorsionFingerprint(Chem.MolFromSmiles('CCCCO'),nBits=4192).GetNonzeroElements()
Out[3]: {1974: 1, 3516: 1}

There's no good reason for this to be the case.

Incorrect atom labels from BRICS

In [4]: m = Chem.MolFromSmiles('CCOC1(C)CCCCC1')

In [5]: Chem.MolToSmiles(BRICS.BreakBRICSBonds(m),True)
Out[5]: '[3*]O[3*].[4*]CC.[4*]C1(C)CCCCC1'

(dupe of sf.net issue 287 to experiment with github issue tracking)

Incorrect InChIs after clearing computed properties

(reported by Francis Atkinson)

from __future__ import print_function

from rdkit import Chem

old_mol=Chem.MolFromMolBlock("""
  Marvin  02211109112D

 13 12  0  0  0  0            999 V2000
   -0.7607  -10.6459    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   -0.0457  -10.2343    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.6692  -10.6459    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
    1.3843  -10.2343    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
    2.0993  -10.6459    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
    2.8142  -10.2343    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
   -1.4740  -10.2352    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.6692  -11.4731    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    1.3843   -9.4072    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    2.0993  -11.4731    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    3.5317  -10.6451    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.2440  -10.2326    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    2.8132   -9.4072    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
  3  4  1  0  0  0  0
  4  9  1  1  0  0  0
  1  2  1  0  0  0  0
  5 10  1  1  0  0  0
  4  5  1  0  0  0  0
  6 11  1  0  0  0  0
 11 12  1  0  0  0  0
  5  6  1  0  0  0  0
  6 13  1  6  0  0  0
  2  3  1  0  0  0  0
  1  7  1  0  0  0  0
  3  8  1  1  0  0  0
M  END
""")



main_mol = Chem.DeleteSubstructs(old_mol,Chem.MolFromSmiles('O=[Sb](=O)O'))
main_mol.ClearComputedProps()
Chem.SanitizeMol(main_mol)
print(Chem.MolToSmiles(old_mol,True))
print(Chem.MolToSmiles(main_mol,True))

old_mol.Debug()
main_mol.Debug()

print(Chem.MolToInchi(old_mol))
print(Chem.MolToInchi(new_mol))


logLevel bug in MolFromInchi

There is a typo in inchi.py . "logLogLevel" is used where it should be "logLevel". Here is a reproducible:

>>> Chem.MolFromInchi("InChI=1S/CH2/h1H2", logLevel=100)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "..../site-packages/rdkit/Chem/inchi.py", line 76, in MolFromInchi
    if logLogLevel not in logLevelToLogFunctionLookup:
NameError: global name 'logLogLevel' is not defined

Reported by Andrew Dalke

mol fails to transfer to inchi format

from rdkit import Chem
from rdkit.Chem import BRICS
m1 = Chem.inchi.MolFromInchi('InChI=1S/C10H9N3O/c1-7-11-10(14)9(13-12-7)8-5-3-2-4-6-8/h2-6H,1H3,(H,11,12,14)')
m2 = BRICS.BreakBRICSBonds(m1)
Chem.MolToSmiles(m2,True)

I got

'[14_]c1nnc(C)nc1O.[16_]c1ccccc1'.

But when I try to get inchi format

Chem.inchi.MolToInchi(m2)

I got

[23:56:23] ERROR: Unknown element(s): *
''

By the way,
res = list(BRICS.FindBRICSBonds(m1))
res

I got
[((8, 7), ('14', '16'))]

What are '14' and '16'?

Thanks.

Bad ring query matches for molecules from MolFromSmarts

In [5]: Chem.MolFromSmiles('c:1:c:c:c:c:c1').HasSubstructMatch(Chem.MolFromSmarts('[R2]~[R1]~[R2]'))
Out[5]: False

In [6]: Chem.MolFromSmarts('c:1:c:c:c:c:c1').HasSubstructMatch(Chem.MolFromSmarts('[R2]~[R1]~[R2]'))
Out[6]: True

In [7]: Chem.MolFromSmarts('ccc').HasSubstructMatch(Chem.MolFromSmarts('[R2]~[R1]~[R2]'))
Out[7]: True

Support for Pillow/PIL fork

Pillow is a "friendly" fork of PIL. Effectively it replaces PIL.

It's nearly completely backwards compatible, except that it places things under the "PIL" module. Things like "import Image" need to be "from PIL import Image".

Support for it is a couple of lines to rdkit/sping/PIL/pidPIL.py. Change:

import Image, ImageFont, ImageDraw

  • to -

try:
from PIL import Image, ImageFont, ImageDraw
except ImportError:
import Image, ImageFont, ImageDraw

BaseFeatures_DIP2_NoMicroSpecies.fdef not parseable

In [7]: ffact = ChemicalFeatures.BuildFeatureFactory('./BaseFeatures_DIP2_NoMicrospecies.fdef')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-be7869918589> in <module>()
----> 1 ffact = ChemicalFeatures.BuildFeatureFactory('./BaseFeatures_DIP2_NoMicrospecies.fdef')

ValueError:  pattern->getNumAtoms() != len(feature weight vector)

Inital take on the USR Descriptor (no tests)

Hi Greg,

Here is my take an the USR Descriptor - it's 2x faster than Adrian's implementation but it's a lot less clearer. It probably can be improved quickly. Numerics agree with Adrian but I should probably add a test case.

Jan

aromatic Si written in SMILES, but cannot be read

In [2]: Chem.MolFromSmiles('Cc1cc[si](-c2cccc3ccc4cc5ccccc5cc4c32)[si](C)n1')
[04:48:35] SMILES Parse Error: syntax error for input: Cc1cc[si](-c2cccc3ccc4cc5ccccc5cc4c32)[si](C)n1

In [3]: Chem.MolFromSmiles('Cc1cc[Si](-c2cccc3ccc4cc5ccccc5cc4c32)[Si](C)n1')
Out[3]: <rdkit.Chem.rdchem.Mol at 0x242d440>

In [5]: Chem.CanonSmiles('C1=CC=CC=[Si]1')
Out[5]: 'c1cc[si]cc1'

SDWriter failing with bad boost::any_cast on windows

[reported by Paul C]

In [10]: from rdkit import Chem

In [11]: from rdkit.Chem import Descriptors

In [12]: from rdkit.ML.Descriptors import MoleculeDescriptors

In [13]: m = Chem.MolFromSmiles('CC')

In [15]: nms=[x[0] for x in Descriptors._descList]

In [16]: calc = MoleculeDescriptors.MolecularDescriptorCalculator(nms)

In [17]: ds= calc.CalcDescriptors(m)

In [18]: w=Chem.SDWriter('blah.sdf')

In [19]: w.write(m)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-19-4b04ce05d7ef> in <module>()
----> 1 w.write(m)

RuntimeError: boost::bad_any_cast: failed conversion using boost::any_cast

Stereochemistry lost for reacting atoms that don't change connectivity

Reported by Robert Feinstein in this thread: http://www.mail-archive.com/[email protected]/msg02908.html

# Demo of RDKit reaction transform nuking stereocenters
from rdkit import Chem
from rdkit.Chem import AllChem

# Define simple transform that includes possible stereocenter ([C:2])
rxn = AllChem.ReactionFromSmarts('[C:2][C:1]=O>>[C:2][C:1]=S')

# React achiral mol as test
ps = rxn.RunReactants((Chem.MolFromSmiles('CC=O'),))
Chem.MolToSmiles( ps[0][0], isomericSmiles=True )
# Output is 'CC=S'

# React mol with chiral center far removed
ps = rxn.RunReactants((Chem.MolFromSmiles('[Cl][C@H]([Br])CCCC=O'),))
Chem.MolToSmiles( ps[0][0], isomericSmiles=True )
# Output is 'S=CCCC[C@H](Cl)Br'

# React mol with chiral center included in transform component
ps = rxn.RunReactants((Chem.MolFromSmiles('[Cl][C@H](C=O)'),))
Chem.MolToSmiles( ps[0][0], isomericSmiles=True )
# Output is 'S=CCCl' - chriality has been lost.

improper behavior for empty SDMolSuppliers

This is reasonable:

[14]>>> s = Chem.SDMolSupplier()

[15]>>> s.SetData("")

[16]>>> s.next()
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
/scratch/RDKit_sf/<ipython-input-16-5e5e6532ea26> in <module>()
----> 1 s.next()

StopIteration: End of supplier hit

But this is bad:

[11]>>> s = Chem.SDMolSupplier()

[12]>>> s.SetData("")

[13]>>> len(s)
  [13]: 1

as is this:

[17]>>> s = Chem.SDMolSupplier()

[18]>>> s.SetData("")

[19]>>> s[0]

[20]>>>

and this is incomprehensible:

[17]>>> s = Chem.SDMolSupplier()

[18]>>> s.SetData("")

[19]>>> s[0]

[20]>>> s[1]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/scratch/RDKit_sf/<ipython-input-20-88de191fe097> in <module>()
----> 1 s[1]

IndexError: invalid index

[21]>>> s.SetData("")

[22]>>> s[0]

[23]>>> s.next()
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
/scratch/RDKit_sf/<ipython-input-23-5e5e6532ea26> in <module>()
----> 1 s.next()

StopIteration: End of supplier hit

[24]>>> len(s)
  [24]: 2

MolFromInchi doesn't work

I am using python Python 2.7.3
from rdkit import Chem
m2 = Chem.inchi.MolFromInchi('InChI=1S/C10H9N3O/c1-7-11-10(14)9(13-12-7)8-5-3-2-4-6-8/h2-6H,1H3,(H,11,12,14)')
I got
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'module' object has no attribute 'MolFromInchi'
But if I use MolFromSmiles
from rdkit import Chem
m2 = Chem.MolFromSmiles('C1CCC1')
It works.

Added USR descriptor

  • Ultrafast Shape Descriptor,
  • access via rdkit.Chem.Descriptors.USR,
  • added unit tests (sanity and numeric),
  • also some docs.

RemoveAtoms with chiral centers causes problems in SMILES generation

In [2]: smiles = "CCN1CCN(c2cc3[nH]c(C(=O)[C@@]4(CC)CC[C@](C)(O)CC4)nc3cc2Cl)CC1"

In [3]: mol = Chem.MolFromSmiles(smiles)

In [4]: tmp = Chem.EditableMol(mol)

In [5]: for atom in [29, 28, 27, 26, 25, 24, 8, 7, 6, 5, 4, 3, 2, 1, 0]: tmp.RemoveAtom(atom)

In [6]: mol = tmp.GetMol()

In [8]: Chem.MolToSmiles(mol)
[05:00:49] 

****
Range Error
idx
Violation occurred on line 153 in file /scratch/RDKit_git/Code/GraphMol/ROMol.cpp
Failed Expression: 0 <= 18 <= 14
****

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-8-2ae14fafa41a> in <module>()
----> 1 Chem.MolToSmiles(mol)

RuntimeError: Range Error

Remove the chiral spec and things work fine:

In [13]: smiles2 = "CCN1CCN(c2cc3[nH]c(C(=O)[C]4(CC)CC[C](C)(O)CC4)nc3cc2Cl)CC1"

In [14]: mol = Chem.MolFromSmiles(smiles2)

In [15]: tmp = Chem.EditableMol(mol)

In [16]: for atom in [29, 28, 27, 26, 25, 24, 8, 7, 6, 5, 4, 3, 2, 1, 0]: tmp.RemoveAtom(atom)

In [17]: mol = tmp.GetMol()

In [18]: Chem.MolToSmiles(mol)
Out[18]: 'CCC1(C(=O)c(n)[nH])CCC(C)(O)CC1'

reported by Dan Warner

InChI generation code not recognizing stereo

reported by Jan Holst Jensen

> For example: InChI strings generated for spiro.mol (spiro.mol - attached):
>
> IUPAC:
> InChI=1S/2C9H14Cl2/c2*1-7(10)3-9(4-7)5-8(2,11)6-9/h2*3-6H2,1-2H3/t2*7-,8-,9-/m10/s1
> RDKit: InChI=1S/2C9H14Cl2/c2*1-7(10)3-9(4-7)5-8(2,11)6-9/h2*3-6H2,1-2H3

This one still doesn't recognize the stereo. I'll file a bug for it:
In [2]: Chem.MolToInchi(Chem.MolFromMolFile('spiro.mol'))
[09:53:16] WARNING: Omitted undefined stereo
Out[2]: 'InChI=1S/2C9H14Cl2/c2*1-7(10)3-9(4-7)5-8(2,11)6-9/h2*3-6H2,1-2H3'

Here's the file:

spiro.mol
  ACD/Labs0709041010  

 22 24  0  0  0  0  0  0  0  0  1 V2000
    9.2912   -9.4308    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.8009   -6.9967    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0
    7.7063   -7.9840    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.8986   -9.1405    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    9.1531   -6.3991    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   10.5689   -7.8459    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   11.7408   -6.2812    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   11.8789   -9.3129    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   13.3257   -7.7280    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   15.1335   -6.5716    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0
   15.2311   -8.7153    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    9.6519  -14.7621    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.1616  -12.3280    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0
    8.0670  -13.3153    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.2593  -14.4717    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    9.5138  -11.7304    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   10.9296  -13.1772    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   12.1015  -11.6125    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   12.2396  -14.6442    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   13.6864  -13.0593    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   15.4941  -11.9029    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   15.5918  -14.0466    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0
  3  2  1  0  0  0  0
  3  4  1  1  0  0  0
  1  3  1  0  0  0  0
  5  3  1  0  0  0  0
  6  1  1  0  0  0  0
  5  6  1  0  0  0  0
  7  6  1  0  0  0  0
  6  8  1  1  0  0  0
  9  7  1  0  0  0  0
  8  9  1  0  0  0  0
 10  9  1  0  0  0  0
  9 11  1  1  0  0  0
 14 13  1  0  0  0  0
 14 15  1  1  0  0  0
 12 14  1  0  0  0  0
 16 14  1  0  0  0  0
 17 12  1  0  0  0  0
 16 17  1  0  0  0  0
 18 17  1  0  0  0  0
 17 19  1  1  0  0  0
 20 18  1  0  0  0  0
 19 20  1  0  0  0  0
 21 20  1  0  0  0  0
 20 22  1  1  0  0  0
M  END
>  <NAME>
spiro 

$$$$

MolFragmentToSmiles generating non-canonical results

In [3]: Chem.MolFragmentToSmiles(Chem.MolFromSmiles('c1c(C)cccc1'),(0,1,2))
Out[3]: 'Ccc'

In [4]: Chem.MolFragmentToSmiles(Chem.MolFromSmiles('c1c(C)cccc1'),(1,2,3))
Out[4]: 'ccC'

In [5]: Chem.MolFragmentToSmiles(Chem.MolFromSmiles('c1c(C)cccc1'),(1,3,2))
Out[5]: 'ccC'

logging in inchi.py module

inchi.MolFromInchi and inchi.MolToInchiAndAuxInfo contain a 'log(log)' statement. If the 'logLevel' argument is not None an error is produced.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.