Code Monkey home page Code Monkey logo

Comments (3)

rvianello avatar rvianello commented on July 24, 2024

Hi @paconius

great question, I don't remember exactly, but I think at the time of the first productive release of this library, there was a bug in the cartridge code affecting the @= operator.

One reason why this issue wasn't considered very critical is that same-structure queries are often more efficient when simply based on the comparison of a canonical smiles string, but if @= can be now confirmed to work reliably, I would be fine with restoring the commented-out version. I'll try and find some time to review the details.

Best,
Riccardo

from django-rdkit.

paconius avatar paconius commented on July 24, 2024

Thanks for the info Riccardo. Your comment about cannonical smiles being the best (fastest?) way to do exact matching is an interesting one. Is this how you personally perform exact searches?

If this has merit, then perhaps best practice would be to always have an indexed canonical smiles string field that is populated for every structure and then use the database string operators to perform exact searches. If this is true then it might make sense to promote this an an improvement to the cartridge code. Greg has several blog posts on the speed of the cartridge, so I imagine he would be interested in your solution if it is fastest.

from django-rdkit.

rvianello avatar rvianello commented on July 24, 2024

Hi @paconius

I think this is the issue I recalled rdkit/rdkit#525 - it was actually fixed ages ago, so restoring the commented out version of the exact lookup - as you initially suggested - would definitely make sense.

Interestingly, the implementation of the mol comparison operators I think performs a both-ways substructure test, before resorting to a canonical smiles comparison:
https://github.com/rdkit/rdkit/blob/master/Code/PgSQL/rdkit/adapter.cpp#L424

Anyways, yes, when exact lookup operations are to be performed on a more than occasional basis, I prefer to use a dedicated indexed column filled with the canonical smiles (not my invention, and to be honest, I never performed a dedicated benchmark, I would be just very surprised if it wasn't quite faster). I am not sure I would automate this pattern on every mol column though, because requirements vary, and I'm not sure it would represent a very valuable addition for the cartridge considering that the necessary functionalities are already easily available (but this is just my personal feeling).

Best,
Riccardo

from django-rdkit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.