Code Monkey home page Code Monkey logo

structure-indexer's Introduction

Structure Indexer

This is a self-contained structure indexer that uses Lucene as the underlying storage and indexing engine. The indexing architecture is based on an inverted indexing technique developed within NCATS. Please consult the wiki (coming soon) for additional details.

Contact: [email protected]

Building

If you're building for the first time, you'll need to add the jchem.jar to your local maven repository. To do this, execute the following command:

mvn install:install-file \
  -Dfile=lib/jchem.jar \
  -DgroupId=chemaxon \
  -DartifactId=jchem \
  -Dversion=3.2.12 \
  -Dpackaging=jar \
  -DgeneratePom=true

Then simply do mvn package to build the structure-indexer jar file. The bin directory contains the following wrapper scripts:

indexer is the main driver that is used to build the index. See indexer -h for complete usage. Here is an running example:

indexer index_dir BindingDB2D.sdf

searcher is the client driver that provides a command-line interface for searching and filtering. See searcher -h for complete usage. For example, consider the following command:

searcher  -fsmiles -F_natoms=20:22 -F_molwt=280.:300. -F_source=BindingDB2D -s sim -t.9 idx "N1c2ccccc2NC(=O)c2cccnc12"

This example performs similarity searching against the index idx for the given structure with a Tanimoto cutoff of 0.9, number of atoms in the range [20,22], molecular weight in the range [280, 300], only from the source BindingDB2D, and outputs the matches as SMILES format.

structure-indexer's People

Contributors

blueswordfish avatar caodac avatar chemmitch avatar dkatzel-ncats avatar rajarshi avatar tylerperyea avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

structure-indexer's Issues

We need an exception check here to fail gracefully

This can throw an exception if a molfile fails to parse, it shouldn't fail catastrophically when that happens:

out.put(new Result (p, similarity,null));

Also, this can only happen because we parse the molfile as part of the payload loading:

The thing is, we only NEED to parse the molfile if we're going to do a substructure search. For similarity searches this is an unnecessary step anyway and should probably be avoided. Perhaps it can be deferred?

The worse part about this is that throwing an exception ultimately makes the queue wait forever:

for (int i = 0; i < nthreads; ++i)
in.put(POISON_PAYLOAD);
threadPool.submit(new Runnable () {
public void run () {
try {
int total = 0;
for (Future<Integer> f : threads) {
total += f.get();
}
out.put(POISON_RESULT);
}
catch (Exception ex) {
ex.printStackTrace();
}
}
});

Typically we shouldn't see an unparsable molfile in the index, but it does happen from time to time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.