Code Monkey home page Code Monkey logo

Comments (6)

mfleisch avatar mfleisch commented on June 13, 2024

Hey Rick,
thanks for this detailed report.

Regarding the first slowdown:
In general the algorithms for molecular formula estimation should not have changed dramatically. So this should not be the reason for the runtime increase. Just to be sure, are both versions using the same ILP solver?

In your benchmark, you are starting SIRIUS each time for a single compound. So it is very likely, that you are just measuring the difference in startup time of the different SIRIUS versions, which is slower for SIRIUS 4.4 due to the new Project-Space and the modular sub-tool layout.

SIRIUS is designed to run large datasets and efficiently parallelizes among different compound, analysis steps and even within the algorithms. So it is generally much more efficient to run many compounds at once than running a separate SIRIUS instance for each compound.

However, we know that this can be annoying in more interactive integrations. We currently started a project to start a local SIRIUS installation as a background service (Rest-Service) that can be queried through a client API which we would like to provide for the scripting languages that are mostly used by the community (e.g. R and Python).
I think that could be a great solution for your use case. Let me know if this is interesting for you.

Regarding the second one:
I have no answer yet. ;-) We will investigate that, and get back to you.

from sirius.

rickhelmus avatar rickhelmus commented on June 13, 2024

Hi Markus,

Thanks for the quick reply!

The linear solver is kept the same as other solvers are not present on the Linux box.
Do you foresee any way to by-pass the overhead from the new workspace loading? Otherwise 4.0 will remain an option for formula calculation at least.

Background: I'm using SIRIUS as backend for formula/compound annotation in patRoon. At the moment everything is optimized for single query SIRIUS executions, as this seemed to be a good option for previous versions. I guess in the future I can rewrite it so multiple queries are combined for every SIRIUS call. The REST API sounds interesting, but is probably not exactly the right tool for large batch calculations in this scenario :-)

Thanks again,
Rick

from sirius.

mfleisch avatar mfleisch commented on June 13, 2024

Hey Rick,
not sure if it is not the right tool for batch calculations. This would competently avoid any startup times of SIRIUS and JVM (except for one initial startup). Further, input data can be passed to SIRIUS in memory instead by writing it to the disc and load it from disc again. I am not talking about a "web service", it is all local. REST is just used to provide an language independent API. Single query executions would not be a problem anymore, because the SIRIUS internal job managing could handle them appropriately.

The main difference from your side would be starting SIRIUS once and sending a rest query instead of executing a command line process. Building the queries is done by the client library, for patRoon you could use the R client I guess? So running SIRIUS tasks would then change to just calling R functions instead of making command line calls.

from sirius.

mfleisch avatar mfleisch commented on June 13, 2024

I think there is no way to the bypass the project-space because it is the main persistence layer of the CLI. However, I will have a short look if we can tune the startup time a bit, but I do not see much potential here.

Nevertheless, starting a JVM and parsing property files for each compound will always be a sub-optimal workflow.

I would suggest to either invest in bundling compounds together. With this approach it you should become faster than it was with 4.0.1. Or we can try to get a prototype of the background version ready if your are interested in trying it.

Further, SIRIUS development goes in the direction to use more and more dataset wide information for the algorithms (e.g. ZODIAC). So processing data on a per compound basis may hinder you using this features.

Best
Markus

from sirius.

rickhelmus avatar rickhelmus commented on June 13, 2024

Hi Markus,

Thanks for the extra information! I will definitely try the other approach with calculating multiple formulas on a go. I never realized this was already possible with 4.0 too... will be interesting to compare for sure.

The REST api now also sounds way more interesting! I think I'll start with the bundling approach first, as this will probably require relative few changes in the code.

Thanks again,
Rick

from sirius.

mfleisch avatar mfleisch commented on June 13, 2024

FYI: here #11 are some further discussion regarding wrapping SIRIUS in R and the local SIRIUS REST API.

from sirius.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.