Comments (6)
Hey Rick,
thanks for this detailed report.
Regarding the first slowdown:
In general the algorithms for molecular formula estimation should not have changed dramatically. So this should not be the reason for the runtime increase. Just to be sure, are both versions using the same ILP solver?
In your benchmark, you are starting SIRIUS each time for a single compound. So it is very likely, that you are just measuring the difference in startup time of the different SIRIUS versions, which is slower for SIRIUS 4.4 due to the new Project-Space and the modular sub-tool layout.
SIRIUS is designed to run large datasets and efficiently parallelizes among different compound, analysis steps and even within the algorithms. So it is generally much more efficient to run many compounds at once than running a separate SIRIUS instance for each compound.
However, we know that this can be annoying in more interactive integrations. We currently started a project to start a local SIRIUS installation as a background service (Rest-Service) that can be queried through a client API which we would like to provide for the scripting languages that are mostly used by the community (e.g. R and Python).
I think that could be a great solution for your use case. Let me know if this is interesting for you.
Regarding the second one:
I have no answer yet. ;-) We will investigate that, and get back to you.
from sirius.
Hi Markus,
Thanks for the quick reply!
The linear solver is kept the same as other solvers are not present on the Linux box.
Do you foresee any way to by-pass the overhead from the new workspace loading? Otherwise 4.0 will remain an option for formula calculation at least.
Background: I'm using SIRIUS as backend for formula/compound annotation in patRoon. At the moment everything is optimized for single query SIRIUS executions, as this seemed to be a good option for previous versions. I guess in the future I can rewrite it so multiple queries are combined for every SIRIUS call. The REST API sounds interesting, but is probably not exactly the right tool for large batch calculations in this scenario :-)
Thanks again,
Rick
from sirius.
Hey Rick,
not sure if it is not the right tool for batch calculations. This would competently avoid any startup times of SIRIUS and JVM (except for one initial startup). Further, input data can be passed to SIRIUS in memory instead by writing it to the disc and load it from disc again. I am not talking about a "web service", it is all local. REST is just used to provide an language independent API. Single query executions would not be a problem anymore, because the SIRIUS internal job managing could handle them appropriately.
The main difference from your side would be starting SIRIUS once and sending a rest query instead of executing a command line process. Building the queries is done by the client library, for patRoon you could use the R client I guess? So running SIRIUS tasks would then change to just calling R functions instead of making command line calls.
from sirius.
I think there is no way to the bypass the project-space because it is the main persistence layer of the CLI. However, I will have a short look if we can tune the startup time a bit, but I do not see much potential here.
Nevertheless, starting a JVM and parsing property files for each compound will always be a sub-optimal workflow.
I would suggest to either invest in bundling compounds together. With this approach it you should become faster than it was with 4.0.1. Or we can try to get a prototype of the background version ready if your are interested in trying it.
Further, SIRIUS development goes in the direction to use more and more dataset wide information for the algorithms (e.g. ZODIAC). So processing data on a per compound basis may hinder you using this features.
Best
Markus
from sirius.
Hi Markus,
Thanks for the extra information! I will definitely try the other approach with calculating multiple formulas on a go. I never realized this was already possible with 4.0 too... will be interesting to compare for sure.
The REST api now also sounds way more interesting! I think I'll start with the bundling approach first, as this will probably require relative few changes in the code.
Thanks again,
Rick
from sirius.
FYI: here #11 are some further discussion regarding wrapping SIRIUS in R and the local SIRIUS REST API.
from sirius.
Related Issues (20)
- conda linux install with openjdk missing lib/jvm/lib/libawt_xawt.so HOT 1
- Question: how to use the Passatutto decoy generation
- Not all compounds in summaries
- sirius 4.8.2 --> sirius 5.8.3 HOT 2
- Database information
- how to import raw data into Sirius HOT 1
- SIRIUS 5.6.3 Error when querying REST service HOT 1
- build error on fresh checkout HOT 4
- Export Issues with Sirius 5.8.3
- Error when storing custom db HOT 4
- Empty .csv file while trying to use the FBMN export HOT 2
- compound_identification.csv is missing HOT 1
- ftree-export got errors
- No web service connection HOT 1
- SIRIUS account for independent, nonprofit research institute
- sirius-network-connection HOT 1
- changing configuration
- Login and connection issues HOT 1
- unable to make Sirius use Gurobi solver
- Can not find `v5.8.6/sirius-5.8.6-linux64-headless.zip` HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sirius.