octopus-platform / octopus Goto Github PK
View Code? Open in Web Editor NEWGeneric server for collaborative code analysis
License: GNU Lesser General Public License v3.0
Generic server for collaborative code analysis
License: GNU Lesser General Public License v3.0
It seems that the orientdbimporter is currently CPU bound, and that shouldn't be the case. We should perform profiling and possibly replace the CSV library, which is most probably at fault here.
Update: I am taking this as an opportunity to update to OrientDB 2.2.6. This already works in the joern branch orientdb2.2
:
https://github.com/octopus-platform/joern/tree/orientdb2.2
This version of OrientDB contains new code for batch insertion:
http://orientdb.com/docs/2.2.x/Graph-Batch-Insert.html
We can modify orientdbimporter
to make use of this API and see whether that improves performance. While at it, I would also suggest we make indexing a separate step, as indexing is not always required.
http://orientdb.com/docs/2.2.x/Performance-Tuning.html
Using the batch insertion API, there seems to be no simple way of setting edge labels, see:
https://groups.google.com/forum/?hl=es#!topic/orient-database/hiToJotzPEU
I read the OGraphBatchInsert
code and found the following:
The method setEdgeClass
is assumed to be called before begin
, which then creates the edge class in case it does not exist. The field is then passed as a first argument to the constructor of ODocument
upon edge creation in createEdge
. This means that by calling setEdgeClass
right before createEdge
, we can most probably create edges with different types, however, we need to assure that the edge type already exists in the database. This means that we have to know all edge types in advance upon creating the database schema.
I think it should be OK to ask for all edge types in the second line of edges.csv
files. Then we can create the database first, including all edge types and indices.
Here is another big problem if we want to use the new API: while for joern, node ids are indeed just long integers, this is not true for bjoern. However, the API only supports long integers are node ids.
With OrientDB 2.1.5, when trying to upload a file larger than 10kb via the uploadfile
command, we now get the following error message:
Error on content size 16850940: the maximum allowed is 1000000 [ONetworkProtocolHttpDb]
The error message is produced in this source file:
https://github.com/orientechnologies/orientdb/blob/4bcaad015d4410eaeceb5d1ad2abb355cb72fd59/server/src/main/java/com/orientechnologies/orient/server/network/protocol/http/ONetworkProtocolHttpAbstract.java
Either new versions of OrientDB enforce this limit on the size of POST requests, or our new way of starting the server from the Java world is causing this problem, e.g., because we do not set the maximum request size to a higher value than 10kb.
@ml86 Can you try to find out if this is a newly introduced problem, and if we can resolve it via server configuration?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.