Code Monkey home page Code Monkey logo

octopus's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

octopus's Issues

Optimize orientdbimporter

It seems that the orientdbimporter is currently CPU bound, and that shouldn't be the case. We should perform profiling and possibly replace the CSV library, which is most probably at fault here.

Update: I am taking this as an opportunity to update to OrientDB 2.2.6. This already works in the joern branch orientdb2.2:

https://github.com/octopus-platform/joern/tree/orientdb2.2

This version of OrientDB contains new code for batch insertion:

http://orientdb.com/docs/2.2.x/Graph-Batch-Insert.html

We can modify orientdbimporter to make use of this API and see whether that improves performance. While at it, I would also suggest we make indexing a separate step, as indexing is not always required.

http://orientdb.com/docs/2.2.x/Performance-Tuning.html

Using the batch insertion API, there seems to be no simple way of setting edge labels, see:

https://groups.google.com/forum/?hl=es#!topic/orient-database/hiToJotzPEU

I read the OGraphBatchInsert code and found the following:

The method setEdgeClass is assumed to be called before begin, which then creates the edge class in case it does not exist. The field is then passed as a first argument to the constructor of ODocument upon edge creation in createEdge. This means that by calling setEdgeClass right before createEdge, we can most probably create edges with different types, however, we need to assure that the edge type already exists in the database. This means that we have to know all edge types in advance upon creating the database schema.

I think it should be OK to ask for all edge types in the second line of edges.csv files. Then we can create the database first, including all edge types and indices.

Here is another big problem if we want to use the new API: while for joern, node ids are indeed just long integers, this is not true for bjoern. However, the API only supports long integers are node ids.

File size limit for uploads

With OrientDB 2.1.5, when trying to upload a file larger than 10kb via the uploadfile command, we now get the following error message:

Error on content size 16850940: the maximum allowed is 1000000 [ONetworkProtocolHttpDb]

The error message is produced in this source file:
https://github.com/orientechnologies/orientdb/blob/4bcaad015d4410eaeceb5d1ad2abb355cb72fd59/server/src/main/java/com/orientechnologies/orient/server/network/protocol/http/ONetworkProtocolHttpAbstract.java

Either new versions of OrientDB enforce this limit on the size of POST requests, or our new way of starting the server from the Java world is causing this problem, e.g., because we do not set the maximum request size to a higher value than 10kb.

@ml86 Can you try to find out if this is a newly introduced problem, and if we can resolve it via server configuration?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.