redapple / sql2graph Goto Github PK
View Code? Open in Web Editor NEWhelper module to export data from a relational database to a graph database (through CSV files)
License: MIT License
helper module to export data from a relational database to a graph database (through CSV files)
License: MIT License
Hi there,
I imported the musicbrainz database to Neo4j using the following approach, helped by @jexp:
Define 2 indexes (one mbid
exact, for MBIDs and one mb
fulltext, for everything else) in batch.properties:
batch_import.keep_db=false
batch_import.mapdb_cache.disable=true
batch_import.node_index.mb=fulltext
batch_import.node_index.mbid=exact
batch_import.csv.quotes=false
cache_type=none
use_memory_mapped_buffers=true
neostore.nodestore.db.mapped_memory=300M
neostore.relationshipstore.db.mapped_memory=3G
neostore.propertystore.db.mapped_memory=500M
neostore.propertystore.db.strings.mapped_memory=500M
neostore.propertystore.db.arrays.mapped_memory=0M
neostore.propertystore.db.index.keys.mapped_memory=15M
neostore.propertystore.db.index.mapped_memory=15M
Then, create the indexing instructions directly in the node.csv and rels.csv files, so we don't need the ...index.csv files anymore, see https://github.com/jexp/batch-import -> automatic indexing
kind:string:mb comment status position name:string:mb area gender format barcode number ended length end_date_year begin_date_year mbid:string:mbid type:string:mb pk
artist Talkshow Boy f e8d94cf5-fafa-48fc-a6fa-aa50cf54d7f3 288762
artist Vibulator f 735bfaad-6eb1-4f9c-b21d-cbaef7c79a92 97944
artist Eat Me f c38a93e8-2ecf-4848-b1d2-364202d9dc0c Group 499198
artist Uffe Andersen f a7f3c871-3ba3-40b1-ba58-d08b40312789 Person 514886
artist Headust f eda60727-7036-437b-b53d-ae472818ee3a 212148
artist Sons Of The Subway f 232d5716-c2b2-47e1-aa0c-264ec69e6a18 100774
artist The Poe Boy Family f 672d599e-6a6c-456e-98ba-dac5a45e3ed8 43132
artist Ralph Gusovius Germany Male f 1950 6ecfcea1-677d-427b-a38b-9c76ce92e313 Person 295052
artist Elastik Band f 46e0639c-1ccf-45f5-b886-4cbf5549a2a1 61467
And then import the two files with something like
java -Xmx10G -server -Dfile.encoding=UTF-8 -jar ~/neo/batch-import/target/batch-import-jar-with-dependencies.jar ./graph.db nodes.csv rels.csv
WDYT? It would make the output a lot easier, and the import took about 10min on my machine, 160M Properties, 75M relatoinships ...
Hi
I have been following the steps to create the csv files for musicbrainz. Currently my script has only outputted the following lines
UPDATE 0 UPDATE 0 NOTICE: table "entity_mapping" does not exist, skipping DROP TABLE
I am concerned whether this takes a very long time to execute or if the script is halted. It seems that the script is sleeping when I check the processes. Any insight pls?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.