Code Monkey home page Code Monkey logo

unibench's Introduction

UniBench - Towards Benchmarking the Multi-Model DBMS

The UniBench project aims to develop a generic benchmark for a holistic evaluation of multi-model database systems (MMDS), which are able to support multiple data models such as documents, graph, and key-value models in a single back-end. UniBench consists of a set of mixed data models that mimics a social commerce application, which covers data models including JSON, XML, key-value, tabular, and graph. The UniBench workload consists of a set of complex read-only queries and read-write transactions that involve at least two data models.

Please access our Big Data 2020 tutorial, DAPD 2019 journal paper, and TPCTC 2018 paper to find more details:

Zhang, Chao, Jiaheng Lu. "Big Data System Benchmarking: State of the Art, Current Practices, and Open Challenges." In IEEE BIG DATA 2020 TUTORIAL, 2020.

Zhang, Chao, Jiaheng Lu. "Holistic Evaluation in Multi-Model Databases Benchmarking." In Distributed and Parallel Databases, 2019.

Zhang, Chao, et al. "UniBench: A benchmark for multi-model database management systems." TPCTC. Springer, Cham, 2018.

Query Implementations

Essentially, any MMDB can be implemented in UniBench either with or without data transformation. Since there is no query language standard, one can find all the query definitions in ArangoDB AQL, OrientDB SQL and AgensGraph SQL/Cypher, Spark SQL (partially) as follows:

Query 01 02 03 04 05 06 07 08 09 10
ArangoDB (AQL) 01 02 03 04 05 06 07 08 09 10
OrientDB (SQL) 01 02 03 04 05 06 07 08 09 10
AgensGraph (Cypher/SQL) 01 02 03 04 05 06 07 08 09 10
Spark (SQL) 01 02 03 04 05 06 07 08 09 10

Environment

To run this benchmark, you need to have the systems under test installed, and JRE >=8 installed. The current implementations include four state-of-the-art MMDB. Namely, ArangoDB (Query Language: AQL), OrientDB (Query Language: Orient SQL), AgensGraph (Query Language: SQL/Cypher), and Spark (partial evaluation using Spark SQL), you may also employ Unibench to evaluate a new MMDB by the following steps: (1) write an importing script or program, (2) extends the MMDB abstract class. (3) implement the connection and query methods in the corresponding MMDB class.

Running

Note that ArangoDB uses the default _System database and the password is empty. OrientDB uses a database named test that you may create beforehand. Download the latest release with SF1 dataset at https://github.com/HY-UDBMS/UniBench/releases/tag/0.2, and try out the first multi-model query as follows:

./DataImporting_ArangoDB.sh
java -jar Unibench.jar ArangoDB Q1

Benchmarking notes

(1) larger datasets with SF10 and SF30 can be found at https://github.com/HY-UDBMS/UniBench/releases/tag/data.

(2) for the UniBench schema in the DAPD 2019 journal paper, the tag table and the product table use the same data of Product.csv, the productId has the one-to-one mapping to the tagid. The hasInterest relation is removed since the queries do not involve it.

(3) for the data importing, we have released the scripts for ArangoDB and OrientDB based on their importer, since they have evolved several versions, please check if some parameters need to be changed. For example, ArangoDB 3.7 has used arangoimport utility to replace arangoimp, and the authentication needs to be turned off.

(4) if the benchmark can not find the parameter files (Brands, PersonIds, ProductIds), please download them from the github repo to the directory with path ./UniBench/Unibench/

(5) To run the benchmark in OrientDB, you need to create a custom javascript function named compareList in the Functions panel of OrientDB as follow:

var IDs = new Array();

  for(var i=0;i<array1.length;i++){
      for(var j=0;j<array2.length;j++){
        if( (array1[i].field('asin')==array2[j].field('asin')) && (array1[i].field('cnt')>array2[j].field('cnt')) )
				IDs.push(array1[i]);
     }
  }

return IDs;

Importing the data to OrientDB by a single command under the DataImporting_OrientDB folder as follow:

./Orientimport.sh

Data provenance

We collect the metadata from three datasets, namely, LDBC dataset, DBpedia dataset, and Amazon product dataset, and we use these data for purposes of academic research.

unibench's People

Contributors

rucchao avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.