Code Monkey home page Code Monkey logo

fishstore's Introduction

Build Status

Introduction

FishStore is a new ingestion and storage layer for flexible- and fixed-schema datasets. It allows you to dynamically register complex predicates over the data, to define interesting subsets of the data. Such predicates are called PSFs (for predicated subset functions).

FishStore performs partial parsing of the ingested data (based on active PSFs) in a fast, parallel, and micro-batched manner, and hash indexes records for subsequent fast PSF-based retrieval. To accomplish its goals, FishStore leverages and extends the FASTER hash key-value store, and uses an unmodified parser interface for fast parsing (we use simdjson in many of our examples).

FishStore is being open-sourced as a research prototype, by researchers from Microsoft Research and the University of Utah. You can read more about the concepts behind FishStore in the SIGMOD 2019 research paper. Note that the research paper uses Mison as its parser, whereas this open-source release of FishStore provides a generic parser interface model, with simdjson as an out-of-the-box example.

For detailed usage of FishStore, please refer to our tutorial.

Building FishStore

Clone FishStore including submodules:

git clone https://github.com/microsoft/FishStore.git
cd FishStore
git submodule update --init

FishStore uses CMake for builds. To build it, create one or more build directories and use CMake to set up build scripts for your target OS. Once CMake has generated the build scripts, it will try to update them, as needed, during ordinary build.

Building on Windows

Create new directory "build" off the root directory. From the new "build" directory, execute:

cmake .. -G "<MSVC compiler> Win64"

To see a list of supported MSVC compiler versions, just run "cmake -G". As of this writing, we're using Visual Studio 2017, so you would execute:

cmake .. -G "Visual Studio 15 2017 Win64"

That will create build scripts inside your new "build" directory, including a "FishStore.sln" file that you can use inside Visual Studio. CMake will add several build profiles to FishStore.sln, including Debug/x64 and Release/x64.

Building on Linux

The Linux build requires several packages (both libraries and header files); see "CMakeFiles.txt" in the root directory for the list of libraries being linked to, on Linux.

As of this writing, the required libraries are:

  • stdc++fs : for <experimental/filesytem>, used for cross-platform directory creation.
  • uuid : support for GUIDs.
  • tbb : Intel's Thread Building Blocks library, used for concurrent_queue.
  • gcc
  • aio : Kernel Async I/O, used by QueueFile / QueueIoHandler.
  • stdc++
  • pthread : thread library.

On Ubuntu, you may install dependencies as follows:

 sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test
 sudo apt update
 sudo apt install -y g++-7 libaio-dev uuid-dev libtbb-dev

Also, CMake on Linux, for the gcc compiler, generates build scripts for either Debug or Release build, but not both; so you'll have to run CMake twice, in two different directories, to get both Debug and Release build scripts.

Create new directories "build/Debug" and "build/Release" off the root directory. From "build/Debug", run:

cmake -DCMAKE_BUILD_TYPE=Debug ../..

and from "build/Release", run:

cmake -DCMAKE_BUILD_TYPE=Release ../..

Then you can build Debug or Release binaries by running "make" inside the relevant build directory.

Other options

You can try other generators (compilers) supported by CMake. The main CMake build script is the CMakeLists.txt located in the root directory.

Extensions

FishStore is a general storage layer supporting different input data formats and general PSFs. Specifically, users can extend FishStore by implementing their own parser adapters and PSF libaries, for more details please refer to:

Submodules

This project references the following Git submodules:

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

fishstore's People

Contributors

badrishc avatar dongx-psu avatar yinanli avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.