Code Monkey home page Code Monkey logo

pegasus's Introduction

Build Status

docs/media-img/pegasus-logo.png

中文Wiki, 微信交流群, slack channel

What is Pegasus?

Pegasus is a distributed key-value storage system developed and maintained by Xiaomi Cloud Storage Team, with targets of high availability, high performance, strong consistency and ease of use. The original motivation of this project is to replace Apache HBase for users who only need simple key-value schema but require low latency and high availability. It is based on the modified rDSN(original Microsoft/rDSN) framework, and uses modified RocksDB(original facebook/RocksDB) as underlying storage engine. The consensus algorithm it uses is PacificA.

Features

  • High performance

    Here are several key aspects that make Pegasus a high performance storage system:

    • Implemented in C++
    • Staged event-driven architecture, a distinguished architecture that Nginx adopts.
    • High performance storage-engine with RocksDB, though slight change is made to support fast learning.
  • High availability

    Unlike Bigtable/HBase, a non-layered replication architecture is adopted in Pegasus in which an external DFS like GFS/HDFS isn't the dependency of the persistent data, which benefits the availability a lot. Meanwhile, availability problems in HBase which result from Java GC are totally eliminated for the use of C++.

  • Strong consistency

    We adopt the PacificA consensus algorithm to make Pegasus a strong consistency system.

  • Easily scaling out

    Load can be balanced dynamically to newly added data nodes with a global load balancer.

  • Easy to use

    We provided C++ and Java client with simple interfaces to make it easy to use.

Architecture overview

The following diagram shows the architecture of Pegasus:

docs/media-img/pegasus-architecture-overview.png

Here is a brief explanation on the concepts and terms in the diagram:

  • MetaServer: a component in Pegasus to do the whole cluster management. The meta-server is something like "HMaster" in HBase.
  • Zookeeper: the external dependency of Pegasus. We use zookeeper to store the meta state of the cluster and do meta-server's fault tolerance.
  • ReplicaServer: a component in Pegasus to serve client's read/write request. The replica-server is also the container for replicas.
  • Partition/replica: the whole key space is split into several partitions, and each partition has several replicas for fault tolerance. You may want to refer to the PacificA algorithm for more details.

For more details about design and implementation, please refer to PPTs under docs/ppt/.

Data model & API overview

The data model in Pegasus is (hashkey + sortkey) -> value, in which:

  • Hashkey is used for partitioning. Values with different hash keys may stored in different partitions.
  • Sortkey is used for sorting within a hashkey. Values with the same hashkey but different sortkeys are in the same partition, and ordered by the sort key. If you use scan API to scan a single hashkey, you will get the values by the lexicographical order of sortkeys.

The following diagram shows the data model of Pegasus:

docs/media-img/pegasus-data-model.png

Quick Start

You may want to refer to the installation guide.

Related Projects

Submodules:

Client libs:

Test tools:

Data import/export tools:

How to contribute

We open sourced this project because we know that it is far from mature and needs lots of improvement. So we are looking forward to your contribution.

If you have more questions, please join our slack channel.

License

Copyright 2015-2018 Xiaomi, Inc. Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

pegasus's People

Contributors

0xflotus avatar acelyc111 avatar foreverneverer avatar hycdong avatar l2dy avatar loveheat avatar mentoswang avatar qinzuoyan avatar shengofsun avatar zhangyifan27 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.