apache / incubator-pegasus Goto Github PK

Apache Pegasus - A horizontally scalable, strongly consistent and high-performance key-value store

License: Apache License 2.0

Shell 2.67% CMake 1.45% Makefile 0.05% Python 1.78% C++ 75.23% C 1.04% Thrift 0.78% Dockerfile 0.17% Go 6.37% Java 8.95% Scala 0.50% JavaScript 0.97% HTML 0.02%

pegasus nosql distributed-database key-value-store

incubator-pegasus's Issues

filter not take effect when scan cross different batch

>>> use temp
OK
>>> full_scan
partition: all
hash_key_filter_type: no_filter
sort_key_filter_type: no_filter
batch_size: 100
max_count: 2147483647
timout_ms: 5000
detailed: false
no_value: false

"a" : "m_1" => "a"
"a" : "m_2" => "a"
"a" : "m_3" => "a"
"a" : "m_4" => "a"
"a" : "m_5" => "a"
"a" : "n_1" => "b"
"a" : "n_2" => "b"
"a" : "n_3" => "b"

8 key-value pairs got.
>>> full_scan --batch_size 10 -s prefix -y m
partition: all
hash_key_filter_type: no_filter
sort_key_filter_type: prefix
sort_key_filter_pattern: "m"
batch_size: 10
max_count: 2147483647
timout_ms: 5000
detailed: false
no_value: false

"a" : "m_1" => "a"
"a" : "m_2" => "a"
"a" : "m_3" => "a"
"a" : "m_4" => "a"
"a" : "m_5" => "a"

5 key-value pairs got.
>>> full_scan --batch_size 3 -s prefix -y m
partition: all
hash_key_filter_type: no_filter
sort_key_filter_type: prefix
sort_key_filter_pattern: "m"
batch_size: 3
max_count: 2147483647
timout_ms: 5000
detailed: false
no_value: false

"a" : "m_1" => "a"
"a" : "m_2" => "a"
"a" : "m_3" => "a"
"a" : "m_4" => "a"
"a" : "m_5" => "a"
"a" : "n_1" => "b"
"a" : "n_2" => "b"
"a" : "n_3" => "b"

8 key-value pairs got.
>>>

编译失败

编译环境：
机器： Linux version 3.2.0-61-generic (buildd@roseapple) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #93-Ubuntu SMP Fri May 2 21:31:50 UTC 2014
GCC：4.8.4
CMake：2.8.12.2
错误信息：

skip build fmtlib
skip build Poco
-DPOCO_INCLUDE=/data/bigdata/pegasus/pegasus/rdsn/thirdparty/output/include -DPOCO_LIB=/data/bigdata/pegasus/pegasus/rdsn/thirdparty/output/lib -DGTEST_INCLUDE=/data/bigdata/pegasus/pegasus/rdsn/thirdparty/output/include -DGTEST_LIB=/data/bigdata/pegasus/pegasus/rdsn/thirdparty/output/lib -DCMAKE_POSITION_INDEPENDENT_CODE=ON
-- Configuring done
-- Generating done
-- Build files have been written to: /data/bigdata/pegasus/pegasus/rdsn/thirdparty/build/fds
[ 78%] Built target galaxy-fds-sdk-cpp
[ 84%] Built target sample
Linking CXX executable testrunner
/usr/bin/ld: cannot find -lgtest
/usr/bin/ld: cannot find -lgtest_main
collect2: error: ld returned 1 exit status
make[2]: *** [test/testrunner] Error 1
make[1]: *** [test/CMakeFiles/testrunner.dir/all] Error 2
make: *** [all] Error 2
build fds failed
ERROR: build rdsn failed

coredump when rpc_code is defined but not handled

when pegasus not support INCR operator, we also define RPC_RRDB_RRDB_INCR (refer to https://github.com/XiaoMi/pegasus/blob/v1.9.2/src/server/pegasus_server_impl.cpp#L30 )to keep compatibility with v1.4.x.

if we send RPC_RRDB_RRDB_INCR rpc code using Pegasus Java Client 1.9.0 to Pegasus Server <=1.9.2，then pegasus server will coredump:

D2018-07-15 21:27:23.725 (1531661243725307352 0295) replica.io-thrd.00661: network.cpp:619:on_server_session_accepted(): server session accepted, remote_client = x.x.x.x:xxxxx, current_count = 5
F2018-07-15 21:27:23.727 (1531661243727366961 02aa) replica.default3.030002940001000e: pegasus_server_impl.cpp:86:handle_request(): assertion expression: false
F2018-07-15 21:27:23.727 (1531661243727404626 02aa) replica.default3.030002940001000e: pegasus_server_impl.cpp:86:handle_request(): recv message with unhandled rpc name RPC_RRDB_RRDB_INCR from x.x.x.x:xxxxx, trace_id = 0000000000000000

That is: If some rpc code is defined, but not handled, the server will core. It it not robust enough.

refer to storage_serverlet.h:82

build problem when using toolchain

在build机器上使用toolchain进行编译时，我发现：
如果环境变量设置了：

export LIBRARY_PATH="$DSN_THIRDPARTY_ROOT/lib"

那么CMakeLists.txt中的以下语句不会生效：

link_directories(${DSN_THIRDPARTY_ROOT}/lib)

造成的后果就是${DSN_THIRDPARTY_ROOT}/lib不在链接的-L路径里面，链接时因为找不到库或者找到错误的库，链接失败。

搜到一些可能相关的链接： https://public.kitware.com/Bug/view.php?id=16074

Question about reconfiguration mechanism in Pegasus

Hi I'm getting started with Pegasus by reading PacificA consensus algorithm.

In the paper, primary/secondary data node could suspect its peer has become faulty,
then it reports to configuration manager.

After the configuration manager removes the faulty one, the replica count in this
replication group has been reduced.

In Pegasus's implementation, does the configuration manager pick up a new node and
add it to this replication group automatically? Or do this via some tools?

Thanks in advance.

Backup unit test fails occasionally

Pegasus travis test fails occasionally:

D2018-06-10 13:30:02.390 (1528637402390718359 3e4e)  mimic.io-thrd.15950: client session created, remote_server = 127.0.0.1:34601, current_count = 1
sleep 1 second to wait complete...
D2018-06-10 13:30:02.390 (1528637402390939619 3e59)  mimic.io-thrd.15961: client session connected, remote_server = 127.0.0.1:34601, current_count = 1
	new app_id = 2
sleep 10s to wait app become healthy...
partition[0] is unhealthy, coz primary is invalid...
sleep 10s to wait app become healthy...
partition[0] is unhealthy, coz primary is invalid...
sleep 10s to wait app become healthy...
partition[0] is unhealthy, coz primary is invalid...
sleep 10s to wait app become healthy...
partition[0] is unhealthy, coz primary is invalid...
sleep 10s to wait app become healthy...
partition[0] is unhealthy, coz primary is invalid...
sleep 10s to wait app become healthy...
partition[0] is unhealthy, coz primary is invalid...
sleep 10s to wait app become healthy...
partition[0] is unhealthy, coz primary is invalid...
sleep 10s to wait app become healthy...
partition[0] is unhealthy, coz primary is invalid...
sleep 10s to wait app become healthy...
partition[0] is unhealthy, coz primary is invalid...
sleep 10s to wait app become healthy...
partition[0] is unhealthy, coz primary is invalid...
sleep 10s to wait app become healthy...
partition[0] is unhealthy, coz primary is invalid...
sleep 10s to wait app become healthy...
partition[0] is unhealthy, coz primary is invalid...
/home/travis/build/XiaoMi/pegasus/src/test/function_test/test_restore.cpp:318: Failure
Value of: restore()
  Actual: false
Expected: true

But if you rebuild it, it probably successes.

add read/write throughput statistics counter

Now we have read/write QPS counter, but is not enough, because the size of each request is not take into account.

shell config files conflict when exec shell concurrently under the same tool

背景

广告CTR的AB方案中，需要使用./script/pegasus_set_usage_scenario.sh来设置表的使用场景。

现在c3和c4机房分别有两个集群c3srv-adb和c4srv-adb，需要灌数据的时候，工作流会同时启动两个任务，分别设置这两个集群的usage_scenario。

由于这两个任务在同一个机器上执行，且使用的同一个pegasus tools工具目录下的shell，结果造成总是只有一个集群被设置成功。

原因是：./run.sh shell --cluster=xxx 在执行的时候，会在当前目录下生成config-shell.ini。如果同时有多个shell运行，都会生成config-shell.ini，会出现覆盖问题。

support incr related command in redis proxy

and update the wiki doc.

enable and test batch feature in 2pc write

to improve write performance.

Hive Pegasus Integration

Just like Hive HBase Integration and Hive Mongo Integration, we can integrate Pegasus into hive, then users can query SQL on Pegasus, like Using Hive to interact with HBase.

Sounds cool, isn't it?

Fault injection on write path

We can add random fault injection on pegasus_write_service::impl::db_write to test the condition where rocksdb is failed.

it's needed to judge system load before heavy load work start, eg. manual compact

simplify clearing data of dropped table

现在过期表的物理删除逻辑太复杂，依赖的点太多，需要简化。
参见过期表数据的物理删除。

add kill_partition for kill_test

currently if we want to test the learning in the kill test, we should only kill the replica server process, which is not friendly for memory leak test. perhaps we'd better add a command of killing one partition but not the whole replica server process.

Speed up Travis CI build

now building pegasus takes more than 30 minutes (travis ci timeout is 50 minutes), we need to speed up it.

when new meta server is added, show it in shell command of "cluster_info"

when we run "cluster_info" in shell command, the "meta_servers" list is a static value which shows the configured meta servers when cluster is initialized:

>>> cluster_info
meta_servers        : 10.112.3.11:30601,10.112.3.10:30601
primary_meta_server : 10.112.3.11:30601
zookeeper_hosts     : 10.112.3.11:2181,10.112.3.10:2181,10.112.2.33:2181
zookeeper_root      : /pegasus/c3tst-sample
meta_function_level : freezed

when a new meta server is added to cluster dynamically, the "meta_servers" list won't change.

shoud resolve this.

基于Hash的partition的调度

我看到你们在微信上的文章，partition的调度使用了网络流算法，正好我以前针对这个问题做过一个费用流的算法，目标里包括了分机架和总迁移量尽量小，感兴趣的话可以参考一下。

http://www.jianshu.com/p/686b55e2f96b

fix "io_getevents returns -4" problem

Appears for many times in unit test:

W2018-07-18 10:16:05.669 (1531880165669313326 617f)  mimic.io-thrd.24959: io_getevents returns -4, you probably want to try on another machine:-(

support truncate table

currently the only way to quickly clean the data of a table is drop the table then create a new one. perhaps we can support a "truncate table" command to let users to clean the table quickly.

文档未提及CentOS7平台的一点补充

需要两个文档未提及的包
1、zlib-devel
编译时需要，否则出错
2、nmap-ncat
没有这个会在 ./run.sh start_onebox 这一步骤遇到：
./scripts/start_zk.sh: line 62: nc: command not found

目前编译完成，还在继续作新手尝试。

感谢各位开发者的贡献。

当一个节点挂了比较长的时间或者向一个replicat set里面新增一个节点，数据是如何同步的

当一个节点挂了比较长的时间，重新复活后数据是如何同步的？

cold back support fuse

improve priority_queue to prevent starvation

now the priority queue's dequeue() is:

    T dequeue_impl(/*out*/ long &ct, bool pop = true)
    {
        if (_count == 0) {
            ct = 0;
            return nullptr;
        }

        ct = --_count;

        int index = priority_count - 1;
        for (; index >= 0; index--) {
            if (_items[index].size() > 0) {
                break;
            }
        }

        assert(index >= 0); // "must find something");
        auto c = _items[index].front();
        _items[index].pop();
        return c;
    }

if the HIGH priority queue is always not empty, the task in COMMON/LOW queue may be starved.

we can refer to the implementation of nfs_client_impl.

有怎么编译的说明吗？

run.sh build 不行
cd src
cmake ..

fix pack scripts to pack all dependent library

shell: hide hint options when provided by user

shell core if compiled with clang

add NDEBUG macro when compile pegasus in release mode

now we use lots of assert() in our code, and do not add NDEBUG macro even when compile in release mode.

to improve:

add NDEBUG macro when compile pegasus in release mode
change necessary assert() to dassert()
test and compare performance with NDEBUG and without NDEBUG

编译thrift失败

系统：ubuntu16.04

➜ thirdparty git:(dc3a3ee) ✗ ./build-thirdparty.sh
+++ dirname ./build-thirdparty.sh
++ cd .
++ pwd

TP_DIR=/home/listar/Code/pegasus/rdsn/thirdparty
TP_SRC=/home/listar/Code/pegasus/rdsn/thirdparty/src
TP_BUILD=/home/listar/Code/pegasus/rdsn/thirdparty/build
TP_OUTPUT=/home/listar/Code/pegasus/rdsn/thirdparty/output
export CC=gcc
CC=gcc
export CXX=g++
CXX=g++
CLEAR_OLD_BUILD=NO
BOOST_ROOT=
[[ 0 > 0 ]]
'[' NO = YES ']'
mkdir -p /home/listar/Code/pegasus/rdsn/thirdparty/output/include
mkdir -p /home/listar/Code/pegasus/rdsn/thirdparty/output/lib
mkdir -p /home/listar/Code/pegasus/rdsn/thirdparty/output/bin
'[' '!' -d /home/listar/Code/pegasus/rdsn/thirdparty/output/include/concurrentqueue ']'
echo 'skip build concurrentqueue'
skip build concurrentqueue
'[' '!' -d /home/listar/Code/pegasus/rdsn/thirdparty/output/include/gtest ']'
echo 'skip build gtest'
skip build gtest
'[' '!' -d /home/listar/Code/pegasus/rdsn/thirdparty/output/include/rapidjson ']'
'[' '!' -d /home/listar/Code/pegasus/rdsn/thirdparty/output/include/thrift ']'
mkdir -p /home/listar/Code/pegasus/rdsn/thirdparty/build/thrift-0.9.3
cd /home/listar/Code/pegasus/rdsn/thirdparty/build/thrift-0.9.3
CMAKE_FLAGS='-DCMAKE_BUILD_TYPE=release -DWITH_JAVA=OFF -DWITH_PYTHON=OFF -DWITH_C_GLIB=OFF -DWITH_CPP=ON -DBUILD_TESTING=OFF -DBUILD_EXAMPLES=OFF -DWITH_QT5=OFF -DWITH_QT4=OFF -DWITH_OPENSSL=OFF -DBUILD_COMPILER=OFF -DBUILD_TUTORIALS=OFF -DWITH_LIBEVENT=OFF -DCMAKE_INSTALL_PREFIX=/home/listar/Code/pegasus/rdsn/thirdparty/output -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DWITH_SHARED_LIB=OFF'
'[' x '!=' x ']'
echo -DCMAKE_BUILD_TYPE=release -DWITH_JAVA=OFF -DWITH_PYTHON=OFF -DWITH_C_GLIB=OFF -DWITH_CPP=ON -DBUILD_TESTING=OFF -DBUILD_EXAMPLES=OFF -DWITH_QT5=OFF -DWITH_QT4=OFF -DWITH_OPENSSL=OFF -DBUILD_COMPILER=OFF -DBUILD_TUTORIALS=OFF -DWITH_LIBEVENT=OFF -DCMAKE_INSTALL_PREFIX=/home/listar/Code/pegasus/rdsn/thirdparty/output -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DWITH_SHARED_LIB=OFF
-DCMAKE_BUILD_TYPE=release -DWITH_JAVA=OFF -DWITH_PYTHON=OFF -DWITH_C_GLIB=OFF -DWITH_CPP=ON -DBUILD_TESTING=OFF -DBUILD_EXAMPLES=OFF -DWITH_QT5=OFF -DWITH_QT4=OFF -DWITH_OPENSSL=OFF -DBUILD_COMPILER=OFF -DBUILD_TUTORIALS=OFF -DWITH_LIBEVENT=OFF -DCMAKE_INSTALL_PREFIX=/home/listar/Code/pegasus/rdsn/thirdparty/output -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DWITH_SHARED_LIB=OFF
cmake /home/listar/Code/pegasus/rdsn/thirdparty/src/thrift-0.9.3 -DCMAKE_BUILD_TYPE=release -DWITH_JAVA=OFF -DWITH_PYTHON=OFF -DWITH_C_GLIB=OFF -DWITH_CPP=ON -DBUILD_TESTING=OFF -DBUILD_EXAMPLES=OFF -DWITH_QT5=OFF -DWITH_QT4=OFF -DWITH_OPENSSL=OFF -DBUILD_COMPILER=OFF -DBUILD_TUTORIALS=OFF -DWITH_LIBEVENT=OFF -DCMAKE_INSTALL_PREFIX=/home/listar/Code/pegasus/rdsn/thirdparty/output -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DWITH_SHARED_LIB=OFF
-- Parsed Thrift package version: 0.9.3
-- Parsed Thrift version: 0.9.3 (0.9.3)
-- Building without tests

-- Thrift version: 0.9.3 (0.9.3)
-- Thrift package version: 0.9.3
-- Build configuration Summary
-- Build Thrift compiler: OFF
-- Build with unit tests: OFF
-- Build examples: OFF
-- Build Thrift libraries: ON
-- Language libraries:
-- Build C++ library: OFF
-- - Boost headers missing
-- Build C (GLib) library: OFF
-- - Disabled by via WITH_C_GLIB=OFF
-- Build Java library: OFF
-- - Disabled by via WITH_JAVA=OFF
-- - Ant missing
-- Build Python library: OFF
-- - Disabled by via WITH_PYTHON=OFF
-- Library features:
-- Build shared libraries: OFF
-- Build static libraries: ON
-- Build with ZLIB support: ON
-- Build with libevent support: OFF
-- Build with Qt4 support: OFF
-- Build with Qt5 support: OFF
-- Build with OpenSSL support: OFF
-- Build with Boost thread support: OFF
-- Build with C++ std::thread support: OFF

-- Configuring done
-- Generating done
-- Build files have been written to: /home/listar/Code/pegasus/rdsn/thirdparty/build/thrift-0.9.3

make -j8
make install
make: *** No rule to make target 'install'。停止。
res=2
cd /home/listar/Code/pegasus/rdsn/thirdparty
exit_if_fail thrift 2
'[' 2 -ne 0 ']'
echo 'build thrift failed'
build thrift failed
exit 2
➜ thirdparty

分析

找到原因了，是boost版本太老导致，系统中并存了2个版本，1个版本太老是1.44版本，删除了就ok了。

'GetScanner' not support "reverse" order

Pegasus supports only asc order while scanning datas in default.
In some case, we want to scan the datas in reverse order (or desc order).

“#pragma once“ usage is not recommended

I saw "#pragma once" in header file, it is better use "#ifndef XXX #define XXX ... # endif"

Did pegasus support distributed transaction?

just like the issue title,did it support distributed transaction so I can use it like the following code?

pegasus.beginTransaction();
pegasus.put("key",value);
v = pegasus.get("key");
v++;
pegasus.put("key",v);
pegasus.commit();

multi_set数字字符串bug

// multi set. value="99" ,set 后再读取出来，value 值就错了。
用run.sh shell 看到的内容是：

multi_get_range test_key sortkey sortkez
hash_key: "test_key"
start_sort_key: "sortkey"
start_inclusive: true
stop_sort_key: "sortkez"
stop_inclusive: false
sort_key_filter_type: no_filter
max_count: -1
no_value: false
reverse: false
"test_key" : "sortkey_0" => "V\x7F"
"test_key" : "sortkey_1" => "V\x7F"
"test_key" : "sortkey_2" => "V\x7F"
"test_key" : "sortkey_3" => "V\x7F"
"test_key" : "sortkey_4" => "V\x7F"
"test_key" : "sortkey_5" => "V\x7F"
"test_key" : "sortkey_6" => "V\x7F"

下面是 multi_set 的代码：

int main(int argc, const char *argv[])
{
    if (!pegasus_client_factory::initialize("config.ini")) {
        fprintf(stderr, "ERROR: init pegasus failed\n");
        return -1;
    }

    if (argc < 3) {
        fprintf(stderr, "USAGE: %s <cluster-name> <app-name>\n", argv[0]);
        return -1;
    }

    int  run_key_count = 2;
    if (argc == 4) {
        run_key_count = atoi(argv[3]);
    }

    // set
    pegasus_client *client = pegasus_client_factory::get_client(argv[1], argv[2]);

    std::string hashKey = "test_key";
    std::map<std::string, std::string>  kvs;
    for(int j =0; j < 7; ++j) {
        std::string sortKey = "sortkey_" + std::to_string(j);
        kvs[sortKey ] =  "99";
        printf("test:key:%s,value:%s\n", sortKey.c_str(), kvs[sortKey].c_str());
    }
    int ret = client->multi_set(hashKey, kvs);
    if (ret != PERR_OK) {
        return -1;
    }

    struct pegasus_client::multi_get_options optA;

    std::map<std::string, std::string>  values;
    ret = client->multi_get(hashKey, "sortkey", "sortkez", optA, values);
    if (ret != PERR_OK && ret != PERR_INCOMPLETE ) {
        return -1;
    }

    for ( std::map<std::string, std::string>::iterator it = values.begin(); it != values.end(); ++it ) {
        std::string newValue = "99";
        if (0 != strcmp(newValue.c_str(), it->second.c_str())) {
            fprintf(stdout, "ERROR: multi_get value headKey:%s, sortKey:%s, value:%s != value:%s\n"
                , hashKey.c_str(), it->first.c_str()
                , it->second.c_str(), newValue.c_str());
            return -1;
        }
        fprintf(stdout, "hashkey:%s, sortkey:%s, value:%s\n", hashKey.c_str(), it->first.c_str(), it->second.c_str());
        // del
        ret = client->del(hashKey, it->first);
        if (ret != PERR_OK) {
            fprintf(stderr, "ERROR: del failed, error=%s\n", client->get_error_string(ret));
            return -1;
        }
    }

    return 0;
}

build时出现Not a git repository: ../.git/modules/rdsn 错误

环境
CentOS 7.3.1611
kernel 3.10.0-514.el7.x86_64
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
cmake 2.8.12.2
boost 1.53.0 Release 27.el7

参考：
https://github.com/XiaoMi/pegasus/blob/master/docs/installation.md

1、安装开发包
yum -y install cmake boost-devel libaio-devel snappy-devel bzip2-devel
readline-devel
2、clone
3、build

日志如下：
ln: failed to create symbolic link ‘/root/pegasus/DSN_ROOT’: File exists
INFO: start build rdsn...
CLEAR=NO
BUILD_TYPE=debug
SERIALIZE_TYPE=
GIT_SOURCE=github
ONLY_BUILD=YES
RUN_VERBOSE=NO
WARNING_ALL=NO
ENABLE_GCOV=NO
Use system boost
CMAKE_OPTIONS= -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++
-DCMAKE_BUILD_TYPE=Debug -DDSN_GIT_SOURCE=github
MAKE_OPTIONS= -j8
#############################################################################
fatal: Not a git repository: ../.git/modules/rdsn

fix "minos_client_dir" path in several script and make these scripts support minos2.0

currently, several scripts assume the minos client is in dir "/home/work/pegasus/infra/minos/client", like:

pegasus_rolling_update.sh
pegasus_offline_node_list.sh
pegasus_migrate_zookeeper.sh
pgasus_offline_node.sh

we'd better to fix this by making the variable "minos_client_dir" changable.

besides, minos2.0 should also be supported in these scripts.

Geo Support

User demand

Near query
Within query

Related work

RocksDB
https://github.com/facebook/rocksdb/blob/master/include/rocksdb/utilities/geo_db.h
https://github.com/facebook/rocksdb/tree/master/utilities/geodb
Spatial indexing in RocksDB
https://rocksdb.org/blog/2015/07/17/spatial-indexing-in-rocksdb.html
https://code.facebook.com/videos/632343316893163/geo-spatial-features-in-rocksdb-scale-presentation/
GeoHash
http://www.cnblogs.com/LBSer/p/3310455.html
Spatial Keys: Memory Efficient Geohashes
https://karussell.wordpress.com/2012/05/23/spatial-keys-memory-efficient-geohashes/
Google S2
http://s2geometry.io/
https://www.jianshu.com/p/7332dcb978b2
https://s2.sidewalklabs.com/regioncoverer/
Mongodb Geo2d
https://cloud.tencent.com/developer/article/1004794

support bulk load by creating and ingesting SST files

Though pegasus already provides buld_load usage scenario on table for faster write speed, it still uses set or multiSet inteface to insert data one by one. Definitely it is not fast enough to load very quite a lot of data (typically billions of rows) into Pegasus in a short time.

Maybe we can seek for a better way, considering:

Rocksdb support creating and ingesting SST files
Pegasus support recovering table from external file system (such as HDFS) by loading RocksDB snapshots of each partition, through the cold backup feature.

Then the idea is:

construct RocksDB snapshots for each partition by offline computing (such as MapReduce), and store them on HDFS.
recover table from HDFS by cold backup recovery.

develop ETL tools

including:

DataX (XiaoMi/pegasus-datax#1)

fix counter of collectorapp.pegasusapp.stat.storage_mb#_all_

and collectorapp.pegasusapp.stat.storage_count#all

core in aio_task::~aio_task()

时间

2018/03/15 15:24

版本

Pegasus Server 1.7.0 (9a7a067) Release

平台

CentOS release 6.3 (Final)

现场

work@c3-hadoop-ssd-tst-st04
/home/work/coresave/issue-13

栈信息

#0  0x000000376e4328a5 in raise () from /lib64/libc.so.6
#1  0x000000376e434085 in abort () from /lib64/libc.so.6
#2  0x000000376e46ffe7 in __libc_message () from /lib64/libc.so.6
#3  0x000000376e475916 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007f59df3ebe24 in deallocate (this=<optimized out>, __p=<optimized out>) at /home/work/qinzuoyan/Pegasus/toolchain/output/include/c++/4.8.2/ext/new_allocator.h:110
#5  _M_deallocate (this=<optimized out>, __n=<optimized out>, __p=<optimized out>) at /home/work/qinzuoyan/Pegasus/toolchain/output/include/c++/4.8.2/bits/stl_vector.h:174
#6  ~_Vector_base (this=0x7f5665200abc, __in_chrg=<optimized out>) at /home/work/qinzuoyan/Pegasus/toolchain/output/include/c++/4.8.2/bits/stl_vector.h:160
#7  ~vector (this=0x7f5665200abc, __in_chrg=<optimized out>) at /home/work/qinzuoyan/Pegasus/toolchain/output/include/c++/4.8.2/bits/stl_vector.h:416
#8  dsn::aio_task::~aio_task (this=0x7f56652009e4, __in_chrg=<optimized out>) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/core/core/task.cpp:682
#9  0x00007f59df3ebe89 in dsn::aio_task::~aio_task (this=0x7f56652009e4, __in_chrg=<optimized out>) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/core/core/task.cpp:682
#10 0x00007f59df3ed6ea in release_ref (this=0x7f56652009e4) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/include/dsn/utility/autoref_ptr.h:76
#11 dsn::task::exec_internal (this=this@entry=0x7f56652009e4) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/core/core/task.cpp:242
#12 0x00007f59df47e3fd in dsn::task_worker::loop (this=0x12926f0) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/core/core/task_worker.cpp:323
#13 0x00007f59df47e5c9 in dsn::task_worker::run_internal (this=0x12926f0) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/core/core/task_worker.cpp:302
#14 0x00007f59dd528600 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>)
    at /home/qinzuoyan/git.xiaomi/pegasus/toolchain/objdir/../gcc-4.8.2/libstdc++-v3/src/c++11/thread.cc:84
#15 0x000000376e807851 in start_thread () from /lib64/libpthread.so.0
#16 0x000000376e4e811d in clone () from /lib64/libc.so.6

分析

在aio_task的析构函数出core，具体是在析构_unmerged_write_buffers变量时：

std::vector<dsn_file_buffer_t> _unmerged_write_buffers;

task code为LPC_WRITE_REPLICATION_LOG_SHARED，是写shared log的回调task
aio_task的ref_counter显示正常：

    <dsn::ref_counter> = {
      _vptr.ref_counter = 0x7f59df77b1f0 <vtable for dsn::aio_task+16>, 
      _magic = 3735928559, 
      _counter = {
        <std::__atomic_base<long>> = {
          _M_i = 0
        }, <No data fields>}
    },

_unmerged_write_buffers的栈信息显示正常：

  _unmerged_write_buffers = {
    <std::_Vector_base<dsn_file_buffer_t, std::allocator<dsn_file_buffer_t> >> = {
      _M_impl = {
        <std::allocator<dsn_file_buffer_t>> = {
          <__gnu_cxx::new_allocator<dsn_file_buffer_t>> = {<No data fields>}, <No data fields>}, 
        members of std::_Vector_base<dsn_file_buffer_t, std::allocator<dsn_file_buffer_t> >::_Vector_impl: 
        _M_start = 0x7f56ff82fe20, 
        _M_finish = 0x7f56ff82fe50, 
        _M_end_of_storage = 0x7f56ff82fe60
      }
    }, <No data fields>},

但是_unmerged_write_buffers的内容显示不正常：

(gdb) pvector this._unmerged_write_buffers
elem[0]: $2 = {
  buffer = 0x0, 
  size = 0
}
elem[1]: $3 = {
  buffer = 0x0, 
  size = 0
}
elem[2]: $4 = {
  buffer = 0x7f56642154b0, 
  size = 0
}
Vector size = 3
Vector capacity = 4
Element type = std::_Vector_base<dsn_file_buffer_t, std::allocator<dsn_file_buffer_t> >::pointer

unexpected low qps on ssd machine, any thing wrong?

Init pegasus succeed
LevelDB: version 4.0
Date: Thu Nov 9 08:13:07 2017
CPU: 2 * Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
CPUCache: 46080 KB
Keys: 16 bytes each
Values: 100 bytes each (100 bytes after compression)
Entries: 100000
Prefix: 0 bytes
Keys per prefix: 0
RawSize: 11.1 MB (estimated)
FileSize: 11.1 MB (estimated)
Writes per second: 0
Compression: NoCompression
Memtablerep: skip_list
Perf Level: 0
WARNING: Optimization is disabled: benchmarks unnecessarily slow
WARNING: Assertions are enabled; benchmarks unnecessarily slow

Thread Count Runtime QPS AvgLat P99Lat
1 10000 11.763 850 1176 2425
2 20000 20.256 987 2018 4953
3 30000 29.534 1015 2938 7817
4 40000 38.809 1030 3765 12794

apache / incubator-pegasus Goto Github PK

incubator-pegasus's Issues

背景

分析

User demand

Related work

时间

版本

平台

现场

栈信息

分析

Recommend Projects

Recommend Topics

Recommend Org