Code Monkey home page Code Monkey logo

apache / incubator-pegasus Goto Github PK

View Code? Open in Web Editor NEW
1.9K 97.0 309.0 124.55 MB

Apache Pegasus - A horizontally scalable, strongly consistent and high-performance key-value store

Home Page: https://pegasus.apache.org/

License: Apache License 2.0

Shell 2.67% CMake 1.45% Makefile 0.05% Python 1.78% C++ 75.23% C 1.04% Thrift 0.78% Dockerfile 0.17% Go 6.37% Java 8.95% Scala 0.50% JavaScript 0.97% HTML 0.02%
pegasus nosql distributed-database key-value-store

incubator-pegasus's Issues

filter not take effect when scan cross different batch

>>> use temp
OK
>>> full_scan
partition: all
hash_key_filter_type: no_filter
sort_key_filter_type: no_filter
batch_size: 100
max_count: 2147483647
timout_ms: 5000
detailed: false
no_value: false

"a" : "m_1" => "a"
"a" : "m_2" => "a"
"a" : "m_3" => "a"
"a" : "m_4" => "a"
"a" : "m_5" => "a"
"a" : "n_1" => "b"
"a" : "n_2" => "b"
"a" : "n_3" => "b"

8 key-value pairs got.
>>> full_scan --batch_size 10 -s prefix -y m
partition: all
hash_key_filter_type: no_filter
sort_key_filter_type: prefix
sort_key_filter_pattern: "m"
batch_size: 10
max_count: 2147483647
timout_ms: 5000
detailed: false
no_value: false

"a" : "m_1" => "a"
"a" : "m_2" => "a"
"a" : "m_3" => "a"
"a" : "m_4" => "a"
"a" : "m_5" => "a"

5 key-value pairs got.
>>> full_scan --batch_size 3 -s prefix -y m
partition: all
hash_key_filter_type: no_filter
sort_key_filter_type: prefix
sort_key_filter_pattern: "m"
batch_size: 3
max_count: 2147483647
timout_ms: 5000
detailed: false
no_value: false

"a" : "m_1" => "a"
"a" : "m_2" => "a"
"a" : "m_3" => "a"
"a" : "m_4" => "a"
"a" : "m_5" => "a"
"a" : "n_1" => "b"
"a" : "n_2" => "b"
"a" : "n_3" => "b"

8 key-value pairs got.
>>> 

编译失败

编译环境:
机器: Linux version 3.2.0-61-generic (buildd@roseapple) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #93-Ubuntu SMP Fri May 2 21:31:50 UTC 2014
GCC:4.8.4
CMake:2.8.12.2
错误信息:

skip build fmtlib
skip build Poco
-DPOCO_INCLUDE=/data/bigdata/pegasus/pegasus/rdsn/thirdparty/output/include -DPOCO_LIB=/data/bigdata/pegasus/pegasus/rdsn/thirdparty/output/lib -DGTEST_INCLUDE=/data/bigdata/pegasus/pegasus/rdsn/thirdparty/output/include -DGTEST_LIB=/data/bigdata/pegasus/pegasus/rdsn/thirdparty/output/lib -DCMAKE_POSITION_INDEPENDENT_CODE=ON
-- Configuring done
-- Generating done
-- Build files have been written to: /data/bigdata/pegasus/pegasus/rdsn/thirdparty/build/fds
[ 78%] Built target galaxy-fds-sdk-cpp
[ 84%] Built target sample
Linking CXX executable testrunner
/usr/bin/ld: cannot find -lgtest
/usr/bin/ld: cannot find -lgtest_main
collect2: error: ld returned 1 exit status
make[2]: *** [test/testrunner] Error 1
make[1]: *** [test/CMakeFiles/testrunner.dir/all] Error 2
make: *** [all] Error 2
build fds failed
ERROR: build rdsn failed

coredump when rpc_code is defined but not handled

when pegasus not support INCR operator, we also define RPC_RRDB_RRDB_INCR (refer to https://github.com/XiaoMi/pegasus/blob/v1.9.2/src/server/pegasus_server_impl.cpp#L30 )to keep compatibility with v1.4.x.

if we send RPC_RRDB_RRDB_INCR rpc code using Pegasus Java Client 1.9.0 to Pegasus Server <=1.9.2,then pegasus server will coredump:

D2018-07-15 21:27:23.725 (1531661243725307352 0295) replica.io-thrd.00661: network.cpp:619:on_server_session_accepted(): server session accepted, remote_client = x.x.x.x:xxxxx, current_count = 5
F2018-07-15 21:27:23.727 (1531661243727366961 02aa) replica.default3.030002940001000e: pegasus_server_impl.cpp:86:handle_request(): assertion expression: false
F2018-07-15 21:27:23.727 (1531661243727404626 02aa) replica.default3.030002940001000e: pegasus_server_impl.cpp:86:handle_request(): recv message with unhandled rpc name RPC_RRDB_RRDB_INCR from x.x.x.x:xxxxx, trace_id = 0000000000000000

That is: If some rpc code is defined, but not handled, the server will core. It it not robust enough.

refer to storage_serverlet.h:82

build problem when using toolchain

在build机器上使用toolchain进行编译时,我发现:
如果环境变量设置了:

export LIBRARY_PATH="$DSN_THIRDPARTY_ROOT/lib"

那么CMakeLists.txt中的以下语句不会生效:

link_directories(${DSN_THIRDPARTY_ROOT}/lib)

造成的后果就是${DSN_THIRDPARTY_ROOT}/lib不在链接的-L路径里面,链接时因为找不到库或者找到错误的库,链接失败。

搜到一些可能相关的链接: https://public.kitware.com/Bug/view.php?id=16074

Question about reconfiguration mechanism in Pegasus

Hi I'm getting started with Pegasus by reading PacificA consensus algorithm.

In the paper, primary/secondary data node could suspect its peer has become faulty,
then it reports to configuration manager.

After the configuration manager removes the faulty one, the replica count in this
replication group has been reduced.

In Pegasus's implementation, does the configuration manager pick up a new node and
add it to this replication group automatically? Or do this via some tools?

Thanks in advance.

Backup unit test fails occasionally

Pegasus travis test fails occasionally:

D2018-06-10 13:30:02.390 (1528637402390718359 3e4e)  mimic.io-thrd.15950: client session created, remote_server = 127.0.0.1:34601, current_count = 1
sleep 1 second to wait complete...
D2018-06-10 13:30:02.390 (1528637402390939619 3e59)  mimic.io-thrd.15961: client session connected, remote_server = 127.0.0.1:34601, current_count = 1
	new app_id = 2
sleep 10s to wait app become healthy...
partition[0] is unhealthy, coz primary is invalid...
sleep 10s to wait app become healthy...
partition[0] is unhealthy, coz primary is invalid...
sleep 10s to wait app become healthy...
partition[0] is unhealthy, coz primary is invalid...
sleep 10s to wait app become healthy...
partition[0] is unhealthy, coz primary is invalid...
sleep 10s to wait app become healthy...
partition[0] is unhealthy, coz primary is invalid...
sleep 10s to wait app become healthy...
partition[0] is unhealthy, coz primary is invalid...
sleep 10s to wait app become healthy...
partition[0] is unhealthy, coz primary is invalid...
sleep 10s to wait app become healthy...
partition[0] is unhealthy, coz primary is invalid...
sleep 10s to wait app become healthy...
partition[0] is unhealthy, coz primary is invalid...
sleep 10s to wait app become healthy...
partition[0] is unhealthy, coz primary is invalid...
sleep 10s to wait app become healthy...
partition[0] is unhealthy, coz primary is invalid...
sleep 10s to wait app become healthy...
partition[0] is unhealthy, coz primary is invalid...
/home/travis/build/XiaoMi/pegasus/src/test/function_test/test_restore.cpp:318: Failure
Value of: restore()
  Actual: false
Expected: true

But if you rebuild it, it probably successes.

shell config files conflict when exec shell concurrently under the same tool

背景

广告CTR的AB方案中,需要使用./script/pegasus_set_usage_scenario.sh来设置表的使用场景。

现在c3和c4机房分别有两个集群c3srv-adb和c4srv-adb,需要灌数据的时候,工作流会同时启动两个任务,分别设置这两个集群的usage_scenario。

由于这两个任务在同一个机器上执行,且使用的同一个pegasus tools工具目录下的shell,结果造成总是只有一个集群被设置成功。

原因是:./run.sh shell --cluster=xxx 在执行的时候,会在当前目录下生成config-shell.ini。如果同时有多个shell运行,都会生成config-shell.ini,会出现覆盖问题。

Fault injection on write path

We can add random fault injection on pegasus_write_service::impl::db_write to test the condition where rocksdb is failed.

add kill_partition for kill_test

currently if we want to test the learning in the kill test, we should only kill the replica server process, which is not friendly for memory leak test. perhaps we'd better add a command of killing one partition but not the whole replica server process.

Speed up Travis CI build

now building pegasus takes more than 30 minutes (travis ci timeout is 50 minutes), we need to speed up it.

when new meta server is added, show it in shell command of "cluster_info"

when we run "cluster_info" in shell command, the "meta_servers" list is a static value which shows the configured meta servers when cluster is initialized:

>>> cluster_info
meta_servers        : 10.112.3.11:30601,10.112.3.10:30601
primary_meta_server : 10.112.3.11:30601
zookeeper_hosts     : 10.112.3.11:2181,10.112.3.10:2181,10.112.2.33:2181
zookeeper_root      : /pegasus/c3tst-sample
meta_function_level : freezed

when a new meta server is added to cluster dynamically, the "meta_servers" list won't change.

shoud resolve this.

fix "io_getevents returns -4" problem

Appears for many times in unit test:

W2018-07-18 10:16:05.669 (1531880165669313326 617f)  mimic.io-thrd.24959: io_getevents returns -4, you probably want to try on another machine:-(

support truncate table

currently the only way to quickly clean the data of a table is drop the table then create a new one. perhaps we can support a "truncate table" command to let users to clean the table quickly.

文档未提及CentOS7平台的一点补充

需要两个文档未提及的包
1、zlib-devel
编译时需要,否则出错
2、nmap-ncat
没有这个会在 ./run.sh start_onebox 这一步骤遇到:
./scripts/start_zk.sh: line 62: nc: command not found

目前编译完成,还在继续作新手尝试。

感谢各位开发者的贡献。

improve priority_queue to prevent starvation

now the priority queue's dequeue() is:

    T dequeue_impl(/*out*/ long &ct, bool pop = true)
    {
        if (_count == 0) {
            ct = 0;
            return nullptr;
        }

        ct = --_count;

        int index = priority_count - 1;
        for (; index >= 0; index--) {
            if (_items[index].size() > 0) {
                break;
            }
        }

        assert(index >= 0); // "must find something");
        auto c = _items[index].front();
        _items[index].pop();
        return c;
    }

if the HIGH priority queue is always not empty, the task in COMMON/LOW queue may be starved.

we can refer to the implementation of nfs_client_impl.

add NDEBUG macro when compile pegasus in release mode

now we use lots of assert() in our code, and do not add NDEBUG macro even when compile in release mode.

to improve:

  • add NDEBUG macro when compile pegasus in release mode
  • change necessary assert() to dassert()
  • test and compare performance with NDEBUG and without NDEBUG

编译thrift失败

系统:ubuntu16.04

➜ thirdparty git:(dc3a3ee) ✗ ./build-thirdparty.sh
+++ dirname ./build-thirdparty.sh
++ cd .
++ pwd

  • TP_DIR=/home/listar/Code/pegasus/rdsn/thirdparty
  • TP_SRC=/home/listar/Code/pegasus/rdsn/thirdparty/src
  • TP_BUILD=/home/listar/Code/pegasus/rdsn/thirdparty/build
  • TP_OUTPUT=/home/listar/Code/pegasus/rdsn/thirdparty/output
  • export CC=gcc
  • CC=gcc
  • export CXX=g++
  • CXX=g++
  • CLEAR_OLD_BUILD=NO
  • BOOST_ROOT=
  • [[ 0 > 0 ]]
  • '[' NO = YES ']'
  • mkdir -p /home/listar/Code/pegasus/rdsn/thirdparty/output/include
  • mkdir -p /home/listar/Code/pegasus/rdsn/thirdparty/output/lib
  • mkdir -p /home/listar/Code/pegasus/rdsn/thirdparty/output/bin
  • '[' '!' -d /home/listar/Code/pegasus/rdsn/thirdparty/output/include/concurrentqueue ']'
  • echo 'skip build concurrentqueue'
    skip build concurrentqueue
  • '[' '!' -d /home/listar/Code/pegasus/rdsn/thirdparty/output/include/gtest ']'
  • echo 'skip build gtest'
    skip build gtest
  • '[' '!' -d /home/listar/Code/pegasus/rdsn/thirdparty/output/include/rapidjson ']'
  • '[' '!' -d /home/listar/Code/pegasus/rdsn/thirdparty/output/include/thrift ']'
  • mkdir -p /home/listar/Code/pegasus/rdsn/thirdparty/build/thrift-0.9.3
  • cd /home/listar/Code/pegasus/rdsn/thirdparty/build/thrift-0.9.3
  • CMAKE_FLAGS='-DCMAKE_BUILD_TYPE=release -DWITH_JAVA=OFF -DWITH_PYTHON=OFF -DWITH_C_GLIB=OFF -DWITH_CPP=ON -DBUILD_TESTING=OFF -DBUILD_EXAMPLES=OFF -DWITH_QT5=OFF -DWITH_QT4=OFF -DWITH_OPENSSL=OFF -DBUILD_COMPILER=OFF -DBUILD_TUTORIALS=OFF -DWITH_LIBEVENT=OFF -DCMAKE_INSTALL_PREFIX=/home/listar/Code/pegasus/rdsn/thirdparty/output -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DWITH_SHARED_LIB=OFF'
  • '[' x '!=' x ']'
  • echo -DCMAKE_BUILD_TYPE=release -DWITH_JAVA=OFF -DWITH_PYTHON=OFF -DWITH_C_GLIB=OFF -DWITH_CPP=ON -DBUILD_TESTING=OFF -DBUILD_EXAMPLES=OFF -DWITH_QT5=OFF -DWITH_QT4=OFF -DWITH_OPENSSL=OFF -DBUILD_COMPILER=OFF -DBUILD_TUTORIALS=OFF -DWITH_LIBEVENT=OFF -DCMAKE_INSTALL_PREFIX=/home/listar/Code/pegasus/rdsn/thirdparty/output -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DWITH_SHARED_LIB=OFF
    -DCMAKE_BUILD_TYPE=release -DWITH_JAVA=OFF -DWITH_PYTHON=OFF -DWITH_C_GLIB=OFF -DWITH_CPP=ON -DBUILD_TESTING=OFF -DBUILD_EXAMPLES=OFF -DWITH_QT5=OFF -DWITH_QT4=OFF -DWITH_OPENSSL=OFF -DBUILD_COMPILER=OFF -DBUILD_TUTORIALS=OFF -DWITH_LIBEVENT=OFF -DCMAKE_INSTALL_PREFIX=/home/listar/Code/pegasus/rdsn/thirdparty/output -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DWITH_SHARED_LIB=OFF
  • cmake /home/listar/Code/pegasus/rdsn/thirdparty/src/thrift-0.9.3 -DCMAKE_BUILD_TYPE=release -DWITH_JAVA=OFF -DWITH_PYTHON=OFF -DWITH_C_GLIB=OFF -DWITH_CPP=ON -DBUILD_TESTING=OFF -DBUILD_EXAMPLES=OFF -DWITH_QT5=OFF -DWITH_QT4=OFF -DWITH_OPENSSL=OFF -DBUILD_COMPILER=OFF -DBUILD_TUTORIALS=OFF -DWITH_LIBEVENT=OFF -DCMAKE_INSTALL_PREFIX=/home/listar/Code/pegasus/rdsn/thirdparty/output -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DWITH_SHARED_LIB=OFF
    -- Parsed Thrift package version: 0.9.3
    -- Parsed Thrift version: 0.9.3 (0.9.3)
    -- Building without tests

-- Thrift version: 0.9.3 (0.9.3)
-- Thrift package version: 0.9.3
-- Build configuration Summary
-- Build Thrift compiler: OFF
-- Build with unit tests: OFF
-- Build examples: OFF
-- Build Thrift libraries: ON
-- Language libraries:
-- Build C++ library: OFF
-- - Boost headers missing
-- Build C (GLib) library: OFF
-- - Disabled by via WITH_C_GLIB=OFF
-- Build Java library: OFF
-- - Disabled by via WITH_JAVA=OFF
-- - Ant missing
-- Build Python library: OFF
-- - Disabled by via WITH_PYTHON=OFF
-- Library features:
-- Build shared libraries: OFF
-- Build static libraries: ON
-- Build with ZLIB support: ON
-- Build with libevent support: OFF
-- Build with Qt4 support: OFF
-- Build with Qt5 support: OFF
-- Build with OpenSSL support: OFF
-- Build with Boost thread support: OFF
-- Build with C++ std::thread support: OFF


-- Configuring done
-- Generating done
-- Build files have been written to: /home/listar/Code/pegasus/rdsn/thirdparty/build/thrift-0.9.3

  • make -j8
  • make install
    make: *** No rule to make target 'install'。 停止。
  • res=2
  • cd /home/listar/Code/pegasus/rdsn/thirdparty
  • exit_if_fail thrift 2
  • '[' 2 -ne 0 ']'
  • echo 'build thrift failed'
    build thrift failed
  • exit 2
    ➜ thirdparty

分析

找到原因了,是boost版本太老导致,系统中并存了2个版本,1个版本太老是1.44版本,删除了就ok了。

Did pegasus support distributed transaction?

just like the issue title,did it support distributed transaction so I can use it like the following code?

pegasus.beginTransaction();
pegasus.put("key",value);
v = pegasus.get("key");
v++;
pegasus.put("key",v);
pegasus.commit();

multi_set数字字符串bug

// multi set. value="99" ,set 后再读取出来,value 值就错了。
用run.sh shell 看到的内容是:

multi_get_range test_key sortkey sortkez
hash_key: "test_key"
start_sort_key: "sortkey"
start_inclusive: true
stop_sort_key: "sortkez"
stop_inclusive: false
sort_key_filter_type: no_filter
max_count: -1
no_value: false
reverse: false
"test_key" : "sortkey_0" => "V\x7F"
"test_key" : "sortkey_1" => "V\x7F"
"test_key" : "sortkey_2" => "V\x7F"
"test_key" : "sortkey_3" => "V\x7F"
"test_key" : "sortkey_4" => "V\x7F"
"test_key" : "sortkey_5" => "V\x7F"
"test_key" : "sortkey_6" => "V\x7F"

下面是 multi_set 的代码:

int main(int argc, const char *argv[])
{
    if (!pegasus_client_factory::initialize("config.ini")) {
        fprintf(stderr, "ERROR: init pegasus failed\n");
        return -1;
    }

    if (argc < 3) {
        fprintf(stderr, "USAGE: %s <cluster-name> <app-name>\n", argv[0]);
        return -1;
    }

    int  run_key_count = 2;
    if (argc == 4) {
        run_key_count = atoi(argv[3]);
    }

    // set
    pegasus_client *client = pegasus_client_factory::get_client(argv[1], argv[2]);

    std::string hashKey = "test_key";
    std::map<std::string, std::string>  kvs;
    for(int j =0; j < 7; ++j) {
        std::string sortKey = "sortkey_" + std::to_string(j);
        kvs[sortKey ] =  "99";
        printf("test:key:%s,value:%s\n", sortKey.c_str(), kvs[sortKey].c_str());
    }
    int ret = client->multi_set(hashKey, kvs);
    if (ret != PERR_OK) {
        return -1;
    }

    struct pegasus_client::multi_get_options optA;

    std::map<std::string, std::string>  values;
    ret = client->multi_get(hashKey, "sortkey", "sortkez", optA, values);
    if (ret != PERR_OK && ret != PERR_INCOMPLETE ) {
        return -1;
    }

    for ( std::map<std::string, std::string>::iterator it = values.begin(); it != values.end(); ++it ) {
        std::string newValue = "99";
        if (0 != strcmp(newValue.c_str(), it->second.c_str())) {
            fprintf(stdout, "ERROR: multi_get value headKey:%s, sortKey:%s, value:%s != value:%s\n"
                , hashKey.c_str(), it->first.c_str()
                , it->second.c_str(), newValue.c_str());
            return -1;
        }
        fprintf(stdout, "hashkey:%s, sortkey:%s, value:%s\n", hashKey.c_str(), it->first.c_str(), it->second.c_str());
        // del
        ret = client->del(hashKey, it->first);
        if (ret != PERR_OK) {
            fprintf(stderr, "ERROR: del failed, error=%s\n", client->get_error_string(ret));
            return -1;
        }
    }

    return 0;
}

build时出现Not a git repository: ../.git/modules/rdsn 错误

环境
CentOS 7.3.1611
kernel 3.10.0-514.el7.x86_64
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
cmake 2.8.12.2
boost 1.53.0 Release 27.el7

参考:
https://github.com/XiaoMi/pegasus/blob/master/docs/installation.md

1、安装开发包
yum -y install cmake boost-devel libaio-devel snappy-devel bzip2-devel
readline-devel
2、clone
3、build

日志如下:
ln: failed to create symbolic link ‘/root/pegasus/DSN_ROOT’: File exists
INFO: start build rdsn...
CLEAR=NO
BUILD_TYPE=debug
SERIALIZE_TYPE=
GIT_SOURCE=github
ONLY_BUILD=YES
RUN_VERBOSE=NO
WARNING_ALL=NO
ENABLE_GCOV=NO
Use system boost
CMAKE_OPTIONS= -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++
-DCMAKE_BUILD_TYPE=Debug -DDSN_GIT_SOURCE=github
MAKE_OPTIONS= -j8
#############################################################################
fatal: Not a git repository: ../.git/modules/rdsn

fix "minos_client_dir" path in several script and make these scripts support minos2.0

currently, several scripts assume the minos client is in dir "/home/work/pegasus/infra/minos/client", like:

  • pegasus_rolling_update.sh
  • pegasus_offline_node_list.sh
  • pegasus_migrate_zookeeper.sh
  • pgasus_offline_node.sh

we'd better to fix this by making the variable "minos_client_dir" changable.

besides, minos2.0 should also be supported in these scripts.

Geo Support

support bulk load by creating and ingesting SST files

Though pegasus already provides buld_load usage scenario on table for faster write speed, it still uses set or multiSet inteface to insert data one by one. Definitely it is not fast enough to load very quite a lot of data (typically billions of rows) into Pegasus in a short time.

Maybe we can seek for a better way, considering:

Then the idea is:

  • construct RocksDB snapshots for each partition by offline computing (such as MapReduce), and store them on HDFS.
  • recover table from HDFS by cold backup recovery.

core in aio_task::~aio_task()

时间

2018/03/15 15:24

版本

Pegasus Server 1.7.0 (9a7a067) Release

平台

CentOS release 6.3 (Final)

现场

work@c3-hadoop-ssd-tst-st04
/home/work/coresave/issue-13

栈信息

#0  0x000000376e4328a5 in raise () from /lib64/libc.so.6
#1  0x000000376e434085 in abort () from /lib64/libc.so.6
#2  0x000000376e46ffe7 in __libc_message () from /lib64/libc.so.6
#3  0x000000376e475916 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007f59df3ebe24 in deallocate (this=<optimized out>, __p=<optimized out>) at /home/work/qinzuoyan/Pegasus/toolchain/output/include/c++/4.8.2/ext/new_allocator.h:110
#5  _M_deallocate (this=<optimized out>, __n=<optimized out>, __p=<optimized out>) at /home/work/qinzuoyan/Pegasus/toolchain/output/include/c++/4.8.2/bits/stl_vector.h:174
#6  ~_Vector_base (this=0x7f5665200abc, __in_chrg=<optimized out>) at /home/work/qinzuoyan/Pegasus/toolchain/output/include/c++/4.8.2/bits/stl_vector.h:160
#7  ~vector (this=0x7f5665200abc, __in_chrg=<optimized out>) at /home/work/qinzuoyan/Pegasus/toolchain/output/include/c++/4.8.2/bits/stl_vector.h:416
#8  dsn::aio_task::~aio_task (this=0x7f56652009e4, __in_chrg=<optimized out>) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/core/core/task.cpp:682
#9  0x00007f59df3ebe89 in dsn::aio_task::~aio_task (this=0x7f56652009e4, __in_chrg=<optimized out>) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/core/core/task.cpp:682
#10 0x00007f59df3ed6ea in release_ref (this=0x7f56652009e4) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/include/dsn/utility/autoref_ptr.h:76
#11 dsn::task::exec_internal (this=this@entry=0x7f56652009e4) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/core/core/task.cpp:242
#12 0x00007f59df47e3fd in dsn::task_worker::loop (this=0x12926f0) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/core/core/task_worker.cpp:323
#13 0x00007f59df47e5c9 in dsn::task_worker::run_internal (this=0x12926f0) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/core/core/task_worker.cpp:302
#14 0x00007f59dd528600 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>)
    at /home/qinzuoyan/git.xiaomi/pegasus/toolchain/objdir/../gcc-4.8.2/libstdc++-v3/src/c++11/thread.cc:84
#15 0x000000376e807851 in start_thread () from /lib64/libpthread.so.0
#16 0x000000376e4e811d in clone () from /lib64/libc.so.6

分析

  • 在aio_task的析构函数出core,具体是在析构_unmerged_write_buffers变量时:
std::vector<dsn_file_buffer_t> _unmerged_write_buffers;
  • task code为LPC_WRITE_REPLICATION_LOG_SHARED,是写shared log的回调task
  • aio_task的ref_counter显示正常:
    <dsn::ref_counter> = {
      _vptr.ref_counter = 0x7f59df77b1f0 <vtable for dsn::aio_task+16>, 
      _magic = 3735928559, 
      _counter = {
        <std::__atomic_base<long>> = {
          _M_i = 0
        }, <No data fields>}
    }, 
  • _unmerged_write_buffers的栈信息显示正常:
  _unmerged_write_buffers = {
    <std::_Vector_base<dsn_file_buffer_t, std::allocator<dsn_file_buffer_t> >> = {
      _M_impl = {
        <std::allocator<dsn_file_buffer_t>> = {
          <__gnu_cxx::new_allocator<dsn_file_buffer_t>> = {<No data fields>}, <No data fields>}, 
        members of std::_Vector_base<dsn_file_buffer_t, std::allocator<dsn_file_buffer_t> >::_Vector_impl: 
        _M_start = 0x7f56ff82fe20, 
        _M_finish = 0x7f56ff82fe50, 
        _M_end_of_storage = 0x7f56ff82fe60
      }
    }, <No data fields>}, 
  • 但是_unmerged_write_buffers的内容显示不正常:
(gdb) pvector this._unmerged_write_buffers
elem[0]: $2 = {
  buffer = 0x0, 
  size = 0
}
elem[1]: $3 = {
  buffer = 0x0, 
  size = 0
}
elem[2]: $4 = {
  buffer = 0x7f56642154b0, 
  size = 0
}
Vector size = 3
Vector capacity = 4
Element type = std::_Vector_base<dsn_file_buffer_t, std::allocator<dsn_file_buffer_t> >::pointer

unexpected low qps on ssd machine, any thing wrong?

Init pegasus succeed
LevelDB: version 4.0
Date: Thu Nov 9 08:13:07 2017
CPU: 2 * Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
CPUCache: 46080 KB
Keys: 16 bytes each
Values: 100 bytes each (100 bytes after compression)
Entries: 100000
Prefix: 0 bytes
Keys per prefix: 0
RawSize: 11.1 MB (estimated)
FileSize: 11.1 MB (estimated)
Writes per second: 0
Compression: NoCompression
Memtablerep: skip_list
Perf Level: 0
WARNING: Optimization is disabled: benchmarks unnecessarily slow
WARNING: Assertions are enabled; benchmarks unnecessarily slow

Thread Count Runtime QPS AvgLat P99Lat
1 10000 11.763 850 1176 2425
2 20000 20.256 987 2018 4953
3 30000 29.534 1015 2938 7817
4 40000 38.809 1030 3765 12794

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.