tencent / tendbcluster-tspider Goto Github PK
View Code? Open in Web Editor NEWTSpider is the proxy node of TenDB Cluster.
License: GNU General Public License v2.0
TSpider is the proxy node of TenDB Cluster.
License: GNU General Public License v2.0
sql_big_result
will be omitted when using Spider SE.spider_max_connection
is set to 500.print_bool_as_literal
. When off, it will dispatch 1/0 when dealing with boolean value in JSON functions, otherwise, it will dispatch boolean value as its literal. The default value is OFF.ha_list
in order to prevent hang in committing phase.group by
and aggregation functions under some circumstances.distinct
and limit
under some circumstances.max_connection
times the number of storage nodes.Release date: May. 26, 2021
This is not reproducible in MariaDB, or I haven't tested all situations.
This has been fixed in the version 3.5.2
CREATE TABLE t1 (
uid INT PRIMARY KEY,
id INT,
name CHAR(20)
)ENGINE=SPIDER COMMENT='wrapper "mysql", table "t1"'
PARTITION BY LIST (uid%2)
(
PARTITION pt0 VALUES IN (0) COMMENT = 'srv "bk0"',
PARTITION pt1 VALUES IN (1) COMMENT = 'srv "bk1"'
);
INSERT INTO t1 VALUES (1, 1, 'a'), (3, 3, 'c'), (5, 3, 'd'), (7, 3, 'f'), (9, 3, 'g'), (11, 3, 'h'), (13, 3, 'i'), (15, 4, 'd');
INSERT INTO t1 VALUES (2, 2, 'b'), (4, 2, 'c');
If we execute TSpider1 select distinct id from t1 where name > 'a' limit 3
,
Spider will send
select distinct `uid`,`id`,`name` from `tspider1`.`t1` where (`name` > 'a') limit 3
to both remote, which would get the id of 3
from the first remote and 2
from the second remote, resulting in a miss of 4
in the first remote.
MariaDB Spider use a way to split read the result from two remotes like
# for remote 1
select distinct `id`,`name` from `test1_0`.`t1` where (`name` > 'a') limit 6
select distinct `id`,`name` from `test1_1`.`t1` where (`name` > 'a') limit 6,6
# for remote 2
select distinct `id`,`name` from `test1_1`.`t1` where (`name` > 'a') limit 6
This is closely related to the following variables
mysql> show variables like "spider%split%";
+------------------------------+-------+
| Variable_name | Value |
+------------------------------+-------+
| spider_multi_split_read | -1 |
| spider_semi_split_read | -1 |
| spider_semi_split_read_limit | -1 |
| spider_split_read | -1 |
+------------------------------+-------+
4 rows in set (0.01 sec)
User can set an array of spider_log_ignore_err_numbers
,
when spider print errors, it will lock up whether the current error number
is in the array, if true, it will not output the error into a log.
The default value of this variable is "" (empty);
For example:
set global spider_log_ignore_err_numbers = "1062,12701,1067";
When spider output log, it will ignore error numbers 1062, 12701, and 1067.
the current bitmap is
static const ulonglong SPD_ERR_DUPLICATE = 1U << 0; // 1062
static const ulonglong SPD_ERR_GONE_AWAWY = 1U << 1; // 12701
static const ulonglong SPD_ERR_TOO_MANY_CONN = 1U << 2; // 12723
static const ulonglong SPD_ERR_NO_REMOTE_EXIST = 1U << 3; // 1477
static const ulonglong SPD_ERR_CONN_REMOTE = 1U << 4; // 1429
static const ulonglong SPD_ERR_INVALID_DEFAULT = 1U << 5; // 1067
static const ulonglong SPD_ERR_INVALID_DATE = 1U << 6; // 1292
static const ulonglong SPD_ERR_TRUNCATE_VALUE = 1U << 7; // 1366
static const ulonglong SPD_ERR_BAD_TYPE_VALUE = 1U << 8; // 1411
static const ulonglong SPD_ERR_NET_TIMEOUT = 1U << 9; // 1159
static const ulonglong SPD_ERR_COM_OUT_OF_SYNC = 1U << 10;// 2014
Added global variables spider_parallel_limit
with the default value "FALSE". when spider_parallel_limit
is true, spider will parallel dispatch "select + limit" query
Some SQL will return "Empty Query" on TSpider, when it is associated with the AUTO-INCREMENT row.
See fix: e3e5bb7
Please refer to https://github.com/Tencent/TenDBCluster-TSpider/releases/tag/tspider-3.5.3
spider_direct_insert_ignore
,如果为0,对于insert ignore into ...
语句,tspider不会下发ignore给后端,此时如果出现唯一键/主键重复,会返回warning。参数值如果为1,此时tspider会让后端处理ignore,如果出现唯一键/主键重复,tspider不会返回warning。默认值为0。join_cache_level
若大于4则对tspider无效。(解决了之前INSERT INTO m(id) SELECT id FROM tbl WHERE id NOT IN (94933594) AND flag = 1
;会产生duplicate key error的问题)create server ...
时会crash的问题建表的时候不支持 GEOMETRY,是否有其他方法解决?
see at MDEV-26315.
Fix: 01c278b
BUG
We fixed the problem by resolving some multi-threaded conflicts and avoid information_schema.auto_increment
to affect our auto-increment value. In other words, we no longer maintain the value of it.
Upon 3.6.4, users should not rely on the value of information_schema.auto_increment
. (We think the maintenance of this value in a distributed system is a waste of efficiency and useless.
比如原来HASH UID取模是64,现在变更256,TSPIDER是如何迁移数据的?可以像MongoDB那样自动迁移数据吗?
MariaDB reproducible, see MDEV-24020.
This bug has been fixed in tspider version 3.5.2
query1:
select * from tb_test where field1 = xxx limit 1;
1 row in set (0.034 sec)
query2:
select * from tb_test where field1 = xxx order by id limit 1;
1 row in set (0.005 sec)
This is a known issue. Tspider try to optimize the query1 by accessing the shards sequentially, because the rows retrived may be satisfied from the first shard, there is no need to request all backend shards to retrived rows in parallel.
But the bad case is that the only one satisfied row is located in the last shard. Thus this may take shard_num * one_response_time
to finish the query.
The multi-range read execution plan is used in both rr_quick
and joins optimization.
In the TSpider, the multi-range read plan is problematic, see the bug described below.
This is reproducible in MariaDB Spider also.
On backends,
CREATE TABLE `tbl` (
`id` int(11) NOT NULL,
`flag` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
)ENGINE=INNODB;
On TSpider,
CREATE TABLE `tbl` (
`id` int(11) NOT NULL,
`flag` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=SPIDER DEFAULT CHARSET=utf8
PARTITION BY LIST (id % 4)
(PARTITION `pt0` VALUES IN (0) COMMENT = 'database "bugtest", table "tbl", server "bk0"' ENGINE = SPIDER,
PARTITION `pt1` VALUES IN (1) COMMENT = 'database "bugtest", table "tbl", server "bk1"' ENGINE = SPIDER,
PARTITION `pt2` VALUES IN (2) COMMENT = 'database "bugtest", table "tbl", server "bk2"' ENGINE = SPIDER,
PARTITION `pt3` VALUES IN (3) COMMENT = 'database "bugtest", table "tbl", server "bk3"' ENGINE = SPIDER);
CREATE TABLE `m` (
`id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `tbl` VALUES (101624109, 1);
INSERT INTO `tbl` VALUES (91272871, 1);
INSERT INTO `tbl` VALUES (94933594, 1);
INSERT INTO `tbl` VALUES (98646655, 1);
INSERT INTO `tbl` VALUES (13914947, 0);
INSERT INTO `tbl` VALUES (45051390, 0);
INSERT INTO `tbl` VALUES (30864034, 1);
INSERT INTO `tbl` VALUES (33671239, 0);
INSERT INTO `tbl` VALUES (39109063, 1);
INSERT INTO `tbl` VALUES (91137966, 1);
INSERT INTO `tbl` VALUES (95897689, 0);
INSERT INTO `tbl` VALUES (571307512, 1);
INSERT INTO `tbl` VALUES (35706155, 1);
INSERT INTO `tbl` VALUES (34044708, 0);
set GLOBAL spider_ignore_single_select_index = OFF;
insert into `m`(id) select `id` from `tbl` where `id` not in (94933594) and `flag` = 1;
Our solution is simply prohibiting mrr
for use. Also, we set the maximum of join_cache_level
to 4, so the BKA join algorithm will not take effect in TSpider.
In fact, a lot of optimization in the Server layer will do a little benefit for Spider performance but brings too much uncertainty to Spider. Therefore, in general, we ignore using indices, in other words, we use a full-table scan for most single queries. A single query is one without joining.
spider_ignore_single_update_index=ON
and sql_safe_updates=ON
, we cannot do update
and delete
based on indices as where condition.spider_parallel_limit
, if on, tspider will use multi-thread to process SELECT ... LIMIT ...
queries. The default value of this variable is off.mysqldump
with a lot of tables.mysqldump
时,会产生内存泄漏的问题。mysql> CREATE TABLE t1 (i INT PRIMARY KEY, jdoc JSON); # create table
Query OK, 0 rows affected (0.11 sec)
mysql> INSERT INTO t1 VALUES(1, '{"Name":"Zhangsan", "Age":18}'); # insert
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO t1 VALUES(2, '{"Name":"Lisi", "Age":35}'); # insert
Query OK, 1 row affected (0.00 sec)
mysql> SELECT * FROM t1 WHERE json_extract(jdoc, '$.Age')=35;
+---+-----------------------------+
| i | jdoc |
+---+-----------------------------+
| 2 | {"Age": 35, "Name": "Lisi"} |
+---+-----------------------------+
1 row in set (0.00 sec)
will generate an error
Reproducible in MariaDB
If we update shard keys using shard keys as a condition, re-sharding could be involved. TSpider will split the updating by the following three steps.
a. A selection (if there are where conditions) select ... from ... where id = ...
b. A insertion of new record insert high_priority into ... where id = new_val
c. A deletion of the outdated previous record delete from ... where id = old_val and a = ... and b = ...
In c, TSpider will add all the fields in the where condition to find the record before deletion.
If there happens to be a float field f
in the table, then the last delete will become delete from ... where id = old_val and f = f_old_val
However, because MySQL FLOAT/DOUBLE is implemented according to IEEE-754, if you can't use the equivalent comparison, an error will occur.
CREATE TABLE t (
id INT PRIMARY KEY, -- primary key, also shard key
a INT,
b BLOB,
f FLOAT
)ENGINE=SPIDER PARTITION BY LIST (id MOD 4)
INSERT INTO t VALUES(61,76,'b1',4.86947), (60,33,'b2',0), (24,75,'b3',4.61755), (63,76,'b4',4.86947);
--bug
UPDATE t SET id = 25 WHERE id = 24;
-- tspider will send the following statements to backends
select `id`,`a`,`b`,`f` from `update_test`.`t` where (`id` = 24) for update
insert high_priority into `update_test`.`t`(`id`,`a`,`b`,`f`)values(25,75,'b3',4.61755)
delete from `update_test`.`t` where `id` = 24 and `a` = 75 and `b` = 'b3' and `f` = 4.61755 limit 1
delete from
update_test.
t where ...
we only append the information of the primary key or the unique key in the where condition. (FLOAT/DOUBLE
cannot be the primary key / unique key)select `id`,`a`,`b`,`f` from `update_test`.`t` where (`id` = 24) for update
insert high_priority into `update_test`.`t`(`id`,`a`,`b`,`f`)values(25,75,'b3',4.61755)
delete from `update_test`.`t` where `id` = 24 and `a` = 75 and `b` = 'b3' and `f` like 4.61755 limit 1
spider_update_with_primary_key_first
with default value ON.e.g.
CREATE TABLE t1(id INT PRIMARY KEY, a INT, b BLOB, c CHAR(20), f FLOAT, v VARCHAR(200))ENGING=SPIDER...
INSERT INTO t1 VALUES(61,76,'b','c',4.86947,'v'), (60,33,'b','c',0,'v'), (24,75,'b','c',4.61755,'v');
update t1 set id = 25 where id = 24; -- update primary key, requires change of partition
--BUG, TSpider will dispatch `update ... set ... where ... and t1.f = 4.61755` to backend
--since float cannot be compared with equal function, it may return no row matched.
--TO AVOID THIS BUG, it is recommended to `SET GLOBAL spider_update_with_primary_key_first = ON;`
Please refer to https://github.com/Tencent/TenDBCluster-TSpider/releases/tag/tspider-3.5.2
SPIDER0 | 192.168.1.113 | | mysql | mysql | 25000 | | SPIDER | |
| SPIDER1 | 192.168.1.114 | | mysql | mysql | 25000 | | SPIDER | |
在114上访问spider的时候链接表报错
ERROR 12701 (HY000): Unable to connect to foreign data source: SPT0
The new query may use the freed trx (spider->trx)
and cause a crash because without doing spider_get_trx
beforehand.
case:
insert into t1 values(c1,c2) select c21,c22 from t2 on duplicate key update c1=values(c1),c2=values(c2);
This was fixed and released in 3.5.1
When we use INSERT IGNORE
in Spider.
spider_direct_dup_insert
is 1, Spider will send INSERT
along with IGNORE
to remote backends, so if there is a duplicate, Spider will not get noticed. This is usually problematic when dealing with auto-increment
columns.spider_direct_dup_insert
is 0, Spider will not send INSERT IGNORE
to remote backends and handle duplicates on the Server layer.If we view the Spider and Remote backends altogether as a single MySQL, then value 0 should be more appropriate.
However, this variable does not affect the behavior of INSERT IGNORE
, but also REPLACE
, ON DUPLICATE UPDATE ...
, which is a little messy.
In the 3.5.3 version, we add a new variable named spider_direct_insert_ignore
with the default value 0. If the value is 0, Spider will send INSERT
without the IGNORE
to remote backends and handle duplicates on the Server layer. If the value is 1, Spider will send INSERT
together with the IGNORE
to backends.
According to RFC 7159, two JSON Objects with the same keys but in different orders should be considered as equal.
However,
mysql> INSERT INTO tbl VALUES(1, JSON_OBJECT('id', 87, 'name', 'carrot', 'flag', true));
Query OK, 1 row affected (0.22 sec)
mysql> SELECT * FROM tbl;
+----+--------------------------------------------+
| id | j |
+----+--------------------------------------------+
| 1 | {"id": 87, "flag": true, "name": "carrot"} |
+----+--------------------------------------------+
1 row in set (0.01 sec)
mysql> SELECT * FROM tbl WHERE j = JSON_OBJECT('id', 87, 'name', 'carrot', 'flag', true);
Empty set (0.03 sec)
Fix: b3084cf
Some dangling pointers in previous code resulting in crashes when using create server
.
This bug has been fixed in tspider-3.5.3.
比如下面几种定义在目前最新的版本上:
CREATE TABLE aaa
(
id1
int(10) NOT NULL,
id2
int(10) NOT NULL,
id3
int(10) NOT NULL,
PRIMARY KEY (id1
,id2
,id3
)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE TABLE bbb
(
id1
int(10) NOT NULL,
id2
int(10) NOT NULL,
id3
int(10) NOT NULL,
PRIMARY KEY (id1
,id2
,id3
)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4; COMMENT='shard_key "id1"';
CREATE TABLE ccc
(
id1
int(10) NOT NULL,
id2
int(10) NOT NULL,
id3
int(10) NOT NULL,
PRIMARY KEY (id1
),
KEY id1_2_index
(id1
,id2
) USING BTREE,
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4; COMMENT='shard_key "id1"';
CREATE TABLE ddd
(
id1
int(10) NOT NULL,
id2
int(10) NOT NULL,
id3
int(10) NOT NULL,
PRIMARY KEY (id1
,id2
),
KEY id2_index
(id2
) USING BTREE,
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4; COMMENT='shard_key "id1"';
CREATE TABLE eee
(
id1
int(10) NOT NULL,
id2
int(10) NOT NULL,
id3
int(10) NOT NULL,
PRIMARY KEY (id1
),
KEY id2_index
(id2
) USING BTREE,
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4; COMMENT='shard_key "id1"';
INSERT INTO tbl VALUES (1, '{ "a": 1, "b": [2, 3]}');
UPDATE tbl SET j = JSON_REPLACE(j, '$.a', 10, '$.c', '[true, false]');
INSERT INTO tbl VALUES (1, json_array(1,'abc',NULL,true));
SELECT * FROM tbl WHERE (j = json_array(1,'abc',NULL,true));
We added new global variable print_bool_as_literal
,
If true:
SET GLOBAL print_bool_as_literal = ON;
SELECT * FROM t1 WHERE (j = json_array(1,'abc',NULL,true));
SELECT * FROM t1 WHERE (j = json_array(1,'abc',NULL,false));
-- TSpider will send to backends:
select `i`,`j` from `auto_test_remote`.`t1` where (`j` = /* <cache> */ (json_array(1,'abc',NULL,true)));
select `i`,`j` from `auto_test_remote`.`t1` where (`j` = /* <cache> */ (json_array(1,'abc',NULL,false)));
Otherwise:
SET GLOBAL print_bool_as_literal = OFF;
SELECT * FROM t1 WHERE (j = json_array(1,'abc',NULL,true));
SELECT * FROM t1 WHERE (j = json_array(1,'abc',NULL,false));
-- TSpider will send to backends:
select `i`,`j` from `auto_test_remote`.`t1` where (`j` = /* <cache> */ (json_array(1,'abc',NULL,1)));
select `i`,`j` from `auto_test_remote`.`t1` where (`j` = /* <cache> */ (json_array(1,'abc',NULL,0)));
spider_log_ignore_err_numbers
with default value '' (empty string). The variable specifies the error numbers which spider log will ignore to output. For example, if we do not want spider error log to record errors with number 1062, 12701 and 2014, we can set global spider_log_ignore_err_numbers='1062,12701,2014'
. Currently, we can ignore the following error numbers: 1062,12701,12723,1477,1429,1067,1292,1366,1411,1159,2014.SHOW TABLE STATUS/SHOW CREATE TABLE/SELECT * FROM information_schema.tables
.spider_log_ignore_err_numbers
, spider在打印日志时,该参数指定的error number会被忽略(不打印相关日志)。该参数默认为空,表示日志输出所有错误。若想忽略错误码为1062, 12701,2014的错误,设置格式为: set global spider_log_ignore_err_numbers='1062,12701,2014'
; 目前可支持忽略的错误数包括:(1062,12701,12723,1477,1429,1067,1292,1366,1411,1159,2014)Release date: Apr. 22, 2021
Under some circumstances, the crash occurs when using./mysqladmin shutdown
to close the TSpider server.
This seems to be a problem of MariaDB that hasn't been completely solved for a long time.
The temporary solution that significantly reduces the chances of crashes is added in tspider-3.5.2.
We noticed that the crash was related to the following two functions.
void mysql_audit_release(THD *thd)
static void *kill_server(void *sig_ptr)
kill_server
is a key step of the main thread, mysql_audit_release
is a step the child thread close the thd
.
As for the details of mysql_audit_release
.
void mysql_audit_release(THD *thd)
{
...
for (; plugins < plugins_last; plugins++)
{
st_mysql_audit *data= plugin_data(*plugins, struct st_mysql_audit *); ==> break point 1
/* Check to see if the plugin has a release method */
if (!(data->release_thd)) ==> break point 2
continue;
/* Tell the plugin to release its resources */
data->release_thd(thd);
}
...
}
During execution, two threads are involved in the mysql_audit_release
function. According to the order they hit the breakpoint, the two threads are named thread_A and thread_B, respectively. We call the main thread, which does kill_server
M.
We have literally tried all executing orders of these threads and find only the B
-> M
-> A
executing order will lead to the bug.
More specifically
unireg_end()
, Freeze M, then Thaw A。data
has been changed; when visited, an illegal area has been visited.M will call kill_server
when executing shutdown
. In the meantime, A will kill itself by calling THD::free_connection
, and free some plugin resources. At this time, the status of the plugin is switched to DYING
. However, thread M will forcibly shut down a plugin with a status other than Uninitialized
or Freed
or Disabled
. In other words, this plugin of thread A will be shut down forcibly by thread M.
However, if the child thread executes later than the main thread, then we will visit some resources that have been freed, then a core dump occurs.
The stack of thread M:
finalize_audit_plugin(...)
plugin_deinitialize(...)
plugin_shutdown(...)
clean_up(...)
unireg_end(...)
kill_server(...)
Our solution is to sleep 2 seconds when thread_count > 0
, thus waiting for the children threads to release the plugin. But our thread_count
is not accurate.
This solution only reduces the chances of failure greatly, rather than completely solving the problem.
We use a Read-write lock Hash Map design with multiple Mutex Stacks to re-implement the Spider connection pool.
In our test, it reduces the p99 latency and increases the overall QPS. The performance is significantly better than the original design when the query threads are much more than the connection pool capacity.
h3. Introduction
The Spider SE creates and maintains a connection pool that caches the connection objects (pointers) to multiple remote backends.
The Spider can simultaneously send multiple SQL to one backend, so there are probably numerous connection objects to the same backend in the connection pool. These connection objects are interchangeable as long as they point to the same backend.
When receiving a request with a specified backend target bk, one Spider thread searches for one connection object to bk
in the connection pool. If it exists, Spider will use the object to connect the backend and send SQL to it. After the SQL execution is complete, Spider will put the connection object back to the connection pool.
Otherwise, Spider will create one or wait for another thread to put back one. The variable spider_max_connection determines the specific behavior (whether to create one or wait). And the variable spider_conn_wait_timeout specifies the max waiting time.
Periodically, the Spider will recycle some connection objects if they are not used for a particular interval.
The Spider used a hash map to represent the connection pool of which the hash key is the remote_name#version and the hash value is the connection object pointer.
For example, in an example configuration:
mysql> select * from mysql.servers;
+-------------+-----------+----+----------+----------+-------+--------+---------+-------+
| Server_name | Host | Db | Username | Password | Port | Socket | Wrapper | Owner |
+-------------+-----------+----+----------+----------+-------+--------+---------+-------+
| SPT0 | 127.0.0.1 | | mysql | mysql | 20000 | | mysql | |
| SPT1 | 127.0.0.1 | | mysql | mysql | 20001 | | mysql | |
| SPT2 | 127.0.0.1 | | mysql | mysql | 20002 | | mysql | |
| SPT3 | 127.0.0.1 | | mysql | mysql | 20003 | | mysql | |
+-------------+-----------+----+----------+----------+-------+--------+---------+-------+
4 rows in set (0.00 sec)
The name for the remote server 127.0.0.1:20000 is SPT0. By default, the version is 0, so the hash key is 'SPT0#0'. If the 127.0.0.1:20000 is down, we will do an active/standby failover. The remote standby server will inherit the name SPT0, but update the version to 1, so the new hash key will be 'SPT0#1'.
If we want to get a connection pointer from the connection pool, Spider searches the server name from the hash map, then deletes and returns it if one exists. Spider put back a connection pointer by constructing a new hash item by packaging the 'servername#version', pointer pair, and insert it into the hash map.
Therefore, the hash map is a multi-map, meaning a hash key corresponds to multiple values.
Spider uses a mutex lock to guarantee thread safety whenever it wants to access the hash map, including the search, insertion, and deletion. This is not efficient enough since we will lock the whole hash structure every time we access it. And it is especially slow when we recycle connections since we have to iterate the entire hash map structure and delete connection objects that exceed the threshold.
We proposed a new way to represent the connection pool, essentially split one mutex lock into multiple mutex locks to reduce locking time. In CPP pseudocode, it's like:
typedef std::string key_type; /* backend_name#version */
typedef void * conn_ptr; /* connection pointer */
typedef struct {
std::stack<conn_ptr> stk;
pthread_mutex_t mutex;
} mutex_stack; /* a stack with a mutex lock */
typedef struct {
pthread_rwlock_t rwlock;
std::unordered_map<key_type, mutex_stack> hashmap;
} spider_connection_pool; /* a hashmap with a rwlock */
The key of the hash map remained unchanged, is the remote_name#version, whereas the hash value is a stack that stores connection pointers. In our design, the hash map is not a multi-map, it's a unique hash map, meaning one hash key only corresponds to one value.
The hash map is guard by a read-write lock, and each stack is guard by a mutex lock.
For instance, if we have four remote backends, our connection pool should have four items; each remote backend corresponds to one mutex stack.
The stack stores the pointer of connections. If we want to put back a connection pointer, we push it into the corresponding stack. If we're going to get a connection pointer, we pop it from the corresponding stack. The push and pop operation should be wrapped by mutex locking and unlocking.
Finding the 'corresponding' stack is, first apply a read lock, second, search in the hash map by the key, third, release the read lock. If the key does not exist in the hash map, we initialize a new mutex stack, construct a hash item by <key, mutex_stack>, apply a write lock, and try to insert it into the hash map. Please note that many threads could insert hash items with the same key, and the hashmap is a unique hash so that the insertion could fail, and we should re-search the hash map.
We generally define two essential functions named put_conn
(put a connection pointer back to the pool) and get_conn
(get a connection pointer specified by the key).
The write lock acquirement only takes place in put_conn
. If we cannot find a corresponding hash in get_conn
, we can directly return not found, thus creating a new connection without getting one in the connection pool.
Except remote backends fail, we only need to apply the write lock n times. (assuming we have n remote backends) Therefore, nearly 100 percent of our operation on the hash map is read in the run time, meaning roughly non-blocking. We split one big mutex lock into n mutex locks for n stacks compared to the original design, reducing lots of conflicts.
h3. Performance
We test the performance of point selection on 1 Spider with 16 remote backends architecture. The Spider machine uses a 12-core Intel Xeon Cascade Lake 8255C (2.5 GHz) with 15G RAM. Other physical configurations are guaranteed fast and large enough so that they cannot be the bottleneck.
The target table t1 has 10GB of data and is with the following structure.
CREATE TABLE `sbtest1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`k` int(11) NOT NULL DEFAULT 0,
`c` char(120) NOT NULL DEFAULT '',
`pad` char(60) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `k_1` (`k`)
) ENGINE=SPIDER DEFAULT CHARSET=utf8
And we perform point selection using the Sysbench test bench.
SELECT * FROM `sbtest1` WHERE id = ?;
We use the Perf performance test bench tool to trace the time occupied by the concurrency-related functions of the connection pool in the CPU. The result is as follows.
pthread_mutex_lock | pthread_getspecific | pthread_rwlock_lock | Total | |
---|---|---|---|---|
Before | 0.355996% | 0.060372% | N/A | 0.416368% |
After | 0.138516% | 0.07392% | 0.00003% | 0.212466% |
lf the remote backends is without failure, all operations of RWLock are read. So RWLock is almost non-blocking. Although we spent more time on pthread_getspecific (because we have more mutex locks than before), the considerable reduction of pthread_mutex_lock time reduce the proportion of CPU time to 51.03%, which is a significant improvement.
We keep the number of Sysbench client threads to be 300 unchanged and do point-selection stress tests with different spider_max_connection variable values.
spider_max_connection specifies the maximum number of spider connections to a single remote backend. In other words, the maximum capacity of a single remote backend in the connection pool.
QPS tests:
spider_max_connection | Before | After | After / Before |
---|---|---|---|
100 | 124912.36 | 125323.45 | 100.32% |
50 | 124804.67 | 125407.97 | 100.48% |
20 | 124437.48 | 125411.6 | 100.78% |
10 | 98681.96 | 117253.38 | 118.81% |
5 | 83200.37 | 98597.90 | 118.50% |
It can be seen from the above test that when the number of test threads is greater than the total capacity of the connection pool (16 remotes times 10 < 300 client threads), the performance improvement of the optimization solution is significant.
h4. Test 3
We let spider_max_connection be 100 unchanged, use 50, 100, and 200, 300 Sysbench client threads to perform stress tests, and use the Sysbench summarization to record the QPS and p99 latency performance.
p99 latency tests:
Before | After | After / Before | |
---|---|---|---|
50 threads | 1.04 | 1.04 | 100% |
100 threads | 1.96 | 1.96 | 100% |
200 threads | 8.13 | 7.84 | 96.43% |
300 threads | 124644.19 | 125723.51 | 89.76% |
It can be seen from the results that the greater the number of threads, the more pronounced the reduction in p99 latency, indicating that the overall query time is more stable.
QPS tests:
Before | After | After / Before | |
---|---|---|---|
50 threads | 57168.74 | 57670.93 | 100.88% |
100 threads | 99769.4 | 99937.68 | 100.17% |
200 threads | 122404.09 | 123153.73 | 100.61% |
300 threads | 124644.19 | 125723.51 | 100.87% |
The overall QPS are slightly greater than the original design when the connection pool capacity is large enough.
TSpider runs on thousands of instances every day; it is of great importance to keep it stable, error-free, and robust. However, many execution plans are not implemented completely and tested on every case on the TSpider. In other words, some execution optimizations could lead to error results on TSpider.
Also, due to the distributed architecture of TSpider, many MySQL Server logical and physical optimizations may have little effect or even bring extra overhead on TSpider.
Therefore, we usually do a full-table scan for all single ( SELECT and UPDATE ) queries on TSpider to guarantee the correctness of execution and reduce the extra time of computing the "optimized" execution plan.
When we tested and traced the performance of point selection by the primary key, we found that the performance is below our expectations.
After diving into the code, we found that if we allow using the index, which is the const index type under this circumstance, it will boost the performance of point selection by 7.65%. This is because the const index is one of the fastest execution plans. If the server has decided to use the const index, it will get the result of data at the join::optimize
stage and ignore all other computations of the execution plan.
Also, we found that TSpider will allocate 1000 row result spaces for the const index selection. However, if the index is a const index, it will only return 0 or 1 results, so there is no need to allocate such a big buffer and free them one by one. (const join type).
Therefore, if we detect a const index is used, we only allocate 1-row result space for the execution.
Two optimizations altogether increase the QPS by 20.96%.
If we are using update join, the value before updating and after updating is the same. The row changed should return 0. However, TSpider returns the number of rows changed as the same as the number of rows matched.
example:
MariaDB [(none)]> select actor_rid, level from aaa.db_actor where actor_rid in
(9223654611343214324,9223654611343244360,9223654611343214270);
+---------------------+-------+
| actor_rid | level |
+---------------------+-------+
| 9223654611343214270 | 1 |
| 9223654611343244360 | 74 |
| 9223654611343214324 | 3 |
+---------------------+-------+
3 rows in set (0.007 sec)
MariaDB [(none)]> select actor_rid, level from test.db_actor_test where actor_rid in
(9223654611343214324,9223654611343244360,9223654611343214270);
+---------------------+-------+
| actor_rid | level |
+---------------------+-------+
| 9223654611343214324 | 2 |
| 9223654611343244360 | 74 |
+---------------------+-------+
2 rows in set (0.000 sec)
Execute the following SQL on TSpider.
MariaDB [(none)]> update aaa.db_actor t1 inner join test.db_actor_test t2 on t1.actor_rid = t2.actor_rid set
t1.level=t2.level where t1.actor_rid in (9223654611343214324,9223654611343244360,9223654611343214270);
Query OK, 2 rows affected (0.006 sec)
Rows matched: 2 Changed: 2 Warnings: 0
MariaDB [(none)]> select actor_rid, level from aaa.db_actor where actor_rid in
(9223654611343214324,9223654611343244360,9223654611343214270);
+---------------------+-------+
| actor_rid | level |
+---------------------+-------+
| 9223654611343214270 | 1 |
| 9223654611343244360 | 74 |
| 9223654611343214324 | 2 |
+---------------------+-------+
3 rows in set (0.002 sec)
Only one row changed, but returns Changed 2.
UPDATE JOIN
.UPDATE
/ UPDATE JOIN
. The cause of the error is related to the FLOAT column and the execution plan.1. 解决了在高并发场景下,某一线程由于用于指定auto increment值产生而重复错误时,同一时期其他线程可能会受到连带影响,也产生duplicate主键值错误的问题。(关键词: AUTO INCREMENT, DUPLICATE )
2. 解决了在特定情况下,TSpider在执行UPDATE JOIN时,返回affected rows不符合预期的情况。 (关键词: UPDATE JOIN, AFFECTED ROWS)
3. 解决了TSpider在UPDATE / UPDATE JOIN下,需要更换新的分片时,会出现数据为空 / 多出数据 / 没有匹配列的错误。错误原因与FLOAT列和执行计划有关。 (关键词: UPDATE,FLOAT,执行计划)
4. 修复了特定场景下TSpider在执行SQL时,会返回empty query的Bug。(关键词: EMPTY QUERY)
spider_get_time
.ha_register
. (We did not reproduce the bug, instead we raise a solution to avoid it)Add the patch of https://dev.mysql.com/worklog/task/?id=12571 to TSpider.
Released in 3.5.1.
我看了您的压力测试报告,没有和官方MariaDB Spider的压测对比。
另外,和直连DB,性能差多少?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.