futurewei-cloud / chogori-platform Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
If a txn operation fails (read/write/heartbeat), the txn needs to be aborted. The 3SI client will ensure it is aborted, but if the user tried to commit it, the client will return a 200OK. This could cause bugs in the user application. The solution is to return a 4xx error if the user tries to commit a failed transaction.
This is to support SQL inserts in one round trip and will be implemented by adding a flag to the write request.
SKVRecord should have a serialization cursor that prevents re-serializing fields, which could break the storage payload
If there are ongoing read or write requests, we should not issue an end request and should throw an exception to indicate a user bug. Same for issuing more requests after an end request.
Implement a test for these cases.
Document reasoning behind the design choice.
Implement integration tests described in https://github.com/futurewei-cloud/chogori-platform/blob/master/docs/RFC/K23SI_testing.md
Each scenario set described in the RFC can be a separate pull request.
SKV client works for static schemas such as with TPC-C but does not work for the dynamic schema case of reading and writing SKVRecords. Fix it and add tests for it.
Add more unit test cases for SKVRecord.
docs/SKV.md (includes overview of SKVRecord)
test/k23si/SKVRecordTest.cpp (existing unit test cases)
src/k2/dto/SKVRecord.h (SKVRecord interface)
src/k2/dto/FieldTypes.cpp (String conversions for creating keys)
Test case | Expected Result |
---|---|
Serialize a record with composite partition and range keys (e.g. partition key is string field + uint32_t field + string field) | Byte sequence of partitionKey string and rangeKey string are as expected with proper encoding and NULL byte separators |
Serialize a record with a composite partition key and one key field NULL | Byte sequence of partitionKey is as expected with NULL field encoding |
Serialize a record with a composite partition key and one key field (designated NullLast) is NULL | Byte sequence of partitionKey is a expected with NULL last field encoding |
Serialize a record with one value field skipped (i.e. using "skipNext()") | Fields can be deserialized successfully using the deserializeNextOptional function and with the FOR_EACH_RECORD_FIELD macro |
Deserialize fields out of order by name | Fields can be deserialized successfully |
Test case | Expected Result |
---|---|
getPartitionKey() is called before all partition key fields are serialized | Exception is thrown |
getRangeKey() is called before all range key fields are serialized | Exception is thrown |
deserializeField(string name) on a name that is not in schema | Exception is thrown |
seekField() with a field index out-of-bounds for the schema | Exception is thrown |
Deserialize a field that has not been serialized for the document | Exception is thrown |
trace operation support (trace a read or write, including PUSH, and accumulated trace for txn on server). I imagine we start a txn with a trace flag, and that makes the server accumulate an event log for the txn, including all PUSH operations performed against it. We can use this with inspect to validate expected results (e.g. trace two conflicting txns)
We should be able to trace across cluster components: CPO, persistence, etc. We should also consider integration with the logging and metrics systems.
When running test_k23si_tpcc.sh the test shows a handful of finalization errors during the load phase, which has sync finalize turned on. These need to be investigated and fixed.
[0026:00:27:06.931.008]-nodepool-(0) [ERROR] [/build/src/k2/module/k23si/TxnManager.cpp:414 @operator()]Finalize request did not succeed for {pvid={id=0, rangeV=1, assignmentV=1}, colName=TPCC, mtr={txnid=9221627776308204174, timestamp={tsoId=1, endCount=1605305669540315000(18579:22:14:29.540.315), delta=4608}, priority=medium}, trh={schema=district pkey=, rkey=}, key={schema=orderline pkey=, rkey=
}, action=commit}, status=[408 Request Timeout]: partition deadline exceeded
This would reduce data copies in the code that needs this functionality (e.g. in partial update code in Module.cpp)
K23SI client should return an exceptional future wherever possible instead of throwing an exception directly.
A partial update is when the user only wants to write a subset of the fields of a record. We want to optimize for this case with a new partialUpdate RPC and interface, which will reduce traffic over the network.
Implement tests from scenario 4 from here: https://github.com/futurewei-cloud/chogori-platform/blob/master/docs/RFC/K23SI_testing.md
You can come up with the test cases and expected result, and add to the testing documentation above.
Also try to fix the transaction layer protocol for autoRDMA to provide correct url.
e.g. support query such as:
UPDATE X SET Field1 = Field2 + 1 WHERE pkf1=1 AND pkf2=2;
Implement state transitions necessary to integrate with persistence
If a write with the rejectIfExists flag is retried, the client may see a failure even if the first try placed a write intent. Make this operation idempotent by adding an op sequence id to write requests. This can be reused for future read-modify-write operations too.
I implemented a core-to-core communication method for the applet running in the same process in order to speed up the RPC speed in the same machine. To test it, I merged the txbench_client.cpp and txbench_server.cpp into a single file: txbench_combine.cpp so that the server and the client can run in the same process.
However, when I tested it, it always returned the segmentation fault. When I try to fix this problem, I found this issue is from this function: https://github.com/futurewei-cloud/chogori-platform/blob/master/src/k2/transport/RPCDispatcher.cpp#L118-L151
Here is the way to reproduce the segmentation fault. Run txbench_combine with args: ./txbench_combine -c 2 --tcp_endpoints 12345 12346 --tcp_remotes tcp+k2rpc://0.0.0.0:12345 --memory 10G --poll-mode --cpuset 9-10
, it will return a segmentation fault. If I changed the args --tcp_remotes tcp+k2rpc://0.0.0.0:12345
to --tcp_remotes tcp+k2rpc://0.0.0.0:12346
, then the benchmark script works well. The only difference between the args is the first args asks the client to communicate with the server that is in the same core, so it will trigger a loop of send/receive requests, which will be processed by _handleNewMessage function.
See section 2.8 in the spec: http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-c_v5.11.0.pdf
If the spec mentions any terminal display or file output, we do not want to do that but the data needed for it should be retrieved from SKV and stored in the transaction context. For example, in the New Order transaction in src/k2/cmd/tpcc/transactions.h the tax, discount, and total_amount variables are saved as member variables but are not used in the code.
including tso service/worker and client
If a transaction does more than one write to the same key, a write intent will be created for each one. The client will add the key multiple times to the write set, but only one write intent will be finalized.
Right now the k23si read cache keeps overlapping intervals in memory. It might be better to split and merge overlapping intervals on insertion to save memory
In the tpcc benchmark for SKV, change full writes into partial writes where possible. The changes need to be made in src/k2/cmd/tpcc/transactions.h. If you need a reference on what the transactions are supposed to do you can look at the specification: http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-c_v5.11.0.pdf
You can use test/integration/test_k23si_tpcc.sh locally test if the changes work. For performance testing, I recently updated the scripts and README in the cluster directory. You will need to modify the configuration to work on 3 machines, with fewer cores used and fewer TPC-C warehouses used. You don't need to do performance testing for this task but getting setup for performance testing will be very useful soon.
Logs from chogori-sql:
[0000:19:59:16.077.602]-nodepool-(1) [WARN] [/build/src/k2/module/k23si/TxnManager.cpp:77 @operator()]heartbeat expired on: {txnId={trh={schema=00000001000030008000000000000a30 pkey=^A00000001000030008000000000000a30^@^A^A^@^A^A��f^]^BGK���2#�X��^@^A, rk
ey=}, mtr={txnid=11254935246207970076, timestamp={tsoId=1, endCount=1611080472035753000(18646:18:21:12.035.753), delta=4608}, priority=medium}}, writeKeys=[0], rwExpiry={tsoId=1, endCount=1611080472035753000(18646:18:21:12.035.753), delta=4608}, hbExpiry
=0000:19:55:21.882.900, syncfin=0}
[0000:19:59:32.829.201]-k2_pg-(0) [DEBUG] [/build/src/k2/connector/yb/pggate/k23si_seastar_app.cc:219 @operator()]Write...
[0000:19:59:32.829.222]-k2_pg-(139781049149184) [ERROR] [/build/src/k2/connector/yb/pggate/k2_adapter.cc:441 @operator()]K2 write failed due to hb not allowed for the txn state
[0000:19:59:32.829.234]-k2_pg-(139781049149184) [DEBUG] [/build/src/k2/connector/yb/pggate/k2_adapter.cc:443 @operator()]K2 write status: [405 Method Not Allowed]: hb not allowed for the txn state
finally, the txn failed
2021-01-19 18:27:05.272 UTC [129] FATAL: Invalid argument: hb not allowed for the txn state
Add more integration test cases for schema creation.
src/k2/dto/ControlPlaneOracle.h (Schema and schema create request definitions)
test/cpo/CPOTest.cpp (existing schema creation tests)
Test case | Expected Result |
---|---|
Create a new version of an existing schema by renaming a (non-key) field | 2xx success code, schema can be retrieved through GetSchemasRequest from CPO |
Create a new version of an existing schema by adding a new field | 2xx success code, schema can be retrieved through GetSchemasRequest from CPO |
Create a schema which does not have any range key fields set | 2xx success code, schema can be retrieved through GetSchemasRequest from CPO |
Test case | Expected Result |
---|---|
Create a schema with duplicate field names | 400 error code, schema does not exist in result set from GetSchemasRequest |
Create a schema by setting partitionKeyFields manually by index, and an index is out of bounds of the fields | 400 error code, schema does not exist in result set from Get Schemas Request |
Create a schema where the field at index 0 is not a partition or range key field | 400 error code, schema does not exist in result set from GetSchemasRequest |
Create a new version of an existing schema where a key field is renamed | 409 error code, schema does not exist in result set from GetSchemasRequest |
Create a new version of an existing schema where the type of a key field changes | 409 error code, schema does not exist in result set from GetSchemasRequest |
Create a new version of an existing schema where a key field is removed | 409 error code, schema does not exist in result set from GetSchemasRequest |
See section 2.7 in the spec: http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-c_v5.11.0.pdf
If the spec mentions any terminal display or file output, we do not want to do that but the data needed for it should be retrieved from SKV and stored in the transaction context. For example, in the New Order transaction in src/k2/cmd/tpcc/transactions.h the tax, discount, and total_amount variables are saved as member variables but are not used in the code.
Implement scenario 02 test cases from https://github.com/futurewei-cloud/chogori-platform/blob/master/docs/RFC/K23SI_testing.md
You will need to implement the functionality to delay finalization. You can do this with an option in the txn END request similar to the existing syncFinalize option. The delay itself would go here: https://github.com/futurewei-cloud/chogori-platform/blob/master/src/k2/module/k23si/TxnManager.cpp#L344
You may also need to increase the heartbeat deadline config option for the tests.
Also consider other test cases that could be added to this scenario, especially requests against the keys that have the WI and aborted records.
References:
src/k2/dto/Persistence.h (Dtos for the communication between PlogServer and PlogClient)
src/k2/dto/PartitionGroup.h (Dtos for register/get plog partition groups)
src/k2/persistence/plog/PlogServer.h
src/k2/persistence/plog/PlogServer.cpp
src/k2/persistence/plog/PlogClient.h
src/k2/persistence/plog/PlogClient.cpp
src/k2/cmd/demo/plog_server.cpp (Start the PlogServer instance)
test/plog/*
test/integration/test_plog.sh (Intergration unit test cases for plog service)
We want to remove unsigned integer types as a data type for SKV fields because it adds a lot of complication to filtering and they are not needed for SQL anyway. So we should add support for using signed integers as key fields, which requires an order preserving key encoding. Reference: https://www.zanopha.com/docs/elen.pdf
This happened a few times during laptop-deployed testing. it hits this case in Module.cpp:1043
case dto::TxnRecordState::Deleted:
default:
K2ASSERT(log::skvsvr, false, "Invalid transaction state: {}", incumbent.state);
}
[0002:08:06:43.250.163]-nodepool-(k2::skv_server:0) [ERROR] [/build/src/k2/module/k23si/Module.cpp:1043 @handleTxnPush] Invalid transaction state: Deleted
The state Deleted is used as the in-memory state of a transaction while we're recording that it was deleted in Persistence
It is possible to have a race condition where a push is issued but by the time it is handled the incumbent has been finalized.
decimal64 (providing 16 digits of precision) and decimal128 (providing 34 digits of precision) can be used to support the SQL decimal data type up to those levels of precision. They are included as a GCC standard library extension so implementation work will be reduced.
This task is to add decimal64 and decimal128 as SKV schema types, which can be used as data fields and as part of filter expressions but will not be supported as key fields. Also change SKV's TPC-C benchmark to use the new types where appropriate.
These can be used in the Query and other interfaces
See section 2.6 in the spec: http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-c_v5.11.0.pdf
If the spec mentions any terminal display or file output, we do not want to do that but the data needed for it should be retrieved from SKV and stored in the transaction context. For example, in the New Order transaction in src/k2/cmd/tpcc/transactions.h the tax, discount, and total_amount variables are saved as member variables but are not used in the code.
min, max, sum, count
Instead of currently in config or in start up parameter.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.