tikv / sig-transaction Goto Github PK
View Code? Open in Web Editor NEWResources for the transaction SIG
Resources for the transaction SIG
I noticed that JDK 8 is still needed for building. Is this intentional? Do you plan to support JDK 8 in the java-client? I'm looking at some warnings like:
[WARNING] /home/peter/work/git/tikv-client-java/src/main/java/org/tikv/common/util/MemoryUtil.java:[31,16] sun.misc.Cleaner is internal proprietary API and may be removed in a future release
[WARNING] /home/peter/work/git/tikv-client-java/src/main/java/org/tikv/common/util/MemoryUtil.java:[32,16] sun.misc.Unsafe is internal proprietary API and may be removed in a future release
[WARNING] /home/peter/work/git/tikv-client-java/src/main/java/org/tikv/common/util/MemoryUtil.java:[33,18] sun.nio.ch.DirectBuffer is internal proprietary API and may be removed in a future release
[WARNING] /home/peter/work/git/tikv-client-java/src/main/java/org/tikv/common/util/FastByteComparisons.java:[23,16] sun.misc.Unsafe is internal proprietary API and may be removed in a future release
There are alternatives in JDK 9+ that don't produce such warnings which are now runtime errors in JDK 16+... How much do you value being able to run on JDK 8 ? There is an option of creating alternative classes such as MemoryUtil for JDK 8 vs. JDK 9+ and pack them into a multi-release JAR which would run correctly across all the JDK 8, 9, ..., 16+ releases. I can try to do that and propose a PR if you're interested.
The prerequisite for doing that is to require JDK 9+ (preferably JDK 11) to build the project via maven. The javac would use -release 8 option then to build most of the code so it would run on JDK 8, only those classes that need to use different API in JDK 9+ would then be build using -release 9 option. At the end the multi-release JAR would be build which would use the correct classes for the platform on which it is running. Are you OK with that approach?
That would be a really nice idea to support other Java versions! You see, before the client was separated from TiSpark, we require Java 8 so that users do not struggle with incompatible Java versions.
So are you OK with approach to require JDK9+ (or even JDK 11) for building client-java while the produced jar would still work on JDK 8 ?
I think there is always a way to build with JDK8 along with other Java versions... and it does not break the current environment.
Let's try to fulfill them both while supporting more Java versions:)
The problem is that a build produced with JDK 8 javac will:
Because it uses deprecated (encapsulated) JDK APIs which are deprecated for removal (in JDK 16 they are just disabled, but JDK 17 might remove them alltogether)
You see, building with JDK 8 can't build code that uses replacement API(s) introduced in JDK9+.
OTOH when building with JDK 9, you can use the -release 8 javac option that does the following:
so building with JDK 9 javac with -release 8 option is quivalent to building with JDK 8 javac.
If you want to build alternative versions of classes like MemoryUtil
where one version will be used when running on JDK 8 and the other version when running on JDK 9+ (using multirelease JAR), then only JDK 9 javac is suitable for that.
Unless maven pom.xml is structured in a way where building a multirelease (and thus JDK 8+ compatible JAR) would be performed only when a particular Maven profile is enabled. So by default building with JDK 8 would not enable the profile, but building with JDK 9+ would enable it. Is this acceptable?
I see, it seems logical to me. We could discuss this change in our meeting with other client contributors. If it is okay, I can schedule it next week. Also, this change would influence a lot that it may not be merged to release 3.x. We could consider it to be enclosed in the next big release, say release-4.0?
I'm going to try to modify the build procedure to use a Maven profile which would be enabled manually. I think this way default build procedure can be left unchanged so the impact is minimal. You can choose to include that in either 3.x or 4.0 if you think the change is suitable for inclusion. Perhaps in 4.0 the profile could be enabled automatically when building with JDK 9+ but in 3.x only manually?
With an offline discussion with @andylokandy yesterday, I learned we create tikv/sig-transaction for collecting stuff and hosting a repository for design discussion.
However, the design document should have been put under tikv/rfcs, as well as we have GitHub Discussion for open-ended discussion.
@andylokandy @sticnarf and I reached a consensus that we turn on GitHub Discussion on tikv/tikv and moving disuccsion there under a category named transaction
. Also we finalize the design document stuff under tikv/rfcs.
What do you think?
Download from cockroach labs
I would like to focus on their transaction system, which is described primarily in sections 3 and 4. Cockroach Labs also have some excellent blog posts on their transaction systems which make great background reading.
Currently the green-gc is disabled by default as it still has some corener cases or issues to solve.
As the hibernate region is default enabled just on the master branch, it's urgent to solve the green gc issues. However, as the green-gc is already a GA released feature, it's better to solve the issues in the near sprints.
Also it's necessary to think it over about the raftstore bypass lock scan or collect, as missing lock may cause data loss which is critical.Besides, more tests about it are needed to cover these corner paths or exceptional paths.
Before we start implementation I think we should have a design doc that is a bit more concrete than the docs in the repo. It doesn't need to be very formal since I think initial implementation will be somewhat experimental. It should contain:
Once we have the above in a document, then we should figure out the work items and who will do that work, but lets do that as a second stage to the above doc.
We'll read 'Industrial-Strength OLTP Using Main Memory and Many Cores', by Avni et al. at Huawei Research Centre. It was published in the proceedings of VLDB 2020 and you can download it from them.
Discuss it on Slack: #sig-txn-reading-group-nov-dec-21
See also our blog post.
What are the issues and can they be solved? There is some text already in the repo.
Look for any places where non-unique commit_ts might cause a problem (for example, it means that we cannot order two transactions with the same commit_ts). In the big picture it means commit_ts only gives a partial order of transactions.
When looking at tools such as CDC and binlog, please also check if there might be other issues with the parallel commit design.
Iโd like to propose that PRs to TiKV transaction code which are only documentation (i.e., only add or change comments and whitespace) only require one LGTM, not two.
Committers, please check your box if you agree, or leave unchecked and make a comment if you do not. As laid out in the governance decision making rules, the decision requires a simple majority, but consensus is preferred.
If we fallback from async commit using the solution 2 in #64, we need to amend the primary lock clearing the async commit mark. Then, there can be two mutations writing the same lock. I am not sure whether CDC can handle this case.
Currently README.md has all the info, but it's not pretty. The SIG should have a landing page on the TiKV website or its own page to welcome new contributors and provide info and links, etc.
More ideas and/or volunteers are welcome!
I fell our unit tests in TiKV somehow not satisfing.
For example when reviewing tikv#9514, which changed the rollback collapse logic, I found it breaked some test cases in mvcc::reader
.
These test are used to test the behavour of reader and the engine.rollbacks
are used only to provide a testing environment. i.e. They are not supposed to really do the "rollback" work here, what we want is just the Modifies which caused by the rollback. But a large part of this test depends on the Modifies caused by the unprotected rollbacks, and if we change the rollback collapse logic, the tests are broken.
Moreover, who can promise that the "rollback" or "cleanup" function we are using here is right? Maybe the tests for "cleanup", but "cleanup"'s correctness is depend on the correctness of reader, and lead us to circular reasoning.
It might be a bad habit to prepare unit testing environment for lower level componets (scanner in this case) with higher level components (txn and actions, or more specific, cleanup in this case), after all, lower level componets should not even know higher level components' existance. But huge amount of tests in our code are written in this way, and many of them are too hard/complex/important to change.
Once the docs quest is well underway, I think finishing off the client would make a great next quest issue.
We should have somebody on duty to do initial triage of issues. These are helpfully posted to slack once per day (I think). They could also triage new PRs to make sure they get review
As well as a list of names, we should specify the work the person on duty should do.
From @MyonKeminta :
CDC requires all incoming "commit" operation has a greater commit_ts than the current resolved_ts. But CDC receives these events by listening apply operation. So it's possible that:
Transaction T1 starts with start_ts = 90 while the max_ts = 100 and got min_commit_ts = 101 , then it starts writing;
We have done much work about updating and making use of max_read_ts
in this PR: tikv/tikv#8363 . But there are still much work to do to make it perfectly correct.
We need to:
max_read_ts
by CDC correctly (#42)max_read_ts
for every single region.rollback
, cleanup
, check_txn_status
, check_secondary_locks
, etc) need to update max_read_ts
with its start_ts
. Perhaps the name max_read_ts
is not perfectly accurate. This is required to guarantee the correctness of the current way of handling non-globally-unique commit_ts
.read_ts
. Discussed here: #21Beyond simple unit tests, we should have a plan to test parallel commit thoroughly. Some ideas:
More ideas (and elaborating on the above) welcome!
Previously we found that schema version checking might be a problem. Later we think it can be solved if we define a transaction to be committed iff all its keys are prewritten, and, the schema version doesn't change between its start_ts and commit_ts. However it still solve the problem of 1PC, which have no chance to check the schema version at all. We need to find a solution for this if we want to implement 1PC.
From @sticnarf : Maybe we can have something like max_commit_ts
. It's also something like lease.
We send it in prewrite. If the calculated min_commit_ts > max_commit_ts
, the prewrite will fail (can fallback).
When doing DDL, we invalidate max_commit_ts
to disable async commit (or 1PC), but ensure changes before max_commit_ts
are valid with the previous schema.
We should host this information here, but in the mean time, here is a link: https://docs.google.com/document/d/1nkBa1ThWBQAR6z3ST6Ef90LOjvgqn3RPi8ycSEEy-cw/edit#
Now the min_commit_ts
of async commit transactions cannot be advanced like the legacy transactions. So if our read timestamp is larger than the min_commit_ts
of an async-commit lock, we can only wait until the lock expires and resolve the lock.
Actually, we can use the CheckSecondaryLocks
API to achieve something like advancing min_commit_ts
.
We can add caller_start_ts
(like the one in the CheckTxnStatus
API) to the request. TiKV uses this timestamp to update its max_ts
.
message CheckSecondaryLocksRequest {
...
// The start timestamp of the transaction which this request is part of.
uint64 caller_start_ts = 4;
}
After the change, we don't need to wait until the TTL expires before checking secondary locks. If TTL is not expired, we set caller_start_ts
. Otherwise, we keep caller_start_ts
zero.
If caller_start_ts
is zero, CheckSecondaryLocks
writes rollbacks if the lock does not exist, which is the current behavior. If caller_start_ts
exists, CheckSecondaryLocks
needn't write anything if the lock does not exist.
After calling CheckSecondaryLocks
with caller_start_ts
, we can skip the locks of this transaction, just like we've advanced the min_commit_ts
of the transaction: If a following prewrite hits the same TiKV, its min_commit_ts
must be greater than the caller_start_ts
due to the updated max_ts
; if it sends to a different TiKV, the leader must have changed, so the max_ts
should have been updated to a more recent timestamp from PD and thus the commit_ts
should be also greater than the caller_start_ts
.
However, we still need to weigh the benefit and the cost. It is heavy to check all secondaries, so maybe we should not do this so eagerly (maybe after several backoffs?). And the benefit is not so big. Async commit transactions are typically small. It is unusual that prewriting them takes a long time. So personally I don't think it's a task of high priority.
@MyonKeminta believes there will be problems if and only if the computed ts is > PD's latest ts + 1. I'm not clear on the concrete problems this causes or if/how they can be mitigated.
Some ideas:
I'd like a 'docs quest' to be one of the first activities for the SIG.
We are keeping changing and improving our design during development of async commit. However now our design document is out-of-date, and doesn't match our real implementation well. We need to update the document, so people new to async commit can get start better.
As a solution to schema version check issue (#51), we added max_commit_ts
limit to async commit's prewrite requests. When the calculated min_commit_ts
exceeds the max_commit_ts
, the CommitTsTooLarge error will be thrown. We need to find a proper way to handle the CommitTsTooLarge error. Otherwise, when the load is high, the failure rate of async commit might be significant.
When TiDB receives CommitTsTooLarge error, check the schema version again.
In solution 1, if the load is high enough, it's still likely to fail after retry. Another choice is to fallback to non-async-commit transaction when CommitTsTooLarge error occurs. This might be more complicated to implement than solution 1. If we always rewrite the primary lock to non-async-commit lock first when falling back, the implementation might be easier. We should confirm the correctness first before adopting this way.
To implement async commit, we need to support using a timestamp that's not a globally-unique tso as a commit_ts. The hardest problem we meet is that the key collision in write cf. This problem was solved in this PR. What it does is:
has_overlapped_rollback
flag to it if this is a protected rollback, or do nothing if not protected.The problem is, I think, that we need to prove the second point above in a more strict way (like TLA+). For reference, here follows how we came into the conclusion that commit operations doesn't need to check overlapping rollbacks.
Consider we are performing a commit operation for transaction T2
on a key, and another transaction T1
also affects this key but rolled back.
First of all, we can push the max_read_ts
before performing rollbacks, so prewrites after the rollback will surely have its min_commit_ts
greater than T1.start_ts
. So we just need to consider what will happen if on one of the keys that are affected by both transactions, T1
's rollback operation happens between prewriting and committing of T2
.
T1
is an optimistic transaction, then its rollback record can be safely overwritten by T2
's commit, since it can still cause WriteConflict to T1
's later-coming prewrite requests.[1]T1
is a pessimistic transaction, then it need to write rollback records only when it has entered the prewrite phase, which means, it should have already successfully acquired all pessimisitc locks. So there's two cases about this, according to if T1
needs to acquire a pessimistic lock on the current key:
T1
needs acquiring pessimistic lock on the key, then the rollback must happen after acquiring pessimistic lock on the key. The pessimistic lock cannot be acquired after T2
prewriting, since T2
prewriting writes a lock on the key. If the pessimistic lock is acquired before T2
pewriting, and T2
's prewriting cannot succeed before T1
rolling back. So in this case it's impossible that T1
's rollback happens between T2
's prewrite and commit.T1
doesn't need the pessimistic lock, then it must be an index key, therefore T1
and T2
must have another key in common that needs acquiring pessimistic lock. Since the index key should not be T1
's primary key, so if T1
is a non-async-commit transaction, we don't need to care about if its rollback record on secondaries is overwritten. However if T1
is a async-commit transaction, it may need to write protected rollback record on secondaries when someone calls check_secondary_locks
on it. (WIP here...)[2][2]: I just found this case seems to be incorrect when I was drafting this issue. We need to confirm and find someway to fix it if necessary.
For announcements and so forth.
The TiKV has recently been launching a new feature in the transaction model named Async commit. It's an optimization that reduces the commit latency from 2PC to 1PC. However, this feature has not been reflected in the TLA+ Specs.
This issue tracks updating the spec and also running TLC model checker on the new model to gain more confidence on the async commit design.
TiKV needs better documentation. The transaction SIG will do its part by documenting the concepts and code behind the TiKV/TiDB transaction system.
If the quest is successful, it should be easier for
See the leader board for a ranking of documentation questers.
If you want advice, help, or review, the following people are available and will prioritize quest issues:
Existing documentation and other resources are listed in the doc directory.
To claim a task:
Tasks may or may not have a related issue. Tasks with names are being worked on by that user and should have an open issue for discussion. Boxes get ticked when the documentation task is complete.
Describe the underlying concepts of transactions (mostly be linking to other material, but we should organize and document that material and ensure it covers everything, filling in gaps where necessary). Describe at an algorithmic level the way transactions are implemented in TiKV.
Describe modules of the TiKV implementation. This should how the code works, why that design was chosen, and how it fits in with the larger transaction system. We should have documentation per-concept, which should be roughly a Rust module, but is unlikely to exactly match.
MongoDB has lots of great examples of module-level docs, e.g., storage.
Not every type and function needs a comment, but it would be great to document the more important and more complex code.
TiKV:
Engine
kv - tikv/tikv#8546 - @palash25Snapshot
kvIterator
kvMvccTxn
txnStore
storeScanner
storesched_txn_command
storageRust Client:
TiDB:
TODO more tasks
Async commit and 1PC are fundamental changes to the transaction protocol in TiDB. There will be new kinds of locks that need to be handled in new ways. Therefore, it is very hard for TiDB < 5.0 to handle these locks well, for example:
The BatchGet API adds a new response-level error for returning the memory lock (see tikv/tikv#9077). TiDB 5.0 will be able to read the memory lock so it can handle the case correctly. However, TiDB < 5.0 cannot see this newly added field and it requires all KV pairs to be returned in the pairs
field, which is what we cannot accomplish in TiKV 5.0 with async commit.
When there is one 5.0 TiDB instance while other instances are 4.0, during a rolling update, the 4.0 TiDB may read an async-commit lock prewritten from the 5.0 TiDB. However, TiDB 4.0 cannot handle these async-commit locks, it will just retry but never succeed. Then, this TiDB connection will hang for a long time and prevent a graceful shutdown during the rolling update.
What would you like to read in January 2021?
Nominate papers in the comments. Any research paper or white paper which covers distributed transactions or related work are suitable, both recent work and classics.
Constraints:
To post on the TiKV and my personal blog (and anywhere else we like). Announce the group, say what we do, how to join, etc. Initial activities.
Any other points we should be sure to add?
We should have (or check) at least:
Anything else?
It's found that currently CDC and TiFlash is still not fully compatible with async commit, in case the commit record is rewritten because of overlapping rollback.
TiFlash and TiSpark may need to resolve locks, and the logic for resolving async commit transactions is different from that for normal 2PC transactions. We need to check if TiFlash and TiSpark is affected by this, and adapt them if necessary. Clients in other languages need to be adapted too.
cc @sticnarf @youjiali1995
We are going to finish async commit in 5.0, and there's going to be another important feature in 5.0: Local/Global transaction in cross DC deployment.
In that feature, there might be multiple PDs in a cluster allocate timestamps simultaneously. There are two kinds of timestamps: local timestamps and global timestamps. A local timestamp can be guaranteed to be globally unique (although it's not yet implemented), but is not guaranteed to be globally monotonic. A global timestamp is guaranteed to be greater than all (global or local) allocated timestamps and less than all (global or local) timestamps that will be allocated. A transaction can be a local transaction (that uses a local timestamp) or a global transaction(that uses a global transaction).
The problem is, if multi-DC is used with async commit/1pc enabled, the commit_ts calculation becomes complicated. We can't record only one max_ts any more. For example, in this case:
Maybe we need to maintain a map (DC -> max_ts) instead of a single max_ts in TiKV. TiDB's requests to TiKV need to mark which DC it belongs to, or if it's a global transaction. TiKV updates (or gets) the max_ts corresponding to that DC, or update all max_ts-es (or gets the max one) if it's a global transaction. When leader transferring or region merging or something happens that needs updating the max_ts, get a global ts from PD and update all max_ts-es.
Then there might still be many corner cases. For example, the number of DC-es may changes dynamically, which introduces more complexity in maintaining max_ts. One of the ways is to record both local max_ts and global max_ts for ts calculation, like this:
struct MaxTs {
global_max_ts: AtomicU64,
local_max_ts: [AtomicU64; MAX_DC_COUNT],
}
Design docs, etc are in this repo in design/parallel-commit
max_read_ts
#45Open PRs:
Work in progress:
Goal is to demonstrate async commit which works and has roughly the expected performance profile, though not be performant. Some corner cases may not be correct. Demonstrate that the work is feasible and beneficial.
Currently in implementation.
Outstanding issues:
Address all outstanding issues. Handle corner cases and tools. Test, benchmark, and optimise.
Currently in implementation.
Outstanding issues:
Not blocking 5.0 release.
These things are not code-specific and span TiKV and its client. They would be best documented in a separate docs area, rather than inline in the code, and/or in blog posts.
Some ideas:
Due to the special writing procedure of 1PC transactions, 1PC is not compatible with CDC. It might be not compatible with other components either. We need to solve the problem.
Some ideas can be found here .
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.