Comments (19)
It is int64. Probably no danger?
you are right, if we are writing with speed 50k IOPs, it would take 5849424 years to go overflow. None of us can see that day :)))
from longhorn.
Update: a tiny small improvement to the revision counter logic
Current implementation
We are converting the int64 to a string then a slice of bytes https://github.com/longhorn/longhorn-engine/blob/be5e072e02cd959ab371a04264957624223b01fb/pkg/replica/revision_counter.go#L45-L46
The strconv.FormatInt
function seems to eat noticeable CPU time in profiling
Modification:
Using LittleEndian to encode and decode the int64:
func (r *Replica) writeRevisionCounter(counter int64) error {
if r.revisionFile == nil {
return fmt.Errorf("BUG: revision file wasn't initialized")
}
copy(revisionCounterBuf, int64ToBytes(counter))
_, err := r.revisionFile.WriteAt(revisionCounterBuf, 0)
if err != nil {
return errors.Wrap(err, "failed to write to revision counter file")
}
return nil
}
func int64ToBytes(n int64) []byte {
bytes := make([]byte, 8)
binary.LittleEndian.PutUint64(bytes, uint64(n))
return bytes
}
func bytesToInt64(bytes []byte) int64 {
return int64(binary.LittleEndian.Uint64(bytes))
}
Result
A small improvement in the write performance maybe 1-2%.
However, the implementation cost is big: we need to migrate and update from old decoding to new decoding.
I think this is not a good idea
from longhorn.
This is potentially the 3rd bottleneck:
- For some reason, we always have to do an alignment for each read requests which means go through this code path https://github.com/longhorn/longhorn-engine/blob/be5e072e02cd959ab371a04264957624223b01fb/vendor/github.com/longhorn/sparse-tools/sparse/file.go#L120-L123
- This doesn't happen for the write request. I.e., write request doesn't have to go through this code path https://github.com/longhorn/longhorn-engine/blob/be5e072e02cd959ab371a04264957624223b01fb/vendor/github.com/longhorn/sparse-tools/sparse/file.go#L133-L136
cc @shuo-wu @ejweber @derekbit
from longhorn.
This is potentially the 3rd bottleneck:
- For some reason, we always have to do an alignment for each read requests which means go through this code path https://github.com/longhorn/longhorn-engine/blob/be5e072e02cd959ab371a04264957624223b01fb/vendor/github.com/longhorn/sparse-tools/sparse/file.go#L120-L123
- This doesn't happen for the write request. I.e., write request doesn't have to go through this code path https://github.com/longhorn/longhorn-engine/blob/be5e072e02cd959ab371a04264957624223b01fb/vendor/github.com/longhorn/sparse-tools/sparse/file.go#L133-L136
We already know the input data []byte
needs to be aligned for direct IO.
Can we always allocate an aligned buffer for data
in the caller end for preventing extra data copy in the two functions?
from longhorn.
We already know the input data []byte needs to be aligned for direct IO.
Can we always allocate an aligned buffer for data in the caller end for preventing extra data copy in the two functions?Agree with this idea. I will give it a try to see if it can improve the read speed
@derekbit Looks like the performance gets worse. So we will not do this:
IOPS (Read/Write)
Random: 45,395 / 19,553
Sequential: 62,778 / 33,106
Bandwidth in KiB/sec (Read/Write)
Random: 714,421 / 348,190
Sequential: 976,918 / 353,571
Latency in ns (Read/Write)
Random: 821,898 / 455,709
Sequential: 814,332 / 456,455
from longhorn.
Update:
There is potentially another bottleneck in the read flow in the replica. For every read, we have to make a systemcall os.(*File).Stat
to get the size of the snaoshot file here. This systemcall is eating noticeable CPU. Do you think we can cache this snapshot size value and don't have to issue a systemcall for every read? @shuo-wu @derekbit @ejweber
from longhorn.
We already know the input data []byte needs to be aligned for direct IO.
Can we always allocate an aligned buffer for data in the caller end for preventing extra data copy in the two functions?
Agree with this idea. I will give it a try to see if it can improve the read speed
from longhorn.
Side track a little bit. Do you think there is a danger with integer overflow when we keep increasing the counter over here https://github.com/longhorn/longhorn-engine/blob/be5e072e02cd959ab371a04264957624223b01fb/pkg/replica/revision_counter.go#L143 ?
If we keep writing long enough, it might become negative
It is int64
. Probably no danger?
from longhorn.
@shuo-wu This is the CPU usage of strconv.FormatInt
that we were talking about in the sync up. It is small but noticeable
from longhorn.
For every read, we have to make a systemcall os.(*File).Stat to get the size of the snaoshot file here.
Good catch! I am fine with using the cached size. The snapshot immutability should be guaranteed by the ctime/checksum mechanism
from longhorn.
Do you think we should backport this to v1.6.x and v1.5.x? @shuo-wu @ejweber @derekbit
from longhorn.
Do you think we should backport this to v1.6.x and v1.5.x? @shuo-wu @ejweber @derekbit
I don't think there is much danger here, especially around compatibility, as the changes are all self-contained within the engine. On the other hand, connection count will increase (and possible CPU utilization as well) as a tradeoff for the performance. Maybe that's surprising in a patch update?
I lean slightly towards backporting, but I'm also fine with the gains being associated with a new minor version of Longhorn. I'll defer to whatever you decide.
from longhorn.
Side track a little bit. Do you think there is a danger with integer overflow when we keep increasing the counter over here https://github.com/longhorn/longhorn-engine/blob/be5e072e02cd959ab371a04264957624223b01fb/pkg/replica/revision_counter.go#L143 ?
If we keep writing long enough, it might become negative
from longhorn.
As dissed with @shuo-wu in the sync up. We rerun some of the Broadcom tests. The setup is the same setup in our initial Broadcom report in SUSE data center
Case 1: In a CPU/RAM limited constraints
- Random read IOPs test
- IOPs: similar
- CPU usage: similar
- Sequential read IOPs test
- IOPs: similar
- CPU usage: similar
- Random write IOPs test
- IOPs: increased from 2988 to 3150
- CPU usage: similar
- Sequential write IOPs test
- IOPs: similar
- CPU usage: similar
Case 2: In no CPU/RAM constraint and workload are NOT rate-limited
With master-head engine:
IOPS (Read/Write)
Random: 26,603 / 28,501
Sequential: 49,915 / 43,092
Bandwidth in KiB/sec (Read/Write)
Random: 1,587,307 / 505,325
Sequential: 1,827,467 / 535,291
Latency in ns (Read/Write)
Random: 272,445 / 260,084
Sequential: 273,663 / 257,785
With the PR:
IOPS (Read/Write)
Random: 26,946 / 33,542
Sequential: 50,484 / 51,998
Bandwidth in KiB/sec (Read/Write)
Random: 1,595,310 / 541,211
Sequential: 1,611,390 / 528,907
Latency in ns (Read/Write)
Random: 274,136 / 262,201
Sequential: 277,235 / 252,450
Note: in this case read performance somehow doesn't increase. I believe it is something weird with tgt on Photon RT OS. Even with tgt+local file backend, it is still give the same read performance as tgt+Longhorn => the bottleneck is at tgt
Conclusion
- Does the increasing number of connections between engine-replica hurt the resource-constrained env?
-> Looks like no
from longhorn.
Btw, this article is very helpful in understand a common mistake in go concurrency https://eli.thegreenplace.net/2019/go-internals-capturing-loop-variables-in-closures/
from longhorn.
After testing, the snapshot file caching idea doesn't seem to improve performance in practice even though sounds good in theory. CPU usage is also relatively the same. Let's abandon it. Sorry for the noise.
from longhorn.
Pre Ready-For-Testing Checklist
-
Where is the reproduce steps/test steps documented?
The reproduce steps/test steps are at: #8436 (comment) -
Is there a workaround for the issue? If so, where is it documented?
The workaround is at: -
Does the PR include the explanation for the fix or the feature?
-
Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart?
The PR for the YAML change is at:
The PR for the chart change is at: -
Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including
backport-needed/*
)?
The PR is at: longhorn/longhorn-engine#1085 -
Which areas/issues this PR might have potential impacts on?
Area Volume Read/Write
Issues -
If labeled: require/LEP Has the Longhorn Enhancement Proposal PR submitted?
The LEP PR is at -
If labeled: area/ui Has the UI issue filed or ready to be merged (includingbackport-needed/*
)?
The UI issue/PR is at -
If labeled: require/doc Has the necessary document PR submitted or merged (includingbackport-needed/*
)?
The documentation issue/PR is at -
If labeled: require/automation-e2e Has the end-to-end test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue (includingbackport-needed/*
)
The automation skeleton PR is at
The automation test case PR is at
The issue of automation test case implementation is at (please create by the template) -
If labeled: require/automation-engine Has the engine integration test been merged (includingbackport-needed/*
)?
The engine automation PR is at -
If labeled: require/manual-test-plan Has the manual test plan been documented?
The updated manual test plan is at -
If the fix introduces the code for backward compatibility Has a separate issue been filed with the labelrelease/obsolete-compatibility
?
The compatibility issue is filed at
from longhorn.
Test Plan:
- Create a cluster of 3 worker nodes with similar big spec like
equinix metal m3.small.x86, ubuntu 22.04, 5.15.0-101-generic, 20Gbps network
- deploy Longhorn 1.6.1
- Run kbench
- Upgrade Longhorn to master-head and upgrade the engine to master-head
- Run Kbench again
- Verify the you see better IOPs result
- See the issue description for example
from longhorn.
Verified on master-head 20240508
- longhorn master-head 8026d1a
- longhorn-engine master-head https://github.com/longhorn/longhorn-engine/commit/
The test steps
#8436 (comment)
Result Passed
- In the AWS EC2 t2.xlarge environment, I did not observe significant differences in IOPS.
Therefore, I have decided to build up a cluster in a local virtual machine to conduct this test.
The test result is as follows.
v1.6.1
=========================
FIO Benchmark Summary
For: test_device
CPU Idleness Profiling: disabled
Size: 30G
Quick Mode: disabled
=========================
IOPS (Read/Write)
Random: 10,329 / 4,684
Sequential: 17,677 / 10,204
Bandwidth in KiB/sec (Read/Write)
Random: 413,252 / 141,958
Sequential: 461,723 / 170,358
Latency in ns (Read/Write)
Random: 431,773 / 553,568
Sequential: 357,682 / 547,949
master-head
=========================
FIO Benchmark Summary
For: test_device
CPU Idleness Profiling: disabled
Size: 30G
Quick Mode: disabled
=========================
IOPS (Read/Write)
Random: 11,024 / 5,082
Sequential: 18,149 / 10,672
Bandwidth in KiB/sec (Read/Write)
Random: 415,402 / 137,600
Sequential: 445,611 / 167,325
Latency in ns (Read/Write)
Random: 409,800 / 525,464
Sequential: 342,326 / 525,112
Here is the summary table
IOPS (Random Read/Write) | IOPS (Sequential Read/Write) | |
---|---|---|
v1.6.1-1st | 10,329 / 4,684 | 17,677 / 10,204 |
master-head-1st | 11,024 / 5,082 | 18,149 / 10,672 |
Bandwidth KiB/sec (Random Read/Write) | Bandwidth KiB/sec (Sequential Read/Write) | |
---|---|---|
v1.6.1-1st | 413,252 / 141,958 | 461,723 / 170,358 |
master-head-1st | 415,402 / 137,600 | 445,611 / 167,325 |
Latency ns (Random Read/Write) | Latency ns (Sequential Read/Write) | |
---|---|---|
v1.6.1-1st | 431,773 / 553,568 | 357,682 / 547,949 |
master-head-1st | 409,800 / 525,464 | 342,326 / 525,112 |
Hi @PhanLe1010
I think we can close this ticket. If there are any concerns regarding the test results, please let me know.
from longhorn.
Related Issues (20)
- [BACKPORT][v1.6.3][IMPROVEMENT] Add setting to configure support bundle timeout for node bundle collection
- [BACKPORT][v1.5.6][IMPROVEMENT] Add setting to configure support bundle timeout for node bundle collection
- [TASK] Reference Architecture and Sizing Guidelines for Longhorn v1.7.x HOT 1
- [TEST] Investigate accessing lab behind vpn
- [BACKPORT][v1.6.3][IMPROVEMENT] System restore unable to restore volume with backing image HOT 1
- [BUG] Longhorn cifs backups cannot find credentials HOT 9
- [DOC] Incorrect and invalid links HOT 1
- Expanding the volume through UI but not reflecting it in backend. HOT 1
- [TEST][BUG] system restore stuck because of the volume/PV/PVC restoration
- [BACKPORT][v1.6.3][IMPROVEMENT] Improve and simplify chart values.yaml HOT 1
- [BACKPORT][v1.5.6][IMPROVEMENT] Improve and simplify chart values.yaml HOT 1
- Longhorn 1.6.2 - pvc is not ready for workloads HOT 1
- [BUG] Failed to delete a v2 orphan replica
- [FEATURE] Automatically attach the volumes for trimming filesystem HOT 1
- [TEST][FEATURE] Automatically attach the volumes for trimming filesystem
- [BUG] Fresh RWX volume on a fresh cluster install fails to ever mount (dual stack, IPv6-first cluster)
- [UI][IMPROVEMENT] Tweak some minor UI issues
- [UI][FEATURE] Multiple backup stores support
- [BUG] Request for a guide to the longhorn metric
- [BUG] System backup failed because backup creation failed. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from longhorn.