nviennot / core-to-core-latency Goto Github PK
View Code? Open in Web Editor NEWMeasures the latency between CPU cores
License: MIT License
Measures the latency between CPU cores
License: MIT License
Num cores: 32
Num iterations per samples: 1000
Num samples: 300
1) CAS latency on a single shared cache line
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
0
1 21±0
2 20±0 19±0
3 19±0 17±0 18±0
4 20±0 19±0 19±0 18±0
5 19±0 18±0 18±0 17±0 18±0
6 19±0 18±0 18±0 17±0 18±0 17±0
7 18±0 17±0 16±0 15±0 17±0 16±0 15±0
8 84±0 82±0 83±0 82±0 82±0 82±0 81±0 81±0
9 83±0 82±0 81±0 81±0 81±0 81±0 81±0 80±0 22±0
10 83±0 82±0 83±0 82±0 82±0 81±0 81±0 80±0 20±0 19±0
11 81±0 81±0 82±0 80±0 80±0 79±0 80±0 79±0 19±0 18±0 18±0
12 83±0 81±0 81±0 80±0 81±0 80±0 80±0 79±0 21±0 20±0 19±0 19±0
13 81±0 80±0 80±0 79±0 80±0 80±0 79±0 78±0 19±0 19±0 18±0 17±0 18±0
14 82±0 81±0 81±0 80±0 81±0 80±0 80±0 79±0 19±0 18±0 18±0 17±0 18±0 17±0
15 81±0 79±0 80±0 79±0 79±0 79±0 78±0 78±0 18±0 17±0 17±0 16±0 17±0 16±0 16±0
16 8±0 21±0 20±0 19±0 22±0 19±0 19±0 18±0 84±0 83±0 83±0 82±0 82±0 83±1 82±0 81±0
17 21±0 7±0 19±0 18±0 19±0 19±0 18±0 16±0 82±0 82±0 82±0 81±0 81±0 80±0 81±0 79±0 21±0
18 19±0 19±0 7±0 18±0 19±0 18±0 18±0 17±0 83±0 81±0 82±0 81±0 81±0 80±0 81±0 80±0 20±0 19±0
19 19±0 18±0 18±0 7±0 18±0 17±0 17±0 16±0 82±0 81±0 81±0 80±0 81±0 80±0 80±0 79±0 19±0 18±0 18±0
20 21±0 19±0 19±0 18±0 7±0 17±0 18±0 17±0 83±0 82±0 82±0 80±0 81±0 80±0 81±0 80±0 20±0 19±0 19±0 18±0
21 19±0 18±0 18±0 17±0 18±0 7±0 17±0 16±0 82±0 81±0 81±0 79±0 80±0 79±0 80±0 79±0 19±0 19±0 18±0 17±0 18±0
22 19±0 18±0 18±0 17±0 18±0 17±0 7±0 16±0 82±0 81±0 81±0 80±0 80±0 79±0 80±0 79±0 19±0 18±0 18±0 16±0 18±0 17±0
23 18±0 16±0 17±0 15±0 16±0 15±0 16±0 7±0 81±0 80±0 80±0 79±0 79±0 79±0 79±0 78±0 18±0 17±0 16±0 16±0 17±0 15±0 16±0
24 83±0 83±0 83±0 82±0 83±0 82±0 82±0 81±0 7±0 21±0 20±0 19±0 22±0 19±0 19±0 18±0 84±0 83±0 83±0 82±0 83±0 82±0 82±0 81±0
25 83±0 82±0 81±0 81±0 82±0 81±0 81±0 80±0 22±0 7±0 19±0 18±0 19±0 19±0 18±0 17±0 83±0 82±0 82±0 81±0 82±0 81±0 81±0 80±0 21±0
26 84±0 82±0 82±0 82±0 82±0 81±0 81±0 80±0 20±0 19±0 7±0 18±0 19±0 18±0 18±0 17±0 83±0 82±0 82±0 81±0 82±0 81±0 81±0 80±0 20±0 19±0
27 82±0 81±0 81±0 80±0 80±0 80±1 80±0 79±0 19±0 18±0 18±0 7±0 19±0 17±0 17±0 16±0 82±0 81±0 81±0 80±0 81±0 79±0 80±0 79±0 19±0 18±0 18±0
28 82±0 81±0 81±0 80±0 81±0 80±0 81±0 79±0 21±0 20±0 19±0 19±0 7±0 18±0 18±0 17±0 82±0 82±0 81±0 81±0 81±0 80±0 80±0 79±0 21±0 20±0 19±0 19±0
29 81±0 80±0 80±0 79±0 80±0 79±0 80±0 79±0 19±0 19±0 18±0 17±0 18±0 7±0 17±0 16±0 81±0 80±0 81±0 79±0 80±0 79±0 79±0 78±0 19±0 19±0 18±0 17±0 18±0
30 82±0 81±0 81±0 80±0 81±0 80±0 80±0 79±0 19±0 18±0 18±0 17±0 18±0 17±0 7±0 16±0 82±0 81±0 81±0 80±0 81±0 80±0 80±0 80±0 19±0 18±0 18±0 17±0 18±0 17±0
31 81±0 80±0 79±0 79±0 79±0 79±0 79±0 78±0 18±0 17±0 17±0 16±0 17±0 16±0 16±0 7±0 81±0 80±0 80±0 80±0 79±0 79±0 78±0 78±0 18±0 17±0 17±0 16±0 17±0 16±0 16±0
Min latency: 7.1ns ±0.0 cores: (20,4)
Max latency: 84.4ns ±0.3 cores: (24,16)
Mean latency: 50.1ns
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Vendor ID: AuthenticAMD
Model name: AMD Ryzen 9 5950X 16-Core Processor
CPU family: 25
Model: 33
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
Stepping: 0
BogoMIPS: 6786.99
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse
4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs i
bpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_s
ave tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm
Virtualization features:
Virtualization: AMD-V
Caches (sum of all):
L1d: 512 KiB (16 instances)
L1i: 512 KiB (16 instances)
L2: 8 MiB (16 instances)
L3: 64 MiB (2 instances)
Vulnerabilities:
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Retbleed: Not affected
Spec store bypass: Vulnerable
Spectre v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
Spectre v2: Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected
Srbds: Not affected
Tsx async abort: Not affected
OC: No (stock)
Num cores: 8
Num iterations per samples: 5000
Num samples: 300
1) CAS latency on a single shared cache line
0 1 2 3 4 5 6 7
0
1 47±1
2 320±2 277±1
3 322±3 222±1 42±0
4 332±2 326±3 331±2 297±2
5 264±2 239±1 318±2 224±1 42±0
6 323±3 340±2 335±2 337±3 339±2 271±1
7 248±1 233±1 256±1 251±1 262±1 223±1 42±0
Min latency: 41.9ns ±0.3 cores: (3,2)
Max latency: 339.6ns ±2.5 cores: (6,1)
Mean latency: 252.2ns
Num cores: 8
Num iterations per samples: 30000
Num samples: 1000
1) CAS latency on a single shared cache line
0 1 2 3 4 5 6 7
0
1 41±0
2 276±1 249±1
3 307±1 237±0 42±0
4 260±1 248±1 241±1 269±1
5 333±1 221±0 320±1 246±0 40±0
6 289±1 350±1 255±0 274±1 255±0 273±1
7 327±1 250±0 267±1 222±0 328±1 236±0 41±0
Min latency: 40.4ns ±0.1 cores: (5,4)
Max latency: 350.1ns ±1.1 cores: (6,1)
Mean latency: 239.2ns
Here's result on i5-8600T
Min latency: 32.9ns ±0.0 cores: (5,2)
Max latency: 35.8ns ±0.0 cores: (1,0)
Mean latency: 34.2ns
,,,,,
35.829233900000006,,,,,
34.96898663333334,34.5728319,,,,
35.10521058333333,35.07162216666667,33.275935483333335,,,
34.02019273333333,33.00294551666667,34.16384906666668,33.78305628333333,,
33.94283296666667,35.17520166666666,32.87415983333332,34.19390786666667,33.41371766666667,
Here's some data from Intel Xeon CPU E5-2695
Command:
core-to-core-latency --csv 30000 1000 > result.csv
Min latency: 35.2ns ±0.0 cores: (35,31)
Max latency: 161.3ns ±2.3 cores: (20,13)
Mean latency: 89.7ns
If a core is masked through affinity, it will be correctly skipped by the process, and in the stdout it will be correctly identified as skipped. However, the csv output does not retain any cpu-ids. So if you'd have a 4-core system with one cpu masked out, the reported IDs are wrong in the csv and the notebook. One solution could be to make the first number in the csv to be the core id.
AMD Ryzen 5 5600X 6-Core Processor.csv
processor : 0
vendor_id : AuthenticAMD
cpu family : 25
model : 33
model name : AMD Ryzen 5 5600X 6-Core Processor
stepping : 2
microcode : 0xa201205
cpu MHz : 2200.000
cache size : 512 KB
physical id : 0
siblings : 12
core id : 0
cpu cores : 6
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 16
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm
bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass
bogomips : 7400.61
TLB size : 2560 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
Intel Celeron 1005M 2-Core Processor.csv
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 58
model name : Intel(R) Celeron(R) CPU 1005M @ 1.90GHz
stepping : 9
microcode : 0x21
cpu MHz : 1485.327
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave lahf_lm cpuid_fault epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm arat pln pts md_clear flush_l1d
vmx flags : vnmi preemption_timer invvpid ept_x_only flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 3791.30
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
Hey, leaving another sample for the system with the same CPU (SMT enabled) as from the #15 (run with 5000 sample iteration size like specified in the README), the results deviate more and are less uniform. I also have cores behaving like the ones from more-performant group in the worse performing one. output.csv.
Also, one time (attached screenshot, I sadly overwritten the results) the "better" performing cores were in 1-12 group instead of 13-24 so the CPU or OS is changing these.
Here's some data for early 2010-era RISC machines that I have access to.
I think those processors aren't really relevant anymore but it's still interesting to see how the numbers have improved in the last ten years or so.
Architecture: ppc64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Big Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Model name: POWER7 (architected), altivec supported
Model: 2.1 (pvr 003f 0201)
Thread(s) per core: 4
Core(s) per socket: 8
Socket(s): 2
Virtualization features:
Hypervisor vendor: pHyp
Virtualization type: para
Caches (sum of all):
L1d: 512 KiB (16 instances)
L1i: 512 KiB (16 instances)
L2: 4 MiB (16 instances)
L3: 64 MiB (16 instances)
NUMA:
NUMA node(s): 2
NUMA node0 CPU(s): 0-31
NUMA node1 CPU(s): 32-63
Vulnerabilities:
Itlb multihit: Not affected
L1tf: Mitigation; RFI Flush
Mds: Not affected
Meltdown: Mitigation; RFI Flush
Mmio stale data: Not affected
Retbleed: Not affected
Spec store bypass: Mitigation; Kernel entry/exit barrier (fallback)
Spectre v1: Mitigation; __user pointer sanitization
Spectre v2: Vulnerable
Srbds: Not affected
Tsx async abort: Not affected
Result for inter-core latencies:
,,,,,,,,,,,,,,,
163.33049233333332,,,,,,,,,,,,,,,
186.66341966666667,189.99669466666668,,,,,,,,,,,,,,
169.99704300000002,173.33031800000003,176.6635936666667,,,,,,,,,,,,,
159.9972166666667,156.66394133333336,186.66341933333334,173.33031833333334,,,,,,,,,,,,
166.66376733333334,179.9968686666667,169.99704300000002,176.66359333333335,176.66359333333338,,,,,,,,,,,
169.9970426666667,176.66359333333335,173.33031800000003,156.66394133333336,176.66359333333335,169.99704233333333,,,,,,,,,,
169.9970426666667,176.66359333333338,169.99704266666672,156.66394133333336,176.6635936666667,173.330318,156.66394133333338,,,,,,,,,
439.99234599999994,433.32579533333325,446.6588963333332,433.32579533333325,439.9923459999999,436.6590706666666,433.3257949999999,433.32579533333325,,,,,,,,
443.3256213333332,443.3256213333332,436.6590703333333,446.65889666666646,443.3256213333332,439.9923459999999,443.325621,443.32562133333323,173.33031833333337,,,,,,,
446.6588963333332,443.328348,433.32931233333335,443.3292196666667,443.32922,446.6625220000001,443.32922,443.3292196666667,186.66493466666668,169.9984226666667,,,,,,
443.32922,443.3292196666667,436.662615,439.9959176666667,443.3292196666667,439.9959176666667,439.99591733333335,439.99591733333335,186.66493466666668,169.9984226666667,166.66512033333333,,,,,
443.3292196666667,443.32921966666675,433.3293126666667,439.99591733333335,443.32922000000013,433.32931233333335,439.99591733333335,433.32931266666674,163.33181766666667,173.331725,183.33163233333335,169.99842266666667,,,,
446.6625223333333,443.32921966666675,436.66261533333335,446.66252199999997,446.6625223333333,439.99591733333335,446.6625223333334,449.99582466666664,173.331725,159.99851533333333,163.3318176666667,173.331725,176.66502733333328,,,
433.3293126666666,433.3293126666667,446.6625223333333,429.99601,429.9960103333333,439.99591733333335,439.99591733333335,439.9959176666667,166.66512033333333,186.66493433333335,173.331725,179.9983296666667,183.33163233333332,183.33163233333335,,
439.99591733333335,443.32955133333326,439.9969656666666,443.3302759999999,443.3302759999999,443.33027633333325,443.3302759999999,446.6635863333332,183.332069,169.99882766666664,159.99889666666667,169.99882766666664,186.66537933333328,169.99882766666667,173.332138,
Cores 0-7 are in the first socket, while cores 8-15 are in the second socket.
Result for intra-core SMT/hyperthread latencies:
,,,
63.33223166666668,,,
69.99878233333334,73.33205766666667,,
69.99878233333334,69.99878266666667,89.99843433333332,
This processor has SMT4 cores (i.e each physical core can run four threads concurrently).
Architecture: sparc64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Big Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Model name: UltraSparc T4 (Niagara4)
Thread(s) per core: 8
Core(s) per socket: 8
Socket(s): 2
Flags: sun4v
Caches (sum of all):
L1d: 2 MiB (128 instances)
L1i: 2 MiB (128 instances)
L2: 16 MiB (128 instances)
Result for inter-core latencies:
,,,,,,,,,,,,,,,
98.33333133333333,,,,,,,,,,,,,,,
98.333331,99.33333133333332,,,,,,,,,,,,,,
98.33333133333333,99.66666433333332,98.33333133333333,,,,,,,,,,,,,
97.66666466666666,98.99999766666666,98.999998,98.66666466666666,,,,,,,,,,,,
97.66666466666666,98.99999799999999,98.66666466666666,98.33333133333333,98.333331,,,,,,,,,,,
97.33333133333333,98.999998,98.99999766666666,98.99999799999999,97.999998,97.999998,,,,,,,,,,
97.66666433333333,98.999998,98.999998,98.99999766666666,97.999998,97.999998,98.33333133333333,,,,,,,,,
355.9999926666667,355.99999233333335,355.666659,355.9999926666667,355.9999926666667,355.99999233333335,355.99999233333335,355.9999926666667,,,,,,,,
355.66665933333337,355.99999233333335,355.99999233333335,355.66665933333337,355.99999233333335,355.99999233333335,355.99999233333335,355.9999926666666,97.99999766666666,,,,,,,
355.9999926666667,355.99999233333335,355.66665900000004,355.9999923333333,355.9999926666667,355.99999233333335,355.66665900000004,355.99999233333335,99.33333133333333,97.999998,,,,,,
355.99999233333335,355.9999923333333,355.6666593333333,355.6666593333333,355.99999233333335,355.9999923333333,355.66665933333337,355.6666593333333,99.66666433333333,97.999998,99.33333133333332,,,,,
356.33332566666667,355.66665900000004,355.9999926666667,355.99999233333335,355.66665900000004,356.33332566666667,355.66665933333337,355.99999233333335,98.66666466666666,98.99999766666666,98.999998,98.99999799999999,,,,
355.99999233333335,355.99999233333335,355.66665900000004,355.9999926666667,355.99999233333335,355.99999233333335,355.66665900000004,355.9999926666667,98.99999766666666,98.999998,98.99999799999999,98.99999766666666,99.66666466666666,,,
355.9999923333333,355.9999926666667,355.66665933333337,355.99999233333335,355.666659,355.9999926666667,355.9999926666667,355.99999233333335,98.99999799999999,98.99999766666666,98.999998,98.99999799999999,97.999998,97.999998,,
355.99999233333335,355.6666593333333,355.99999233333335,355.66665900000004,355.9999926666667,355.9999926666667,355.99999233333335,355.99999233333335,98.999998,98.999998,98.99999766666666,98.99999799999999,97.999998,97.99999766666666,99.66666466666666,
As with before, cores 0-7 are in the first socket, while cores 8-15 are in the second socket.
Result for intra-core SMT/hyperthread latencies:
,,,,,,,
23.99999933333333,,,,,,,
23.666666333333332,23.666666,,,,,,
23.666666333333332,23.666666,23.666666333333332,,,,,
23.666666,23.666666333333332,23.666666,23.666666333333332,,,,
23.666666,23.666666333333332,23.666666,23.666666333333332,23.99999933333333,,,
23.999999666666668,23.99999933333333,23.99999933333333,23.666666333333332,23.666666,23.666666333333332,,
23.666666,23.666666333333332,23.666666,23.999999666666668,23.99999933333333,23.999999666666668,23.99999933333333,
This processor has SMT8 cores (i.e each physical core can run eight threads concurrently).
My result
output.csv
Num cores: 4
Num iterations per samples: 5000
Num samples: 300
1) CAS latency on a single shared cache line
0 1 2 3
0
1 90±0
2 89±0 90±0
3 90±0 89±0 90±0
Min latency: 89.0ns ±0.0 cores: (2,0)
Max latency: 90.0ns ±0.3 cores: (1,0)
Mean latency: 89.6ns
Ref.: #26
Apperently macos sometimes does not respect affinity settings so i don't know how accurate is it. Probably a asahi linux run would be more accurate
output.csv
See attached.
pi@raspberrypi:~ $ cat /proc/cpuinfo | grep Model
Model : Raspberry Pi 3 Model B Rev 1.2
Not sure if this processor is still relevant, but just for fun here's my data:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz
CPU family: 6
Model: 60
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Stepping: 3
CPU(s) scaling MHz: 54%
CPU max MHz: 3700.0000
CPU min MHz: 800.0000
BogoMIPS: 6584.98
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush d
ts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc
arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pc
lmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid
sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand la
hf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi
flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpci
d xsaveopt dtherm ida arat pln pts md_clear flush_l1d
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 128 KiB (4 instances)
L1i: 128 KiB (4 instances)
L2: 1 MiB (4 instances)
L3: 6 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-3
Vulnerabilities:
Itlb multihit: KVM: Mitigation: VMX disabled
L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT disabled
Mds: Mitigation; Clear CPU buffers; SMT disabled
Meltdown: Mitigation; PTI
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP disabled, RSB filling
Srbds: Mitigation; Microcode
Tsx async abort: Not affected
Generated with core-to-core-latency 5000 --csv > output.csv
:
,,,
21.46404933333333,,,
20.476732,20.671329333333333,,
21.57587133333333,20.916239666666666,21.389124666666664,
Output:
Num cores: 16
Using RDTSC to measure time: false
Num round trips per samples: 1000
Num samples: 300
Showing latency=round-trip-time/2 in nanoseconds:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
1 67±6
2 260±6 253±6
3 260±6 253±6 60±6
4 260±6 253±6 260±6 260±6
5 260±6 253±6 260±6 267±6 60±6
6 253±6 267±6 260±6 253±6 253±6 267±6
7 247±6 253±6 260±6 260±6 253±6 260±6 60±6
8 260±6 247±6 247±6 247±6 253±6 260±6 253±6 253±6
9 260±6 247±6 240±6 260±6 253±6 253±6 253±6 253±6 60±6
10 260±6 247±6 260±6 260±6 260±6 273±6 260±6 267±6 260±6 253±6
11 260±6 253±6 267±6 260±6 253±6 260±6 267±6 267±6 260±6 253±6 67±6
12 253±6 253±6 273±6 247±6 273±6 247±6 260±6 260±6 253±6 247±6 267±6 260±6
13 253±6 253±6 267±6 253±6 253±6 260±6 267±6 260±6 253±6 247±6 260±6 267±6 60±6
14 273±6 260±6 260±6 260±6 267±6 253±6 247±6 247±6 260±6 253±6 273±6 247±6 253±6 253±6
15 260±6 247±6 267±6 253±6 253±6 253±6 267±6 260±6 267±6 247±6 260±6 260±6 260±6 260±6 60±6
Min latency: 60.0ns ±19.7 cores: (15,14)
Max latency: 273.3ns ±39.7 cores: (10,5)
Mean latency: 244.4ns
,,,,,,,,,,,,,,,
66.6688,,,,,,,,,,,,,,,
260.00832333333335,253.34144166666664,,,,,,,,,,,,,,
260.00831,253.34143,60.00191833333333,,,,,,,,,,,,,
260.00831,253.34143,260.00831,260.00831,,,,,,,,,,,,
260.0082983333333,253.34141833333334,260.0082966666667,266.6751766666667,60.001915,,,,,,,,,,,
253.34141666666667,266.6751766666667,260.0082883333333,253.34140499999998,253.34140499999998,266.6751633333333,,,,,,,,,,
246.674525,253.34140499999995,260.00828,260.00827,253.34139333333331,260.00827166666664,60.001908333333326,,,,,,,,,
260.00827166666664,246.6745133333333,246.67451333333335,246.67450166666666,253.34137999999993,260.0082583333333,253.34138000000002,253.34137999999996,,,,,,,,
260.00825833333334,246.67449333333323,240.00761166666663,260.00824500000004,253.34136833333324,253.3413666666667,253.3413666666667,253.34136833333332,60.0019,,,,,,,
260.0082333333333,246.6744766666667,260.0082316666667,260.00823333333335,260.00823333333335,273.3419866666666,260.0082216666667,266.6750983333334,260.00821833333333,253.3413433333333,,,,,,
260.00822000000005,253.34134166666666,266.67509,260.00820666666664,253.34132999999997,260.00820666666664,266.675085,266.6750833333333,260.00820500000003,253.3413183333334,66.66876666666667,,,,,
253.34131833333333,253.34131666666667,273.3419483333334,246.67444000000012,273.3419466666667,246.6744300000001,260.0081816666667,260.00818,253.3413050000001,246.6744283333334,266.67505833333337,260.008175,,,,
253.3412916666667,253.34129166666665,266.675045,253.3412916666667,253.34129333333337,260.00816833333334,266.6750316666666,260.00815500000004,253.34127999999998,246.674405,260.00815500000004,266.67503166666665,60.00188166666667,,,
273.3418966666667,260.00814333333335,260.00814333333335,260.00814333333335,266.67501833333336,253.34126666666668,246.674385,246.67438000000004,260.00813000000005,253.34125500000005,273.34188000000006,246.67438000000004,253.34125333333336,253.3412433333333,,
260.0081183333333,246.67436666666666,266.6749933333333,253.3412433333333,253.34124166666666,253.34123666666667,266.67497999999995,260.008105,266.6749783333334,246.67435666666668,260.00810499999994,260.00810333333334,260.00809166666664,260.0080933333333,60.00186666666666,
Output:
Num cores: 16
Using RDTSC to measure time: false
Num round trips per samples: 5000
Num samples: 300
Showing latency=round-trip-time/2 in nanoseconds:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
1 61±6
2 265±6 255±6
3 241±6 265±6 61±6
4 256±6 251±6 253±6 253±6
5 256±6 253±6 264±6 253±6 61±6
6 245±6 245±6 261±6 255±6 241±6 261±6
7 257±6 245±6 261±6 256±6 240±6 265±6 61±6
8 264±6 236±6 249±6 264±6 253±6 253±6 247±6 245±6
9 251±6 263±6 248±6 264±6 252±6 252±6 256±6 249±6 61±6
10 253±6 261±6 261±6 257±6 240±6 265±6 244±6 267±6 252±6 252±6
11 255±6 261±6 256±6 263±6 268±6 240±6 243±6 260±6 252±6 251±6 61±6
12 249±6 263±6 252±6 263±6 261±6 261±6 256±6 248±6 248±6 257±6 260±6 260±6
13 263±6 260±6 251±6 252±6 245±6 264±6 260±6 249±6 259±6 247±6 259±6 259±6 61±6
14 260±6 265±6 260±6 255±6 259±6 243±6 247±6 251±6 249±6 256±6 257±6 261±6 255±6 252±6
15 245±6 244±6 256±6 255±6 243±6 264±6 252±6 260±6 245±6 244±6 257±6 247±6 255±6 252±6 61±6
Min latency: 61.3ns ±8.3 cores: (1,0)
Max latency: 268.0ns ±10.9 cores: (11,4)
Mean latency: 241.4ns
,,,,,,,,,,,,,,,
61.33257533333333,,,,,,,,,,,,,,,
265.33006000000006,254.66353166666667,,,,,,,,,,,,,,
241.33036766666666,265.33007833333335,61.33258233333334,,,,,,,,,,,,,
255.99686733333337,250.6636056666667,253.33024666666674,253.3302516666667,,,,,,,,,,,,
255.99689166666673,253.33026366666672,263.99680800000004,253.33027633333336,61.332594,,,,,,,,,,,
245.33037966666674,245.3303843333334,261.33019900000005,254.66361933333337,241.33045033333337,261.33021699999995,,,,,,,,,,
257.3302706666666,245.3304206666666,261.330236,255.99697133333325,239.9971653333333,265.3302076666666,61.332611,,,,,,,,,
263.9968979999999,235.99723199999988,249.3304133333332,263.9969143333333,253.33037966666654,253.3303853333333,246.66380133333325,245.3304883333333,,,,,,,,
250.66376666666665,262.66363366666656,247.99714166666664,263.99696266666655,251.99710699999991,251.99711333333326,255.99707299999994,249.33048766666664,61.33263466666666,,,,,,,
253.33044866666663,261.33036433333325,261.3303713333333,257.33042200000006,239.9972893333334,265.3303423333334,243.99725633333344,266.66367366666674,251.99717666666675,251.99718166666673,,,,,,
254.66382566666672,261.3304236666667,255.99715500000005,262.6637533333334,267.9970336666667,239.99735033333334,242.66399200000004,259.997139,251.9972326666667,250.66392066666668,61.33266166666667,,,,,
249.33060866666668,262.66380133333337,251.9972563333334,262.6638123333334,261.330501,261.33050633333335,255.9972356666667,247.99732700000004,247.99733300000008,257.330572,259.99721500000004,259.99722066666675,,,,
262.66386400000005,259.99723300000005,250.66400400000003,251.99732800000004,245.33073666666672,263.99721300000004,259.99726066666665,249.33071133333337,258.66395133333333,246.66408333333337,258.66396366666663,258.6639686666666,61.33269433333332,,,
259.9972946666667,265.3305783333333,259.9973056666666,254.66403433333332,258.66399799999994,242.66416733333327,246.66413099999997,250.66409633333325,249.33078166666658,255.99738466666662,257.3307093333333,261.33067533333326,254.6640819999999,251.99744699999997,,
245.33085199999996,243.99753833333332,255.99742266666664,254.66410766666664,242.66423199999994,263.997358,251.99748399999996,259.9974086666667,245.33089266666667,243.99757766666667,257.33078466666666,246.6642283333334,254.6641536666667,251.99751833333343,61.33273066666666,
Adds results for a baseline M1 Pro running Asahi Linux
core-to-core-latency 5000 --csv > output.csv
Num cores: 8
Using RDTSC to measure time: false
Num round trips per samples: 5000
Num samples: 300
Showing latency=round-trip-time/2 in nanoseconds:
0 1 2 3 4 5 6 7
0
1 57±3
2 146±3 147±3
3 151±3 145±3 48±3
4 147±3 148±3 46±3 45±3
5 156±3 161±3 167±3 148±3 157±3
6 136±3 138±3 166±3 155±3 158±3 47±3
7 138±3 152±3 158±3 154±3 152±3 43±3 41±3
Min latency: 41.0ns ±2.8 cores: (7,6)
Max latency: 166.7ns ±2.9 cores: (5,2)
Mean latency: 125.2ns
core-to-core-latency 5000 --csv > output.csv 21.150 user 0.006 system 199% cpu (10.601 wasted time).
,,,,,,,
57.33422566666668,,,,,,,
145.66893533333334,147.0022993333333,,,,,,
151.002362,144.6689403333333,48.33409300000001,,,,,
146.66897233333333,147.66899899999999,46.334065,45.33404966666667,,,,
156.0024693333333,161.002555,166.66931599999998,148.335698,157.33584400000007,,,
136.00217733333335,137.6688706666667,165.66932900000003,154.6691533333333,158.00254866666666,46.66742000000001,,
137.66888899999998,152.00246233333328,157.66922166666663,154.3358413333333,151.66913366666665,43.334038333333346,41.000668333333344,
uname -a
Linux epk-asahi 5.19.0-asahi-5-1-ARCH #1 SMP PREEMPT_DYNAMIC Sat, 20 Aug 2022 09:23:11 +0000 aarch64 GNU/Linux
lscpu
Architecture: aarch64
CPU op-mode(s): 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Vendor ID: Apple
Model name: -
Model: 0
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 1
Stepping: 0x2
CPU(s) scaling MHz: 100%
CPU max MHz: 2064.0000
CPU min MHz: 600.0000
BogoMIPS: 48.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt f
cma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg
dcpodp flagm2 frint
Model name: -
Model: 0
Thread(s) per core: 1
Core(s) per socket: 3
Socket(s): 2
Stepping: 0x2
CPU(s) scaling MHz: 50%
CPU max MHz: 3036.0000
CPU min MHz: 600.0000
BogoMIPS: 48.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt f
cma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg
dcpodp flagm2 frint
Vulnerabilities:
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Retbleed: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; __user pointer sanitization
Spectre v2: Not affected
Srbds: Not affected
Tsx async abort: Not affected
I tested the latency on M2 Macbook Air.
When I tested in Asahi Linux, the performance was not so reasonably.
,,,,,,,
67.14945294999993,,,,,,,
67.34945848333324,67.49946433333325,,,,,,
67.41613864999997,67.23281358333334,67.31615300000004,,,,,
178.53200283333322,176.4820619666668,176.41543756666658,174.71548903333348,,,,
177.74884061666643,177.46554656666677,174.28226821666678,174.03230284999995,38.24977780000004,,,
173.86567576666653,173.11570996666654,172.48240873333353,176.34908294999994,39.799796883333315,39.91646433333331,,
176.5824546166666,179.0991352166667,174.7825140666666,175.28253513333323,40.1998201833333,40.199821316666636,39.266493283333304,
Lenovo P14s (second gen) AMD 5850U 8-core, base clock 1.9ghz, boost 4.5ghz (despite AMD saying 4.4).
Debian bookworm, kernel 5.18.16-1 (Debian).
Num cores: 16
Using RDTSC to measure time: true
Num round trips per samples: 5000
Num samples: 1000
Showing latency=round-trip-time/2 in nanoseconds:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
1 8±0
2 21±0 17±0
3 20±0 19±0 8±0
4 18±0 20±0 20±0 19±0
5 20±0 17±0 19±0 19±0 8±0
6 20±0 17±0 19±0 22±0 20±0 19±0
7 21±0 21±0 19±0 17±0 20±0 20±0 8±0
8 17±0 17±0 20±0 18±0 18±0 22±0 19±0 19±0
9 17±0 17±0 19±0 20±0 18±0 18±0 19±0 20±0 8±0
10 19±0 20±0 21±0 19±0 19±0 19±0 20±0 21±0 19±0 20±0
11 20±0 20±0 19±0 19±0 19±0 19±0 17±0 20±0 19±0 20±0 8±0
12 19±0 19±0 20±0 20±0 19±0 21±0 20±0 19±0 22±0 20±0 17±0 19±0
13 18±0 18±0 19±0 19±0 18±0 19±0 20±0 19±0 22±0 17±0 21±0 19±0 8±0
14 21±0 21±0 22±0 21±0 20±0 21±0 19±0 22±0 19±0 22±0 22±0 23±0 21±0 21±0
15 19±0 19±0 18±0 20±0 18±0 20±0 20±0 20±0 20±0 21±0 19±0 18±0 20±0 17±0 8±0
Min latency: 7.8ns ±0.0 cores: (7,6)
Max latency: 22.8ns ±0.0 cores: (14,11)
Mean latency: 18.7ns
Results: 5850u.csv
Stock clocks: Performance max4.9GHz Efficient max3.6GHz
output-stock-hz.csv
AI Overclock (Azus MB): P max5.4GHz for cores 0,1,2 , max5.1GHz cores 3,4,5 , E max4.1GHz cores 6,7,8,9 (maybe slightly higher)
output-overclock.csv
Thanks for putting together this project!
Qualcomm Snapdragon 855+ @ Wikichip
Output
Num cores: 8
Num iterations per samples: 5000
Num samples: 300
1) CAS latency on a single shared cache line
0 1 2 3 4 5 6 7
0
1 365±6
2 159±6 178±6
3 165±6 181±6 110±6
4 86±6 66±2 64±0 66±1
5 65±1 64±1 68±1 64±0 65±1
6 68±1 61±0 70±3 67±2 61±1 64±1
7 70±3 62±1 61±0 64±2 65±1 63±1 61±0
Min latency: 60.8ns ±0.3 cores: (7,2)
Max latency: 364.8ns ±79.5 cores: (1,0)
Mean latency: 93.0ns
% lscpu
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Vendor ID: Qualcomm
Model name: -
Model: 14
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Stepping: 0xd
CPU(s) scaling MHz: 100%
CPU max MHz: 1785.6000
CPU min MHz: 300.0000
BogoMIPS: 38.40
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
Model name: -
Model: 14
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 2
Stepping: 0xd
CPU(s) scaling MHz: 79%
CPU max MHz: 2956.8000
CPU min MHz: 710.4000
BogoMIPS: 38.40
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Vulnerable
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Branch predictor hardening
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
File: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz.csv
OS: Pop!_OS 22.04 LTS x86_64
Interesting to see the difference with the i7-1165G7
❯ core-to-core-latency 5000
CPU: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
Num cores: 8
Num iterations per samples: 5000
Num samples: 300
1) CAS latency on a single shared cache line
0 1 2 3 4 5 6 7
0
1 30±1
2 29±0 28±0
3 27±0 26±0 26±0
4 7±0 29±0 29±0 27±0
5 29±0 7±0 28±0 26±0 29±0
6 29±0 28±0 7±0 26±0 28±0 28±0
7 28±0 27±0 28±0 7±0 27±0 26±0 27±0
Min latency: 6.8ns ±0.0 cores: (5,1)
Max latency: 30.3ns ±0.6 cores: (1,0)
Mean latency: 24.8ns
❯ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Vendor ID: GenuineIntel
Model name: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
CPU family: 6
Model: 140
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
Stepping: 1
CPU max MHz: 4200,0000
CPU min MHz: 400,0000
BogoMIPS: 4838.40
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clf
lush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm c
onstant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc c
puid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est
tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt ts
c_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_
fault epb cat_l2 invpcid_single cdp_l2 ssbd ibrs ibpb stibp ibrs_enhanced t
pr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 s
mep bmi2 erms invpcid rdt_a avx512f avx512dq rdseed adx smap avx512ifma clf
lushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xge
tbv1 xsaves split_lock_detect dtherm ida arat pln pts hwp hwp_notify hwp_ac
t_window hwp_epp hwp_pkg_req avx512vbmi umip pku ospke avx512_vbmi2 gfni va
es vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid movdiri movd
ir64b fsrm avx512_vp2intersect md_clear ibt flush_l1d arch_capabilities
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 192 KiB (4 instances)
L1i: 128 KiB (4 instances)
L2: 5 MiB (4 instances)
L3: 8 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-7
Vulnerabilities:
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Retbleed: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling
Srbds: Not affected
Tsx async abort: Not affected
Testing on my work laptop. Dell Latitude 7410, 2020 model. CPU is Intel Core i5 10310U (10th Gen). 4 cores, 8 threads.
This is on stock clocks (no overclocking), CPU configured in BIOS to 15W TDP, with Base clock 1.7ghz, Boost at 2.2ghz.
Using NUM_ROUND_TRIPS
=2000 and NUM_SAMPLES
=1000
Note, Hyperthreading is turned on, so that makes the 4 core CPU appear as 8 cores in the table.
This also skews results, with very low latency between threads on the same core.
core-to-core-latency
utility compiled with rustc
stable v1.63.0
Linux OS, Ubuntu 22.04 with Xanmod kernel v5.17
Min latency: 7.2ns ±0.1 cores: (7,3)
Max latency: 21.6ns ±0.2 cores: (7,5)
Mean latency: 18.8ns
,,,,,,,
20.322991249999998,,,,,,,
20.035588249999996,20.2018165,,,,,,
21.16958825,21.369923000000004,21.335075750000005,,,,,
7.300013000000001,20.12066025,19.628703499999997,21.3523555,,,,
21.486786999999996,7.345331250000001,20.029967250000002,21.1629685,19.6607805,,,
19.914945249999995,20.491801999999996,7.382247000000001,21.119740499999995,19.79455025,20.268390750000002,,
20.49829675,21.15840975,21.356683000000004,7.20622025,20.7075525,21.636530999999994,20.944371750000002,
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 20
On-line CPU(s) list: 0-19
Vendor ID: GenuineIntel
Model name: 12th Gen Intel(R) Core(TM) i7-12700K
CPU family: 6
Model: 151
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 1
Stepping: 2
CPU(s) scaling MHz: 16%
CPU max MHz: 5100.0000
CPU min MHz: 800.0000
BogoMIPS: 7219.20
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts
rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt
tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l2 invpcid_single cdp_l2 ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpi
d ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdt_a rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect avx_vnni dtherm ida ara
t pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req umip pku ospke waitpkg gfni vaes vpclmulqdq tme rdpid movdiri movdir64b fsrm md_clear serialize pconfig arch_lbr flush_l1d arch_
capabilities
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 512 KiB (12 instances)
L1i: 512 KiB (12 instances)
L2: 12 MiB (9 instances)
L3: 25 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-19
Vulnerabilities:
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling
Srbds: Not affected
Tsx async abort: Not affected
Generated with core-to-core-latency 5000 --csv > output.csv
:
,,,,,,,,,,,,,,,,,,,
4.448893666666666,,,,,,,,,,,,,,,,,,,
33.860952000000005,32.43357566666667,,,,,,,,,,,,,,,,,,
32.87937533333333,32.879766999999994,4.438448000000001,,,,,,,,,,,,,,,,,
32.170397333333334,32.54281433333334,32.32563866666667,32.272031999999996,,,,,,,,,,,,,,,,
32.46923433333334,32.11467833333334,31.848267000000003,33.93661266666667,4.482093333333334,,,,,,,,,,,,,,,
33.456196999999996,32.22874366666667,31.7009,32.032530333333334,31.756795333333333,31.201316333333338,,,,,,,,,,,,,,
32.35605400000001,32.05808100000001,31.943181666666668,33.69383699999999,31.27857033333334,31.645629666666665,4.442795,,,,,,,,,,,,,
33.24668533333333,31.664736666666666,33.31742899999999,31.506772999999995,31.179102000000007,30.87865333333334,31.13465433333333,32.164469000000004,,,,,,,,,,,,
31.92064866666666,33.02878833333333,31.667552333333326,31.035089666666664,31.710445,31.191064666666662,30.997930000000004,31.036128333333327,4.364471999999999,,,,,,,,,,,
31.434928000000006,31.763595333333335,32.83059766666667,31.290861,30.833653,30.83297766666667,30.696909333333338,32.39623400000001,30.223562999999988,30.13578066666666,,,,,,,,,,
31.97571466666667,31.90774,33.306677,31.291537,31.119677333333335,31.879037999999994,31.13053133333333,31.883985666666668,30.648340666666662,30.808182666666653,4.403745000000002,,,,,,,,,
32.46575366666666,31.506656000000003,31.823121333333336,31.71178533333333,31.376972666666667,30.49211733333334,30.346120333333328,30.60045533333334,31.758410333333334,29.855509333333334,29.739554666666663,29.650747333333335,,,,,,,,
31.07530033333332,32.47009933333334,30.78935233333334,30.799008333333337,30.422065666666672,31.03479033333333,30.357547333333333,32.060738666666666,29.882707666666672,29.95993166666667,29.622224333333335,29.630103666666667,4.371233,,,,,,,
32.67154366666668,30.938582333333336,32.640933,31.03519099999999,30.750640333333333,32.216891333333336,30.55828266666666,30.889234666666663,29.889495333333336,30.420549999999995,29.957848333333338,31.04800033333335,29.14639733333334,29.954005,,,,,,
31.38270599999999,31.55658266666667,32.45051433333333,30.969073666666663,30.80595166666666,30.38428900000001,30.67182433333334,32.64246466666667,29.770204999999994,30.35542233333334,29.634460666666673,30.14862866666667,29.39097033333333,30.73453766666666,4.375556,,,,,
40.52135566666667,40.520891666666664,39.835361999999996,39.846249333333326,39.471687333333335,39.485864333333325,39.43216366666666,39.46887966666666,38.757887333333336,38.716461,38.18351466666667,38.22605066666668,37.67568699999999,37.66062766666668,37.534257,37.535288999999985,,,,
40.549529333333325,40.54266166666667,39.868224666666656,39.891868,39.48151633333334,39.47038166666667,39.42565033333332,39.425579000000006,38.76239500000001,38.748255666666665,38.16941866666666,38.177531,37.66541833333334,37.67394266666667,37.53232566666667,37.56635133333334,49.740120000000005,,,
40.59156066666667,40.58448599999999,39.906694333333334,39.89060466666667,39.597365999999994,39.53175166666668,39.46161433333334,39.46497733333332,38.74496533333333,38.74070933333334,38.463348,38.21367066666667,37.697702,37.702216666666665,37.559437,37.53439366666666,49.773393,49.74892533333333,,
40.54896733333333,40.544551,39.805924999999995,39.86888466666666,39.48403866666666,39.479945666666666,39.44833233333334,39.437237333333336,38.765966666666664,38.70971033333334,38.167391999999985,38.171963666666656,37.71786266666667,37.69896766666666,37.542722000000005,37.54445766666667,49.69533866666666,49.76201966666667,49.74855166666667,
Qualcomm Snapdragon 850 @ Wikichip
OS: Windows 11 ARM64
Num cores: 8
Num iterations per samples: 5000
Num samples: 300
1) CAS latency on a single shared cache line
0 1 2 3 4 5 6 7
0
1 73±0
2 73±0 73±2
3 73±0 70±0 70±0
4 66±0 65±1 67±3 64±0
5 69±2 64±0 64±0 64±0 62±4
6 68±1 64±0 66±1 64±0 60±2 58±0
7 67±1 64±0 64±0 64±0 58±1 58±0 58±0
Min latency: 57.6ns ±0.2 cores: (7,5)
Max latency: 73.3ns ±0.4 cores: (1,0)
Mean latency: 65.4ns
Hi
Thanks for sharing this cool project!
This is my output.csv
My processor: AMD Ryzen Threadripper 3960X, 3.80GHz, 24 Cores, Zen 2, 3rd Gen, 2019-Q4
Cheers
I happen to have access to one of these, so here's the results.
There are some BIOS options for enabling NUMA support, so I tested in both monolithic mode and 2-node mode. Not all of the memory channels are populated in this system, so that might be affecting things too. I also attempted to run this with little other CPU activity and with the CPUs locked to the maximum 3 GHz. I'm happy to re-run with any suggestions to get cleaner data.
Num cores: 16
Using RDTSC to measure time: false
Num round trips per samples: 5000
Num samples: 300
Showing latency=round-trip-time/2 in nanoseconds:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
1 5±5
2 21±6 21±6
3 21±6 21±6 5±5
4 26±6 21±6 21±6 16±6
5 21±6 21±6 21±6 21±6 10±6
6 21±6 16±6 21±6 21±6 21±6 21±6
7 21±6 26±6 21±6 21±6 21±6 21±6 11±6
8 21±6 21±6 21±6 21±6 16±6 26±6 21±6 21±6
9 26±6 16±6 21±6 21±6 21±6 21±6 21±6 21±6 5±5
10 21±6 21±6 21±6 21±6 21±6 21±6 21±6 16±6 21±6 21±6
11 21±6 21±6 16±6 26±6 16±6 21±6 26±6 21±6 21±6 21±6 5±5
12 21±6 21±6 21±6 21±6 21±6 16±6 21±6 16±6 16±6 21±6 16±6 26±6
13 21±6 21±6 21±6 21±6 21±6 21±6 21±6 21±6 21±6 21±6 21±6 21±6 5±5
14 21±6 21±6 21±6 21±6 21±6 21±6 15±6 26±6 21±6 21±6 21±6 16±6 26±6 16±6
15 26±6 21±6 21±6 21±6 21±6 16±6 15±6 21±6 26±6 16±6 26±6 21±6 21±6 21±6 10±6
Min latency: 5.0ns ±5.0 cores: (1,0)
Max latency: 26.3ns ±11.7 cores: (14,7)
Mean latency: 19.7ns
Awesome tool!
Here's some results from my local 5950X gaming machine. Note that this is in a "Gaming" motherboard that is running at an above-spec profile for XMP ram and ASUS's own interpretation of what turbo limits should be and so on, though that is going to be the more common configuration for a consumer 5950x anyway.
Generated with cargo run --release 5000 --csv > output.csv
against 5667d39
Edit: System wasn't fully idle, see updated results below.
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
20.746637333333332,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
17.95516,19.150424333333326,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
19.205262,20.200952999999995,18.05196666666667,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
18.81552233333333,18.895528000000002,16.830887,18.033071,,,,,,,,,,,,,,,,,,,,,,,,,,,,
19.417191,20.752288666666665,18.177500666666663,19.031043333333333,17.58916333333333,,,,,,,,,,,,,,,,,,,,,,,,,,,
16.571023333333336,17.833010000000005,16.013229999999997,17.00602,15.885443333333333,16.906523333333332,,,,,,,,,,,,,,,,,,,,,,,,,,
17.81345133333333,18.95349433333333,16.785600000000002,18.01863,17.45023333333333,18.41574,15.903416666666665,,,,,,,,,,,,,,,,,,,,,,,,,
88.629274,89.46750700000001,88.47289666666666,88.52971733333334,87.29821033333333,88.23525366666667,86.21302333333333,87.82864166666667,,,,,,,,,,,,,,,,,,,,,,,,
90.236141,90.604718,89.88050966666668,90.64481566666667,89.092327,89.84939266666667,87.79799733333336,88.88045499999998,21.255498999999997,,,,,,,,,,,,,,,,,,,,,,,
88.37895866666668,89.124369,87.497338,88.41031799999998,86.95864233333333,87.93970433333332,85.89954800000001,87.34923233333333,18.626683333333336,19.74302,,,,,,,,,,,,,,,,,,,,,,
89.82500699999999,90.92843799999999,89.51941733333334,90.31601666666666,88.74470800000002,89.52782566666667,87.52727433333334,88.871197,19.92714666666667,21.202158,18.986091,,,,,,,,,,,,,,,,,,,,,
87.56495400000001,87.94922733333333,86.80136333333334,87.57190166666666,86.23370733333333,86.66839300000001,85.052591,86.14248133333334,19.04361,19.486016,17.382287666666663,18.60184333333333,,,,,,,,,,,,,,,,,,,,
88.75318966666667,89.60360399999999,88.12901233333335,88.849098,87.31144933333334,88.22800866666665,86.31126166666668,87.82770533333333,20.28952333333333,21.994366333333335,18.901655000000005,19.879733333333338,18.304774,,,,,,,,,,,,,,,,,,,
88.25886666666666,88.75392666666666,87.44311766666667,87.98804466666668,86.38540666666665,87.10225166666666,85.53575066666666,87.25439033333332,17.215409666666666,18.748592333333338,17.086455,17.98148633333333,16.386489333333333,17.592460666666664,,,,,,,,,,,,,,,,,,
88.34333699999999,88.61499200000002,87.35472766666666,88.07306933333332,86.56914833333332,87.269176,85.45093000000001,86.73774433333334,18.413057000000002,19.803380999999998,17.311056666666666,18.838845,17.651304333333336,18.680102333333334,16.56342533333334,,,,,,,,,,,,,,,,,
8.135623333333333,20.914972999999996,17.996019999999998,19.61398366666667,18.628149999999998,19.64426666666667,16.58682,17.89741666666667,88.50308133333333,90.30354200000002,88.40881533333334,89.82690999999998,87.18189733333332,88.80151933333336,87.55691199999998,87.61868000000001,,,,,,,,,,,,,,,,
20.66380533333334,7.680173666666663,19.124143666666672,20.110519333333336,18.943326666666668,21.565603333333335,17.901589999999995,19.347223333333336,89.58727999999998,91.07968499999998,89.40243333333335,90.70070133333333,87.70071633333333,89.611324,88.25380833333335,88.34462133333334,20.603775333333335,,,,,,,,,,,,,,,
18.080306666666665,19.071713333333335,7.518696666666665,18.148323333333334,16.997403333333335,18.67397666666667,16.391066666666667,17.26971666666666,88.58310666666665,89.52712166666666,87.24168333333333,89.39893233333333,86.76557366666668,87.78318566666668,86.905513,86.95409999999998,18.054013333333334,19.33020666666667,,,,,,,,,,,,,,
19.378613333333337,20.214341333333333,18.349913333333333,7.669963333333335,18.082156666666666,19.073073333333333,17.13835,18.255914,88.59917733333333,90.43937166666666,88.52071366666665,90.37383366666667,87.58015033333335,88.60860333333333,87.861319,88.028025,19.22852733333334,20.572364999999998,18.213466666666665,,,,,,,,,,,,,
18.683343333333333,18.891360000000002,16.91178333333333,17.80600666666666,7.606443333333332,17.718283333333336,15.708236666666663,16.871040000000004,87.72090633333333,88.87007966666664,86.84735066666669,88.50880266666665,86.010351,87.31457700000001,86.50998833333333,86.38685533333332,18.422086666666672,19.018553333333337,16.750983333333334,17.84255666666667,,,,,,,,,,,,
19.478264666666664,20.991729,18.34968033333334,18.873189999999997,17.518586666666668,7.636756666666667,16.586679999999998,18.054416666666665,88.15052100000001,89.61023133333335,88.05208966666665,89.73411366666667,86.72028,88.35143699999998,87.09868300000001,87.47095766666666,19.476316666666666,21.055428,18.354740000000003,18.984423333333336,17.577963333333333,,,,,,,,,,,
16.616408333333336,18.002557000000003,16.148813333333337,16.885530000000003,15.78010333333333,16.849166666666665,7.689210000000001,15.868503333333331,86.08539733333333,87.69200466666666,85.91518433333333,87.68852600000001,84.96186233333334,86.27230533333334,85.20097366666667,85.332503,16.624296666666666,17.84425666666667,16.06343,16.93364333333334,15.863116666666665,16.76944666666667,,,,,,,,,,
17.792013333333337,19.099897333333335,16.799450000000004,18.023526666666665,16.911659999999998,17.986553333333333,15.902343333333334,7.683473333333333,87.41472133333333,88.99018833333334,87.33130533333336,88.96923799999999,86.15381066666667,87.65580333333335,86.590412,86.51487466666666,17.783086666666662,19.082946666666672,16.67944666666667,18.101937333333336,16.938609999999997,18.01944333333333,15.90172,,,,,,,,,
88.626124,89.75480766666668,88.36001466666667,88.55076700000001,87.61623466666668,88.42361133333335,86.1415056666667,87.53084100000001,7.865243666666668,21.577453666666663,18.652896666666667,20.004842,18.998436666666667,20.21155733333333,17.26804166666667,18.54585666666667,88.58275300000001,89.901078,89.15442466666664,90.21796633333332,88.76242500000001,88.84928266666667,86.72973300000001,88.03223033333335,,,,,,,,
89.75610099999999,90.182664,89.22831299999999,89.85464433333333,88.20781633333333,89.08983266666667,87.13099033333333,88.41566466666669,21.189794666666668,7.869080666666669,19.481530000000003,20.91038233333333,19.432104999999996,22.20602233333333,18.477409333333334,19.726836333333335,89.62603766666666,90.102721,88.955223,89.63583366666666,89.528678,91.548339,89.46222366666669,90.28096566666666,21.06603266666667,,,,,,,
89.90119466666668,89.276217,88.0799933333333,88.65868966666667,86.90821466666668,88.12455366666666,85.93880700000001,87.329734,18.654750333333336,19.949284666666667,7.895976666666666,19.25956366666667,17.44996,18.911446666666663,17.088046333333338,18.372567333333333,90.32958866666664,91.08044500000001,89.952824,90.13576400000001,87.073022,88.11534866666666,86.03755766666667,87.31764466666665,18.66860333333333,19.521799999999995,,,,,,
90.289882,91.26479766666667,89.85162133333336,90.64178066666665,88.79289799999998,89.93604333333333,87.89802666666668,89.252704,19.976176666666667,21.184666,19.060394,8.00042333333333,18.75364733333333,20.034585666666665,18.086728333333337,18.941735333333337,90.337897,91.17626133333334,89.85484533333333,90.55216766666668,89.00551833333334,89.97233299999999,87.91033800000001,89.21034066666667,19.992605666666666,20.920538,19.03683,,,,,
88.38918033333336,88.85554200000003,87.74929933333333,88.24700733333334,86.87659466666668,87.418892,85.72141633333335,86.90449166666667,19.11988566666666,19.616951999999998,17.403740000000006,18.762012666666667,7.918761666666665,18.24400833333333,16.272954666666664,17.488160666666662,87.98203799999999,88.43779433333333,87.33116766666666,88.28132766666667,86.99877299999999,87.56369666666664,85.82381466666665,86.87025633333333,19.087439333333336,19.485605999999994,17.37635,18.991784000000003,,,,
88.73916799999999,89.61924366666665,88.10670433333333,88.76513,87.289439,88.29084033333334,86.38348433333334,87.71372899999999,19.901970000000002,21.347285666666668,18.811267,19.797989,18.368139333333335,7.89797,17.299362000000002,18.596110333333332,88.76585266666667,89.59339366666666,87.87292599999999,88.94917133333334,87.27295866666667,88.27695500000002,86.29725566666666,87.62550733333335,19.973000333333335,21.264557333333332,18.786296666666672,19.833867,18.34920933333333,,,
86.96347433333335,87.50521766666668,86.26018499999999,86.99244666666667,85.48531,86.11396500000001,84.42193366666666,85.78931200000001,17.012946666666664,18.36992166666667,16.440426666666667,17.64922,16.221463666666665,17.176186666666666,8.013721666666669,16.399783333333332,86.77817633333332,87.28931200000002,85.96490933333335,87.11027433333334,85.69534666666665,86.06156466666665,84.23088533333335,85.73340233333333,17.021169999999998,18.272506666666665,16.486816666666662,17.62337,16.333246666666668,17.156689999999994,,
102.83695966666664,102.948447,103.00740833333332,105.284744,101.24415266666665,103.91533466666665,101.23376766666667,102.02340333333335,21.394265666666666,25.420931666666668,20.182791333333334,21.768440666666667,23.02316766666667,22.268760666666665,19.98306766666667,11.484821333333333,102.27970699999999,103.29158200000003,102.51779333333332,103.76887466666666,100.78950133333333,99.98749333333336,98.40147966666666,99.353674,22.787499333333333,21.839080999999997,19.345207666666667,21.226416000000004,21.840715,21.198277,18.865716999999997,
Hi @nviennot and the community!
I have a request for the community - While I don't have access to one, I am particularly interested in the Apple M1 Ultra. Apple made a pretty big fuss about the transparency of their die-to-die interconnect in the M1 Ultra. So, it would be very interesting to see what happens with it - both on Mac OS and (say, Asahi) Linux.
@nviennot Awesome project mate!
Best,
S.
Hello, I'm having errors when trying to build it on my machine, using Rust 1.62.1 on Gentoo.
error[E0658]: use of unstable library feature 'scoped_threads'
--> src/main.rs:36:5
|
36 | std::thread::scope(|s| {
| ^^^^^^^^^^^^^^^^^^
|
= note: see issue #93203 <https://github.com/rust-lang/rust/issues/93203> for more information
error[E0658]: use of unstable library feature 'scoped_threads'
--> src/main.rs:38:11
|
38 | s.spawn(|| {
| ^^^^^
|
= note: see issue #93203 <https://github.com/rust-lang/rust/issues/93203> for more information
error[E0658]: use of unstable library feature 'scoped_threads'
--> src/main.rs:50:11
|
50 | s.spawn(|| {
| ^^^^^
|
= note: see issue #93203 <https://github.com/rust-lang/rust/issues/93203> for more information
error[E0658]: use of unstable library feature 'scoped_threads'
--> src/main.rs:68:10
|
68 | .join()
| ^^^^
|
= note: see issue #93203 <https://github.com/rust-lang/rust/issues/93203> for more information
For more information about this error, try `rustc --explain E0658`.
error: could not compile `core-to-core-latency` due to 4 previous errors
Seems like it uses a feature that only starts becoming available in Rust 1.63, so I suppose this could be stated explicitly in the readme file?
The utility needs a way to detect if a CoreId
is on the same physical core as another CoreId
, (this indicates a form of hyperthreading or SMT is in use), and filter out those pairs from the benchmark.
You could also build up a table of pairs, and remove all the duplicates, so that on a 8-core CPU, with hyperthreading, only 8 physical cores are included in the benchmark.
Should i use another number for round trips or is this okay?
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Vendor ID: GenuineIntel
Model name: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
CPU family: 6
Model: 140
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
Stepping: 1
CPU(s) scaling MHz: 33%
CPU max MHz: 4700.0000
CPU min MHz: 400.0000
BogoMIPS: 5608.00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bt
s rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt
tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l2 invpcid_single cdp_l2 ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept v
pid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsav
ec xgetbv1 xsaves split_lock_detect dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx5
12_bitalg avx512_vpopcntdq rdpid movdiri movdir64b fsrm avx512_vp2intersect md_clear ibt flush_l1d arch_capabilities
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 192 KiB (4 instances)
L1i: 128 KiB (4 instances)
L2: 5 MiB (4 instances)
L3: 12 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-7
Vulnerabilities:
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Retbleed: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Srbds: Not affected
Tsx async abort: Not affected
core-to-core-latency 5000 --csv
,,,,,,,
29.155970666666665,,,,,,,
28.404578333333333,27.833243000000003,,,,,,
27.027639666666666,26.381477666666665,26.02073733333334,,,,,
5.981889,29.43731433333333,28.25665566666667,26.858948666666667,,,,
28.969351333333336,6.135179000000002,28.145669666666667,26.589022333333332,28.75294266666667,,,
28.399731666666668,27.951117,5.73669,26.609733666666664,29.34837233333333,29.074402666666664,,
27.505726000000003,26.453945333333333,25.88870033333333,5.732708666666666,26.649091333333338,26.39043066666666,25.832449,
I have install jupyterlab on windows 11 and try to run the results.ipynb with:
jupyter-lab results.ipynb
and then add a cell:
cpu = "Intel Core i7-11700K @ 8 Cores (Rocket Lake, 11th gen)"
fname = "D:/backup/benchmark/core_to_core_latency/Intel Core i7-11700K.csv"
m = load_data(fname)
n1=8
m = load_data(fname)
n = 8
show_heapmap(m[::2,::2], title=cpu)
show_heapmap(np.diag(m[::2,1::2]).reshape((1,n)), yticks=False, figsize=(3.5, 4),
title=cpu, subtitle="Hyper-thread same-core latency")
run the cell above, I got error message:
`---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas_libs\parsers.pyx:1083, in pandas._libs.parsers.TextReader._convert_tokens()
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas_libs\parsers.pyx:1233, in pandas._libs.parsers.TextReader._convert_with_dtype()
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas_libs\parsers.pyx:1246, in pandas._libs.parsers.TextReader._string_convert()
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas_libs\parsers.pyx:1444, in pandas._libs.parsers._string_box_utf8()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
During handling of the above exception, another exception occurred:
UnicodeDecodeError Traceback (most recent call last)
Cell In [12], line 3
1 cpu = "Intel Core i7-11700K @ 8 Cores (Rocket Lake, 11th gen)"
2 fname = "D:/backup/benchmark/core_to_core_latency/Intel Core i7-11700K.csv"
----> 3 m = load_data(fname)
5 n1=8
7 m = load_data(fname)
Cell In [5], line 2, in load_data(filename)
1 def load_data(filename):
----> 2 m = np.array(pd.read_csv(filename, header=None))
3 return np.tril(m) + np.tril(m).transpose()
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\util_decorators.py:211, in deprecate_kwarg.._deprecate_kwarg..wrapper(*args, **kwargs)
209 else:
210 kwargs[new_arg_name] = new_arg_value
--> 211 return func(*args, **kwargs)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\util_decorators.py:317, in deprecate_nonkeyword_arguments..decorate..wrapper(*args, **kwargs)
311 if len(args) > num_allow_args:
312 warnings.warn(
313 msg.format(arguments=arguments),
314 FutureWarning,
315 stacklevel=find_stack_level(inspect.currentframe()),
316 )
--> 317 return func(*args, **kwargs)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:950, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
935 kwds_defaults = _refine_defaults_read(
936 dialect,
937 delimiter,
(...)
946 defaults={"delimiter": ","},
947 )
948 kwds.update(kwds_defaults)
--> 950 return _read(filepath_or_buffer, kwds)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:611, in _read(filepath_or_buffer, kwds)
608 return parser
610 with parser:
--> 611 return parser.read(nrows)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:1772, in TextFileReader.read(self, nrows)
1765 nrows = validate_integer("nrows", nrows)
1766 try:
1767 # error: "ParserBase" has no attribute "read"
1768 (
1769 index,
1770 columns,
1771 col_dict,
-> 1772 ) = self._engine.read( # type: ignore[attr-defined]
1773 nrows
1774 )
1775 except Exception:
1776 self.close()
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py:243, in CParserWrapper.read(self, nrows)
241 try:
242 if self.low_memory:
--> 243 chunks = self._reader.read_low_memory(nrows)
244 # destructive to chunks
245 data = _concatenate_chunks(chunks)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas_libs\parsers.pyx:808, in pandas._libs.parsers.TextReader.read_low_memory()
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas_libs\parsers.pyx:890, in pandas._libs.parsers.TextReader._read_rows()
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas_libs\parsers.pyx:1037, in pandas._libs.parsers.TextReader._convert_column_data()
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas_libs\parsers.pyx:1090, in pandas._libs.parsers.TextReader._convert_tokens()
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas_libs\parsers.pyx:1233, in pandas._libs.parsers.TextReader._convert_with_dtype()
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas_libs\parsers.pyx:1246, in pandas._libs.parsers.TextReader._string_convert()
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas_libs\parsers.pyx:1444, in pandas._libs.parsers._string_box_utf8()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte`
how can I fix this problem?
3 Different tests on my Ryzen 5700X Zen3.
This is on stock clocks (no overclocking) with Base clock 3.4ghz, Boost at 4.6ghz, Infinity Fabric clock set to stock speed of 1600mhz.
Using NUM_ROUND_TRIPS
=2000 and NUM_SAMPLES
=1000
Note, Hyperthreading is turned on, so that makes the 8 core CPU appear as 16 cores in the table.
This also skews results, with very low latency between threads on the same core.
core-to-core-latency
utility compiled with rustc
stable v1.63.0
Linux OS, Manjaro v21.3.0 with Xanmod kernel v5.18.11
Min latency: 16.5ns ±0.0 cores: (8,0)
Max latency: 43.3ns ±0.0 cores: (4,2)
Mean latency: 36.5ns
,,,,,,,,,,,,,,,
38.94162999999999,,,,,,,,,,,,,,,
41.445865000000005,38.75842,,,,,,,,,,,,,,
37.654669999999996,36.285954999999994,39.00216499999999,,,,,,,,,,,,,
40.60070999999999,38.476715,43.28821000000001,38.5983625,,,,,,,,,,,,
35.90454999999999,34.57694750000001,36.755255000000005,33.419462499999995,35.9783375,,,,,,,,,,,
39.921434999999995,38.485744999999994,41.11958,37.1455825,40.81456,35.4262125,,,,,,,,,,
37.8734525,36.426179999999995,39.09775,35.9705825,38.7956125,33.491725,37.12957000000001,,,,,,,,,
16.54401,38.0780725,41.1588875,37.803545,41.124052500000005,35.585494999999995,40.110302499999996,37.912705,,,,,,,,
38.2953475,16.561055,38.81158,36.51018500000001,39.515815,34.37252750000001,38.33106999999999,36.546352500000005,38.6430975,,,,,,,
41.452059999999996,38.534932500000004,16.548502499999998,39.108507499999995,43.07939,36.490660000000005,40.9334775,39.0969925,41.51021749999999,38.52144,,,,,,
37.68707499999999,36.3865925,39.01529500000001,16.551325,38.74605499999999,33.394994999999994,37.100899999999996,35.9515225,37.904044999999996,36.465835,39.0266975,,,,,
40.35999,39.3040125,43.2097,38.626375,16.548612500000004,35.810015,40.625955000000005,38.716497499999996,40.7629775,39.1218275,43.24042,38.6535425,,,,
35.83899749999999,34.62856250000001,36.576840000000004,33.668982500000006,36.157515,16.638612500000004,35.7362625,33.82701,35.97954500000001,34.640100000000004,36.6114775,33.479617499999996,36.09784,,,
40.0429525,38.4250625,41.089865,37.228919999999995,40.8252675,35.440090000000005,16.5945025,37.38737,40.2356475,38.6216675,41.0849,37.128989999999995,40.92366750000001,35.48053,,
37.71517,36.37655,39.1379,35.8036725,38.76224499999999,33.47431250000002,37.087977499999994,16.619805000000003,37.758815000000006,36.538152499999995,39.19572249999999,35.97026749999999,38.68845999999999,33.425552499999995,37.09912250000001,
Min latency: 7.8ns ±0.0 cores: (9,1)
Max latency: 20.5ns ±0.0 cores: (8,1)
Mean latency: 17.4ns
,,,,,,,,,,,,,,,
20.459379999999996,,,,,,,,,,,,,,,
20.151615000000003,19.43073,,,,,,,,,,,,,,
19.591427500000005,18.833927499999998,18.788295,,,,,,,,,,,,,
19.8105175,18.573265000000003,19.281905000000005,18.138165,,,,,,,,,,,,
18.637205,17.710862500000005,18.0125625,17.11686,17.600120000000004,,,,,,,,,,,
19.327934999999997,18.0839125,18.464625,17.656652500000003,18.0617075,17.007017499999996,,,,,,,,,,
17.151672499999997,16.54403,17.033324999999998,16.394375,16.866457500000003,15.778995,16.21925,,,,,,,,,
7.874110000000001,20.513199999999998,20.107579999999995,19.501192500000005,19.6839275,18.51729,19.245302499999998,17.151429999999994,,,,,,,,
20.479067499999992,7.81701,19.432927500000005,18.8541175,18.542575000000003,17.640897499999998,17.9491725,16.5008575,20.470280000000002,,,,,,,
20.1350425,19.4263,7.827612500000001,18.7435875,19.207349999999998,18.051260000000003,18.34011,16.900432500000008,20.159455,19.3894525,,,,,,
19.546545,18.807790000000004,18.743455,7.822079999999999,18.0795,17.2237525,17.51807,16.3006825,19.551370000000002,18.7584325,18.744190000000003,,,,,
19.67629,18.629735,19.3296175,18.16746,7.829224999999999,17.5601875,17.9947575,16.80911,19.7056325,18.565645,19.30156,18.156792499999998,,,,
18.648445,17.69705,18.000754999999998,17.2928975,17.6389775,7.854035,16.991190000000003,15.915975,18.649770000000004,17.781602499999998,18.06547,17.29243,17.6584175,,,
19.325512500000002,18.094185000000007,18.4232275,17.6899525,18.06692,17.0911575,7.8533100000000005,16.290292499999996,19.366527499999997,18.149015,18.450377500000002,17.720377499999998,18.098385,17.084025,,
17.145695,16.462225,17.0127925,16.385122499999998,16.855559999999997,15.7612775,16.211647499999998,7.8460175,17.062982500000004,16.448357499999997,16.965549999999997,16.344607500000002,16.8657125,15.827535000000003,16.2285225,
gamemoderun ./core-to-core-latency 2000 1000 csv
Min latency: 7.8ns ±0.0 cores: (10,2)
Max latency: 20.8ns ±0.0 cores: (15,13)
Mean latency: 17.3ns
,,,,,,,,,,,,,,,
17.600893499999998,,,,,,,,,,,,,,,
15.85733925,16.816972250000003,,,,,,,,,,,,,,
17.809067249999995,18.99050725,16.8144585,,,,,,,,,,,,,
16.97738375,17.590404000000003,15.78064575,17.803901250000003,,,,,,,,,,,,
18.4700395,19.20257775,17.200833249999995,20.24673525,18.46440675,,,,,,,,,,,
17.192964000000003,18.16497175,16.309734,18.205039250000002,17.22132675,19.65153225,,,,,,,,,,
18.32183675,18.70486675,16.988667500000002,19.252841249999996,18.310112,20.765589,18.689666,,,,,,,,,
7.8816394999999995,17.553100999999998,15.76541525,17.8215125,16.941161000000005,18.591475749999997,17.166933750000002,18.393159999999998,,,,,,,,
17.63147825,7.830509749999998,16.676295749999998,19.030214250000004,17.6043965,19.220339750000004,18.1512765,18.690402249999998,17.560081,,,,,,,
15.814207999999999,16.8057155,7.809531999999999,16.93249975,15.788254999999998,17.164388250000005,16.31296625,17.005254249999997,15.795847749999998,16.76492425,,,,,,
17.781331999999995,18.97124075,16.828719250000002,7.820662750000001,17.801371,20.1839775,18.3034065,19.105602249999997,17.803947500000003,19.026802749999998,16.793632249999995,,,,,
16.919735999999997,17.57033475,15.772727000000003,17.827738749999998,7.827417500000001,18.638468999999997,17.158292000000003,18.331349999999997,16.908394249999997,17.6113075,15.84654075,17.906657749999997,,,,
18.544481500000003,19.199010750000003,17.195166500000006,20.284174,18.5135285,7.861416499999999,19.738359499999998,20.815093,18.5340225,19.242063749999996,17.258336000000003,20.3310165,18.58019725,,,
17.234350749999997,18.16456475,16.262634000000002,18.01444675,17.20777725,19.66889975,7.8356882500000005,18.7000925,17.321829749999996,18.157744750000003,16.2521525,18.01282875,17.239127749999998,19.71260175,,
18.274914499999998,18.769644999999997,16.918240750000006,19.159466499999997,18.249401749999997,20.684715749999995,18.667917000000003,7.8618025,18.309467749999996,18.7012245,16.87986125,19.151705999999997,18.307926499999997,20.819127,18.673902249999998,
Here is the output:
$ core-to-core-latency 5000 --csv > core-to-core-latency.csv
Num cores: 8
Using RDTSC to measure time: true
Num round trips per samples: 5000
Num samples: 300
Showing latency=round-trip-time/2 in nanoseconds:
0 1 2 3 4 5 6 7
0
1 30±1
2 28±0 24±0
3 26±0 25±0 26±0
4 9±0 26±0 27±0 26±0
5 26±0 9±0 25±0 25±0 25±0
6 28±0 26±0 9±0 26±0 26±0 27±0
7 27±0 25±0 24±0 9±0 25±0 27±0 26±0
Min latency: 8.6ns ±0.1 cores: (7,3)
Max latency: 30.1ns ±0.9 cores: (1,0)
Mean latency: 23.6ns
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 43 bits physical, 48 bits virtual
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 8
Model name: AMD Ryzen 7 2700X Eight-Core Processor
Stepping: 2
Frequency boost: enabled
CPU MHz: 2195.137
CPU max MHz: 3700.0000
CPU min MHz: 2200.0000
BogoMIPS: 7384.93
Virtualization: AMD-V
L1d cache: 256 KiB
L1i cache: 512 KiB
L2 cache: 4 MiB
L3 cache: 16 MiB
NUMA node0 CPU(s): 0-15
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc c
puid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dno
wprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate sme ssbd sev ibpb vmmcall sev_es fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflush
opt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_v
mload vgif overflow_recov succor smca
cargo r --release -- --csv 30000 1000
,,,,,,,,,,,,,,,
9.695149833333334,,,,,,,,,,,,,,,
23.5928803,23.522562833333335,,,,,,,,,,,,,,
24.05091153333333,23.41355246666667,9.635870583333332,,,,,,,,,,,,,
23.04669911666667,23.11287251666667,22.83039543333334,22.687446533333336,,,,,,,,,,,,
23.118875133333336,22.91969858333334,23.291542083333336,23.44123448333333,9.963011783333332,,,,,,,,,,,
25.293146316666668,25.01142891666666,25.27851326666666,25.578214900000003,25.791719716666666,23.188006216666665,,,,,,,,,,
24.118877516666668,23.827831250000003,23.766165516666664,23.481585849999995,22.738973916666662,22.810733250000002,9.770995033333335,,,,,,,,,
90.76126398333331,90.90688371666667,91.36600886666668,90.99760793333333,91.35973206666667,91.14332741666664,90.08372029999998,90.3882733,,,,,,,,
90.81394998333334,90.30204446666667,91.57619961666666,91.26440443333334,91.59661055000001,91.33303526666666,91.47008086666669,90.69783071666666,9.596331099999999,,,,,,,
91.58755506666665,91.07018534999999,92.31877234999999,91.9414395,92.16446978333335,92.16232756666665,92.04761993333332,91.84244329999999,23.66573896666666,23.626538566666667,,,,,,
91.65613635,91.41319086666667,91.73456071666669,92.38920034999998,92.07018608333334,91.9995428,92.06397171666667,92.00890575,23.526067833333336,23.491458083333335,9.680927033333333,,,,,
91.90827115,93.55635768333332,94.01794648333333,93.30121599999998,93.96605776666668,93.16514586666666,92.72822296666666,92.57134078333333,23.817887433333336,24.07579078333334,23.059072666666665,23.28516146666667,,,,
93.21118691666666,93.39425756666667,94.26788344999999,93.63034661666667,93.17328893333335,92.74321846666668,93.04481295000001,92.10396916666664,23.057428416666667,23.056802699999995,22.940711333333333,22.942693000000006,9.672013783333332,,,
91.90377184999998,91.82206086666666,92.21777709999999,92.10603530000003,92.54114394999999,94.01709023333333,93.94803913333331,93.36719343333337,24.838541133333333,25.00632791666667,23.857060783333328,23.5309576,22.95523396666667,23.13108455,,
91.99524846666667,91.73730216666667,92.42141029999999,92.35337893333333,93.06561018333333,92.75056995000001,93.21789880000003,93.37495071666666,24.127901299999998,24.433506933333334,23.65281048333333,23.45899496666667,23.226585366666672,23.02333983333333,9.742951099999999,
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.