Comments (5)
-
disable thp, no such issue
echo never > /sys/kernel/mm/transparent_hugepage/enabled -
I'm able to reproduce much more reliably by setting
/sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs to 10 -
add tlbi to cmdq, no such issue
kernel:
https://github.com/Linaro/linux-kernel-uadk/tree/5.11-pthread
Linaro/linux-kernel-uadk@4a9f36f
"echo echo Y > /sys/kernel/debug/sva_force_inval"
Jean:
And when forcing TLB invalidations on the command queue (in addition to
DVM, by setting sva_force_inval to Y with the attached patch) the problem
disappears. So I think, either the command queue TLBI adds such an
overhead that it masks whatever races causes the issue, or the hardware
doesn't handle the TLBI from khugepaged properly (it should be a TLBI
ASIDE1IS, since we go through __flush_tlb_range() with a huge page, which
goes to flush_tlb_mm().
Another thing, when building a kernel in parallel to the openssl command,
I see a lot of "internal compiler" failures in the build, looks like
memory corruption. This seems to confirm the stale TLB hypothesis: because
the SMMU doesn't invalidate the TLB properly, DMA writes to old pages that
have been reallocated for the build.
from uadk.
test with uadk & build kernel
kernel:
make clean; make -j4
uadk:
for((i=0; i<100; i++))
do
test_hisi_hpre rsa-sgn --mode=crt --perf --trd_mode=async --seconds=10 -t 2
echo $i
done
build kenrel:
./arch/arm64/include/asm/rwonce.h:72:0: internal compiler error: Segmentation fault
./arch/arm64/include/asm/rwonce.h:72:0: internal compiler error: Aborted
Please submit a full bug report,
with preprocessed source if appropriate.
See file:///usr/share/doc/gcc-5/README.Bugs for instructions.
gcc: internal compiler error: Segmentation fault (program as)
Please submit a full bug report,
with preprocessed source if appropriate.
See file:///usr/share/doc/gcc-5/README.Bugs for instructions.
scripts/Makefile.build:279: recipe for target 'kernel/rcu/update.o' failed
make[2]: *** [kernel/rcu/update.o] Error 4
make[2]: *** Deleting file 'kernel/rcu/update.o'
make[2]: *** Waiting for unfinished jobs....
CC kernel/irq/generic-chip.o
kernel/irq/devres.c:283:1: internal compiler error: Segmentation fault
uadk:
performance test did not verify output! *** Error in `test_hisi_hpre': double free or corruption (!prev): 0x0000ffff7c021fd0 *** ./uadk.sh: line 16: 12579 Aborted (core dumped) test_hisi_hpre rsa-sgn --mode=crt --perf --trd_mode=async --seconds=10 -t 2
from uadk.
Shameer reported same issue on board without dvm on Jul 25, 2020
Currently the issue is happen on board with dvm, but requires multi-thread test.
Same phenomenon and can use same workaround.
copy from Shameer earlier email:
Issue:
While running test_sva_perf on a D06 board, zip dev reports
random "Hardware Error" and results in app/zip hang.
Kernel: https://github.com/Linaro/linux-kernel-warpdrive.git uacce-devel-5.8
warpdrive: master
Test Script:
#!/bin/sh
a=0
evt=0x80
while [ "$a" -lt 14 ]
do
echo $evt
./perf stat -e smmuv3_pmcg_140020/event=$evt/ ./test_sva_perf -s 2048000 -l 75000 -c 50 -v
a=$(($a+1))
evt=$(($evt+0x1))
done
[ 273.447475] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[ 273.464868] {2}[Hardware Error]: event severity: recoverable
[ 273.476773] {2}[Hardware Error]: Error 0, type: recoverable
[ 273.488670] {2}[Hardware Error]: section_type: PCIe error
[ 273.500387] {2}[Hardware Error]: version: 4.0
[ 273.509915] {2}[Hardware Error]: command: 0x0006, status: 0x0010
[ 273.522907] {2}[Hardware Error]: device_id: 0000:75:00.0
[ 273.534437] {2}[Hardware Error]: slot: 0
[ 273.543046] {2}[Hardware Error]: secondary_bus: 0x00
[ 273.553845] {2}[Hardware Error]: vendor_id: 0x19e5, device_id: 0xa250
[ 273.567751] {2}[Hardware Error]: class_code: 120000
[ 252.733975] hisi_zip 0000:75:00.0: AER: aer_status: 0x00000000, aer_mask: 0x00000000
[ 252.753282] hisi_zip 0000:75:00.0: AER: aer_layer=Transaction Layer, aer_agent=Receiver ID
[ 252.777791] hisi_zip 0000:75:00.0: AER: aer_uncor_severity: 0x00000000
[ 252.825195] hisi_zip 0000:75:00.0: qm_acc_wb_not_ready_timeout [error status=0x40] found
[ 252.842205] hisi_zip 0000:75:00.0: zip_pre_in_data_err [error status=0x80] found
[ 252.857767] hisi_zip 0000:75:00.0: zip_com_inf_err [error status=0x100] found
[ 252.857769] hisi_zip 0000:75:00.0: zip_enc_inf_err [error status=0x200] found
[ 252.887741] hisi_zip 0000:75:00.0: zip_pre_out_err [error status=0x400] found
Debugging shows that this beahviour correlates with large number of io page faults.
Normally the above test reports iopfs in the range of 100s but when this error
happens it goes up to millions.
Also this was never reproduced on another D06 board which runs a BIOS that
enables DVM(Distributed Virtual Memory). This was kind of telling us that the
issue is probably related to SMMU tlb invalidations.
Further debugging/code review revealed that current SMMUv3 SVA code makes
it mandatory that SVA feature can only be supported if SMMUv3 has BTM
(Broadcast TLB maintenance) feature. And it looks like the assumption is
that BTM support means, DVM is also enabled (Need to verify this assumption
is always true). But on our D06 board, even though SMMU reports BTM support,
DVM is only enabled with a special BIOS.
Based on the above criteria(ie, BTM means DVM is enabled for SVA), at present
in the mm notifier -->invalidate_range() code path, it only does ATC invalidations
and there is no SMMU tlb invalidations. This will break on non-DVM
platforms as we need explicit tlbi invalidation here.
With the below quick fix, I am not seeing any Hardware Error now
(completed around 100 iterations of test_sva_perf runs) on my setup with
the above test script.
@@ -3697,6 +3699,10 @@ static void arm_smmu_mm_invalidate_range(struct mmu_notifier *mn,
{
struct arm_smmu_mmu_notifier *smmu_mn = mn_to_smmu(mn);
-
arm_smmu_tlb_inv_range(start, end - start + 1,
-
PAGE_SIZE, false, smmu_mn->domain);
- arm_smmu_atc_inv_domain(smmu_mn->domain, mm->pasid, start,
end - start + 1);
trace_smmu_mm_invalidate(mm->pasid, start, end);
from uadk.
Test with light -weight job, no io page fault, but data is not correct
- thp scan more frequently
echo 10 > /sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs
linaro@ubuntu:~$ sudo openssl speed -engine uadk -seconds 1 rsa2048
[sudo] password for linaro:
engine "uadk" set.
hisi sec init Kunpeng920!
Doing 2048 bits private rsa's for 1s: 3191 2048 bits private RSA's in 0.50s
Doing 2048 bits public rsa's for 1s: RSA verify failure
281473395044352:error:0407008A:rsa routines:RSA_padding_check_PKCS1_type_1:invalid padding:crypto/rsa/rsa_pk1.c:67:
-1 2048 bits public RSA's in 0.29s
OpenSSL 1.1.1a 20 Nov 2018
built on: Fri Mar 5 06:08:16 2021 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) idea(int) blowfish(ptr)
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG
sign verify sign/s verify/s
rsa 2048 bits 0.000157s -0.290000s 6382.0 -3.4
- add workaround data is correct
root@ubuntu:/sys/kernel/debug# echo 1 > sva_force_inval
linaro@ubuntu:~$ sudo openssl speed -engine uadk -seconds 1 rsa2048
engine "uadk" set.
hisi sec init Kunpeng920!
Doing 2048 bits private rsa's for 1s: 3186 2048 bits private RSA's in 0.49s
Doing 2048 bits public rsa's for 1s: 45560 2048 bits public RSA's in 0.91s
OpenSSL 1.1.1a 20 Nov 2018
built on: Fri Mar 5 06:08:16 2021 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) idea(int) blowfish(ptr)
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG
sign verify sign/s verify/s
rsa 2048 bits 0.000154s 0.000020s 6502.0 50065.9
from uadk.
Have found the reason, hpre is connected to another smmu, whose dvm is
not enabled by the bios :(.
sudo busybox devmem 0x2001c0030 32
0x1 // is error
0x9 // is correct
With the updated bios, have passed stress test
from uadk.
Related Issues (20)
- uadk cipher perfmance is much worse than warpdrive perfmance when using multi-queue by test. HOT 1
- kae on uadk v1 still has issue HOT 2
- zip_sva_perf run hw file decompress failed HOT 1
- digest stream mode not supported
- Potential error(e.g. deadlock, ill performance) due to the unreleased lock HOT 4
- UADK multi-queue scheduling problem HOT 2
- UADK multi-threaded performance issues HOT 2
- uadk can not support nginx since cipher and digest can not re-enter HOT 1
- Can you show the version number in the code? HOT 5
- io page fault
- remove fmin in wd_util.c HOT 1
- libcrypto_wd.so should be libcrypto.so? HOT 1
- apt-get install libnuma-dev has issue after numa source code make install HOT 1
- compilation failure with GCC 12 HOT 1
- 虚拟机跑SEC业务概率出错 HOT 2
- In guest system, init2 fails since wd_get_usable_list failed to get usable devices(-19)! HOT 2
- cipher modes supported by v1 not supported by UADK HOT 1
- 使用uadk计算hash多次写入数据设置opdata.has_next = true,提示需64位对齐,请问要怎么对齐
- fail in hw_ifl_perf() with return code EINVAL HOT 1
- Failed to run sanity_test.sh on openEuler 22.04
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from uadk.