Code Monkey home page Code Monkey logo

Comments (5)

gaozhangfei avatar gaozhangfei commented on June 24, 2024
  1. disable thp, no such issue
    echo never > /sys/kernel/mm/transparent_hugepage/enabled

  2. I'm able to reproduce much more reliably by setting
    /sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs to 10

  3. add tlbi to cmdq, no such issue
    kernel:
    https://github.com/Linaro/linux-kernel-uadk/tree/5.11-pthread
    Linaro/linux-kernel-uadk@4a9f36f

"echo echo Y > /sys/kernel/debug/sva_force_inval"

Jean:
And when forcing TLB invalidations on the command queue (in addition to
DVM, by setting sva_force_inval to Y with the attached patch) the problem
disappears. So I think, either the command queue TLBI adds such an
overhead that it masks whatever races causes the issue, or the hardware
doesn't handle the TLBI from khugepaged properly (it should be a TLBI
ASIDE1IS, since we go through __flush_tlb_range() with a huge page, which
goes to flush_tlb_mm().

Another thing, when building a kernel in parallel to the openssl command,
I see a lot of "internal compiler" failures in the build, looks like
memory corruption. This seems to confirm the stale TLB hypothesis: because
the SMMU doesn't invalidate the TLB properly, DMA writes to old pages that
have been reallocated for the build.

from uadk.

gaozhangfei avatar gaozhangfei commented on June 24, 2024

test with uadk & build kernel
kernel:
make clean; make -j4

uadk:
for((i=0; i<100; i++))
do
test_hisi_hpre rsa-sgn --mode=crt --perf --trd_mode=async --seconds=10 -t 2
echo $i
done

build kenrel:
./arch/arm64/include/asm/rwonce.h:72:0: internal compiler error: Segmentation fault
./arch/arm64/include/asm/rwonce.h:72:0: internal compiler error: Aborted
Please submit a full bug report,
with preprocessed source if appropriate.
See file:///usr/share/doc/gcc-5/README.Bugs for instructions.
gcc: internal compiler error: Segmentation fault (program as)
Please submit a full bug report,
with preprocessed source if appropriate.
See file:///usr/share/doc/gcc-5/README.Bugs for instructions.
scripts/Makefile.build:279: recipe for target 'kernel/rcu/update.o' failed
make[2]: *** [kernel/rcu/update.o] Error 4
make[2]: *** Deleting file 'kernel/rcu/update.o'
make[2]: *** Waiting for unfinished jobs....
CC kernel/irq/generic-chip.o
kernel/irq/devres.c:283:1: internal compiler error: Segmentation fault

uadk:
performance test did not verify output! *** Error in `test_hisi_hpre': double free or corruption (!prev): 0x0000ffff7c021fd0 *** ./uadk.sh: line 16: 12579 Aborted (core dumped) test_hisi_hpre rsa-sgn --mode=crt --perf --trd_mode=async --seconds=10 -t 2

from uadk.

gaozhangfei avatar gaozhangfei commented on June 24, 2024

Shameer reported same issue on board without dvm on Jul 25, 2020
Currently the issue is happen on board with dvm, but requires multi-thread test.
Same phenomenon and can use same workaround.

copy from Shameer earlier email:
Issue:

While running test_sva_perf on a D06 board, zip dev reports
random "Hardware Error" and results in app/zip hang.

Kernel: https://github.com/Linaro/linux-kernel-warpdrive.git uacce-devel-5.8
warpdrive: master

Test Script:

#!/bin/sh
a=0
evt=0x80
while [ "$a" -lt 14 ]
do
echo $evt
./perf stat -e smmuv3_pmcg_140020/event=$evt/ ./test_sva_perf -s 2048000 -l 75000 -c 50 -v
a=$(($a+1))
evt=$(($evt+0x1))
done

[ 273.447475] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[ 273.464868] {2}[Hardware Error]: event severity: recoverable
[ 273.476773] {2}[Hardware Error]: Error 0, type: recoverable
[ 273.488670] {2}[Hardware Error]: section_type: PCIe error
[ 273.500387] {2}[Hardware Error]: version: 4.0
[ 273.509915] {2}[Hardware Error]: command: 0x0006, status: 0x0010
[ 273.522907] {2}[Hardware Error]: device_id: 0000:75:00.0
[ 273.534437] {2}[Hardware Error]: slot: 0
[ 273.543046] {2}[Hardware Error]: secondary_bus: 0x00
[ 273.553845] {2}[Hardware Error]: vendor_id: 0x19e5, device_id: 0xa250
[ 273.567751] {2}[Hardware Error]: class_code: 120000
[ 252.733975] hisi_zip 0000:75:00.0: AER: aer_status: 0x00000000, aer_mask: 0x00000000
[ 252.753282] hisi_zip 0000:75:00.0: AER: aer_layer=Transaction Layer, aer_agent=Receiver ID
[ 252.777791] hisi_zip 0000:75:00.0: AER: aer_uncor_severity: 0x00000000
[ 252.825195] hisi_zip 0000:75:00.0: qm_acc_wb_not_ready_timeout [error status=0x40] found
[ 252.842205] hisi_zip 0000:75:00.0: zip_pre_in_data_err [error status=0x80] found
[ 252.857767] hisi_zip 0000:75:00.0: zip_com_inf_err [error status=0x100] found
[ 252.857769] hisi_zip 0000:75:00.0: zip_enc_inf_err [error status=0x200] found
[ 252.887741] hisi_zip 0000:75:00.0: zip_pre_out_err [error status=0x400] found

Debugging shows that this beahviour correlates with large number of io page faults.
Normally the above test reports iopfs in the range of 100s but when this error
happens it goes up to millions.

Also this was never reproduced on another D06 board which runs a BIOS that
enables DVM(Distributed Virtual Memory). This was kind of telling us that the
issue is probably related to SMMU tlb invalidations.

Further debugging/code review revealed that current SMMUv3 SVA code makes
it mandatory that SVA feature can only be supported if SMMUv3 has BTM
(Broadcast TLB maintenance) feature. And it looks like the assumption is
that BTM support means, DVM is also enabled (Need to verify this assumption
is always true). But on our D06 board, even though SMMU reports BTM support,
DVM is only enabled with a special BIOS.

Based on the above criteria(ie, BTM means DVM is enabled for SVA), at present
in the mm notifier -->invalidate_range() code path, it only does ATC invalidations
and there is no SMMU tlb invalidations. This will break on non-DVM
platforms as we need explicit tlbi invalidation here.

With the below quick fix, I am not seeing any Hardware Error now
(completed around 100 iterations of test_sva_perf runs) on my setup with
the above test script.

@@ -3697,6 +3699,10 @@ static void arm_smmu_mm_invalidate_range(struct mmu_notifier *mn,
{
struct arm_smmu_mmu_notifier *smmu_mn = mn_to_smmu(mn);

  •   arm_smmu_tlb_inv_range(start, end - start + 1,
    
  •                          PAGE_SIZE, false, smmu_mn->domain);
    
  • arm_smmu_atc_inv_domain(smmu_mn->domain, mm->pasid, start,
    end - start + 1);
    trace_smmu_mm_invalidate(mm->pasid, start, end);

from uadk.

gaozhangfei avatar gaozhangfei commented on June 24, 2024

Test with light -weight job, no io page fault, but data is not correct

  1. thp scan more frequently
    echo 10 > /sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs

linaro@ubuntu:~$ sudo openssl speed -engine uadk -seconds 1 rsa2048
[sudo] password for linaro:
engine "uadk" set.
hisi sec init Kunpeng920!
Doing 2048 bits private rsa's for 1s: 3191 2048 bits private RSA's in 0.50s
Doing 2048 bits public rsa's for 1s: RSA verify failure
281473395044352:error:0407008A:rsa routines:RSA_padding_check_PKCS1_type_1:invalid padding:crypto/rsa/rsa_pk1.c:67:
-1 2048 bits public RSA's in 0.29s
OpenSSL 1.1.1a 20 Nov 2018
built on: Fri Mar 5 06:08:16 2021 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) idea(int) blowfish(ptr)
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG
sign verify sign/s verify/s
rsa 2048 bits 0.000157s -0.290000s 6382.0 -3.4

  1. add workaround data is correct
    root@ubuntu:/sys/kernel/debug# echo 1 > sva_force_inval

linaro@ubuntu:~$ sudo openssl speed -engine uadk -seconds 1 rsa2048
engine "uadk" set.
hisi sec init Kunpeng920!
Doing 2048 bits private rsa's for 1s: 3186 2048 bits private RSA's in 0.49s
Doing 2048 bits public rsa's for 1s: 45560 2048 bits public RSA's in 0.91s
OpenSSL 1.1.1a 20 Nov 2018
built on: Fri Mar 5 06:08:16 2021 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) idea(int) blowfish(ptr)
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG
sign verify sign/s verify/s
rsa 2048 bits 0.000154s 0.000020s 6502.0 50065.9

from uadk.

gaozhangfei avatar gaozhangfei commented on June 24, 2024

Have found the reason, hpre is connected to another smmu, whose dvm is
not enabled by the bios :(.

sudo busybox devmem 0x2001c0030 32
0x1 // is error
0x9 // is correct

With the updated bios, have passed stress test

from uadk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.