Code Monkey home page Code Monkey logo

open-cas-linux's People

Contributors

adamseyda avatar anatol avatar anikielx avatar arutk avatar deixx avatar docentszachista avatar felipe-barajas avatar kamillepek avatar karolinavelkaja avatar katlapinka avatar kgierszx avatar klaudiaj avatar kmajzero avatar lausaa avatar lhongtong avatar liuhongtong avatar live4thee avatar mdziegie avatar michalwy avatar micrakow avatar mjasadi72 avatar mlyulko avatar mmichal10 avatar mmkaypl avatar oshchirs avatar ostrokrzew avatar pdebski21 avatar rafalste avatar robertbaldyga avatar soniapszczolinska avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

open-cas-linux's Issues

cas_io_cache1_1 Tainted in kernel case kernel panic

@mmichal10 @robertbaldyga we find opencas service case kernel panic in our product env,detail info as fellows,is exists some bugs in opencas?

  • os env
3.10.0-693.el7.x86_64
  • opencas version
[[email protected]:/root]
# casadm --version
+=========================+===================+
| Name                    |      Version      |
+=========================+===================+
| CAS Cache Kernel Module | 19.09.00.00001210 |
| CAS Disk Kernel Module  | 19.09.00.00001210 |
| CAS CLI Utility         | 19.09.00.00001210 |
+=========================+===================+

[[email protected]:/root]
# casadm -L
type    id   disk       status    write policy   device
cache   1    /dev/sdc   Running   wt             -
+core   1    /dev/sdb   Active    -              /dev/cas1-1

  • kernel logs
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405271] ------------[ cut here ]------------
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405297] WARNING: CPU: 10 PID: 39797 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405299] list_del corruption, ffff885fac756ff0->next is LIST_POISON1 (dead000000000100)
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405300] Modules linked in: fuse binfmt_misc cas_cache(OE) cas_disk(OE) tcp_diag inet_diag dm_mirror dm_region_hash dm_log dm_mod iTCO_wdt iTCO_vendor_support sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg pcspkr hpilo hpwdt ioatdma shpchp i2c_i801 lpc_ich dca ipmi_si ipmi_devintf ipmi_msghandler pcc_cpufreq wmi acpi_power_meter ip_tables xfs sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper bnx2x syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crct10dif_pclmul crct10dif_common crc32c_intel tg3 serio_raw hpsa mdio i2c_core ptp pps_core scsi_transport_sas libcrc32c
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405335] CPU: 10 PID: 39797 Comm: cas_io_cache1_1 Tainted: G        W  OE  ------------   3.10.0-693.el7.x86_64 #1
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405337] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 02/17/2017
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405338]  ffff882fbda7bc88 000000004fed9b41 ffff882fbda7bc38 ffffffff816a3d91
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405340]  ffff882fbda7bc78 ffffffff810879c8 00000035bda7bce0 ffff885fac756ff0
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405341]  ffff885fac756000 ffff885fb9c8b2e0 0000000000000282 ffff885fbbf12ab8
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405344] Call Trace:
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405352]  [<ffffffff816a3d91>] dump_stack+0x19/0x1b
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405357]  [<ffffffff810879c8>] __warn+0xd8/0x100
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405360]  [<ffffffff81087a4f>] warn_slowpath_fmt+0x5f/0x80
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405370]  [<ffffffffc05ba3de>] ? cas_rpool_try_get+0x5e/0x80 [cas_cache]
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405371]  [<ffffffff8133d963>] __list_del_entry+0x63/0xd0
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405373]  [<ffffffff8133d9dd>] list_del+0xd/0x30
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405377]  [<ffffffffc05ba3cf>] cas_rpool_try_get+0x4f/0x80 [cas_cache]
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405382]  [<ffffffffc05bf08d>] __cas_ctx_data_alloc+0xed/0x300 [cas_cache]
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405387]  [<ffffffffc05bf2b0>] cas_ctx_data_alloc+0x10/0x20 [cas_cache]
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405393]  [<ffffffffc05c8163>] _ocf_read_generic_do+0x1a3/0x2b0 [cas_cache]
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405397]  [<ffffffffc05c8348>] ocf_read_generic+0xd8/0x100 [cas_cache]
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405405]  [<ffffffffc05e0813>] ocf_io_handle+0x23/0x40 [cas_cache]
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405413]  [<ffffffffc05e0875>] ocf_queue_run_single+0x45/0x50 [cas_cache]
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405419]  [<ffffffffc05e08a8>] ocf_queue_run+0x28/0x50 [cas_cache]
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405424]  [<ffffffffc05b7d9a>] _cas_io_queue_thread+0xfa/0x150 [cas_cache]
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405429]  [<ffffffff810b1910>] ? wake_up_atomic_t+0x30/0x30
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405434]  [<ffffffffc05b7ca0>] ? cas_service_ioctl_ctrl+0x1bd0/0x1bd0 [cas_cache]
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405436]  [<ffffffff810b098f>] kthread+0xcf/0xe0
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405438]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405442]  [<ffffffff816b4f18>] ret_from_fork+0x58/0x90
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405444]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
Feb 28 09:46:14 10-193-69-43 kernel: [6872898.405446] ---[ end trace 808e8663e223caae ]---
Feb 28 09:46:14 10-193-69-43 k

Setting sequential cut off policy/threshold returns 0 status even for non existing cores/caches

For example for the following cache configuration:

type    id   disk             status    write policy   device
cache   1    /dev/nvme0n1p1   Running   wt             -

the following commands return exit code 0

casadm -X -n seq-cutoff -i 1 -j 50 -t 28
casadm -X -n seq-cutoff -i 1 -j 50 -p full
casadm -X -n seq-cutoff -i 5555 -j 50 -p full
casadm -X -n seq-cutoff -i 5555 -p full

When I add core this error still occurs (for different core ids than the one I added). As stated this behaviour occurs when setting per cache or per core.

Hung tasks during "casctl stop --flush" on v19.6

Setup:
RHEL 7.5
[root@gklab-108-180 mnt]# casadm -L
type id disk status write policy device
cache 1 /dev/nvme5n1 Running wb -
└core 1 /dev/md125 Flushing (85.9 %) - /dev/cas1-1

/dev/md125 is a IMSM raid5 array created upon four drives.
I created xfs fisystem on cas device and wrote around 500GB of data.

Flushing is in progress, but I observe ugly call traces:

[ 1080.224573] casadm D ffff9a08b5c8eeb0 0 4601 4596 0x00000080
[ 1080.231764] Call Trace:
[ 1080.234258] [] ? default_wake_function+0x12/0x20
[ 1080.240605] [] ? autoremove_wake_function+0x2b/0x40
[ 1080.247218] [] schedule+0x29/0x70
[ 1080.255648] [] schedule_timeout+0x239/0x2c0
[ 1080.264948] [] ? __wake_up+0x44/0x50
[ 1080.273635] [] ? cas_kick_queue_thread+0x23/0x30 [cas_cache]
[ 1080.284434] [] ? _cas_queue_kick+0xe/0x10 [cas_cache]
[ 1080.294586] [] wait_for_completion+0xfd/0x140
[ 1080.304014] [] ? wake_up_state+0x20/0x20
[ 1080.313025] [] _cache_mngt_core_flush_sync.isra.0+0x7f/0xb0 [cas_cache]
[ 1080.324791] [] _cache_mngt_remove_core_prepare+0x5d/0xc0 [cas_cache]
[ 1080.336989] [] cache_mngt_remove_core_from_cache+0xba/0x220 [cas_cache]
[ 1080.348797] [] ? vmalloc+0x52/0x60
[ 1080.357411] [] cas_service_ioctl_ctrl+0x1d7/0x1930 [cas_cache]
[ 1080.368563] [] ? enqueue_task_fair+0x208/0x6c0
[ 1080.378327] [] ? sched_clock_cpu+0x85/0xc0
[ 1080.387737] [] ? enqueue_task+0x57/0x80
[ 1080.396887] [] do_vfs_ioctl+0x350/0x560
[ 1080.405955] [] ? wake_up_new_task+0x122/0x190
[ 1080.415499] [] ? do_fork+0x107/0x320
[ 1080.424188] [] SyS_ioctl+0xa1/0xc0
[ 1080.432673] [] system_call_fastpath+0x1c/0x21
[ 1200.160372] INFO: task casadm:4601 blocked for more than 120 seconds.
[ 1200.170266] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Clean core in inactive state could not be removed

Steps to reproduce:

  1. Start cache in WB mode and add core
  2. Write some data to CAS device.
  3. Flush cache.
  4. Stop cache.
  5. Unplug core device from OS.
  6. Load cache and check that core is in inactive state.
  7. Remove core without force option.

Actual status:
Inactive core without dirty data could not be removed without "force" option

Expected:
It should be possible to remove clean inactive core without "force" option

Unknown error when removing non-existing core

Reproduction:

  1. start cache
  2. add core
  3. try removing core with different id than added

Expected:
Info that core does not exist

Result:
Error while removing core device X from cache instance 1
Unknown error 1000015

Core block stats names inconsistency

Core block stats names differ slightly depending on using or not the '--ioclass-id' parameter:

 # casadm -P -i 1 -j 1 -f blk -d      # casadm -P -i 1 -j 1 -f blk
 Reads from core(s)                   Reads from core
 Writes to core(s)                    Writes to core
 Total to/from core(s)                Total to/from core
 Reads from exported object(s)        Reads from exported object
 Writes to exported object(s)         Writes to exported object
 Total to/from exported object(s)     Total to/from exported object

CAS 19.9 insert throughput low on kernel 4.13.9-300.fc27.x86_64

4K 100% read miss IOPS (with cache occupancy close to 0%) is order of magnitude lower on kernel 4.13 vs 5.3 :

CAS 19.6 + kernel 4.19 = ~50 KIOPS [expected]
CAS 19.9 + kernel 4.19 = ~27 KIOPS [too low]
CAS 19.9 + kernel 5.3 = ~450 KIOPS [expected]

Major increase in 19.9 vs 19.6 is expected as a result of locking optimizations. However on kernel 4.19 there is actually a decrease. Looks like all CAS threads are wasting time trying to acquire spin lock inside submit_bio (cache write in backfill engine). This looks like cache device queue saturation, but actual queue depth on the cache device as reported by iostat is only ~4 reqs.

Cache device queue settings:
[root@localhost open-cas-linux]# cat /sys/block/nvme0n1/queue/scheduler
[none] mq-deadline
[root@localhost open-cas-linux]# cat /sys/block/nvme0n1/queue/nr_requests
1023

Reproduction:

# 500G cache, 100G core
casadm -S -i 1 -d /dev/nvme0n1 -c wb --force
modprobe null_blk queue_mode=2 gb=100 nr_devices=2 irqmode=0 submit_queues=16
casadm -A -i 1 -d /dev/nullb0
/root/fio/fio -ioengine=libaio -direct=1 -verify=0 -group_reporting=1 -time_based=1 -ramp_time=0 -rw=randread -bs=4k -norandommap=1 -numjobs=16 -iodepth=16 --percentile_list=50:90:99:99.9:99.99:99.999 -cpus_allowed=0-19 -cpus_allowed_policy=split -filename=/dev/cas1-1 -name=aa -runtime=30

Atached are flame graphs for CAS 19.9 + kernel 4.19.

image

image

Setting cleaning-acp params does not work

Trying to set one of the following (for any valid value)
casadm -X -n cleaning-acp -w 50 -i 1
casadm -X -n cleaning-acp -b 100 -i 1
gives the following output:

Error during options handling
Try `casadm --help | -H' for more information.

Kernel warning during nhit promotion policy operations

Kernel warning during nhit promotion policy operations:

[21563.552417] WARNING: CPU: 15 PID: 0 at block/blk-core.c:3255 blk_end_bidi_request+0x68/0x70
[21563.561406] Modules linked in: cas_cache(O) cas_disk(O) xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables devlink bpfilter nls_utf8 rpcsec_gss_krb5 auth_rpcgss oid_registry nfsv4 nfs lockd grace fscache sunrpc intel_rapl iosf_mbi iTCO_wdt ioatdma mxm_wmi iTCO_vendor_support sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel kvm_intel i2c_i801 lpc_ich intel_pch_thermal ipmi_si ipmi_devintf ipmi_msghandler wmi pcc_cpufreq acpi_pad ixgbe igb mdio i2c_algo_bit ptp crc32c_intel nvme i2c_core pps_core nvme_core dca xhci_pci xhci_hcd
[21563.629400] CPU: 15 PID: 0 Comm: swapper/15 Tainted: G O 4.19.56-t2.el7.twitter.x86_64 #1
[21563.639501] Hardware name: Supermicro SYS-5018D-FN4T/X10SDV-8C-TLN4F, BIOS 1.1c 10/03/2016
[21563.648390] RIP: 0010:blk_end_bidi_request+0x68/0x70
[21563.653675] Code: 43 00 89 de 48 89 ef 49 89 c5 e8 63 fe ff ff 49 8b bc 24 c0 01 00 00 4c 89 ee e8 33 38 43 00 31 d2 5b 89 d0 5d 41 5c 41 5d c3 <0f> 0b eb b0 0f 1f 40 00 0f 1f 44 00 00 48 8b 07 48 83 b8 00 01 00
[21563.673317] RSP: 0018:ffff9c95e7bc3d68 EFLAGS: 00010286
[21563.678859] RAX: 0000000000000000 RBX: ffff9c95e44892d8 RCX: 0000000000000000
[21563.686341] RDX: 0000000000001000 RSI: 0000000000000000 RDI: ffff9c95a1b417c0
[21563.693795] RBP: ffff9c95a1b417c0 R08: ffff9c95e1316a00 R09: 0000000000001000
[21563.701293] R10: ffff9c95e051ff98 R11: ffff9c95e1316450 R12: ffff9c95e3935658
[21563.708745] R13: ffff9c95e13163f8 R14: 0000000000000000 R15: 0000000000000000
[21563.716217] FS: 0000000000000000(0000) GS:ffff9c95e7bc0000(0000) knlGS:0000000000000000
[21563.724885] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[21563.730946] CR2: 00007fc2c8894630 CR3: 000000015a20a004 CR4: 00000000003606e0
[21563.738426] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[21563.745874] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[21563.753359] Call Trace:
[21563.756135]
[21563.758498] blk_end_request_all+0x1f/0x40
[21563.762919] block_dev_complete_rq+0x6f/0x90 [cas_cache]
[21563.768592] ? ocf_req_complete+0xeb/0x150 [cas_cache]
[21563.774070] ? _ocf_read_generic_miss_complete+0x75/0xd0 [cas_cache]
[21563.780744] ? ocf_submit_volume_req_cmpl+0x16/0x20 [cas_cache]
[21563.787000] ? cas_bd_io_end+0x37/0x50 [cas_cache]
[21563.792128] ? cas_bd_io_end_callback+0x5e/0xb0 [cas_cache]
[21563.798037] ? blk_update_request+0x78/0x2c0
[21563.802626] ? scsi_end_request+0x2c/0x240
[21563.807054] ? scsi_io_completion+0x88/0x610
[21563.811640] ? blk_done_softirq+0x84/0xa0
[21563.815980] ? __do_softirq+0xe2/0x2dd
[21563.820064] ? irq_exit+0xc5/0xd0
[21563.823691] ? do_IRQ+0x85/0xd0
[21563.827169] ? common_interrupt+0xf/0xf
[21563.831350]
[21563.833764] ? cpuidle_enter_state+0xa2/0x310
[21563.838468] ? cpuidle_enter_state+0x93/0x310
[21563.843163] ? do_idle+0x1b6/0x210
[21563.846885] ? cpu_startup_entry+0x5f/0x70
[21563.851339] ? start_secondary+0x182/0x1c0
[21563.855751] ? secondary_startup_64+0xa4/0xb0
[21563.860455] ---[ end trace 650fe3afc3dad17c ]---
[21563.865452] WARNING: CPU: 15 PID: 0 at block/blk-core.c:3200 blk_finish_request+0x11c/0x150
[21563.874422] Modules linked in: cas_cache(O) cas_disk(O) xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables devlink bpfilter nls_utf8 rpcsec_gss_krb5 auth_rpcgss oid_registry nfsv4 nfs lockd grace fscache sunrpc intel_rapl iosf_mbi iTCO_wdt ioatdma mxm_wmi iTCO_vendor_support sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel kvm_intel i2c_i801 lpc_ich intel_pch_thermal ipmi_si ipmi_devintf ipmi_msghandler wmi pcc_cpufreq acpi_pad ixgbe igb mdio i2c_algo_bit ptp crc32c_intel nvme i2c_core pps_core nvme_core dca xhci_pci xhci_hcd
[21563.942349] CPU: 15 PID: 0 Comm: swapper/15 Tainted: G W O 4.19.56-t2.el7.twitter.x86_64 #1
[21563.952448] Hardware name: Supermicro SYS-5018D-FN4T/X10SDV-8C-TLN4F, BIOS 1.1c 10/03/2016
[21563.961327] RIP: 0010:blk_finish_request+0x11c/0x150
[21563.966608] Code: 6a 1a 00 00 48 8b 43 40 48 8d 53 40 48 39 c2 0f 84 4a ff ff ff 0f 0b 4c 89 ee 48 89 df e8 7c eb 00 00 8b 43 18 e9 1c ff ff ff <0f> 0b e9 07 ff ff ff 0f b6 43 14 83 e8 20 83 f8 03 0f 86 2c ff ff
[21563.986225] RSP: 0018:ffff9c95e7bc3d40 EFLAGS: 00010086
[21563.991768] RAX: 0000139c70051421 RBX: ffff9c95a1b417c0 RCX: 0000000000000018
[21563.999235] RDX: 000008a5cc704076 RSI: 002327a315fb43d7 RDI: ffffffff85234040
[21564.006687] RBP: ffff9c95e3935658 R08: 0000000000000001 R09: 0000000000000000
[21564.014157] R10: ffff9c95e051ea00 R11: 0000000000006500 R12: 0000000000000000
[21564.021610] R13: 0000139c70051421 R14: 0000000000000000 R15: 0000000000000000
[21564.029081] FS: 0000000000000000(0000) GS:ffff9c95e7bc0000(0000) knlGS:0000000000000000
[21564.037776] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[21564.043837] CR2: 00007fc2c8894630 CR3: 000000015a20a004 CR4: 00000000003606e0
[21564.051319] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[21564.058773] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[21564.066244] Call Trace:
[21564.069042]
[21564.071388] blk_end_bidi_request+0x4d/0x70
[21564.075884] blk_end_request_all+0x1f/0x40
[21564.080336] block_dev_complete_rq+0x6f/0x90 [cas_cache]
[21564.085990] ? ocf_req_complete+0xeb/0x150 [cas_cache]
[21564.091464] ? _ocf_read_generic_miss_complete+0x75/0xd0 [cas_cache]
[21564.098159] ? ocf_submit_volume_req_cmpl+0x16/0x20 [cas_cache]
[21564.104427] ? cas_bd_io_end+0x37/0x50 [cas_cache]
[21564.109573] ? cas_bd_io_end_callback+0x5e/0xb0 [cas_cache]
[21564.115491] ? blk_update_request+0x78/0x2c0
[21564.120092] ? scsi_end_request+0x2c/0x240
[21564.124535] ? scsi_io_completion+0x88/0x610
[21564.129142] ? blk_done_softirq+0x84/0xa0
[21564.133479] ? __do_softirq+0xe2/0x2dd
[21564.137560] ? irq_exit+0xc5/0xd0
[21564.141212] ? do_IRQ+0x85/0xd0
[21564.144673] ? common_interrupt+0xf/0xf
[21564.148818]
[21564.151271] ? cpuidle_enter_state+0xa2/0x310
[21564.155980] ? cpuidle_enter_state+0x93/0x310
[21564.160654] ? do_idle+0x1b6/0x210
[21564.164400] ? cpu_startup_entry+0x5f/0x70
[21564.168816] ? start_secondary+0x182/0x1c0
[21564.173244] ? secondary_startup_64+0xa4/0xb0
[21564.177915] ---[ end trace 650fe3afc3dad17d ]---

open-cas does not support dedicated cache

In Openstack environment, dozens of VMs shares one single cache. Some VMs may occupy too much cache and other VMs does not. It is not fair for the tenant of the VMs.

It would be great if open-cas can allocate a fixed size cache for a specific VM, the cache belongs only to this VM, cannot be shared.

ps: A VM may contains one or more core device.

Misleading error when running ./configure script without kernel-devel installed

When having kernel-headers package installed, but missing kernel-devel package Open CAS configure script shows misleading error:

# ./configure
Error: missing kernel headers

It would be expected for configure script to either shows error like 'missing kernel headers and/or kernel devel' or even better to distinguish between this two cases.

OS: RHEL 8.1
Kernel: 4.18.0-147.el8.x86_64

Kernel modules don't load

After following the admin guide installation instructions exactly, I'm finding that the kernel modules are not loading on reboot.

open-cas does not support Ubuntu 19.04 (kernel 5.0.0)

stack@ceph01liang: ~ / open-cas-linux$ sudo make install
cd modules && make install
make[1]: Entering directory '/opt/stack/open-cas-linux/modules'
Installing Open-CAS modules
modprobe: ERROR: could not insert 'cas_cache': Unknown symbol in module, or unknown parameter (see dmesg)
make[1]: *** [Makefile:58: install] Error 1
make[1]: Leaving directory '/opt/stack/open-cas-linux/modules'
make: *** [Makefile:15: modules] Error 2

stack@ceph01liang: ~ / open-cas-linux$ uname -r
5.0.0-29-generic


stack@ceph01liang:~/open-cas-linux$ dmesg | grep cas_disk
[91506.368989] cas_disk: loading out-of-tree module taints kernel.
[91506.369108] cas_disk: module verification failed: signature and/or required key missing - tainting kernel

Kernel warning during filesystem creation on exported device

No reproduction so far

[ 311.242921] ------------[ cut here ]------------
[ 311.248403] WARNING: CPU: 17 PID: 43868 at mm/page_alloc.c:3189 __alloc_pages_slowpath+0x6f/0x724
[ 311.258838] Modules linked in: cas_cache(OE) cas_disk(OE) brd xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink loop rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc dm_mirror dm_region_hash dm_log dm_mod sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul iTCO_wdt glue_helper ablk_helper iTCO_vendor_support mei_me cryptd pcspkr sg joydev lpc_ich i2c_i801 mei ioatdma wmi ipmi_si ipmi_devintf ipmi_msghandler ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic drm_kms_helper syscopyarea sysfillrect crct10dif_pclmul sysimgblt crct10dif_common fb_sys_fops ttm nvme crc32c_intel isci igb ahci drm libsas scsi_transport_sas libahci ptp nvme_core libata pps_core drm_panel_orientation_quirks dca i2c_algo_bit [last unloaded: cas_disk]
[ 311.378189] CPU: 17 PID: 43868 Comm: mkfs.ext4 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7.x86_64 #1
[ 311.391341] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
[ 311.404378] Call Trace:
[ 311.407394] [] dump_stack+0x19/0x1b
[ 311.413423] [] __warn+0xd8/0x100
[ 311.419164] [] warn_slowpath_null+0x1d/0x20
[ 311.425977] [] __alloc_pages_slowpath+0x6f/0x724
[ 311.433289] [] __alloc_pages_nodemask+0x405/0x420
[ 311.440692] [] alloc_pages_current+0x98/0x110
[ 311.447702] [] __get_free_pages+0xe/0x40
[ 311.454238] [] kmalloc_order_trace+0x2e/0xa0
[ 311.461160] [] ? cas_rpool_try_get+0x5e/0x80 [cas_cache]
[ 311.469242] [] __kmalloc+0x211/0x230
[ 311.475374] [] ocf_req_alloc_map+0x3e/0x60 [cas_cache]
[ 311.483251] [] ocf_req_alloc_map_discard+0x57/0x60 [cas_cache]
[ 311.492118] [] ocf_core_volume_submit_discard+0xdd/0x270 [cas_cache]
[ 311.501563] [] ? ocf_core_io_allocator_new+0x1c/0x20 [cas_cache]
[ 311.510619] [] ? ocf_io_new+0x78/0xf0 [cas_cache]
[ 311.518037] [] ocf_volume_submit_discard+0x33/0x80 [cas_cache]
[ 311.526925] +0x1a/0x20 [cas_cache]
[ 311.535186] [] _blockdev_make_request_fast+0x23c/0x370 [cas_cache]
[ 311.544445] [] ? generic_make_request_checks+0x27f/0x390
[ 311.552525] [] _casdsk_exp_obj_make_rq_fn+0xa3/0x1d0 [cas_disk]
[ 311.561478] [] generic_make_request+0x147/0x380
[ 311.568687] [] submit_bio+0x70/0x150
[ 311.574828] [] ? bio_alloc_bioset+0x115/0x310
[ 311.581849] [] blkdev_issue_discard+0x1fa/0x2d0
[ 311.589071] [] blk_ioctl_discard+0xe5/0x130
[ 311.595896] [] blkdev_ioctl+0x632/0xa20
[ 311.602313] [] block_ioctl+0x41/0x50
[ 311.608442] [] do_vfs_ioctl+0x3a0/0x5a0
[ 311.614860] [] ? vfs_write+0x168/0x1f0
[ 311.621179] [] SyS_ioctl+0xa1/0xc0
[ 311.627121] [] ? system_call_after_swapgs+0xa2/0x146
[ 311.634802] [] system_call_fastpath+0x22/0x27
[ 311.641809] [] ? system_call_after_swapgs+0xae/0x146
[ 311.649509] ---[ end trace 946d2bba3d539123 ]---

Kernel bug when starting cache with insufficient amount of RAM

open-cas-linux commit: 3812015

The following is printed to dmesg when attempting to start cache with not enough RAM to store cache metadata:

[ 48.756416] 'OCF Core' volume operations registered
[ 48.782582] 'Block Device' volume operations registered
[ 48.784876] 'Atomic Writes NVMe' volume operations registered
[ 48.790116] [Open-CAS] Not found configuration for upgrade. Standard module initialization.
[ 48.794881] Open Cache Acceleration Software Linux Version 19.06.00.00000827 (3.10.0-693.el7.x86_64)::Module loaded successfully
[ 51.415805] Inserting cache cache1
[ 51.418145] Thread ocf_metadata_updater_cache1 started
[ 51.418149] cache1: Metadata initialized
[ 51.418323] cache1: Successfully added
[ 51.420240] cache1: Cache mode : wt
[ 51.422473] Thread cas_io_cache1_0 started
[ 51.422688] Thread cas_io_cache1_1 started
[ 51.422890] Thread cas_io_cache1_2 started
[ 51.423122] Thread cas_io_cache1_3 started
[ 51.423235] Thread cas_io_cache1_4 started
[ 51.423386] Thread cas_io_cache1_5 started
[ 51.423500] Thread cas_io_cache1_6 started
[ 51.423508] Thread cas_io_cache1_7 started
[ 51.423566] Thread cas_io_cache1_8 started
[ 51.423719] Thread cas_io_cache1_9 started
[ 51.423875] Thread cas_io_cache1_10 started
[ 51.423971] Thread cas_io_cache1_11 started
[ 51.424225] Thread cas_io_cache1_12 started
[ 51.424353] Thread cas_io_cache1_13 started
[ 51.424501] Thread cas_io_cache1_14 started
[ 51.424612] Thread cas_io_cache1_15 started
[ 51.424612] Thread cas_io_cache1_-1 started
[ 51.424612] [Open-CAS] [Classifier] Initialized IO classifier
[ 51.427412] cache1: Not enough free RAM for cache metadata to start cache
[ 51.430615] cache1: Available RAM: 32356962304 B
[ 51.432839] cache1: Needed RAM: 63837936025 B
[ 51.434943] cache1: Attaching cache device failed
[ 51.437311] BUG: unable to handle kernel paging request at ffffc90003ad9000
[ 51.438272] IP: [] ocf_volume_init+0x2b/0x110 [cas_cache]
[ 51.438272] PGD 17fc62067 PUD 17fc63067 PMD 174697067 PTE 0
[ 51.438272] Oops: 0002 [#1] SMP
[ 51.438272] Modules linked in: cas_cache(OE) null_blk ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter xfs libcrc32c cas_disk(OE) sg ppdev joydev pcspkr parport_pc parport i2c_piix4 ip_tables btrfs xor raid6_pq sd_mod crc_t10dif crct10dif_generic sr_mod cdrom crct10dif_common ata_generic virtio_blk pata_acpi cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix libata e1000 virtio_pci floppy
[ 51.438272] virtio_ring i2c_core serio_raw virtio [last unloaded: cas_cache]
[ 51.438272] CPU: 7 PID: 6027 Comm: casadm Tainted: G OE ------------ 3.10.0-693.el7.x86_64 #1
[ 51.438272] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[ 51.438272] task: ffff8807e92e3f40 ti: ffff8800bb614000 task.ti: ffff8800bb614000
[ 51.438272] RIP: 0010:[] [] ocf_volume_init+0x2b/0x110 [cas_cache]
[ 51.438272] RSP: 0018:ffff8800bb617c70 EFLAGS: 00010286
[ 51.438272] RAX: ffffffffc05c3200 RBX: ffffc90003ad9000 RCX: 0000000000000000
[ 51.438272] RDX: ffff8800bb617d70 RSI: ffff8807f3b5ba00 RDI: 0000000000000040
[ 51.438272] RBP: ffff8800bb617c90 R08: ffff8800bb617c90 R09: 0000000000000000
[ 51.438272] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90003cfa000
[ 51.438272] R13: ffffc90003ad9000 R14: 00007fff7c09f710 R15: ffffc90003a99020
[ 51.438272] FS: 00007f39dd745740(0000) GS:ffff88081f3c0000(0000) knlGS:0000000000000000
[ 51.438272] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 51.438272] CR2: ffffc90003ad9000 CR3: 0000000035ec4000 CR4: 00000000000006e0
[ 51.438272] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 51.438272] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 51.438272] Stack:
[ 51.438272] ffff8800bb617d70 ffffc90003cfa000 ffffc90003ad9000 00007fff7c09f710
[ 51.438272] ffff8800bb617cc8 ffffffffc05afd5f ffffc90003a98000 ffff8808147c8600
[ 51.438272] ffff8800bb617d70 00007fff7c09f710 0000000000000001 ffff8800bb617d48
[ 51.438272] Call Trace:
[ 51.438272] [] ocf_mngt_get_ram_needed+0x5f/0x100 [cas_cache]
[ 51.438272] [] cache_mngt_init_instance+0x22c/0x430 [cas_cache]
[ 51.438272] [] cas_service_ioctl_ctrl+0x16d0/0x1ad0 [cas_cache]
[ 51.438272] [] ? x2apic_send_IPI_mask+0x13/0x20
[ 51.438272] [] ? native_smp_send_reschedule+0x4c/0x70
[ 51.438272] [] ? resched_curr+0xa8/0xc0
[ 51.438272] [] do_vfs_ioctl+0x33d/0x540
[ 51.438272] [] ? file_has_perm+0x9f/0xb0
[ 51.438272] [] SyS_ioctl+0xa1/0xc0
[ 51.438272] [] ? SyS_clone+0x16/0x20
[ 51.438272] [] system_call_fastpath+0x16/0x1b
[ 51.438272] Code: 0f 1f 44 00 00 55 48 85 f6 48 89 e5 41 56 41 55 41 54 53 48 8b 06 48 89 fb 8b 78 0c 0f 84 96 00 00 00 48 85 db 0f 84 8d 00 00 00 <48> 89 33 c6 43 18 00 be d0 80 00 00 49 89 d4 41 89 cd e8 5e 1e
[ 51.438272] RIP [] ocf_volume_init+0x2b/0x110 [cas_cache]
[ 51.438272] RSP
[ 51.438272] CR2: ffffc90003ad9000

Flush operation cannot be interrupted

Can't stop the core flushing process with SIGINT. When running in background also can't kill the process with SIGKILL or SIGTERM although the message that the process is terminated or killed is displayed. Situation with cache flush looks similar.

seq-cutoff: 4K seq read performance is much worse than core device's when all the read requests are 100% cache hit and cache device is 80% dirty

Test Config:

  • OS: RHEL7.5
  • CAS: v20.01 release
  • Cache: NVMe/5GB
  • Core: HDD/50GB
  • Write back
  • nop cleaning policy

Reproduce Steps:

  1. 80% fill up cache device with 4K seq write.
    fio --filename=/dev/cas1-1 --direct=1 --thread --rw=write --bs=4k --size=4G --iodepth=128 --ioengine=libaio --numjobs=1 --group_reporting --name=4k_seq_write

  2. Enable seq-cutoff, threshold=4KB
    casadm -X -n seq-cutoff -p always -i 1 -j 1 -t 4

  3. Reset CAS statistic counters
    casadm -Z -i 1

  4. 4K seq read
    fio --filename=/dev/cas1-1 --direct=1 --thread --rw=read --bs=4k --size=4G --iodepth=128 --ioengine=libaio --numjobs=1 --group_reporting --name=4k_seq_read

** Expected Result **
Similar 4K seq read performance as cache device's, because 4K seq read requests should be 100% cache hit. Even it is not 100% cache hit, it should equal to core device's performance.

Observed symptoms:

  1. 4K seq read performance is bad. cas1-1(BW) = 15MB/s, HDD(BW) = 217MB/s.
  2. sequential 4K read requests are not cut off. According to iostat and below statistics, there are R/W requests on core device.
  3. According to CAS statistics, all the dirty cache lines are changed to clean, which means CAS flushes all those dirty cache lines when read hit and performing pass-thru requests (seq-cutoff).

** CAS IO statistics **
casadm -P -i 1
Cache Id 1
Cache Size 1290176 [4KiB Blocks] / 4.92 [GiB]
Cache Device /dev/nvme2n1p1
Core Devices 1
Inactive Core Devices 0
Write Policy wb
Eviction Policy lru
Cleaning Policy nop
Promotion Policy always
Cache line size 4 [KiB]
Metadata Memory Footprint 84.4 [MiB]
Dirty for 13744 [s] / 3 [h] 49 [m] 4 [s]
Metadata Mode normal
Status Running

╔══════════════════╤═════════╤══════╤═════════════╗
║ Usage statistics │ Count │ % │ Units ║
╠══════════════════╪═════════╪══════╪═════════════╣
║ Occupancy │ 1048705 │ 81.3 │ 4KiB Blocks ║
║ Free │ 241471 │ 18.7 │ 4KiB Blocks ║
║ Clean │ 1048704 │ 81.3 │ 4KiB Blocks ║
║ Dirty │ 1 │ 0.0 │ 4KiB Blocks ║
╚══════════════════╧═════════╧══════╧═════════════╝

╔══════════════════════╤═════════╤═══════╤══════════╗
║ Request statistics │ Count │ % │ Units ║
╠══════════════════════╪═════════╪═══════╪══════════╣
║ Read hits │ 1 │ 0.0 │ Requests ║
║ Read partial misses │ 0 │ 0.0 │ Requests ║
║ Read full misses │ 0 │ 0.0 │ Requests ║
║ Read total │ 1 │ 0.0 │ Requests ║
╟──────────────────────┼─────────┼───────┼──────────╢
║ Write hits │ 0 │ 0.0 │ Requests ║
║ Write partial misses │ 0 │ 0.0 │ Requests ║
║ Write full misses │ 0 │ 0.0 │ Requests ║
║ Write total │ 0 │ 0.0 │ Requests ║
╟──────────────────────┼─────────┼───────┼──────────╢
║ Pass-Through reads │ 1048575 │ 100.0 │ Requests ║
║ Pass-Through writes │ 0 │ 0.0 │ Requests ║
║ Serviced requests │ 1 │ 0.0 │ Requests ║
╟──────────────────────┼─────────┼───────┼──────────╢
║ Total requests │ 1048576 │ 100.0 │ Requests ║
╚══════════════════════╧═════════╧═══════╧══════════╝

╔══════════════════════════════════╤═════════╤═══════╤═════════════╗
║ Block statistics │ Count │ % │ Units ║
╠══════════════════════════════════╪═════════╪═══════╪═════════════╣
║ Reads from core(s) │ 1048575 │ 50.0 │ 4KiB Blocks ║
║ Writes to core(s) │ 1048575 │ 50.0 │ 4KiB Blocks ║
║ Total to/from core(s) │ 2097150 │ 100.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼─────────┼───────┼─────────────╢
║ Reads from cache │ 1048576 │ 100.0 │ 4KiB Blocks ║
║ Writes to cache │ 0 │ 0.0 │ 4KiB Blocks ║
║ Total to/from cache │ 1048576 │ 100.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼─────────┼───────┼─────────────╢
║ Reads from exported object(s) │ 1048576 │ 100.0 │ 4KiB Blocks ║
║ Writes to exported object(s) │ 0 │ 0.0 │ 4KiB Blocks ║
║ Total to/from exported object(s) │ 1048576 │ 100.0 │ 4KiB Blocks ║
╚══════════════════════════════════╧═════════╧═══════╧═════════════╝

╔════════════════════╤═══════╤═════╤══════════╗
║ Error statistics │ Count │ % │ Units ║
╠════════════════════╪═══════╪═════╪══════════╣
║ Cache read errors │ 0 │ 0.0 │ Requests ║
║ Cache write errors │ 0 │ 0.0 │ Requests ║
║ Cache total errors │ 0 │ 0.0 │ Requests ║
╟────────────────────┼───────┼─────┼──────────╢
║ Core read errors │ 0 │ 0.0 │ Requests ║
║ Core write errors │ 0 │ 0.0 │ Requests ║
║ Core total errors │ 0 │ 0.0 │ Requests ║
╟────────────────────┼───────┼─────┼──────────╢
║ Total errors │ 0 │ 0.0 │ Requests ║
╚════════════════════╧═══════╧═════╧══════════╝

Kernel bug occured during cache stop

Kernel bug occured during cache stop:

[ 4799.934072] Cache cache1 successfully stopped
[ 4799.939197] [Open-CAS] [Classifier] Deinitialized IO classifier
[ 4799.946112] Thread cas_io_cache1_-1 stopped
[ 4799.951269] Thread ocf_metadata_updater_cache1 stopped
[ 4799.957307] ------------[ cut here ]------------
[ 4799.962688] kernel BUG at /root/open-cas-linux/modules/cas_cache/src/ocf/utils/utils_refcnt.c:19!
[ 4799.973045] invalid opcode: 0000 [#1] SMP
[ 4799.977873] Modules linked in: cas_cache(OE) cas_disk(OE) brd nls_utf8 isofs xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache loop sunrpc dm_mirror dm_region_hash dm_log dm_mod sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper iTCO_wdt cryptd iTCO_vendor_support mei_me joydev sg pcspkr mei lpc_ich wmi i2c_i801 ioatdma ipmi_si ipmi_devintf ipmi_msghandler ip_tables ext4 mbcache jbd2 sd_mod
[ 4800.060336] crc_t10dif crct10dif_generic drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops crct10dif_pclmul crct10dif_common ttm isci crc32c_intel igb ahci libsas nvme drm scsi_transport_sas libahci ptp nvme_core libata pps_core drm_panel_orientation_quirks dca i2c_algo_bit [last unloaded: cas_disk]
[ 4800.095851] CPU: 0 PID: 103074 Comm: casadm Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.el7.x86_64 #1
[ 4800.108634] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
[ 4800.121629] task: ffff9608cffa1040 ti: ffff960760804000 task.ti: ffff960760804000
[ 4800.130406] RIP: 0010:[] [] ocf_refcnt_dec+0x45/0x50 [cas_cache]
[ 4800.141184] RSP: 0018:ffff960760807c18 EFLAGS: 00010297
[ 4800.147350] RAX: 00000000ffffffff RBX: ffffa4acc49dc000 RCX: dead000000000200
[ 4800.155552] RDX: ffff960760807ca0 RSI: 00000000fff0bdbb RDI: ffffa4acc49dc5d0
[ 4800.163748] RBP: ffff960760807c30 R08: ffffa4accc420000 R09: 00000001802a0016
[ 4800.171958] R10: 00000000f6158001 R11: ffffe732c0d85600 R12: 00000000fff0bdbb
[ 4800.180156] R13: ffff960760807ca0 R14: ffffa4accc420000 R15: ffffa4acc4ae4b18
[ 4800.188374] FS: 00007f3a2e465740(0000) GS:ffff960aee000000(0000) knlGS:0000000000000000
[ 4800.197843] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4800.204475] CR2: 00007f37133dc0a0 CR3: 00000000ae0c6000 CR4: 00000000000607f0
[ 4800.212663] Call Trace:
[ 4800.215618] [] ? ocf_mngt_cache_put+0x20/0x60 [cas_cache]
[ 4800.223716] [] _ocf_mngt_cache_lock_complete+0x85/0x90 [cas_cache]
[ 4800.232887] [] _ocf_async_lock_run_waiters+0x4b/0x70 [cas_cache]
[ 4800.241865] [] ocf_async_lock_deinit+0x8d/0xb0 [cas_cache]
[ 4800.250060] [] ocf_mngt_cache_put+0x3b/0x60 [cas_cache]
[ 4800.257974] [] ocf_queue_put+0x49/0x60 [cas_cache]
[ 4800.265398] [] cache_mngt_exit_instance+0x177/0x260 [cas_cache]
[ 4800.274274] [] cas_service_ioctl_ctrl+0x1039/0x1e50 [cas_cache]
[ 4800.283166] [] ? sched_slice.isra.58+0x66/0xc0
[ 4800.290205] [] ? __enqueue_entity+0x78/0x80
[ 4800.296945] [] ? enqueue_entity+0x2ef/0xbe0
[ 4800.303690] [] ? cpumask_next_and+0x35/0x50
[ 4800.310423] [] ? enqueue_task_fair+0x208/0x6c0
[ 4800.317455] [] ? sched_clock_cpu+0x85/0xc0
[ 4800.324106] [] do_vfs_ioctl+0x3a0/0x5a0
[ 4800.330455] [] ? wake_up_new_task+0x131/0x1a0
[ 4800.337389] [] ? do_fork+0x107/0x320
[ 4800.343463] [] SyS_ioctl+0xa1/0xc0
[ 4800.349348] [] ? system_call_after_swapgs+0xa2/0x146
[ 4800.356957] [] system_call_fastpath+0x22/0x27
[ 4800.363882] [] ? system_call_after_swapgs+0xae/0x146
[ 4800.371487] Code: 75 0e b8 01 00 00 00 f0 0f b1 5f 08 85 c0 75 06 89 d8 5b 5d c3 90 48 8b 57 18 48 8b 47 10 48 89 d7 e8 d0 50 6e ca 89 d8 5b 5d c3 <0f> 0b 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 53
[ 4800.394090] RIP [] ocf_refcnt_dec+0x45/0x50 [cas_cache]
[ 4800.402005] RSP 

Kernel WARNING during IO

OS: Fedora 31
Commit: 9c7b485
LOG:
Invalid sequential cutoff policy!
Dec 18 07:44:38 localhost.localdomain kernel: WARNING: CPU: 3 PID: 2699 at /root/open-cas-linux/modules/cas_cache/src/ocf/engine/cache_engine.c:194 ocf_seq_cutoff_check+0x35/0x70 [cas_cache]
Dec 18 07:44:38 localhost.localdomain kernel: Modules linked in: cas_cache(OE) cas_disk(OE) xt_CHECKSUM xt_MASQUERADE tun bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables sunrpc snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm sb_edac snd_timer snd crct10dif_pclmul crc32_pclmul soundcore ghash_clmulni_intel joydev i2c_piix4 vboxguest intel_rapl_perf vmwgfx drm_kms_helper crc32c_intel ttm drm e1000 serio_raw ata_generic pata_acpi video [last unloaded: cas_disk]
Dec 18 07:44:38 localhost.localdomain kernel: CPU: 3 PID: 2699 Comm: systemd-udevd Tainted: G W OE 5.2.16-200.fc30.x86_64 #1
Dec 18 07:44:38 localhost.localdomain kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Dec 18 07:44:38 localhost.localdomain kernel: RIP: 0010:ocf_seq_cutoff_check+0x35/0x70 [cas_cache]
Dec 18 07:44:38 localhost.localdomain kernel: Code: 55 49 89 cd 41 54 49 89 d4 55 48 89 fd 53 89 f3 e8 b0 75 00 00 85 c0 74 1f 83 f8 02 74 0e 48 c7 c7 f8 cc 97 c0 e8 9a bb 78 db <0f> 0b 5b 44 89 f0 5d 41 5c 41 5d 41 5e c3 39 9d f0 00 00 00 75 ec
Dec 18 07:44:38 localhost.localdomain kernel: RSP: 0018:ffffb36549f4b868 EFLAGS: 00010286
Dec 18 07:44:38 localhost.localdomain kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Dec 18 07:44:38 localhost.localdomain kernel: RDX: 0000000000000001 RSI: ffffffff9dbd9641 RDI: 0000000000000246
Dec 18 07:44:38 localhost.localdomain kernel: RBP: ffffb3654a1c9c50 R08: 000002620e5cf242 R09: 0000000000000021
Dec 18 07:44:38 localhost.localdomain kernel: R10: ffffb36549f4b6b8 R11: ffffb36549f4b6bd R12: 0000004fffff9000
Dec 18 07:44:38 localhost.localdomain kernel: R13: 0000000000005000 R14: 0000000000000000 R15: 0000000000000040
Dec 18 07:44:38 localhost.localdomain kernel: FS: 00007fe68beb8940(0000) GS:ffff8fc40f780000(0000) knlGS:0000000000000000
Dec 18 07:44:38 localhost.localdomain kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 18 07:44:38 localhost.localdomain kernel: CR2: 00005576758ec000 CR3: 00000003fc024001 CR4: 00000000000606e0
Dec 18 07:44:38 localhost.localdomain kernel: Call Trace:
Dec 18 07:44:38 localhost.localdomain kernel: ? ocf_get_effective_cache_mode+0x81/0xd0 [cas_cache]
Dec 18 07:44:38 localhost.localdomain kernel: ? ocf_core_submit_io_mode+0x453/0x490 [cas_cache]
Dec 18 07:44:38 localhost.localdomain kernel: ? cas_rpool_try_get+0x94/0xa0 [cas_cache]
Dec 18 07:44:38 localhost.localdomain kernel: ? env_allocator_new+0x49/0xa0 [cas_cache]
Dec 18 07:44:38 localhost.localdomain kernel: ? _block_dev_queue_request+0x836/0xad0 [cas_cache]
Dec 18 07:44:38 localhost.localdomain kernel: ? wbt_track+0x21/0x30
Dec 18 07:44:38 localhost.localdomain kernel: ? blk_add_rq_to_plug+0x27/0x70
Dec 18 07:44:38 localhost.localdomain kernel: ? _casdsk_exp_obj_queue_rq+0x59/0x90 [cas_disk]
Dec 18 07:44:38 localhost.localdomain kernel: ? __blk_mq_try_issue_directly+0x121/0x1c0
Dec 18 07:44:38 localhost.localdomain kernel: ? blk_mq_request_issue_directly+0x48/0xe0
Dec 18 07:44:38 localhost.localdomain kernel: ? generic_make_request+0xcf/0x320
Dec 18 07:44:38 localhost.localdomain kernel: ? blk_mq_try_issue_list_directly+0x51/0xc0
Dec 18 07:44:38 localhost.localdomain kernel: ? blk_mq_sched_insert_requests+0xa7/0xf0
Dec 18 07:44:38 localhost.localdomain kernel: ? blk_mq_flush_plug_list+0x214/0x2b0
Dec 18 07:44:38 localhost.localdomain kernel: ? blk_flush_plug_list+0xec/0x110
Dec 18 07:44:38 localhost.localdomain kernel: ? blk_finish_plug+0x21/0x2e
Dec 18 07:44:38 localhost.localdomain kernel: ? read_pages+0x80/0x1b0
Dec 18 07:44:38 localhost.localdomain kernel: ? 0xffffffff9c000000
Dec 18 07:44:38 localhost.localdomain kernel: ? __do_page_cache_readahead+0x168/0x1d0
Dec 18 07:44:38 localhost.localdomain kernel: ? force_page_cache_readahead+0x93/0x100
Dec 18 07:44:38 localhost.localdomain kernel: ? generic_file_read_iter+0x85d/0xe60
Dec 18 07:44:38 localhost.localdomain kernel: ? file_fdatawait_range+0x20/0x20
Dec 18 07:44:38 localhost.localdomain kernel: ? new_sync_read+0x12a/0x1c0
Dec 18 07:44:38 localhost.localdomain kernel: ? vfs_read+0x9d/0x150
Dec 18 07:44:38 localhost.localdomain kernel: ? ksys_read+0x5f/0xe0
Dec 18 07:44:38 localhost.localdomain kernel: ? do_syscall_64+0x5f/0x1a0
Dec 18 07:44:38 localhost.localdomain kernel: ? page_fault+0x8/0x30
Dec 18 07:44:38 localhost.localdomain kernel: ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 18 07:44:38 localhost.localdomain kernel: ---[ end trace 0fb21fbcffa4e526 ]---

Using ALRU with big flush-max-buffers param slows down management commands

Description

When using ALRU cleaning policy and setting big values for flush-max-buffers param completing even the simplest management commands takes very long time.

MicrosoftTeams-image

Reproduction steps

  1. Setup cache+core pair, with HDD as a core (the slower the core device, the worse its gets)
  2. Set WB mode for cache and max flush-max-buffers param
  3. Fill the cache with dirty data
  4. Wait for cleaning to start
  5. Issue any management command e.g. -P, -L multiple times
  6. Observe completion time for commands

Expected results

Commands complete instantaneously.

Observed results

Commands can take very long time to complete (depending on when in relation to cleaning iteration starts you issue them)

Discarding whole cache device failed

When starting cache following error shows up in dmesg:
[ 264.001894] cache1: Discarding whole cache device failed
and the cache device is not discarded (not even attempted to discard), which has some performance impact.

It looks that error is thrown in the first iteration of discard loop within ocf_submit_volume_discard() function. The reason is that ocf_volume_new_io() return NULL since the 'bytes' parameter is not sector size (512 bytes) aligned. This is due to setting max_length to (uint32_t)~0, so all the 32 bits are set and the number is not divisible by 512.

How to uninstall open cas ?

I was trying opencas on rhel 7.6. I would like to build an rpm with the list of cas related binaries, configuration files. Unfortunately , I was not able to find the list of files to put them all together.Below are the list of files I was able to find:
$) ll /sbin/cas*
-rwxr-xr-x 1 root root 192112 Sep 16 14:19 /sbin/casadm
lrwxrwxrwx 1 root root 19 Sep 16 14:19 /sbin/casctl -> /lib/opencas/casctl
$) ll /lib/opencas/
total 40
-rwxr-xr-x 1 root root 4651 Sep 16 14:19 casctl
-rwxr-xr-x 1 root root 1736 Sep 16 14:19 open-cas-loader
-rwxr-xr-x 1 root root 815 Sep 16 14:19 open-cas-mount-utility
-rw-r--r-- 1 root root 23408 Sep 16 14:19 opencas.py

Kernel oops on modprobe cas_cache

I'm getting the following kernel oops when loading cas_cache:

[ 76.666811] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[ 76.676546] PGD 0 P4D 0
[ 76.679676] Oops: 0002 [#1] PREEMPT SMP [ 76.706648] Workqueue: events _cas_rpool_pre_alloc_do [cas_cache]
[ 76.714251] RIP: 0010:_cas_rpool_pre_alloc_do+0x54/0xa0 [cas_cache]
[ 76.722043] Code: 48 c1 e3 04 49 03 5 GS:ffff88a6e4300000(0000) knlGS:0000000000000000
[ 76.806438] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 76.813597] CR2: 0000000000000000 CR3: 000000002a012002 CR4: 00000000003606e0
[ 76.822470] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 76.831355] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 76.840252] Call Trace:
[ 76.843316] process_one_work+0x1d5/0x316
[ 76.848329] ? create_worker+0x1dc/0x1dc
[ 76.853214] worker_thread+0x1cd/0x2bf
[ 76.857848] ? create_worker+0x1dc/0x1dc
[ 76.862698] kthread+0x14f/0x157

I'm using open-cas-linux on v4.19.80 kernel and configured the cache device to be an NVMe optane drive.

My open-cas-linux sha1 is 06b3e69

Kernel BUG after flushing cache

Cache configuration:
WB | LINESIZE16KB | ACP | METADATA_LAYOUT=SEQUENTIAL | MAKE_REQUEST_FN

dmesg:
[1750126.837810] cache1: Flushing cache
[1750126.890931] cache1: Wrong number of dirty blocks (390656!=390657)
[1750216.042064] cache1: Flushing cache completed
[1750216.046847] ------------[ cut here ]------------
[1750216.051805] kernel BUG at /root/open-cas-linux/modules/cas_cache/src/ocf/mngt/ocf_mngt_flush.c:605!
[1750216.061268] invalid opcode: 0000 [#1] SMP
[1750216.065620] Modules linked in: cas_cache(OE) cas_disk(OE) xfs iotrace(OE) brd null_blk xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink nls_utf8 isofs loop rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc dm_mirror dm_region_hash dm_log dm_mod btusb btrtl btbcm btintel intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel ath3k aesni_intel lrw gf128mul glue_helper bluetooth joydev ablk_helper cryptd snd_hda_codec_hdmi eeepc_wmi snd_hda_codec_realtek asus_wmi snd_hda_codec_generic sparse_keymap rfkill snd_hda_intel pcspkr snd_hda_codec snd_seq iTCO_wdt iTCO_vendor_support snd_hda_core snd_hwdep sg snd_seq_device snd_pcm snd_timer snd mei_me shpchp soundcore mei lpc_ich i2c_i801 ip_tables ext4 mbcache jbd2 sr_mod sd_mod cdrom crc_t10dif crct10dif_generic nouveau video i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci libahci drm e1000e crct10dif_pclmul crct10dif_common nvme crc32c_intel mxm_wmi libata ptp nvme_core serio_raw i2c_core pps_core wmi [last unloaded: cas_disk]
[1750216.185920] CPU: 11 PID: 14971 Comm: cas_io_cache1_- Kdump: loaded Tainted: G OE ------------ 3.10.0-862.el7.x86_64 #1
[1750216.198563] Hardware name: System manufacturer System Product Name/RAMPAGE IV EXTREME, BIOS 4701 11/18/2013
[1750216.208772] task: ffff9c23b6012f70 ti: ffff9c2345ee0000 task.ti: ffff9c2345ee0000
[1750216.216686] RIP: 0010:[] [] _ocf_mngt_flush_finish+0xc0/0xe0 [cas_cache]
[1750216.227071] RSP: 0018:ffff9c2345ee3dd0 EFLAGS: 00010202
[1750216.232785] RAX: 0000000000000001 RBX: ffffad9fcce60028 RCX: ffffad9ffaac8058
[1750216.240268] RDX: 0000000000000001 RSI: ffffad9fcce60028 RDI: ffffad9ffa99ebf0
[1750216.247747] RBP: ffff9c2345ee3df8 R08: ffffffffc0c4a330 R09: 0000000000000000
[1750216.255326] R10: ffff9c287fbd4400 R11: ffffdce70d114400 R12: ffffad9ffa99e000
[1750216.262892] R13: ffffad9fcce60000 R14: 0000000000000000 R15: 0000000000000000
[1750216.270376] FS: 0000000000000000(0000) GS:ffff9c28674c0000(0000) knlGS:0000000000000000
[1750216.278830] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1750216.284989] CR2: 00007fd7b21ea718 CR3: 0000000164672000 CR4: 00000000000607e0
[1750216.292475] Call Trace:
[1750216.295289] [] _ocf_pipeline_run_step+0x47/0xf0 [cas_cache]
[1750216.302959] [] ocf_io_handle+0x23/0x40 [cas_cache]
[1750216.309833] [] ocf_queue_run_single+0x45/0x50 [cas_cache]
[1750216.317325] [] ocf_queue_run+0x28/0x50 [cas_cache]
[1750216.324202] [] _cas_io_queue_thread+0xfa/0x150 [cas_cache]
[1750216.331777] [] ? wake_up_atomic_t+0x30/0x30
[1750216.337943] [] ? atomic_dev_deinit+0x40/0x40 [cas_cache]
[1750216.345342] [] kthread+0xd1/0xe0
[1750216.350622] [] ? insert_kthread_work+0x40/0x40
[1750216.357137] [] ret_from_fork_nospec_begin+0x21/0x21
[1750216.364095] [] ? insert_kthread_work+0x40/0x40
[1750216.370607] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 49 8b 86 f8 00 00 00 8b 40 04 85 c0 74 8d 0f 0b 4c 89 e7 e8 e8 fe ff ff 84 c0 0f 84 7b ff ff ff <0f> 0b 66 0f 1f 44 00 00 48 8b 73 28 48 8b 43 20 44 89 fa 4c 89
[1750216.390818] RIP [] _ocf_mngt_flush_finish+0xc0/0xe0 [cas_cache]
[1750216.398850] RSP

Kernel panic when enabling OCF_METADATA_HASH_DEBUG

When define OCF_METADATA_HASH_DEBUG is set to 1 in src/metadata/metadata_hash.c file kernel panic occurs during cache start:

[  496.322666] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[  496.330491] PGD 0 P4D 0
[  496.333029] Oops: 0000 [#1] SMP PTI
[  496.336522] CPU: 57 PID: 7506 Comm: casadm Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-147.el8.x86_64 #1
[  496.347642] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0009.092820190230 09/28/2019
[  496.358079] RIP: 0010:ocf_log_raw+0x2c/0x80 [cas_cache]
[  496.363300] Code: 44 00 00 55 48 89 e5 48 83 ec 50 48 89 4c 24 38 4c 89 44 24 40 4c 89 4c 24 48 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 <48> 8b 07 48 83 78 10 00 74 38 48 8d 4d 10 c7 04 24 18 00 00 00 48
[  496.382045] RSP: 0018:ffffbf8983b2bb78 EFLAGS: 00010246
[  496.387273] RAX: 0000000000000000 RBX: ffffbf8987695000 RCX: 0000000000000010
[  496.394405] RDX: ffffffffc0e8c209 RSI: 0000000000000006 RDI: 0000000000000010
[  496.401537] RBP: ffffbf8983b2bbc8 R08: ffffffffc0e88fb0 R09: 8000000000000063
[  496.408668] R10: 0000000000000000 R11: 0000000000200000 R12: 0000000000000010
[  496.415801] R13: ffffbf8983b2bcd8 R14: 00007ffd24874690 R15: ffff9a6b7691c068
[  496.422933] FS:  00007fb9af5b9740(0000) GS:ffff9a6b7f940000(0000) knlGS:0000000000000000
[  496.431019] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  496.436767] CR2: 0000000000000010 CR3: 000000083e58e004 CR4: 00000000007606e0
[  496.443897] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  496.451029] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  496.458154] PKRU: 55555554
[  496.460869] Call Trace:
[  496.463329]  ocf_metadata_hash_init+0x46/0x200 [cas_cache]
[  496.468819]  ocf_metadata_init+0x40/0x80 [cas_cache]
[  496.473786]  ocf_mngt_cache_start+0x2aa/0x760 [cas_cache]
[  496.479182]  ? cas_blk_identify_type_by_bdev+0xdc/0x170 [cas_cache]
[  496.485449]  cache_mngt_init_instance+0xc1/0x460 [cas_cache]
[  496.491109]  ? cache_mngt_prepare_cache_cfg+0x324/0x360 [cas_cache]
[  496.497372]  ? map_vm_area+0x32/0x40
[  496.500954]  cas_service_ioctl_ctrl+0x1695/0x18f0 [cas_cache]
[  496.506698]  ? select_task_rq_fair+0xa77/0xd00
[  496.511141]  ? selinux_file_ioctl+0x161/0x200
[  496.515504]  do_vfs_ioctl+0xa4/0x630
[  496.519081]  ksys_ioctl+0x60/0x90
[  496.522398]  __x64_sys_ioctl+0x16/0x20
[  496.526151]  do_syscall_64+0x5b/0x1b0
[  496.529817]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[  496.534869] RIP: 0033:0x7fb9aeeb8b2b
[  496.538449] Code: 0f 1e fa 48 8b 05 5d a3 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2d a3 2c 00 f7 d8 64 89 01 48
[  496.557195] RSP: 002b:00007ffd24874508 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  496.564761] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fb9aeeb8b2b
[  496.571893] RDX: 00007ffd24874690 RSI: ffffffffd040ba15 RDI: 0000000000000005
[  496.579026] RBP: 00007ffd24874580 R08: 0000000000000000 R09: 00007fb9aedc5700
[  496.586155] R10: 00007fb9aedc59d0 R11: 0000000000000246 R12: ffffffffd040ba15
[  496.593279] R13: 00007ffd24874690 R14: 0000000000000005 R15: 00007ffd24874510

Looks like the reason is that owner field (which contains logger instance) within ocf_cache_t struct is not yet set when ocf_metadata_hash_init() is called.

Kernel BUG - IO after mounting XFS - NULL pointer dereference

Stack trace:

[164203.133681] XFS (cas1-1): Mounting V5 Filesystem
[164203.141577] XFS (cas1-1): Ending clean mount
[164205.121591] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[164205.130820] IP: [<ffffffffc081d9d2>] remove_lru_list+0x62/0x380 [cas_cache]
[164205.138906] PGD 80000001035ab067 PUD 3f0139067 PMD 0 
[164205.144868] Oops: 0000 [#1] SMP 
[164205.148771] Modules linked in: cas_cache(OE) cas_disk(OE) nls_utf8 isofs xfs brd xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink rpcsec_gss_krb5 auth_rpcgss nfsv4 loop dns_resolver nfs lockd grace fscache sunrpc dm_mirror dm_region_hash dm_log dm_mod iTCO_wdt iTCO_vendor_support gpio_ich intel_powerclamp coretemp kvm_intel kvm irqbypass ipmi_si joydev pcspkr ipmi_devintf i2c_i801 lpc_ich ioatdma ipmi_msghandler i7core_edac sg acpi_cpufreq ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic crct10dif_common drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel ahci drm_panel_orientation_quirks igb libahci nvme ptp libata pps_core nvme_core dca i2c_algo_bit [last unloaded: cas_disk]
[164205.246769] CPU: 4 PID: 15396 Comm: cas_io_cache1_4 Kdump: loaded Tainted: G        W IOE  ------------   3.10.0-957.el7.x86_64 #1
[164205.272173] task: ffff8bb0112c6180 ti: ffff8bac045b0000 task.ti: ffff8bac045b0000
[164205.281047] RIP: 0010:[<ffffffffc081d9d2>]  [<ffffffffc081d9d2>] remove_lru_list+0x62/0x380 [cas_cache]
[164205.292083] RSP: 0018:ffff8bac045b3c98  EFLAGS: 00010246
[164205.298301] RAX: 0000000000000000 RBX: ffffa1f7c1d8d000 RCX: 0000000000000000
[164205.306732] RDX: 0000000000000000 RSI: ffffa1f7c2329dd8 RDI: ffff8bac045b3ca8
[164205.315231] RBP: ffff8bac045b3ce8 R08: ffffffffc0833ee0 R09: 00000000000000ff
[164205.323665] R10: 000000000000003f R11: 00000000000167ba R12: 00000000000167ba
[164205.332131] R13: 000000000002fc30 R14: ffffa1f7c1d8e848 R15: 0000000000000000
[164205.340562] FS:  0000000000000000(0000) GS:ffff8bb01f280000(0000) knlGS:0000000000000000
[164205.350092] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[164205.356794] CR2: 0000000000000018 CR3: 0000000219866000 CR4: 00000000000007e0
[164205.365252] Call Trace:
[164205.368280]  [<ffffffffc081e069>] evp_lru_rm_cline+0x89/0xb0 [cas_cache]
[164205.376092]  [<ffffffffc0820fb4>] __set_cache_line_invalid+0x134/0x160 [cas_cache]
[164205.385044]  [<ffffffffc0821089>] set_cache_line_invalid+0xa9/0xf0 [cas_cache]
[164205.393587]  [<ffffffffc082fdf0>] ocf_write_wi_update_and_flush_metadata+0xd0/0x190 [cas_cache]
[164205.403785]  [<ffffffffc083dd3c>] ocf_io_handle+0x3c/0x50 [cas_cache]
[164205.411280]  [<ffffffffc083dd95>] ocf_queue_run_single+0x45/0x50 [cas_cache]
[164205.419443]  [<ffffffffc083ddc8>] ocf_queue_run+0x28/0x50 [cas_cache]
[164205.426928]  [<ffffffffc08080fa>] _cas_io_queue_thread+0xfa/0x150 [cas_cache]
[164205.435399]  [<ffffffff8d6c2d00>] ? wake_up_atomic_t+0x30/0x30
[164205.442207]  [<ffffffffc0808000>] ? 0xffffffffc0807fff
[164205.448217]  [<ffffffff8d6c1c31>] kthread+0xd1/0xe0
[164205.453951]  [<ffffffff8d6c1b60>] ? insert_kthread_work+0x40/0x40
[164205.461089]  [<ffffffff8dd74c37>] ret_from_fork_nospec_begin+0x21/0x21
[164205.468669]  [<ffffffff8d6c1b60>] ? insert_kthread_work+0x40/0x40
[164205.475766] Code: 8d 04 76 4c 8d 74 c7 60 41 39 d5 0f 86 1e 03 00 00 48 8b 87 38 04 00 00 41 89 cf 48 8d 55 b8 44 89 e6 e8 a2 8d 16 cd 49 8b 46 08 <44> 3b 60 18 0f 94 c1 44 3b 60 10 40 0f 94 c6 44 3b 60 1c 0f 94 
[164205.498396] RIP  [<ffffffffc081d9d2>] remove_lru_list+0x62/0x380 [cas_cache]
[164205.506577]  RSP <ffff8bac045b3c98>
[164205.510749] CR2: 0000000000000018

Preceeded by multiple failures:
[164201.849619] mkfs.xfs: page allocation failure: order:10, mode:0x10c010
[164201.857240] CPU: 10 PID: 27884 Comm: mkfs.xfs Kdump: loaded Tainted: G        W IOE  ------------   3.10.0-957.el7.x86_64 #1
[164201.882061] Call Trace:
[164201.885117]  [<ffffffff8dd61dc1>] dump_stack+0x19/0x1b
[164201.891180]  [<ffffffff8d7bc830>] warn_alloc_failed+0x110/0x180
[164201.898116]  [<ffffffff8dd5d3ce>] __alloc_pages_slowpath+0x6b6/0x724
[164201.905541]  [<ffffffff8d7c0e95>] __alloc_pages_nodemask+0x405/0x420
[164201.912956]  [<ffffffff8d80dcf8>] alloc_pages_current+0x98/0x110
[164201.919976]  [<ffffffff8d7bb5de>] __get_free_pages+0xe/0x40
[164201.926528]  [<ffffffff8d81938e>] kmalloc_order_trace+0x2e/0xa0
[164201.933457]  [<ffffffffc0811e1e>] ? cas_rpool_try_get+0x5e/0x80 [cas_cache]
[164201.941564]  [<ffffffff8d81d3d1>] __kmalloc+0x211/0x230
[164201.947733]  [<ffffffffc081937e>] ocf_req_alloc_map+0x3e/0x60 [cas_cache]
[164201.955654]  [<ffffffffc08193f7>] ocf_req_alloc_map_discard+0x57/0x60 [cas_cache]
[164201.964553]  [<ffffffffc082c6bd>] ocf_core_volume_submit_discard+0xdd/0x270 [cas_cache]
[164201.974005]  [<ffffffffc082c1bc>] ? ocf_core_io_allocator_new+0x1c/0x20 [cas_cache]
[164201.983074]  [<ffffffffc082ba78>] ? ocf_io_new+0x78/0xf0 [cas_cache]
[164201.990506]  [<ffffffffc083cb53>] ocf_volume_submit_discard+0x33/0x80 [cas_cache]
[164201.999448]  [<ffffffffc083ca2a>] ? ocf_volume_new_io+0x1a/0x20 [cas_cache]
[164202.007558]  [<ffffffffc080defc>] _blockdev_make_request_fast+0x23c/0x370 [cas_cache]
[164202.016813]  [<ffffffff8d942eaf>] ? generic_make_request_checks+0x27f/0x390
[164202.024920]  [<ffffffffc0648ee3>] _casdsk_exp_obj_make_rq_fn+0xa3/0x1d0 [cas_disk]
[164202.033871]  [<ffffffff8d944c87>] generic_make_request+0x147/0x380
[164202.041097]  [<ffffffff8d944f30>] submit_bio+0x70/0x150
[164202.047327]  [<ffffffff8d87e2e5>] ? bio_alloc_bioset+0x115/0x310
[164202.054432]  [<ffffffff8d94d52a>] blkdev_issue_discard+0x1fa/0x2d0
[164202.061717]  [<ffffffff8d956ac5>] blk_ioctl_discard+0xe5/0x130
[164202.068584]  [<ffffffff8d957612>] blkdev_ioctl+0x632/0xa20
[164202.075033]  [<ffffffff8d881401>] ? __blkdev_put+0x141/0x1a0
[164202.081673]  [<ffffffff8d880761>] block_ioctl+0x41/0x50
[164202.087831]  [<ffffffff8d856210>] do_vfs_ioctl+0x3a0/0x5a0
[164202.094377]  [<ffffffff8d8640a4>] ? mntput+0x24/0x40
[164202.100321]  [<ffffffff8d843476>] ? __fput+0x186/0x260
[164202.106424]  [<ffffffff8d8564b1>] SyS_ioctl+0xa1/0xc0
[164202.112475]  [<ffffffff8dd74d15>] ? system_call_after_swapgs+0xa2/0x146
[164202.120169]  [<ffffffff8dd74ddb>] system_call_fastpath+0x22/0x27
[164202.127208]  [<ffffffff8dd74d21>] ? system_call_after_swapgs+0xae/0x146

Check if cache device is not too big

Cache line id is 32 bit long, so with small cache line size and big cache there may be more cache lines on the cache device than we can address.

Kernel bug after changing cache mode from PT to WB while IO is running in background

  1. Start cache in WB with 4 cores.
  2. In 2s time interval load io class configuration file and change to randomly selected cache mode.

After changing cache mode from PT to WB kernel bug appears:

[75237.190008] cache1: Changing cache mode from 'Pass Through' to 'Write Back' successful
[75237.832680] BUG: unable to handle kernel paging request at ffffa273a25d5168
[75237.840949] PGD 107d2d067 P4D 107d2d067 PUD 447c01067 PMD 42aed0067 PTE 0
[75237.848997] Oops: 0000 [#1] SMP PTI
[75237.853359] CPU: 35 PID: 31128 Comm: ocf_metadata_up Tainted: G O 4.19.56-t2.el7.twitter.x86_64 #1
[75237.865661] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
[75237.878011] RIP: 0010:ocf_engine_push_req_front+0xa4/0x110 [cas_cache]
[75237.885766] Code: 8b 4c 24 08 4c 89 61 08 48 89 8d 00 01 00 00 4c 89 b5 08 01 00 00 4c 89 63 10 f0 ff 43 08 48 8b 34 24 4c 89 ff e8 7c a6 0b eb <80> bd 20 01 00 00 00 78 17 48 8b 3d ec ec a8 eb e8 d7 ef 9d ea 48
[75237.907987] RSP: 0018:ffffa273845a7e20 EFLAGS: 00010282
[75237.914281] RAX: 0000000000000001 RBX: ffff8b8e4b809a20 RCX: ffff8b8e4b809a30
[75237.922712] RDX: ffff8b8e4b809a30 RSI: 0000000000000282 RDI: ffff8b8e4b809a40
[75237.931145] RBP: ffffa273a25d5048 R08: ffff8b8e4b809a30 R09: ffff8b8b85850098
[75237.939580] R10: ffffa273845a7ca8 R11: 0000000000000000 R12: ffffa273a25d5148
[75237.948013] R13: 0000000000000001 R14: ffff8b8e4b809a30 R15: ffff8b8e4b809a40
[75237.956465] FS: 0000000000000000(0000) GS:ffff8b8e739c0000(0000) knlGS:0000000000000000
[75237.966346] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[75237.973223] CR2: ffffa273a25d5168 CR3: 000000015c20a003 CR4: 00000000001606e0
[75237.981662] Call Trace:
[75237.984854] ? ocf_metadata_updater_run+0x163/0x1b0 [cas_cache]
[75237.991929] ? _cas_cleaner_complete+0x20/0x20 [cas_cache]
[75237.998520] ? _cas_metadata_updater_thread+0x51/0xd0 [cas_cache]
[75238.005796] ? finish_wait+0x80/0x80
[75238.010246] ? kthread+0x113/0x130
[75238.014498] ? kthread_create_worker_on_cpu+0x50/0x50
[75238.020601] ? ret_from_fork+0x35/0x40
[75238.025240] Modules linked in: cas_cache(O) cas_disk(O) nls_utf8 xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables bpfilter devlink rpcsec_gss_krb5 auth_rpcgss oid_registry nfsv4 nfs lockd grace fscache sunrpc intel_rapl iosf_mbi iTCO_wdt iTCO_vendor_support sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel kvm_intel ioatdma ipmi_si ipmi_devintf ipmi_msghandler i2c_i801 lpc_ich wmi pcc_cpufreq isci igb libsas ptp crc32c_intel scsi_transport_sas pps_core i2c_algo_bit i2c_core dca nvme nvme_core [last unloaded: cas_disk]
[75238.101405] CR2: ffffa273a25d5168
[75238.105565] ---[ end trace 05612fd206589865 ]---
[75238.413095] RIP: 0010:ocf_engine_push_req_front+0xa4/0x110 [cas_cache]
[75238.420864] Code: 8b 4c 24 08 4c 89 61 08 48 89 8d 00 01 00 00 4c 89 b5 08 01 00 00 4c 89 63 10 f0 ff 43 08 48 8b 34 24 4c 89 ff e8 7c a6 0b eb <80> bd 20 01 00 00 00 78 17 48 8b 3d ec ec a8 eb e8 d7 ef 9d ea 48
[75238.443090] RSP: 0018:ffffa273845a7e20 EFLAGS: 00010282
[75238.449389] RAX: 0000000000000001 RBX: ffff8b8e4b809a20 RCX: ffff8b8e4b809a30
[75238.457831] RDX: ffff8b8e4b809a30 RSI: 0000000000000282 RDI: ffff8b8e4b809a40
[75238.466270] RBP: ffffa273a25d5048 R08: ffff8b8e4b809a30 R09: ffff8b8b85850098
[75238.474709] R10: ffffa273845a7ca8 R11: 0000000000000000 R12: ffffa273a25d5148
[75238.483154] R13: 0000000000000001 R14: ffff8b8e4b809a30 R15: ffff8b8e4b809a40
[75238.491589] FS: 0000000000000000(0000) GS:ffff8b8e739c0000(0000) knlGS:0000000000000000
[75238.501481] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[75238.508363] CR2: ffffa273a25d5168 CR3: 000000015c20a003 CR4: 00000000001606e0

Kernel oops while unloading kernel module after unsuccessful cache start

Steps to reproduce:

  1. modprobe cas_cache metadata_layout=1 unaligned_io=0 use_io_scheduler=0

  2. casadm --start-cache --cache-device <dev_id> --cache-mode wb --cache-line-size 64
    (Starting cache failed due to existing previous cache's metadata on cache device)
    Dmesg after unsuccessful cache start:
    [ 248.269482] cache1: Attaching cache device failed
    [ 248.274921] Thread cas_io_cache1_25 stopped
    [ 248.274932] Thread cas_io_cache1_19 stopped
    [ 248.274938] Thread cas_io_cache1_16 stopped
    [ 248.274950] Thread cas_io_cache1_9 stopped
    [ 248.274956] Thread cas_io_cache1_14 stopped
    [ 248.274958] Thread cas_io_cache1_30 stopped
    [ 248.274961] Thread cas_io_cache1_15 stopped
    [ 248.274963] Thread cas_io_cache1_31 stopped
    [ 248.274965] Thread cas_io_cache1_0 stopped
    [ 248.274965] Thread cas_io_cache1_3 stopped
    [ 248.274979] Thread cas_io_cache1_24 stopped
    [ 248.274981] Thread cas_io_cache1_8 stopped
    [ 248.274995] Thread cas_io_cache1_26 stopped
    [ 248.274996] Thread cas_io_cache1_10 stopped
    [ 248.274997] Thread cas_io_cache1_11 stopped
    [ 248.274998] Thread cas_io_cache1_27 stopped
    [ 248.274999] Thread cas_io_cache1_12 stopped
    [ 248.275000] Thread cas_io_cache1_28 stopped
    [ 248.275001] Thread cas_io_cache1_29 stopped
    [ 248.275002] Thread cas_io_cache1_13 stopped
    [ 248.275003] cache1: Cache cache1 successfully stopped
    [ 248.275008] [Open-CAS] [Classifier] Deinitialized IO classifier
    [ 248.275010] Thread cas_io_cache1_1 stopped
    [ 248.275010] Thread cas_io_cache1_17 stopped
    [ 248.275012] Thread cas_io_cache1_18 stopped
    [ 248.275013] Thread cas_io_cache1_2 stopped
    [ 248.275014] Thread cas_io_cache1_4 stopped
    [ 248.275015] Thread cas_io_cache1_20 stopped
    [ 248.275016] Thread cas_io_cache1_21 stopped
    [ 248.275017] Thread cas_io_cache1_5 stopped
    [ 248.275018] Thread cas_io_cache1_22 stopped
    [ 248.275019] Thread cas_io_cache1_6 stopped
    [ 248.275020] Thread cas_io_cache1_23 stopped
    [ 248.275021] Thread cas_io_cache1_7 stopped
    [ 248.386529] ata2.00: Enabling discard_zeroes_data

  3. modprobe -r cas_cache

Kernel oops detested on serial console:
[ 277.459743] BUG: unable to handle kernel paging request at ffffffffc05ba0c2
[ 277.467699] PGD 64220c067 P4D 64220c067 PUD 64220e067 PMD 827fac067 PTE 0
[ 277.475441] Oops: 0010 [#1] SMP PTI
[ 277.479486] CPU: 3 PID: 23121 Comm: cas_io_cache1_- Tainted: G O 4.19.56-t2.el7.twitter.x86_64 #1
[ 277.491109] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
[ 277.503885] RIP: 0010:0xffffffffc05ba0c2
[ 277.508415] Code: Bad RIP value.
[ 277.512165] RSP: 0018:ffffbdc284c7fec0 EFLAGS: 00010202
[ 277.518152] RAX: 0000000000000001 RBX: ffff9b19d2aa6120 RCX: 0000000000000000
[ 277.526283] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff9b19d2aa6120
[ 277.534415] RBP: 0000000000000000 R08: ffff9b15ef8e2038 R09: 00000000000002a3
[ 277.542546] R10: ffffbdc28397bda8 R11: fefefefefefefeff R12: ffff9b12c599f8c0
[ 277.550688] R13: ffff9b12c599f910 R14: ffff9b12c599f8c8 R15: ffffffffc05ba000
[ 277.558823] FS: 0000000000000000(0000) GS:ffff9b15ef8c0000(0000) knlGS:0000000000000000
[ 277.568094] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 277.574665] CR2: ffffffffc05ba098 CR3: 000000064220a005 CR4: 00000000000606e0
[ 277.582786] Call Trace:
[ 277.585677] ? finish_wait+0x80/0x80
[ 277.589820] ? kthread+0x113/0x130
[ 277.593779] ? kthread_create_worker_on_cpu+0x50/0x50
[ 277.599577] ? ret_from_fork+0x35/0x40
[ 277.603912] Modules linked in: xfs nls_utf8 xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables devlink bpfilter rpcsec_gss_krb5 auth_rpcgss oid_registry nfsv4 nfs lockd grace fscache sunrpc intel_rapl iosf_mbi sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel kvm_intel iTCO_wdt iTCO_vendor_support mpt3sas ipmi_si ipmi_devintf ipmi_msghandler ioatdma raid_class i2c_i801 lpc_ich wmi pcc_cpufreq igb isci ptp crc32c_intel libsas pps_core scsi_transport_sas i2c_algo_bit nvme i2c_core nvme_core dca [last unloaded: cas_disk]
[ 277.676855] CR2: ffffffffc05ba0c2
[ 277.680705] ---[ end trace 843c9a3cb25b1d23 ]---
[ 277.710148] RIP: 0010:0xffffffffc05ba0c2
[ 277.714701] Code: Bad RIP value.
[ 277.718472] RSP: 0018:ffffbdc284c7fec0 EFLAGS: 00010202
[ 277.724466] RAX: 0000000000000001 RBX: ffff9b19d2aa6120 RCX: 0000000000000000
[ 277.732602] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff9b19d2aa6120
[ 277.740732] RBP: 0000000000000000 R08: ffff9b15ef8e2038 R09: 00000000000002a3
[ 277.748860] R10: ffffbdc28397bda8 R11: fefefefefefefeff R12: ffff9b12c599f8c0
[ 277.756987] R13: ffff9b12c599f910 R14: ffff9b12c599f8c8 R15: ffffffffc05ba000
[ 277.765113] FS: 0000000000000000(0000) GS:ffff9b15ef8c0000(0000) knlGS:0000000000000000
[ 277.774387] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 277.780960] CR2: ffffffffc05ba098 CR3: 000000064220a005 CR4: 00000000000606e0
[ 278.886770] [CAS Disk] CAS Disk Version 20.01.00.00000220 (4.19.56-t2.el7.twitter.x86_64)::Module loaded successfully
[ 279.196346] 'OCF Core' volume operations registered
[ 279.245800] 'Block_Device' volume operations registered
[ 279.251820] [Open-CAS] Not found configuration for upgrade. Standard module initialization.
[ 279.261495] Open Cache Acceleration Software Linux Version 20.01.00.00000220_

CAS thread names are uninformative

Since thread names are trimmed to 16 characters, the result is missing the vital part. For example typically management thread name is "cas_io_cache1_-".

Core active after unplugging.

After unplugging core from cache, core status is still active.

Steps to reproduce:

  • start cache and add core:
    casadm -S -d /dev/nvme1n1p1 && casadm -A -i 1 -d /dev/sdc
  • unplug core device:
    echo 1 > /sys/block/sdc/device/delete
  • check cache stats:
    casadm -P -i 1

Expected:

Inactive Core Devices: 1

Actual status:

Inactive Core Devices: 0

casctl init error

When using casctl init to initialize the cache, there will be a kernel error.

The error code is as follows: kernel: NMI watchdog: bug: soft lock - CPU#0 stuck for 22s

CAS version: 19.9
System version: CentOS 7.7
Kernel version: 3.10.0-1062.9.1.el7.x86_64
Nvme SSD: Samsung 963 1.92t
Data disk: Seagate 900g * 24

How to solve this problem?

Python3 should be required for installation.

Some of the scripts which are used by open cas linux require python3 and python3 is not available everywhere by default. This should be required in order to install open cas linux.

This would fix things such as #119

No error message when trying to display statistics for non-configured IO class

Steps:

  1. Start cache
    casadm --start-cache --cache-device <cache_dev> --cache-id 1
  2. Add core
    casadm --add-core --cache-id 1 --core-device <core_dev>
  3. List configured IO classes (by default only IO class 0 is configured):
    casadm --io-class --list --cache-id 1
  4. Try listing statistics for non-configured IO class:
    casadm --stats --cache-id 1 --io-class-id

exit_code is 1 but no message is displayed

CAS Cache and CAS Disk Kernel Modules not loaded after reboot.

Tested on clean CAS installation.
Kernel: 3.10.0-957.el7.x86_64
OS: CENTOS76

Outputs after reboot:

casadm -L output:

Device /dev/cas_ctrl not found
Is the kernel module loaded?
Error getting caches list

casadm -V output:

Name Version
CAS Cache Kernel Module Not Loaded
CAS Disk Kernel Module Not Loaded
CAS CLI Utility 19.09.00.00000924

Bad regex in casctl

pattern used: /dev/cas(\d)-(\d)
includes only caches and cores with one-digit ids

Kernel WARNING when modprobing cas_cache module

When modprobing cas_cache module on Fedora 31. Please find kernel log bellow.
commit: 9c7b485

Dec 18 07:33:21 localhost.localdomain kernel: [CAS Disk] CAS Disk Version (5.2.16-200.fc30.x86_64)::Module loaded successfully
Dec 18 07:33:21 localhost.localdomain kernel: 'OCF Core' volume operations registered
Dec 18 07:33:21 localhost.localdomain kernel: ------------[ cut here ]------------
Dec 18 07:33:21 localhost.localdomain kernel: WARNING: CPU: 7 PID: 2582 at mm/slab_common.c:93 kmem_cache_create_usercopy.cold+0x46/0xae
Dec 18 07:33:21 localhost.localdomain kernel: Modules linked in: cas_cache(OE+) cas_disk(OE) xt_CHECKSUM xt_MASQUERADE tun bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables sunrpc snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm sb_edac snd_timer snd crct10dif_pclmul crc32_pclmul soundcore ghash_clmulni_intel joydev i2c_piix4 vboxguest intel_rapl_perf vmwgfx drm_kms_helper crc32c_intel ttm drm e1000 serio_raw ata_generic pata_acpi video [last unloaded: cas_disk]
Dec 18 07:33:21 localhost.localdomain kernel: CPU: 7 PID: 2582 Comm: modprobe Tainted: G W OE 5.2.16-200.fc30.x86_64 #1
Dec 18 07:33:21 localhost.localdomain kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Dec 18 07:33:21 localhost.localdomain kernel: RIP: 0010:kmem_cache_create_usercopy.cold+0x46/0xae
Dec 18 07:33:21 localhost.localdomain kernel: Code: ed ff e8 f8 b3 73 00 e9 bd f6 ff ff 44 89 f2 4c 89 e6 48 c7 c7 10 37 14 9d e8 a9 6a e7 ff 48 c7 c7 70 c0 10 9d e8 b3 d9 ed ff <0f> 0b e9 1c f6 ff ff 4c 89 e6 48 c7 c7 e0 36 14 9d 41 be ea ff ff
Dec 18 07:33:21 localhost.localdomain kernel: RSP: 0018:ffffb3654293fb50 EFLAGS: 00010246
Dec 18 07:33:21 localhost.localdomain kernel: RAX: 0000000000000024 RBX: 0000000000000000 RCX: 0000000000000000
Dec 18 07:33:21 localhost.localdomain kernel: RDX: 0000000000000000 RSI: ffff8fc40f997908 RDI: ffff8fc40f997908
Dec 18 07:33:21 localhost.localdomain kernel: RBP: 00000000000000b8 R08: ffff8fc40f997908 R09: 00000000000117dd
Dec 18 07:33:21 localhost.localdomain kernel: R10: ffffb3654293fa08 R11: ffffb3654293fa0d R12: ffff8fc4004b8280
Dec 18 07:33:21 localhost.localdomain kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Dec 18 07:33:21 localhost.localdomain kernel: FS: 00007fc48e103740(0000) GS:ffff8fc40f980000(0000) knlGS:0000000000000000
Dec 18 07:33:21 localhost.localdomain kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 18 07:33:21 localhost.localdomain kernel: CR2: 00007fe22bb206d0 CR3: 00000003ee654006 CR4: 00000000000606e0
Dec 18 07:33:21 localhost.localdomain kernel: Call Trace:
Dec 18 07:33:21 localhost.localdomain kernel: kmem_cache_create+0x12/0x20
Dec 18 07:33:21 localhost.localdomain kernel: env_allocator_create+0x79/0xd0 [cas_cache]
Dec 18 07:33:21 localhost.localdomain kernel: ? ocf_io_allocator_default_init+0x14/0x30 [cas_cache]
Dec 18 07:33:21 localhost.localdomain kernel: ? ocf_volume_type_init+0x9e/0xf0 [cas_cache]
Dec 18 07:33:21 localhost.localdomain kernel: ? _cas_ctx_data_munlock+0x10/0x10 [cas_cache]
Dec 18 07:33:21 localhost.localdomain kernel: ? ocf_ctx_register_volume_type_extended+0x11a/0x1a0 [cas_cache]
Dec 18 07:33:21 localhost.localdomain kernel: ? 0xffffffffc068f000
Dec 18 07:33:21 localhost.localdomain kernel: ? block_dev_init+0x1d/0x30 [cas_cache]
Dec 18 07:33:21 localhost.localdomain kernel: ? cas_initialize_context+0x91/0xc0 [cas_cache]
Dec 18 07:33:21 localhost.localdomain kernel: ? cas_init_module+0xf7/0x1b0 [cas_cache]
Dec 18 07:33:21 localhost.localdomain kernel: ? do_one_initcall+0x46/0x1f4
Dec 18 07:33:21 localhost.localdomain kernel: ? _cond_resched+0x15/0x30
Dec 18 07:33:21 localhost.localdomain kernel: ? kmem_cache_alloc_trace+0x154/0x1c0
Dec 18 07:33:21 localhost.localdomain kernel: ? do_init_module+0x23/0x230
Dec 18 07:33:21 localhost.localdomain kernel: ? do_init_module+0x5c/0x230
Dec 18 07:33:21 localhost.localdomain kernel: ? load_module+0x2775/0x2950
Dec 18 07:33:21 localhost.localdomain kernel: ? __do_sys_finit_module+0xaa/0x110
Dec 18 07:33:21 localhost.localdomain kernel: ? __do_sys_finit_module+0xaa/0x110
Dec 18 07:33:21 localhost.localdomain kernel: ? do_syscall_64+0x5f/0x1a0
Dec 18 07:33:21 localhost.localdomain kernel: ? page_fault+0x8/0x30
Dec 18 07:33:21 localhost.localdomain kernel: ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 18 07:33:21 localhost.localdomain kernel: ---[ end trace 0fb21fbcffa4e456 ]---
Dec 18 07:33:21 localhost.localdomain kernel: 'Block Device' volume operations registered
Dec 18 07:33:21 localhost.localdomain kernel: [Open-CAS] Not found configuration for upgrade. Standard module initialization.
Dec 18 07:33:21 localhost.localdomain kernel: Open Cache Acceleration Software Linux Version (5.2.16-200.fc30.x86_64)::Module loaded successfully

OpenCAS causes system crash and database bad block

Issue 1

Description

  • The compute node Fio pressures the storage node to output the Crash through the NVMe over Fabrics protocol.

Environment

  • OS Version: RedHat-7.5(3.10.0-862.el7.x86_64)
  • CAS Version: OpenCAS-19.6
  • CAS Configuration: WB/acp/cache_line_size=8/seq-cutoff=never

Fio Configuration

[global]
ioengine=libaio
iodepth=16
size=200G
direct=1
runtime=300
time_based
filename=/dev/cas2-1
filename=/dev/cas1-1
filename=/dev/cas3-1
filename=/dev/cas4-1

numjobs=4
group_reporting=1

#[seq-read]
#rw=read
#bs=1M

#[rand-read]
#rw=randread
#bs=8k

[seq-write]
rw=write
bs=1M

#[rand-write]
#rw=randwrite
#bs=8k

Screenshot

image


Issue 2

Description

  • The storage node outputs the CAS disk to the compute node through the NVMe over Fabrics protocol. The CAS disk is used as the data disk of the Oracle database, and the database is restarted to cause bad blocks in the database.

Configuration

  • same as Issue 1

Screenshot

image

Scheduling while atomic in ocf_async_unlock

open-cas-linux commit: 3812015

The following warning occurs after running FIO in WB mode:

[ 969.561114] BUG: scheduling while atomic: swapper/7/0/0x10010000
[ 969.562015] Modules linked in: cas_cache(OE) ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set
nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6tabl
e_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security i
ptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter xfs libcrc32c cas_disk(OE) ppdev sg pcspkr joydev parport
_pc parport i2c_piix4 ip_tables btrfs xor raid6_pq sd_mod crc_t10dif crct10dif_generic sr_mod cdrom crct10dif_common ata_generic virti
o_blk pata_acpi cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix libata e1000 virtio_pci virtio_ri
ng floppy i2c_core virtio serio_raw [last unloaded: cas_cache]
[ 969.562015] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G OE ------------ 3.10.0-693.el7.x86_64 #1
[ 969.562015] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.or
g 04/01/2014
[ 969.562015] ffff88017a11c000 b91d71f1fff90dd8 ffff880424fc3ac0 ffffffff816a3d91
[ 969.562015] ffff880424fc3ad0 ffffffff8169e202 ffff880424fc3b30 ffffffff816a93f7
[ 969.562015] ffff88017a120000 ffff88017a11ffd8 ffff88017a11ffd8 ffff88017a11ffd8
[ 969.562015] Call Trace:
[ 969.562015] [] dump_stack+0x19/0x1b
[ 969.562015] [] __schedule_bug+0x64/0x72
[ 969.562015] [] __schedule+0x807/0x8b0
[ 969.562015] [] __cond_resched+0x26/0x30
[ 969.562015] [] _cond_resched+0x3a/0x50
[ 969.562015] [] mutex_lock+0x12/0x2f
[ 969.562015] [] ocf_async_unlock+0x3c/0xa0 [cas_cache]
[ 969.562015] [] ocf_mngt_cache_unlock+0x1e/0x30 [cas_cache]
[ 969.562015] [] ocf_cleaner_run_complete+0x23/0x40 [cas_cache]
[ 969.562015] [] alru_clean_complete+0x4b/0x50 [cas_cache]
[ 969.562015] [] _ocf_cleaner_complete_req+0x4f/0x60 [cas_cache]
[ 969.562015] [] _ocf_cleaner_finish_req+0x23/0x30 [cas_cache]
[ 969.562015] [] _ocf_cleaner_flush_cache_io_end+0x39/0x50 [cas_cache]
[ 969.562015] [] cas_bd_io_end+0x40/0x50 [cas_cache]
[ 969.562015] [] cas_bd_io_end_callback+0x6e/0xe0 [cas_cache]
[ 969.562015] [] bio_endio+0x64/0xa0
[ 969.562015] [] blk_update_request+0x90/0x360
[ 969.562015] [] blk_mq_end_request+0x1a/0x70
[ 969.562015] [] blk_flush_complete_seq+0xd6/0x2c0
[ 969.562015] [] flush_end_io+0x160/0x250
[ 969.562015] [] blk_mq_end_request+0x3a/0x70
[ 969.562015] [] virtblk_request_done+0x3f/0x70 [virtio_blk]
[ 969.562015] [] __blk_mq_complete_request+0x7d/0x100
[ 969.562015] [] blk_mq_complete_request+0x27/0x30
[ 969.562015] [] virtblk_done+0x7c/0x110 [virtio_blk]
[ 969.562015] [] vring_interrupt+0x38/0x90 [virtio_ring]
[ 969.562015] [] __handle_irq_event_percpu+0x3e/0x1c0
[ 969.562015] [] handle_irq_event_percpu+0x32/0x80
[ 969.562015] [] handle_irq_event+0x3c/0x60
[ 969.562015] [] handle_edge_irq+0x77/0x130
[ 969.562015] [] handle_irq+0xb8/0x150
[ 969.562015] [] ? tick_check_idle+0x8c/0xd0
[ 969.562015] [] ? atomic_notifier_call_chain+0x1a/0x20
[ 969.562015] [] do_IRQ+0x4d/0xe0
[ 969.562015] [] common_interrupt+0x6d/0x6d
[ 969.562015] [] ? native_safe_halt+0x6/0x10
[ 969.562015] [] default_idle+0x1e/0xc0
[ 969.562015] [] arch_cpu_idle+0x26/0x30
[ 969.562015] [] cpu_startup_entry+0x14a/0x1c0
[ 969.562015] [] start_secondary+0x1b6/0x230

Support for clang compiler

Are there any plans to support building with clang instead of gcc?

This would be great for OSes that build packages from source with clang by default.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.