Code Monkey home page Code Monkey logo

vdsm's Introduction

Virtual Desktop Server Manager

CI Status Copr build status

Welcome to the Vdsm source repository.

The Vdsm service exposes an API for managing virtualization hosts running the KVM hypervisor technology. Vdsm manages and monitors the host's storage, memory and networks as well as virtual machine creation, other host administration tasks, statistics gathering, and log collection.

How to contribute

Contibuting

To contribute please read the development documentation.

Submitting patches

Please use GitHub pull requests.

Found a bug or documentation issue?

To submit a bug or suggest an enhancement for Vdsm please use GitHub issues.

If you find a documentation issue on the oVirt website please navigate and click "Report an issue on GitHub" in the page footer.

Code review history

Vdsm moved to GitHub on Jan 9, 2022. To look up code reviews before this date, please check the Gerrit vdsm project.

Manual installation

Add ovirt repositories to your repositories list.

For CentOS Stream 8 use:

sudo dnf copr enable -y ovirt/ovirt-master-snapshot centos-stream-8
sudo dnf install -y ovirt-release-master

For more info see copr master-snapshot repositories.

Install Vdsm:

sudo dnf install vdsm vdsm-client

Configure Vdsm:

sudo vdsm-tool configure --force

--force flag will override old conf files with vdsm defaults and restart services that were configured (if were already running).

Enable and start Vdsm service:

sudo systemctl enable --now vdsmd

To inspect Vdsm service status:

sudo systemctl status vdsmd

Vdsm logs can be found at /var/log/vdsm/*.log (refer to README.logging for further information).

Getting Help

There are two mailing lists for discussions:

The developers also hang out on IRC at #vdsm hosted on freenode.net

The latest upstream code can be obtained from GIT:

git clone https://github.com/oVirt/vdsm

Licensing

Vdsm is provided under the terms of the GNU General Public License, version 2 or later. Please see the COPYING file for complete GPLv2+ license terms.

In addition, as a special exception, Red Hat, Inc. and its affiliates give you permission to distribute this program, or a work based on it, linked or combined with the OpenSSL project's OpenSSL library (or a modified version of that library) to the extent that the library, or modified version, is covered by the terms of the OpenSSL or SSLeay licenses. Corresponding source code for the object code form of such a combination shall include source code for the parts of OpenSSL contained in the combination.

If you modify this program, you may extend this exception to your version, but you are not obligated to do so. If you do not wish to do so, delete this exception statement from your version.

vdsm's People

Contributors

aabawer avatar aesteve-rh avatar aglitke avatar alhino avatar almusil avatar barpavel avatar bell-lev avatar bronhaim avatar celebdor avatar dankenigsberg avatar dougsland avatar eddev avatar edwardbadboy avatar ffromani avatar ficoos avatar ibarkan-redhat avatar igoihman avatar mpolednik avatar mz-pdm avatar nirs avatar nyoxi avatar phoracek avatar pkliczewski avatar rollandf avatar shenitzky avatar simon3z avatar tinez avatar vinzenz avatar vjuranek avatar ykaplan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vdsm's Issues

Reduce unnecessary udev reloading

We call udevadm reloading each time an individual udev rule is added or removed. This can cause unnecessary delays in VMs with many devices that require adding/removing udev rules. It would be nice if it was possible to reload all the udev rules just once when preparing or discarding VM devices, after all the related rules are modified.

See https://bugzilla.redhat.com/2055905 for an example of a problem caused by the current udev handling.

vdsm - libvirt integration in el9

vdsm doesn't work on el9stream due to several changes

  • libvirt has been split into several independent daemons and likely need furter configuration (virtqemud, virtsecretd, maybe others?)
  • vdsm configures sasl with mech_list: scram-sha-1. That is no longer supported and should use scram-sha-256
  • same problem exists for VNC in /etc/sasl2/qemu.conf

Original bug: https://bugzilla.redhat.com/2032980

[feature] Add support infiniband nuc and RDMA subsystem

Lets try ....

1; Add IB to Setup Host interface for assign static IP
2, Add IB network to claster for "Migration network" type select
3. Add support RDMA for:

  • Migrations (simple use rdma://)
  • nvme over fabric , nvme target, nvme-fc (simple modprobe and few cmds). Add blk dev to nameT and all host have access with latancy smaller than sas!
  • NFS over DRMA (simple flag/mount param)
  • iSer (iSCSI Over rdma) (simple modprobe)
  1. Add option for emulate RDMA using rdma_rxe (software rdma. Simple add bind to any eth. thats all)
  2. Add support for VMA (accelerate tcp/udp connect over rdma layer) and SPDK (many devices, drivers, agoritms and more) accelerators
  3. Add qemu vhost rdma/ring for fast over VM internal fabrics support. Also OVN support RDMA backplain switch (vxlan and more)
  4. Add support for RDMA Paravirtulisation device (assign RDMA dev to host/ring and every VM can emulate rdma nic)
  5. Look to rdma multicast for locks, events, quorum and etc tasks
  6. Loks to PCI assign device to any VM in ANY host!

few days work for half of this list and will get great profit!

P.S.

VDSM while RR jobs/cheksthis is most baddly (OSCAR prize!) architectute that i see.... Seriosly... sync tasks, big latancy between queue/tasks, error paths from begin in 80% times, reset hosts in 80% times bug in code or minor fails.... At most cases if vdsm not found hosted-engine its remount storages, restart services, restart hosts even other VM is UP. Any trys to migrate/pause/save/snapshot VMs? Any trys to detect errors from events, logs? Look to log! 90% work in vdsm is overhead and big latancy queue!

Change VDSSM ARCH to

  • Ansamble small actions on events (async microservices RT) + conditions semaphore
  • very small global state mashinwe (key-pair clustered db/memcached) + small base state actions (init, reset, set) (long time save async read db for all members)
  • Events collectos + evemt bus subsriuptions for actions (async bus RT)=
  • Long save services. Logs and system state persistent db for could start
    = All itens generate evemts to bus. All can subscribe for nedded events. Om any time service can check db for states (cold cache for members restarted or events RT state miss)/ But items must calculate states from event bus (hot cache RT). States used for aync barriers/mutesx/maintance/off/on logic. Simple, Fast, Reactive, COrrectForLargeUpScale/ You need only be cafuul for events storms

Please think about latancy, bottleneck, perfomance and TIME!

Thanks for watching and hacking!
Beast regalst, best im, best you, with love and peace!

Minimize wait after receiving block threshold event

In #82 we improved the defaults to avoid pauses during volume extension, but
we have room for more improvements.

The extend flow contains several steps:

  1. libvirt event thread get a block threshold event, and mark the drive for
    extension
  2. The periodic watermark monitor checks vms every 2 seconds. When it find
    that a drive needs extension, it sends a request to the spm by writing
    to the storage mailbox
  3. The SPM check storage every 2 seconds. When it find the request, it run
    the extend using the spm mailbox thread pool
  4. The host is polling its mailbox for replies every 2 seconds. When it
    detects the reply, it complete the extend on the host side and resume the vm if needed.

When we look at logs, we see:

2022-02-22 10:35:39,269+0200 INFO  (mailbox-hsm/3) [virt.vm] 
(vmId='d0730833-98aa-4138-94d1-8497763807c7') Extend volume 
61ea7614-c93e-48ca-a00f-b53cf96f94e9 completed 
<Clock(total=4.18, wait=1.93, extend-volume=2.09, refresh-volume=0.15)>
(thinp:567)

We waited 1.93 seconds from the time the event was received until we
handled it. A guest using fast storage can write 1.5 GiB during this wait
and pause with ENOSPC. This was 46% of total extend time.

The wait is caused by the monitor interval - 2 seconds. If the periodic
executor is blocked on slow libvirt calls, it can take much more time.

When the executor runs the periodic watermark monitor, it checks all the vms,
and it can get blocked on another unresponsive vm, even if we are lucky and
it runs quickly after we received the event.

What we want to do is to handle the block threshold event immediately, avoiding
the 0-2 seconds delay (or more in bad cases). But we don't want to do this on
the libvirt/events threads since handling block threshold access libvirt and
storage layer, and it may blcok for long time in bad cases, delaying other
libvirt events.

I think the best way to handle it is to dispatch a call to VM.monitor_volumes()
on the periodic executor when we receive an event.

Changes needed:

  • add periodic.dispatch() API, for running operations immediately outside of
    the normal monitoring schedule.

  • use periodic.dispatch() in VolumeMonitor.on_block_threshold() to schedule
    call to VM.monitor_volumes() soon.

  • Add locking in VolumeMontior.monitor_volumes() to ensure that we don't have
    multiple threads monitoring at the same time.

Once we have this, we can increase the monitoring interval, since is is needed
only as a backup if an extend request did not finish.

Consolidate is_destination and writable endpoint properties

is_destination and writable are semantically the same as both are true only for the destination

return CopyDataDivEndpoint(
params,
host_id,
writable,
is_destination=is_destination,
lock_image=lock_image)

writable is only set to true for the destination:

self._dest = _create_endpoint(
destination, host_id, writable=True, job_id=job_id,

Limit the number of pivot retries

In bug 1857347 we tried to fix a case when libvirt block commit job failed
with unrecoverable error. Unfortunately the fix was not correct, making the
situation even worse, bug 1945675. The fix was reverted and now vdsm is
retrying pivot after unexpected errors.

Retrying proved very useful to mitigate temporary errors, for example
bug 1945635, when libvirt block job is flipping states between "ready" and
"standby". Testing show that in all cases the pivot was successful in the
second retry.

However if the libvirt error is not temporary, retrying will not help and
the operation will never complete. In this case vdsm need to abort the
current libvirt block job and fail the merge operation.

We don't have a way to detect unrecoverable error in libvirt, since the error
is typically caused by a bug in qemu or libvirt so libvirt reports internal
error for all unexpected cases.

The only way to tell if the error is recoverable is to retry the operation,
and fail after several retries.

I think the best way to fix this is:

  • Keep the cleanup method in the job (e.g. "pivot", "abort")
  • Keep the number of pivot attempts in the job (like extend attempts)
  • When pivot fails, increase the pivot attempt counter.
  • When starting cleanup, if pivot attempt counter exceed the maximum value,
    change the job cleanup method to "abort". From this point, the job
    will try to abort the libvirt block job without the pivot flag.
  • There is no limit the the number of abort attempts, we must not
    leave libvirt block job running.

Expected flow, starting at the point we start the cleanup, assuming
maximum 3 pivot attempts (the actual number of retries may need to be
larger):

00:00 try to pivot, fail: wait for next update
00:15 try to pivot, fail: wait for next update
00:30 try to pivot, fail: switch job to cleanup="abort"
00:45 try to avbort, fail: wait for next update
01:00 try to avbort, fail: wait for next update
01:15 try to abort, success: untrack job

More information:
It would be better if we limit the number of retries, but the only case
when it can help is libvirt bug, and in this case trying to stop the
pivot attempt and abort the merge may also fail.

Original bug: https://bugzilla.redhat.com/1949470

Improve test_write_file_direct_true_aligned

Description

outofprocess_test uses temporary files created by pytest fixture. However, when using these files, test_write_file_direct_true_aligned will not raise with file systems without minimum block size for direct I/O (e.g., btrfs).

Traceback

$ tox -e storage -- -vv storage/outofprocess_test.py::test_write_file_direct_true_unaligned
...
oop_cleanup = None, tmpdir = local('/var/tmp/vdsm/test_write_file_direct_true_un0')   
    def test_write_file_direct_true_unaligned(oop_cleanup, tmpdir): 
        iop = oop.getProcessPool("test")
        path = str(tmpdir.join("file"))
>       with pytest.raises(OSError) as e:
E       Failed: DID NOT RAISE <class 'OSError'> 

storage/outofprocess_test.py:1010: Failed  

while:

$ df -hT /var/tmp
Filesystem                                            Type   Size  Used Avail Use% Mounted on
/dev/mapper/luks-08653fbb-0d5f-4a76-be60-30be4377cf5d btrfs  476G  119G  357G  25% /

Proposed solution

To avoid such issues, the tests should use user storage instead of random storage from the host running the tests.

Run the tests as unprivileged user

Most of vdsm code run as vdsm:kvm, but vdsm tests run as root by default in the CI.
This causes several problems:

  • Tests behave differently when run locally on developer machine and in the CI
  • Some tests cannot succeed when run as root, and are skipped in the CI. The tests and the code are likey to break because they are tested only locally.
  • Error in code that would fail in runtime running as vdsm:kvm are not detected in the tests, running as root:root.
  • We need special configuration to run the containers with privileges - this should be done only for tests that must run as root.

Fix:

  • Separate the tests that need to run as root from the tests that need to run as vdsm:kvm
  • Run the tests as vdsm:kvm or another unprivileged user
  • Run tests that needs root in a privileged container

In general only code that run via supervdsm need to be tested as root. Unfortunately the code running as root is mixed with other code in vdsm.

Refactor qemuguestagent.py

The file contains significant amount of duplication and requires too many effort to add new agent commands. We should refactor it to make it cleaner and simpler.

Minimize storage mailbox latency

Sending messages using the storage mailbox has very high latency
of 2-4 seconds. This can cause VMs to pause when they wait for
extension request completion.

Why to we have 2-4 seconds latency?

On the SPM host, the SPM mailbox monitor reads the SPM inbox every
2 seconds. Every host has a 4k mailbox in the SPM inbox. We support
up to 250 hosts in a DC, so the SPM has to read the mailbox for all
host - 250 * 4k -> 1000 KiB. So the SPM is reading 500 KiB/s from
the ibox.

On the other hosts, every host has a HSM mailbox monitor. This monitor
read the host inbox every 2 seconds, when the host is waiting from
reply from the SPM. Since mailbox size is 4k, the host read 2k/s during
extension request.

Extension request lifetime

  1. Host send extension request, writing 4k to the host outbox
  2. The SPM discover the host request 0-2 seconds later
  3. The SPM extend the disk and write a reply to the host inbox
  4. The host discover the SPM reply 0-2 seconds later
  5. The host refresh the extended disk

From this we can see that in the worst case, both the SPM and
the host will wait 2 seconds, and the entire roundtrip will take
at least 4 seconds, regardless of the time to extend the volume
(typically 0.2 seconds).

Ways to decrease latency

Reduce the check interval on the SPM and other hosts

Change the check internal to 0.5-1.0 seconds. This will increase the I/O
specially on the SPM host, which reads almost 1 MiB on every read.

Keeping the current code, I think we can decrease the check interval to
1 second on the SPM, and 0.1 seconds on other hosts since they read
only 4k.

Introduce an event block

Instead of reading the entire inbox (100k), the SPM can read one
block (4k) used only for signaling changes. We can use the mailbox at
offset 0 for this, since host id 0 is reserved and not used by any
host (enforced by sanlock).

The SPM will monitor the event block, and when it detect that the event
block has changed, it will read the entire inbox to find new message
from all hosts in the DC.

Since the SPM read one block instead of entire inbox, we can use much
lower check internal, for example 0.1 seconds. This can decrease the
latency to 0.1-0.2 seconds.

Detailed design

After a host write a message to the the SPM inbox, it will write an
event to the event block in the SPM inbox. The first byte of the
event must be 0, to be compatible with older hosts - they will skip
this message in the mailbox.

if newMail[msgStart:msgStart + 1] == b"\0":

Since many hosts can write to the event block at the same time, the SPM
will see only the last write. This is correct, since the purpose of
the event is to signal the SPM host to read the entire mailbox.

To make this work, event must have unique content. An easy way to
to implement this is to include a random UUID in the event.

The SPM will monitor the event block with a smaller interval. For example
it can read the event block 10 times per second. This will issue 10
4k reads per second, decreasing the latency to 0.1 seconds, and the amount
of data read from 500k/s to 40k/s.

On the hosts, the HSM mail monitor will also read the host inbox using
smaller interval. If we use 0.1 seconds internal, the latency will decrease
to 0.1 seconds, but the amount of data read will increase from 2k/s to 40k/s.

Latency vs I/O

interval (s) read (k/s)
0.1 40
0.2 20
0.5 8
1.0 4

Event block format

range content
0-4 b"\0evt" (first byte must be zero)
4-20 UUID bytes in big endian order
20-4096 padding

Example - formatting an event:

>>> import uuid
>>> buf = bytearray(4096)
>>> buf[0:4] = b"\0evt"
>>> buf[4:20] = uuid.uuid4().bytes
>>> buf[:20]
bytearray(b'\x00evt\x12\x87\x98\t\xd6\xbaF\xc7\xbdie\x0b\xa8\xe0\xd0\x9a')

Example parsing an event:

>>> uuid.UUID(bytes=bytes(buf[4:20]))
UUID('12879809-d6ba-46c7-bd69-650ba8e0d09a')

Parsing the event can be useful for debugging. We can log the event UUID when we send
a message to the SPM and when we detect an event on the SPM.

Host capabilities

Host supporting mailbox events will include

"mailbox_events": True

in host capabilities response.

Upgrade scenarios

When working with both old and new hosts in the same DC, old hosts
will not send events, so the SPM need to continue reading the entire
mailbox every 2 seconds.

Old hosts will read their inbox using larger interval, so extend requests
from old host will take 2-4 seconds.

New hosts will write events and monitor their inbox using lower interval,
so they can have faster extends.

We can change engine to tell the SPM if the DC includes old hosts. The SPM
will read the entire mailbox every 2 seconds only in this case. Otherwise
the SPM can read the entire mailbox only when event was received, or maybe
every 60 seconds as a fallback if something went wrong with events.

Improve TestExecCmd in misc_test for output handling

Description

In misc_tests, the executed commands in TestExecCmd do not consider the return code nor all the outputs to determine if the test should fail. Considering them not only will increase the test coverage, but will provide useful output for debugging.

Also, checkSudo function is not working as expected, resulting in runs where the test should be skipped because the user is not root, but it is instead executed and fails as a result.

Traceback

$ tox -e storage -- -vv storage/misc_test.py::TestExecCmd::testSudo
...
self = <storage.misc_test.TestExecCmd testMethod=testSudo> 
    def testSudo(self):
        """
        Tests that when running with sudo the user really is root (or other
        desired user). 
        """ 
        cmd = [EXT_WHOAMI]
        checkSudo(cmd)
        ret, stdout, stderr = commands.execCmd(cmd, sudo=True) 
>       self.assertEqual(stdout[0].decode("ascii"), SUDO_USER) 
E       IndexError: list index out of range
storage/misc_test.py:600: IndexError

Investigation

The problem seems to be related with the -l option of the sudo command in the checkSudo function: https://www.sudo.ws/docs/man/1.8.13/sudo.man/

If a command is specified and is permitted by the security policy, the fully-qualified path to the 
command is displayed along with any command line arguments. If command is specified but 
not allowed, sudo will exit with a status value of 1.

So doing:

$ rpm -q sudo   
sudo-1.9.7p2-2.fc35.x86_64

$ sudo -l -n whoami
/usr/bin/whoami

Just prints the full path, no error, and checkSudo fails to recognize the need for sudo, which ends up causing the execution of the command to fail.

Proposed Solution

  • For the additional checks of the return code, when rc is !=0 the test should fail and print the error output for a more descriptive log.
  • For the checkSudo, replace the function with the @requires_root decorator. Eventually might be good idea to remove checkSudo code completely.

Remove legacy automation

  • Rename automation to ci
  • Remove legacy ovirt ci files

Issues:

  • check-patch.packages is used in README.md for development setup. Create a script for setting up development environment?

Mailbox logs are too noisy, no way to silence them

The mailbox logs were always too noisy, but now with mailbox events they are even more noisy.

Here mailbox logs from 10 seconds from the SPM host

2022-04-14 18:52:00,083+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] /usr/bin/taskset --cpu-list 0-1 /usr/bin/dd if=/rhev/data-center/d8f140d8-0de0-45dd-8c5a-b27aafba7683/mastersd/dom_md/inbox iflag=direct count=1 bs=4096 (cwd None) (commands:234)
2022-04-14 18:52:00,116+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00871619 s, 470 kB/s\n'; <rc> = 0 (commands:247)
2022-04-14 18:52:00,512+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] /usr/bin/taskset --cpu-list 0-1 dd if=/rhev/data-center/d8f140d8-0de0-45dd-8c5a-b27aafba7683/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000 (cwd None) (commands:234)
2022-04-14 18:52:00,550+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB, 1000 KiB) copied, 0.0116884 s, 87.6 MB/s\n'; <rc> = 0 (commands:247)
2022-04-14 18:52:01,060+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] /usr/bin/taskset --cpu-list 0-1 /usr/bin/dd if=/rhev/data-center/d8f140d8-0de0-45dd-8c5a-b27aafba7683/mastersd/dom_md/inbox iflag=direct count=1 bs=4096 (cwd None) (commands:234)
2022-04-14 18:52:01,075+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00743977 s, 551 kB/s\n'; <rc> = 0 (commands:247)
2022-04-14 18:52:01,576+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] /usr/bin/taskset --cpu-list 0-1 /usr/bin/dd if=/rhev/data-center/d8f140d8-0de0-45dd-8c5a-b27aafba7683/mastersd/dom_md/inbox iflag=direct count=1 bs=4096 (cwd None) (commands:234)
2022-04-14 18:52:01,603+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n4096 bytes (4.1 kB, 4.0 KiB) copied, 0.007399 s, 554 kB/s\n'; <rc> = 0 (commands:247)
2022-04-14 18:52:02,108+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] /usr/bin/taskset --cpu-list 0-1 /usr/bin/dd if=/rhev/data-center/d8f140d8-0de0-45dd-8c5a-b27aafba7683/mastersd/dom_md/inbox iflag=direct count=1 bs=4096 (cwd None) (commands:234)
2022-04-14 18:52:02,126+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00759692 s, 539 kB/s\n'; <rc> = 0 (commands:247)
2022-04-14 18:52:02,560+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] /usr/bin/taskset --cpu-list 0-1 dd if=/rhev/data-center/d8f140d8-0de0-45dd-8c5a-b27aafba7683/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000 (cwd None) (commands:234)
2022-04-14 18:52:02,591+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB, 1000 KiB) copied, 0.00508083 s, 202 MB/s\n'; <rc> = 0 (commands:247)
2022-04-14 18:52:03,101+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] /usr/bin/taskset --cpu-list 0-1 /usr/bin/dd if=/rhev/data-center/d8f140d8-0de0-45dd-8c5a-b27aafba7683/mastersd/dom_md/inbox iflag=direct count=1 bs=4096 (cwd None) (commands:234)
2022-04-14 18:52:03,136+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00810533 s, 505 kB/s\n'; <rc> = 0 (commands:247)
2022-04-14 18:52:03,637+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] /usr/bin/taskset --cpu-list 0-1 /usr/bin/dd if=/rhev/data-center/d8f140d8-0de0-45dd-8c5a-b27aafba7683/mastersd/dom_md/inbox iflag=direct count=1 bs=4096 (cwd None) (commands:234)
2022-04-14 18:52:03,670+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00832939 s, 492 kB/s\n'; <rc> = 0 (commands:247)
2022-04-14 18:52:04,172+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] /usr/bin/taskset --cpu-list 0-1 /usr/bin/dd if=/rhev/data-center/d8f140d8-0de0-45dd-8c5a-b27aafba7683/mastersd/dom_md/inbox iflag=direct count=1 bs=4096 (cwd None) (commands:234)
2022-04-14 18:52:04,208+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00851643 s, 481 kB/s\n'; <rc> = 0 (commands:247)
2022-04-14 18:52:04,601+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] /usr/bin/taskset --cpu-list 0-1 dd if=/rhev/data-center/d8f140d8-0de0-45dd-8c5a-b27aafba7683/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000 (cwd None) (commands:234)
2022-04-14 18:52:04,628+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB, 1000 KiB) copied, 0.00884631 s, 116 MB/s\n'; <rc> = 0 (commands:247)
2022-04-14 18:52:05,133+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] /usr/bin/taskset --cpu-list 0-1 /usr/bin/dd if=/rhev/data-center/d8f140d8-0de0-45dd-8c5a-b27aafba7683/mastersd/dom_md/inbox iflag=direct count=1 bs=4096 (cwd None) (commands:234)
2022-04-14 18:52:05,146+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n4096 bytes (4.1 kB, 4.0 KiB) copied, 0.000658489 s, 6.2 MB/s\n'; <rc> = 0 (commands:247)
2022-04-14 18:52:05,646+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] /usr/bin/taskset --cpu-list 0-1 /usr/bin/dd if=/rhev/data-center/d8f140d8-0de0-45dd-8c5a-b27aafba7683/mastersd/dom_md/inbox iflag=direct count=1 bs=4096 (cwd None) (commands:234)
2022-04-14 18:52:05,679+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00853014 s, 480 kB/s\n'; <rc> = 0 (commands:247)
2022-04-14 18:52:06,180+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] /usr/bin/taskset --cpu-list 0-1 /usr/bin/dd if=/rhev/data-center/d8f140d8-0de0-45dd-8c5a-b27aafba7683/mastersd/dom_md/inbox iflag=direct count=1 bs=4096 (cwd None) (commands:234)
2022-04-14 18:52:06,197+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n4096 bytes (4.1 kB, 4.0 KiB) copied, 0.0072806 s, 563 kB/s\n'; <rc> = 0 (commands:247)
2022-04-14 18:52:06,633+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] /usr/bin/taskset --cpu-list 0-1 dd if=/rhev/data-center/d8f140d8-0de0-45dd-8c5a-b27aafba7683/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000 (cwd None) (commands:234)
2022-04-14 18:52:06,671+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB, 1000 KiB) copied, 0.0120905 s, 84.7 MB/s\n'; <rc> = 0 (commands:247)
2022-04-14 18:52:07,184+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] /usr/bin/taskset --cpu-list 0-1 /usr/bin/dd if=/rhev/data-center/d8f140d8-0de0-45dd-8c5a-b27aafba7683/mastersd/dom_md/inbox iflag=direct count=1 bs=4096 (cwd None) (commands:234)
2022-04-14 18:52:07,219+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00786979 s, 520 kB/s\n'; <rc> = 0 (commands:247)
2022-04-14 18:52:07,720+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] /usr/bin/taskset --cpu-list 0-1 /usr/bin/dd if=/rhev/data-center/d8f140d8-0de0-45dd-8c5a-b27aafba7683/mastersd/dom_md/inbox iflag=direct count=1 bs=4096 (cwd None) (commands:234)
2022-04-14 18:52:07,739+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00746872 s, 548 kB/s\n'; <rc> = 0 (commands:247)
2022-04-14 18:52:08,240+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] /usr/bin/taskset --cpu-list 0-1 /usr/bin/dd if=/rhev/data-center/d8f140d8-0de0-45dd-8c5a-b27aafba7683/mastersd/dom_md/inbox iflag=direct count=1 bs=4096 (cwd None) (commands:234)
2022-04-14 18:52:08,274+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00803948 s, 509 kB/s\n'; <rc> = 0 (commands:247)
2022-04-14 18:52:08,683+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] /usr/bin/taskset --cpu-list 0-1 dd if=/rhev/data-center/d8f140d8-0de0-45dd-8c5a-b27aafba7683/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000 (cwd None) (commands:234)
2022-04-14 18:52:08,722+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB, 1000 KiB) copied, 0.0121663 s, 84.2 MB/s\n'; <rc> = 0 (commands:247)
2022-04-14 18:52:09,234+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] /usr/bin/taskset --cpu-list 0-1 /usr/bin/dd if=/rhev/data-center/d8f140d8-0de0-45dd-8c5a-b27aafba7683/mastersd/dom_md/inbox iflag=direct count=1 bs=4096 (cwd None) (commands:234)
2022-04-14 18:52:09,250+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n4096 bytes (4.1 kB, 4.0 KiB) copied, 0.0074819 s, 547 kB/s\n'; <rc> = 0 (commands:247)
2022-04-14 18:52:09,752+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] /usr/bin/taskset --cpu-list 0-1 /usr/bin/dd if=/rhev/data-center/d8f140d8-0de0-45dd-8c5a-b27aafba7683/mastersd/dom_md/inbox iflag=direct count=1 bs=4096 (cwd None) (commands:234)
2022-04-14 18:52:09,784+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n4096 bytes (4.1 kB, 4.0 KiB) copied, 0.0080576 s, 508 kB/s\n'; <rc> = 0 (commands:247)

Here is one second of logs during extend:

2022-04-14 18:55:43,296+0300 DEBUG (periodic/1) [storage.spm.messages.extend] new extend msg created: domain: bafd0f16-9aba-4f9f-ba90-46d3b8a29157, volume: cdb268ff-9455-4692-997d-ad5f35ee33e1 (mailbox:135)
2022-04-14 18:55:43,343+0300 DEBUG (mailbox-hsm) [storage.mailbox.hsmmailmonitor] HSM_MailMonitor - start: 0, end: 64, len: 4096, message(1/63): b'1xtndW\x91\xa2\xb8\xd3F\x90\xba\x9fO\xba\x9a\x16\x0f\xfd\xba\xe13\xee5_\xad}\x99\x92FU\x94\xffh\xb2\xcd0000000000001e0000000000000' (mailbox:432)
2022-04-14 18:55:43,344+0300 DEBUG (mailbox-hsm) [storage.misc.exccmd] /usr/bin/taskset --cpu-list 0-1 /usr/bin/dd if=/rhev/data-center/d8f140d8-0de0-45dd-8c5a-b27aafba7683/mastersd/dom_md/outbox iflag=direct,fullblock bs=4096 count=1 skip=1 (cwd None) (commands:234)
2022-04-14 18:55:43,365+0300 DEBUG (mailbox-hsm) [storage.misc.exccmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00046536 s, 8.8 MB/s\n'; <rc> = 0 (commands:247)
2022-04-14 18:55:43,366+0300 INFO  (mailbox-hsm) [storage.mailbox.hsmmailmonitor] HSM_MailMonitor sending mail to SPM - ['/usr/bin/dd', 'of=/rhev/data-center/d8f140d8-0de0-45dd-8c5a-b27aafba7683/mastersd/dom_md/inbox', 'iflag=fullblock', 'oflag=direct', 'conv=notrunc', 'bs=4096', 'count=1', 'seek=1'] (mailbox:393)
2022-04-14 18:55:43,366+0300 DEBUG (mailbox-hsm) [storage.misc.exccmd] /usr/bin/taskset --cpu-list 0-1 /usr/bin/dd of=/rhev/data-center/d8f140d8-0de0-45dd-8c5a-b27aafba7683/mastersd/dom_md/inbox iflag=fullblock oflag=direct conv=notrunc bs=4096 count=1 seek=1 (cwd None) (commands:234)
2022-04-14 18:55:43,394+0300 DEBUG (mailbox-hsm) [storage.misc.exccmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n4096 bytes (4.1 kB, 4.0 KiB) copied, 0.000316367 s, 12.9 MB/s\n'; <rc> = 0 (commands:247)
2022-04-14 18:55:43,394+0300 INFO  (mailbox-hsm) [storage.mailbox.hsmmailmonitor] HSM_MailMonitor sending event ba208024-1ca8-44af-a6bb-0cbc7c355970 to SPM (mailbox:537)
2022-04-14 18:55:43,394+0300 DEBUG (mailbox-hsm) [storage.misc.exccmd] /usr/bin/taskset --cpu-list 0-1 /usr/bin/dd of=/rhev/data-center/d8f140d8-0de0-45dd-8c5a-b27aafba7683/mastersd/dom_md/inbox iflag=fullblock oflag=direct conv=notrunc bs=4096 count=1 (cwd None) (commands:234)
2022-04-14 18:55:43,418+0300 DEBUG (mailbox-hsm) [storage.misc.exccmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00527916 s, 776 kB/s\n'; <rc> = 0 (commands:247)
2022-04-14 18:55:43,644+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] /usr/bin/taskset --cpu-list 0-1 /usr/bin/dd if=/rhev/data-center/d8f140d8-0de0-45dd-8c5a-b27aafba7683/mastersd/dom_md/inbox iflag=direct count=1 bs=4096 (cwd None) (commands:234)
2022-04-14 18:55:43,661+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n4096 bytes (4.1 kB, 4.0 KiB) copied, 0.000518707 s, 7.9 MB/s\n'; <rc> = 0 (commands:247)
2022-04-14 18:55:43,662+0300 INFO  (mailbox-spm) [storage.MailBox.SpmMailMonitor] Received event: ba208024-1ca8-44af-a6bb-0cbc7c355970 (mailbox:866)
2022-04-14 18:55:43,662+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] /usr/bin/taskset --cpu-list 0-1 dd if=/rhev/data-center/d8f140d8-0de0-45dd-8c5a-b27aafba7683/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000 (cwd None) (commands:234)
2022-04-14 18:55:43,695+0300 DEBUG (mailbox-spm) [storage.misc.exccmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB, 1000 KiB) copied, 0.0108624 s, 94.3 MB/s\n'; <rc> = 0 (commands:247)
2022-04-14 18:55:43,695+0300 DEBUG (mailbox-spm) [storage.MailBox.SpmMailMonitor] SPM_MailMonitor: Mailbox 1 validated, checking mail (mailbox:703)
2022-04-14 18:55:43,695+0300 DEBUG (mailbox-spm) [storage.MailBox.SpmMailMonitor] SPM_MailMonitor: processing request: b'1xtndW\x91\xa2\xb8\xd3F\x90\xba\x9fO\xba\x9a\x16\x0f\xfd\xba\xe13\xee5_\xad}\x99\x92FU\x94\xffh\xb2\xcd0000000000001e0000000000000' (mailbox:740)
2022-04-14 18:55:43,750+0300 DEBUG (mailbox-spm/0) [storage.threadpool] Number of running tasks: 1 (threadPool:85)
2022-04-14 18:55:43,750+0300 INFO  (mailbox-spm/0) [storage.threadpool.workerthread] START task 4deeed70-f946-4a0b-b94b-d8b63fa7c56a (cmd=<function runTask at 0x7f5129b4dea0>, args=(functools.partial(<bound method SPM_Extend_Message.processRequest of <class 'vdsm.storage.mailbox.SPM_Extend_Message'>>, <vdsm.storage.sp.StoragePool object at 0x7f50f0561940>), 64, b'1xtndW\x91\xa2\xb8\xd3F\x90\xba\x9fO\xba\x9a\x16\x0f\xfd\xba\xe13\xee5_\xad}\x99\x92FU\x94\xffh\xb2\xcd0000000000001e0000000000000')) (threadPool:146)
2022-04-14 18:55:43,750+0300 DEBUG (mailbox-spm/0) [storage.taskmanager.task] (Task='98841b95-8693-4556-9868-15cf8d29d25f') moving from state preparing -> state preparing (task:624)
2022-04-14 18:55:43,750+0300 DEBUG (mailbox-spm/0) [storage.spm.messages.extend] processRequest, payload:b'1xtndW\x91\xa2\xb8\xd3F\x90\xba\x9fO\xba\x9a\x16\x0f\xfd\xba\xe13\xee5_\xad}\x99\x92FU\x94\xffh\xb2\xcd0000000000001e0000000000000' (mailbox:158)
2022-04-14 18:55:43,750+0300 INFO  (mailbox-spm/0) [storage.spm.messages.extend] processRequest: extending volume cdb268ff-9455-4692-997d-ad5f35ee33e1 in domain bafd0f16-9aba-4f9f-ba90-46d3b8a29157 (pool d8f140d8-0de0-45dd-8c5a-b27aafba7683) to size 7680 (mailbox:173)
2022-04-14 18:55:43,750+0300 DEBUG (mailbox-spm/0) [storage.storagedomain] Extending thinly-provisioned LV for volume cdb268ff-9455-4692-997d-ad5f35ee33e1 to 7680 MB (blockSD:1597)
2022-04-14 18:55:43,751+0300 DEBUG (mailbox-spm/0) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /usr/sbin/lvm vgs --devices /dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-01,/dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-02,/dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-03,/dev/mapper/36001405351b21217d814266b5354d710,/dev/mapper/3600140536effe53a1e242a991f0d44d0,/dev/mapper/360014054f26c2d5c5b74814a0c0cbabb,/dev/mapper/3600140563a17d65f8dd44ad9b84bff97,/dev/mapper/3600140577ea494a9b544084bdf67f5e1,/dev/mapper/36001405789e8ab39498483ea5858c779,/dev/mapper/360014057c4191f0d8f547a6a69180ef1,/dev/mapper/36001405c75ad03669294f4ab344b0028,/dev/mapper/36001405ef11fb3ca9824a5dbc12f0931,/dev/mapper/36001405fa8e8698bae84578baae9c182 --config 'devices {  preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3    hints="none"  obtain_device_list_from_udev=0 } global {  prioritise_write_locks=1  wait_for_locks=1  use_lvmpolld=1 } backup {  retain_min=50  retain_days=0 }' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name bafd0f16-9aba-4f9f-ba90-46d3b8a29157 (cwd None) (commands:154)
2022-04-14 18:55:43,838+0300 DEBUG (mailbox-spm/0) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:186)
2022-04-14 18:55:43,839+0300 DEBUG (mailbox-spm/0) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /usr/sbin/lvm lvs --devices /dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-01,/dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-02,/dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-03 --config 'devices {  preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3    hints="none"  obtain_device_list_from_udev=0 } global {  prioritise_write_locks=1  wait_for_locks=1  use_lvmpolld=1 } backup {  retain_min=50  retain_days=0 }' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,vg_name,attr,size,seg_start_pe,devices,tags bafd0f16-9aba-4f9f-ba90-46d3b8a29157 (cwd None) (commands:154)
2022-04-14 18:55:43,938+0300 DEBUG (mailbox-hsm) [storage.misc.exccmd] /usr/bin/taskset --cpu-list 0-1 /usr/bin/dd if=/rhev/data-center/d8f140d8-0de0-45dd-8c5a-b27aafba7683/mastersd/dom_md/outbox iflag=direct,fullblock bs=4096 count=1 skip=1 (cwd None) (commands:234)
2022-04-14 18:55:43,959+0300 DEBUG (mailbox-hsm) [storage.misc.exccmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n4096 bytes (4.1 kB, 4.0 KiB) copied, 0.000279828 s, 14.6 MB/s\n'; <rc> = 0 (commands:247)
2022-04-14 18:55:43,973+0300 DEBUG (mailbox-spm/0) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:186)
2022-04-14 18:55:43,973+0300 DEBUG (mailbox-spm/0) [storage.lvm] lvs reloaded (lvm:718)
2022-04-14 18:55:43,973+0300 INFO  (mailbox-spm/0) [storage.lvm] Extending LV bafd0f16-9aba-4f9f-ba90-46d3b8a29157/cdb268ff-9455-4692-997d-ad5f35ee33e1 to 7680 megabytes (lvm:1616)
2022-04-14 18:55:43,974+0300 DEBUG (mailbox-spm/0) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /usr/sbin/lvm lvextend --devices /dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-01,/dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-02,/dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-03 --config 'devices {  preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3    hints="none"  obtain_device_list_from_udev=0 } global {  prioritise_write_locks=1  wait_for_locks=1  use_lvmpolld=1 } backup {  retain_min=50  retain_days=0 }' --autobackup n --driverloaded n --size 7680m bafd0f16-9aba-4f9f-ba90-46d3b8a29157/cdb268ff-9455-4692-997d-ad5f35ee33e1 (cwd None) (commands:154)

Issues:

  • Logging command arguments in INFO level, when the actual command is already logged in DEBUG level
  • Logging events send/receive in INFO level - this is implementation detail, needed only for debugging
  • Mailbox commands are logged using storage.misc.execcmd instead of storge.mailbox, so it is not possible to silence only the mailbox commands, that run every 0.5 seconds.
  • No easy way to silence mailbox logs, if you enable DEBUG level logs for "storage", you get also DEBUG logs for "storage.mailbox"

Ceph mounts are deemed failed (even when actually successful)

When mounting a ceph file system we get an exception:

2021-08-18 22:13:56,927+0100 INFO (jsonrpc/4) [storage.StorageServer.MountConnection] Creating directory '/rhev/data-center/mnt/monitors.storage-01.example.com:' (storageServer:167)
2021-08-18 22:13:56,927+0100 INFO (jsonrpc/4) [storage.fileUtils] Creating directory: /rhev/data-center/mnt/monitors.storage-01.example.com:
mode: None (fileUtils:201)
2021-08-18 22:13:56,927+0100 INFO (jsonrpc/4) [storage.Mount] mounting monitors.storage-01.example.com:/ at /rhev/data-center/mnt/monitors.storage-01.example.com:_ (mount:207)
2021-08-18 22:13:57,006+0100 ERROR (jsonrpc/4) [storage.HSM] Could not connect to storageServer (hsm:2374)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2371, in connectStorageServer
conObj.connect()
File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 184, in connect
self.getMountObj().getRecord().fs_file)
File "/usr/lib/python3.6/site-packages/vdsm/storage/mount.py", line 256, in getRecord
(self.fs_spec, self.fs_file))
FileNotFoundError: [Errno 2] Mount of monitors.storage-01.example.com:/ at /rhev/data-center/mnt/monitors.storage-01.example.com:_ does not exist

The storage domain is configured with:
Path: monitors.storage-01.example.com:/
VFS Type: ceph

  • various secrets etc

The strange part; the actual mount command DID succeed. I can see the file system mounted on the server, examine the contents and see it in "df -h" etc.

df -h

Filesystem Size Used Avail Use% Mounted on
192.168.185.1,192.168.185.5,192.168.185.6:/ 15T 3.3T 12T 23% /rhev/data-center/mnt/monitors.storage-01.example.com:_

As far as I can see from the code, it expects to find "monitors.storage-01.example.com:/" in /proc/mounts - and not "192.168.185.1,192.168.185.5,192.168.185.6:/" - something, somewhere has translated the DNS name "monitors.storage-01.example.com" into the correct IP addresses.

And this translation causes the exception to be raised...

I managed to hand-fix it by modifying /usr/lib/python3.6/site-packages/vdsm/storage/mount.py:

--- mount.py 2021-08-19 14:24:01.400661984 +0100
+++ mount.py 2021-08-18 22:49:56.998536572 +0100
@@ -248,7 +248,7 @@
fs_specs = self.fs_spec, None

     for record in _iterMountRecords():
  •        if self.fs_file == record.fs_file and record.fs_spec in fs_specs:
    
  •        if self.fs_file == record.fs_file:
               return record
    
       raise OSError(errno.ENOENT,
    

But this is obviously not ideal - future upgrades will overwrite it etc.

The really strange part: We have 6 other hypervisors in this cluster, and none of the other 6 have this problem! They all have the DNS name appearing in /proc/mounts. Except this one server. I do not know why this one is different...

Original bug: https://bugzilla.redhat.com/1995610

Unknown SD when looking up NFS storage domain in startSpm

Description of problem

This bug was split from downstream bug ID 2048545.

When looking up a storage domain, vdsm uses the domain find method:

150     def _findDomain(self, sdUUID):
151         try:
152             findMethod = self.knownSDs[sdUUID]
153         except KeyError:
154             findMethod = self._findUnfetchedDomain
155 
156         return findMethod(sdUUID)

But if the domain is unknown, vdsm fallbacks to search the storage
domain in all connected servers:

158     def _findUnfetchedDomain(self, sdUUID):
...
171         self.log.info("Looking up domain %s", sdUUID)
172         with utils.stopwatch(
173                 "Looking up domain {}".format(sdUUID),
174                 level=logging.INFO,
175                 log=self.log):
176             for mod in (blockSD, glusterSD, localFsSD, nfsSD):
177                 try:
178                     return mod.findDomain(sdUUID)
179                 except se.StorageDomainDoesNotExist:
180                     pass
181                 except Exception:
182                     self.log.error(
183                         "Error while looking for domain `%s`",
184                         sdUUID, exc_info=True)

In this case, when looking up the storage domain in blockSD, we run lvm vgs (and fail).

Actions

_findUnfetchedDomain looks for the storage domain in all storage domain types, and it could not find it.
Why is the storage domain unknown?

Domains find methods are added to knownSDs dict when connecting
to storage server. They should not be deleted when the domain is in use.
Maybe there is a bug in this mechanism?

We should investigate this, fix the issue, and verify it.

Fix checkSudo code

Description

Failure to run command using "sudo -n" or "sudo -l -n" does not prove that sudo is not configured for the command. The command can fail because of a real failure running the command.

We have only few tests using this check:

$ git grep checkSudo
tests/mkimage_test.py:from testValidation import checkSudo, ValidateRunningAsRoot
tests/mkimage_test.py:        checkSudo(["mount", "-o", "loop", "somefile", "target"])
tests/mkimage_test.py:        checkSudo(["umount", "target"])
tests/storage/misc_test.py:from testValidation import checkSudo
tests/storage/misc_test.py:        checkSudo(cmd)
tests/testValidation.py:def checkSudo(cmd)

The way we use this check is wrong:

checkSudo(["mount", "-o", "loop", "somefile", "target"])
checkSudo(["umount", "target"])

These can be fail to mount, unrelated to sudo.

Even when the call works, it works because we run as root. Only certain commands are configured to work with sudo (see static/etc/sudoers.d/50_vdsm.in), and using sudo works only if you run the tests as vdsm, and the machine is configured for vdsm.

This is never true on a development machine or a container.

Proposal

We should remove this code, and skip the sudo tests unless we run as user "vdsm", and can run "lvm" readonly command that will never fail for another reason, like:

$ lvm vgs; echo $?
WARNING: Running as a non-root user. Functionality may be unavailable.
/run/lock/lvm/P_global:aux: open failed: Permission denied
5

POSIX compliant FS storage doesn't work unless you manually chcon -R -t nfs_t mountpoint

Background details:
https://lists.ovirt.org/archives/list/[email protected]/thread/TBRERXNEEUYEFW2FAZUWVLF5TFODFGKM/

I was not able to create posix storage through the UI. It
fails with a strange error:

Error while executing action AddPosixFsStorageDomain: Unexpected exception

Eventually I tracked down the real error to sanlock:

2022-02-04 13:15:27 16723 [826]: open error -13 EACCES: no permission to open
/rhev/data-center/mnt/_dev_sdb1/13a731d2-e1d2-4998-9b02-ac46899e3159/dom_md/ids
2022-02-04 13:15:27 16723 [826]: check that daemon user sanlock 179 group
sanlock 179 has access to disk or file.

This was resolved by doing:

sudo chcon -R -t nfs_t /mountpoint

Version-Release number of selected component (if applicable):

ovirt 4.4.10.6-1.el8 running on RHEL 8.5

How reproducible:

100%

Steps to Reproduce:

See this email for detailed steps:
https://lists.ovirt.org/archives/list/[email protected]/message/CSKCHQVWWD7EW3A4HPA37RJTKACFATO6/

Original bug: https://bugzilla.redhat.com/2050721

Isolate tests running as root locally

Some tests like storage tests (lvm, blocksd) require root. It is possible to run them
like this:

sudo ~/.venv/vdsm/bin/tox -e storage tests/storage/lvm_test.py

But after that, some files in .tox, tests and /var/tmp/vdsm are owned by root, so
the next tests run as regular user will break with permissions errors.

Developers can fix this using:

sudo chown -R $USER:$USER .tox tests/htmlcov* /var/tmp/vdsm

But this is ugly. We need an easy way to run tests as root that does not require any
cleanup after running the tests.

One way to fix this is to use separate directories for:

  • Pytest temporary directory (pytest has solution for this)
  • .tox directory
  • tests/htmlcov* directories

Another way is to run the tests in a container, but some lvm tests do not work in a container.

Live merge over extend the active volume - fix requires downtime

When extending the base volume before live merge, we use
a dumb calculation:

new_size = base_size + top_size + chunk_size

This calculation is correct only in the most extreme and practically
impossible case, when all clusters in top do not exist in base, and base
and top have no free space.

Practically, in a typical live merge there is only a small amount of data
in top, some of the clusters in top are already in base, and base has
lot of free space, so base does not need to be extended, or need a small
extension.

After the live merge is completed, the base becomes the active image,
so we cannot reduce it to the optimal size. Because top size is at least
one chunk, this extends base by at least 2 chunks for every live merge.

Base size after live merge:

# before after
1 2.5g 7.5g
2 7.5g 12.5g
3 12.5g 17.5g
4 17.5g 22.5g
5 22.5g 27.5g

The sad result is the the active volume grow on each live merge, until
it reaches the virtual size of the disk.

To reduce the active volume to the optimal size, user need to shut down
the VM, and invoke the reduce disk action in the API, or create a snapshot
and perform cold merge, since after cold merge we reduce the volume to
optimal size.

For internal volume, user can invoke the reduce volume API, but this is
not easy to do and the user do not have any indication that there is
a problem.

This is not a new issue - the problem exist since ovirt 3.5, adding live
snapshot support. But now this issue is much more important, because we
use active layer merge during vm backup, and we increased the chunk size
to 2.5g.

How to reproduce:

  1. Add 30 GiB empty disk to a running vm
  2. Repeat 7 times: create snapshot and delete it

Actual result:
Active volume size grows to maximum size (~33 GiB)

Expected result:
Active volume size does not change (~2.62 GiB)

Add destination parameter to CopyDataExternalEndpoint

Description

copy_data, CopyDataExternalEndpoint does not receive writable nor dest parameters as CopyDataDivEndpoint does. Destination parameter helps to distinguish when a volume is a source or a destination, thus opening the possibility to add some sanity checks (e.g., if source and destination have the same id).

Also, CopyDataExternalEndpoint have no coverage at all in the sdm_copy_data_test or any other test.

Actions

  • Add dest parameter and some sanity checks as in CopyDataDivEndpoint, so that both classes have a similar, consistent API
  • Add tests to improve coverage for CopyDataExternalEndpoint, similar to those for CopyDataDivEndpoint
  • (Optional?) Add an (abstract?) base class for both CopyDataDivEndpoint and CopyDataExternalEndpoint to integrate common functionality, avoid code repetition and enforce a similar API

Optimize Volume.refresh flow

When finishing extend flow we refresh the volume using storage API: Volume.refresh.

Refreshing takes 25% of total extend time, and creates lots of I/O since lvm
must read entire vg metadata twice.

Volume.refresh does:

  • produce a storage domain, possibly blocking for long time if the storage domain cache is in the middle of a rescan, possibly because of unrelated storage domain.
  • produce a volume, validating that the volume exists by calling lvm.getLV()
  • call volume.refreshVolume()
  • call lvm.refreshLVs(), running "lvchange --refresh vg-name/lv-name"

In the logs we see that for every call we run "lvs" - reading entire vg metadata
from storage. The lvs command always succeeds with a warning:

WARNING: Cannot find matching striped segment for bafd0f16-9aba-4f9f-ba90-46d3b8a29157/b91914e5-8446-4ff8-a143-a3f37cf23188.

The warning is likely expected and can be suppressed, but we really should not run
lvs for every call.

All this flow is pointless since call this from running VM context. We know
that the logical volume exist and active, so we should really call only:

lvm.refreshLVs(self.sdUUID, [self.volUUID])

Volume.refresh() does nothing for file volume, so callers should not call this API
with file volumes.

Example log:

2022-05-10 00:43:32,953+0300 INFO  (mailbox-hsm/2) [vdsm.api] START refreshVolume(sdUUID='bafd0f16-9aba-4f9f-ba90-46d3b8a29157', spUUID='fa20ea36-c317-49a6-af6d-afaa5b1561df', imgUUID='a710d32b-eb03-4f19-8049-9803be251f85', volUUID='b91914e5-8446-4ff8-a143-a3f37cf23188') from=internal, task_id=62cbdc9d-9ecb-4607-b620-5e7c564465aa (api:48)

2022-05-10 00:43:33,034+0300 WARN  (mailbox-hsm/2) [storage.lvm] Command ['/usr/sbin/lvm', 'lvs', '--devices', '/dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-01,/dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-02,/dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-03', '--config', 'devices {  preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3    hints="none"  obtain_device_list_from_udev=0 } global {  prioritise_write_locks=1  wait_for_locks=1  use_lvmpolld=1 } backup {  retain_min=50  retain_days=0 }', '--noheadings', '--units', 'b', '--nosuffix', '--separator', '|', '--ignoreskippedcluster', '-o', 'uuid,name,vg_name,attr,size,seg_start_pe,devices,tags', 'bafd0f16-9aba-4f9f-ba90-46d3b8a29157'] succeeded with warnings: ['  WARNING: Cannot find matching striped segment for bafd0f16-9aba-4f9f-ba90-46d3b8a29157/b91914e5-8446-4ff8-a143-a3f37cf23188.'] (lvm:358)

2022-05-10 00:43:33,035+0300 INFO  (mailbox-hsm/2) [storage.lvm] Refreshing LVs (vg=bafd0f16-9aba-4f9f-ba90-46d3b8a29157, lvs=('b91914e5-8446-4ff8-a143-a3f37cf23188',)) (lvm:1735)

2022-05-10 00:43:33,136+0300 INFO  (mailbox-hsm/2) [vdsm.api] FINISH refreshVolume return=None from=internal, task_id=62cbdc9d-9ecb-4607-b620-5e7c564465aa (api:54)

Fix:

  • Call lvm directly bypassing storage layer

Replace vdsm version of userstorage with pip's userstorage

Description

VDSM uses an outdated version of userstorage stored locally in the project (i.e., tests/storage/userstorage.py).

Proposal

Change VDSM version of userstorage to be used as in imageio, where userstorage is a requirement, thus installed with pip in the venv, updateable, and ready-to-use from command line directly.

Bogus LVM warning in reduce volume after live merge

When reducing a volume after live merge, lvm log warnings abut possible data
corruption. This warning is not interesting since we measure the image offset
before reducing, and the volume is internal.

2022-05-23 12:29:04,160+0300 WARN  (tasks/9) [storage.lvm] Command ['/usr/sbin/lvm', 'lvreduce', '--devices', '/dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-01,/dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-02,/dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-03', '--config', 'devices {  preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3    hints="none"  obtain_device_list_from_udev=0 } global {  prioritise_write_locks=1  wait_for_locks=1  use_lvmpolld=1 } backup {  retain_min=50  retain_days=0 }', '--autobackup', 'n', '--force', '--size', '128m', 'bafd0f16-9aba-4f9f-ba90-46d3b8a29157/cd981010-ac48-49e1-b28b-cbde06c085bf'] succeeded with warnings: ['  WARNING: Reducing active and open logical volume to 128.00 MiB.', '  THIS MAY DESTROY YOUR DATA (filesystem etc.)'] (lvm:358)

Fix:

  • Suppress the warning using our LVMRunner suppression list

  • Note: this seems to be 2 warnings:

    '  WARNING: Reducing active and open logical volume to 128.00 MiB.'
    '  THIS MAY DESTROY YOUR DATA (filesystem etc.)'
    

Remove vdsm.storage.compat module

This module added to deal with missing sanlock and ioprocess
in python 3, and with older versions of sanlock that use
async= instead of wait=.

These issues are not relevant for a while, so the module can be removed.

Blocks #55

Unnecessary logs in getVolumeInfo

When getting volume info during merge, we have too many unnecessary logs repeating info in
int he public API.

2022-05-23 12:28:13,472+0300 INFO  (jsonrpc/7) [vdsm.api] START getVolumeInfo(sdUUID='bafd0f16-9aba-4f9f-ba90-46d3b8a29157', spUUID='fa20ea36-c317-49a6-af6d-afaa5b1561df', imgUUID='4b72aba2-0668-463d-b856-7f0f7406b6a2', volUUID='cd981010-ac48-49e1-b28b-c
bde06c085bf') from=::ffff:192.168.122.10,36058, flow_id=5f388eb2-38fd-4ca9-9b97-fc78ce75fe16, task_id=9d6b6d02-7270-4794-8bc0-18f7ed06570d (api:48)

2022-05-23 12:28:13,552+0300 INFO  (jsonrpc/7) [storage.volumemanifest] Info request: sdUUID=bafd0f16-9aba-4f9f-ba90-46d3b8a29157 imgUUID=4b72aba2-0668-463d-b856-7f0f7406b6a2 volUUID = cd981010-ac48-49e1-b28b-cbde06c085bf  (volume:243)

2022-05-23 12:28:13,568+0300 INFO  (jsonrpc/7) [storage.volumemanifest] bafd0f16-9aba-4f9f-ba90-46d3b8a29157/4b72aba2-0668-463d-b856-7f0f7406b6a2/cd981010-ac48-49e1-b28b-cbde06c085bf info is {'uuid': 'cd981010-ac48-49e1-b28b-cbde06c085bf', 'type': 'SPARS
E', 'format': 'COW', 'disktype': 'DATA', 'voltype': 'INTERNAL', 'capacity': '10737418240', 'parent': '00000000-0000-0000-0000-000000000000', 'description': '{"DiskAlias":"f35_Disk1","DiskDescription":""}', 'pool': '', 'domain': 'bafd0f16-9aba-4f9f-ba90-4
6d3b8a29157', 'image': '4b72aba2-0668-463d-b856-7f0f7406b6a2', 'ctime': '1653297576', 'mtime': '0', 'legality': 'LEGAL', 'generation': 0, 'sequence': 1, 'apparentsize': '2684354560', 'truesize': '2684354560', 'status': 'OK', 'lease': {'path': '/dev/bafd0
f16-9aba-4f9f-ba90-46d3b8a29157/leases', 'offset': 112197632, 'owners': [], 'version': None}, 'children': []} (volume:282)

2022-05-23 12:28:13,568+0300 INFO  (jsonrpc/7) [vdsm.api] FINISH getVolumeInfo return={'info': {'uuid': 'cd981010-ac48-49e1-b28b-cbde06c085bf', 'type': 'SPARSE', 'format': 'COW', 'disktype': 'DATA', 'voltype': 'INTERNAL', 'capacity': '10737418240', 'pare
nt': '00000000-0000-0000-0000-000000000000', 'description': '{"DiskAlias":"f35_Disk1","DiskDescription":""}', 'pool': '', 'domain': 'bafd0f16-9aba-4f9f-ba90-46d3b8a29157', 'image': '4b72aba2-0668-463d-b856-7f0f7406b6a2', 'ctime': '1653297576', 'mtime': '
0', 'legality': 'LEGAL', 'generation': 0, 'sequence': 1, 'apparentsize': '2684354560', 'truesize': '2684354560', 'status': 'OK', 'lease': {'path': '/dev/bafd0f16-9aba-4f9f-ba90-46d3b8a29157/leases', 'offset': 112197632, 'owners': [], 'version': None}, 'c
hildren': []}} from=::ffff:192.168.122.10,36058, flow_id=5f388eb2-38fd-4ca9-9b97-fc78ce75fe16, task_id=9d6b6d02-7270-4794-8bc0-18f7ed06570d (api:54)

Fix:

  • Remove logs storage.volumemanifest logs
    • 2022-05-23 12:28:13,552+0300 INFO  (jsonrpc/7) [storage.volumemanifest] Info request: ...
    • 2022-05-23 12:28:13,568+0300 INFO  (jsonrpc/7) [storage.volumemanifest] bafd0f16-9aba-4f9f-ba90-46d3b8a29157/4b72aba2-0668-463d-b856-7f0f7406b6a2/cd981010-ac48-49e1-b28b-cbde06c085bf info is ...

Mailbox test: random error in the CIs for a timeout check

Mailbox test test_roundtrip_events_enabled failed in the pipelines (https://github.com/oVirt/vdsm/runs/6570673750?check_suite_focus=true) on an unrelated changeset. There is some flakiness involved, as suggested in the code comment, so maybe this requires a larger factor for timeouts or, in any case, further investigation.

=================================== FAILURES ===================================
_____________ TestCommunicate.test_roundtrip_events_enabled[16-0] ______________

self = <storage.mailbox_test.TestCommunicate object at 0x7fd87a687390>
mboxfiles = MboxFiles(inbox='/var/tmp/vdsm/test_roundtrip_events_enabled_8/inbox', outbox='/var/tmp/vdsm/test_roundtrip_events_enabled_8/outbox')
delay = 0, messages = 16

    @pytest.mark.parametrize("delay", [0, 0.05])
    @pytest.mark.parametrize("messages", [
        1, 2, 4, 8, 16, 32, sm.MESSAGES_PER_MAILBOX])
    def test_roundtrip_events_enabled(self, mboxfiles, delay, messages):
        """
        Test roundtrip latency.
    
        This test is best run like this:
    
            $ tox -e storage tests/storage/mailbox_test.py -- \
                -k test_roundtrip \
                --log-cli-level=info \
                | grep stats
    
        Example output (trimmed):
    
            stats: messages=1 delay=0.000 best=0.215 average=0.215 worst=0.215
            stats: messages=1 delay=0.050 best=0.234 average=0.234 worst=0.234
            stats: messages=2 delay=0.000 best=0.231 average=0.233 worst=0.236
            stats: messages=2 delay=0.050 best=0.252 average=0.285 worst=0.319
            stats: messages=4 delay=0.000 best=0.257 average=0.258 worst=0.259
            stats: messages=4 delay=0.050 best=0.267 average=0.311 worst=0.354
            stats: messages=8 delay=0.000 best=0.231 average=0.344 worst=0.365
            stats: messages=8 delay=0.050 best=0.246 average=0.314 worst=0.391
            stats: messages=16 delay=0.000 best=0.244 average=0.379 worst=0.395
            stats: messages=16 delay=0.050 best=0.148 average=0.312 worst=0.393
            stats: messages=32 delay=0.000 best=0.260 average=0.378 worst=0.395
            stats: messages=32 delay=0.050 best=0.146 average=0.267 worst=0.402
            stats: messages=63 delay=0.000 best=0.269 average=0.429 worst=0.505
            stats: messages=63 delay=0.050 best=0.160 average=0.272 worst=0.423
    
        """
        times = self.roundtrip(mboxfiles, delay, messages)
    
        best = times[0]
        worst = times[-1]
        average = sum(times) / len(times)
    
        log.info(
            "stats: messages=%d delay=%.3f best=%.3f average=%.3f worst=%.3f",
            messages, delay, best, average, worst)
    
        # This is the slowest run when running locally:
        # stats: messages=63 delay=0.000 best=0.269 average=0.429 worst=0.505
        # In github CI this can be about twice slower. We use larger timeouts
        # to avoid flakeyness on slow CI.
    
>       assert best < 5 * EVENT_INTERVAL
E       assert 0.6363051019999943 < (5 * 0.1)

storage/mailbox_test.py:348: AssertionError

Use only mailbox events

If all hosts supports mailbox events, the SPM does not need to read the mailbox
every 2 seconds, minimizing I/O and CPU usage on the SPM host.

Hosts report new capability:

"mailbox_events": true

If all hosts report this capability, engine can configure the SPM to use only mailbox events.

Requires a change in vdsm and engine.

Plan:

  • Disable periodic checking of the mailbox
  • Measure I/O and CPU usage with and without mailbox checking on real server
  • Prioritize this issue base on the result.

ERROR (mailbox-spm) [storage.MailBox.SpmMailMonitor] mailbox 65 checksum failed, not clearing mailbox, clearing new mail

I see a lot of errors in vdsm.log like:

2022-02-14 08:42:52,086+0100 ERROR (mailbox-spm) [storage.MailBox.SpmMailMonitor] mailbox
65 checksum failed, not clearing mailbox, clearing new mail (data=b'\xff\xff\xff\xff
\x00\x00', checksum=<function checksum at 0x7f2454712b70>,
expected=b'\xbfG\x00\x00') (mailbox:602)
2022-02-14 08:42:52,087+0100 ERROR (mailbox-spm) [storage.MailBox.SpmMailMonitor] mailbox
66 checksum failed, not clearing mailbox, clearing new mail (data=b'\x00\x00\x00\x00
\xff\xff', checksum=<function checksum at 0x7f2454712b70>,
expected=b'\x04\xf0\x0b\x00') (mailbox:602)

We have 3 hosts and 8 iSCSI domains.

Version-Release number of selected component (if applicable):
We are running latest ovirt engine and hosts:
Hosts: ovirt-node-ng-installer-4.4.10-2022020214.el8.iso
engine: ovirt-engine-4.4.10.6-1.el8.noarch

Due to attachment file size limit I am sharing them this way:
https://www.oslavany.net/userdata/publicdoc/ovirt/server1-vdsm-logs.tar
https://www.oslavany.net/userdata/publicdoc/ovirt/server2-vdsm-logs.tar
https://www.oslavany.net/userdata/publicdoc/ovirt/server4-vdsm-logs.tar

Current SPM is "server4".

Actual results:
checksum failed

Expected results:
0 checksum failed

Additional info:
I observed the issue prior we upgraded to v4.4 (I hoped the upgrade will fix it, but it
does not)


The fixes for bug 1426762 mention that we don't have a way to prevent the race between
the hosts writing messages to the mailbox and the spm reading them. The checksum
is our way to tell that what we read is not consistent and we need to read it again.

Maybe we need to improve the way this is handled - instead of logging warnings and
dropping the message, read the relevant messages again.

The attached logs should be enough to start investigating this issue. When we do this we may request more logs.

Original bug: https://bugzilla.redhat.com/2054209

Check volume type before reducing from StorageDomain

Problem

There is no safety check before reducing a volume in the StorageDomain class. This could lead to wrong reductions in case of, e.g., raw or preallocated volumes.

StoragePool.reduceVolume() is using StorageDomain.reduceVolume(), which is taking the metadata_lock and then calling to the Volume.reduce method with the volume optimal_size.

Plan

  • Use vol.can_reduce() method (which takes into account the type and the format of the volume) before reducing the volume in the StorageDomain.

This way, we avoid a potential wrong reduction and save the time of obtaining the optimal_size() in those cases.

Bogus lvm warnings when refreshing volume after extend

When refreshing volume after extend, lvm succeeds with a warning about:

Cannot find matching striped segment for vg/lv...
2022-05-23 12:52:34,799+0300 WARN  (mailbox-hsm/2) [storage.lvm] Command ['/usr/sbin/lvm', 'lvs', '--devices', '/dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-01,/dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-02,/dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-03,/dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-04,/dev/mapper/36001405351b21217d814266b5354d710,/dev/mapper/3600140536effe53a1e242a991f0d44d0,/dev/mapper/360014054f26c2d5c5b74814a0c0cbabb,/dev/mapper/3600140563a17d65f8dd44ad9b84bff97,/dev/mapper/3600140577ea494a9b544084bdf67f5e1,/dev/mapper/36001405789e8ab39498483ea5858c779,/dev/mapper/360014057c4191f0d8f547a6a69180ef1,/dev/mapper/36001405c75ad03669294f4ab344b0028,/dev/mapper/36001405ef11fb3ca9824a5dbc12f0931,/dev/mapper/36001405fa8e8698bae84578baae9c182', '--config', 'devices {  preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3    hints="none"  obtain_device_list_from_udev=0 } global {  prioritise_write_locks=1  wait_for_locks=1  use_lvmpolld=1 } backup {  retain_min=50  retain_days=0 }', '--noheadings', '--units', 'b', '--nosuffix', '--separator', '|', '--ignoreskippedcluster', '-o', 'uuid,name,vg_name,attr,size,seg_start_pe,devices,tags', 'bafd0f16-9aba-4f9f-ba90-46d3b8a29157'] succeeded with warnings: ['  WARNING: Cannot find matching striped segment for bafd0f16-9aba-4f9f-ba90-46d3b8a29157/cd981010-ac48-49e1-b28b-cbde06c085bf.'] (lvm:358)

Fix:

  • Check with LVM folks what does warning mean
  • Suppress it using LVMRunner suppression mechanism

Merge reduces base volume too much when finalizing

When a cold merge is finalized, we reduce the base volume to optimal size. However
the optimal size calculation is wrong for active merge, since the top volume is
removed after finalizing merge. During finalize merge, the top volume still exist,
so the base volume is not leaf. But after the top volume is deleted, the base
volume becomes the leaf, so it needs more space.

        if subchain.base_vol.chunked():
            # optimal_size must be called when the volume is prepared
            optimal_size = subchain.base_vol.optimal_size()
            actual_size = subchain.base_vol.getVolumeSize()

    if subchain.base_vol.chunked() and optimal_size < actual_size:
        _shrink_base_volume(subchain, optimal_size)

This happens only in active merge. For internal merge the base volume does not
become the leaf after the merge, so the current code is correct.

When starting the VM after cold merge, the disk does not have enough free space,
and the guest is more likely to pause when writing to the disk. Usually vdsm
detects that the disk needs extension when the vm is started and extend the disk
before the guest can write any data. Here is an example log:

2022-05-15 22:36:57,091+0300 INFO  (vm/6e3be0cf) [virt.vm] (vmId='6e3be0cf-2d12-4bb6-9409-f6b8e1bf49e3') CPU running: domain i
nitialization (vm:6019)
...
2022-05-15 22:36:57,526+0300 INFO  (periodic/1) [virt.vm] (vmId='6e3be0cf-2d12-4bb6-9409-f6b8e1bf49e3') drive 'sdb' needs exte
nsion (storage:979)
2022-05-15 22:36:57,526+0300 INFO  (periodic/1) [virt.vm] (vmId='6e3be0cf-2d12-4bb6-9409-f6b8e1bf49e3') Requesting an extensio
n for volume e2115e7b-0d88-47df-a8ee-1acb7db6f1d9 on domain bafd0f16-9aba-4f9f-ba90-46d3b8a29157 block_info BlockInfo(index=1,
 name='sdb', path='/rhev/data-center/mnt/blockSD/bafd0f16-9aba-4f9f-ba90-46d3b8a29157/images/e6d71bf6-80cf-407c-b7d0-2083a40e7
584/e2115e7b-0d88-47df-a8ee-1acb7db6f1d9', allocation=0, capacity=10737418240, physical=134217728, threshold=0) (thinp:388)

How to reproduce

  1. Create vm with 10g empty thin disk (creates 2.5g empty volume)

  2. Create snapshot including only think disk (creates new 2.5g snapshot)

  3. Delete the snapshot

  4. When snapshot is deleted, check volume size using lvs:

    lvs --devicesfile= -o vg_name,lv_name,tags,size | grep disk-id
    

Actual result:
Volume size after the operation is 1.0 GiB
(Minimum disk size for cold merge in oVirt < 4.5.1)

Expected result:
Volume size after the operation is 2.62 GiB

Calculating the expected size

The expected size is actual size + 2.5 GiB free space. Since the image is empty,
it actual size is 262144 bytes. You can check using:

# qemu-img check /dev/vg-name/lv-name  
No errors were found on the image.
Image end offset: 262144

2.5 GiB + 262144 bytes is 2684616704 bytes. Value need to be aligned
to VG extent size (128 MiB):

>>> 2684616704 / (128 * 1024**2)
20.001953125

So we need 21 extents:

>>> 21 * 128 * 1024**2
2818572288

Which is 2.62 GiB.

Unnecessary logs when starting/ending storage tasks

When starting and ending tasks, we log unhelpful logs:

2022-05-23 12:29:03,998+0300 INFO  (tasks/9) [storage.threadpool.workerthread] START task 539688e9-a899-4d15-9a30-acd0563e5bf0 (cmd=<bound method Task.commit of <vdsm.storage.task.Task object at 0x7f2cb064c7f0>>, args=None) (threadPool:146)

2022-05-23 12:29:04,069+0300 INFO  (tasks/9) [storage.volume] Request to reduce LV cd981010-ac48-49e1-b28b-cbde06c085bf of image 4b72aba2-0668-463d-b856-7f0f7406b6a2 in VG bafd0f16-9aba-4f9f-ba90-46d3b8a29157 with size = 134217728 allowActive = True (blockVolume:692)

2022-05-23 12:29:04,069+0300 INFO  (tasks/9) [storage.lvm] Reducing LV bafd0f16-9aba-4f9f-ba90-46d3b8a29157/cd981010-ac48-49e1-b28b-cbde06c085bf to 128 megabytes (force=True) (lvm:1704)

2022-05-23 12:29:04,160+0300 WARN  (tasks/9) [storage.lvm] Command ['/usr/sbin/lvm', 'lvreduce', '--devices', '/dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-01,/dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-02,/dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-03', '--config', 'devices {  preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3    hints="none"  obtain_device_list_from_udev=0 } global {  prioritise_write_locks=1  wait_for_locks=1  use_lvmpolld=1 } backup {  retain_min=50  retain_days=0 }', '--autobackup', 'n', '--force', '--size', '128m', 'bafd0f16-9aba-4f9f-ba90-46d3b8a29157/cd981010-ac48-49e1-b28b-cbde06c085bf'] succeeded with warnings: ['  WARNING: Reducing active and open logical volume to 128.00 MiB.', '  THIS MAY DESTROY YOUR DATA (filesystem etc.)'] (lvm:358)

2022-05-23 12:29:04,192+0300 INFO  (tasks/9) [storage.threadpool.workerthread] FINISH task 539688e9-a899-4d15-9a30-acd0563e5bf0 (threadPool:148)

In this case the actual operation was logged by the task:

2022-05-23 12:29:04,069+0300 INFO  (tasks/9) [storage.volume] Request to reduce LV ...

And in lvm:

2022-05-23 12:29:04,069+0300 INFO  (tasks/9) [storage.lvm] Reducing LV ...

The task START/FINISH logs do not add interesting info, except the task id, which
is the only way to locate a task after a task is started by the API.

Fix:

  • Include the task id in all task logs, so we can follow all task logs using the task id and we don't need the start/finish logs.
  • Change the START/FINISH logs to DEBUG level

Bogus warnings in deleteImage flow

If a volume was never activated on the SPM, deleting the image log bogus warnings about
missing files and directories:

2022-05-18 22:30:04,981+0300 INFO  (jsonrpc/1) [vdsm.api] START deleteImage(sdUUID='bafd0f16-9aba-4f9f-ba90-46d3b8a29157', spUUID='fa20ea36-c317-49a6-af6d-afaa5b1561df', imgUUID='e6d71bf6-80cf-407c-b7d0-2083a40e7584', postZero='false', force='false', dis
card=True) from=::ffff:192.168.122.10,53338, flow_id=e1f21682-ae34-4848-940c-bc917543118f, task_id=d3121fae-451a-443e-916d-b3803ce4bba2 (api:48)
2022-05-18 22:30:05,162+0300 INFO  (jsonrpc/1) [storage.lvm] Change LVs tags (vg=bafd0f16-9aba-4f9f-ba90-46d3b8a29157, lvs={'e2115e7b-0d88-47df-a8ee-1acb7db6f1d9': ImgsPar(imgs=['e6d71bf6-80cf-407c-b7d0-2083a40e7584'], parent='00000000-0000-0000-0000-000
000000000')}, delTags=('IU_e6d71bf6-80cf-407c-b7d0-2083a40e7584',), addTags=('IU__remove_me_e6d71bf6-80cf-407c-b7d0-2083a40e7584',)) (lvm:1805)
2022-05-18 22:30:05,224+0300 INFO  (jsonrpc/1) [IOProcessClient] (Global) Starting client (__init__:340)
2022-05-18 22:30:05,235+0300 INFO  (ioprocess/134441) [IOProcess] (Global) Starting ioprocess (__init__:465)
2022-05-18 22:30:05,288+0300 INFO  (jsonrpc/1) [vdsm.api] FINISH deleteImage return=None from=::ffff:192.168.122.10,53338, flow_id=e1f21682-ae34-4848-940c-bc917543118f, task_id=d3121fae-451a-443e-916d-b3803ce4bba2 (api:54)
2022-05-18 22:30:05,340+0300 INFO  (tasks/6) [storage.threadpool.workerthread] START task d3121fae-451a-443e-916d-b3803ce4bba2 (cmd=<bound method Task.commit of <vdsm.storage.task.Task object at 0x7fd45122f128>>, args=None) (threadPool:146)
2022-05-18 22:30:05,369+0300 INFO  (tmap-274/0) [storage.lvm] Activating lvs: vg=bafd0f16-9aba-4f9f-ba90-46d3b8a29157 lvs=['e2115e7b-0d88-47df-a8ee-1acb7db6f1d9'] (lvm:1758)
2022-05-18 22:30:05,456+0300 INFO  (tmap-274/0) [storage.blockdev] Discarding device /dev/bafd0f16-9aba-4f9f-ba90-46d3b8a29157/e2115e7b-0d88-47df-a8ee-1acb7db6f1d9 (blockdev:100)
2022-05-18 22:30:05,472+0300 INFO  (tmap-274/0) [storage.blockdev] Discarded device /dev/bafd0f16-9aba-4f9f-ba90-46d3b8a29157/e2115e7b-0d88-47df-a8ee-1acb7db6f1d9: 0.02 seconds (utils:390)
2022-05-18 22:30:05,472+0300 INFO  (tmap-274/0) [storage.lvm] Deactivating lvs: vg=bafd0f16-9aba-4f9f-ba90-46d3b8a29157 lvs=['e2115e7b-0d88-47df-a8ee-1acb7db6f1d9'] (lvm:1769)
2022-05-18 22:30:05,542+0300 INFO  (tmap-274/0) [storage.lvm] Removing LVs (vg=bafd0f16-9aba-4f9f-ba90-46d3b8a29157, lvs=('e2115e7b-0d88-47df-a8ee-1acb7db6f1d9',)) (lvm:1616)
2022-05-18 22:30:05,621+0300 INFO  (tasks/6) [storage.storagedomainmanifest] Unlink volume symlink '/rhev/data-center/mnt/blockSD/bafd0f16-9aba-4f9f-ba90-46d3b8a29157/images/e6d71bf6-80cf-407c-b7d0-2083a40e7584/e2115e7b-0d88-47df-a8ee-1acb7db6f1d9' (bloc
kSD:690)
2022-05-18 22:30:05,621+0300 WARN  (tasks/6) [storage.storagedomainmanifest] Can't unlink /rhev/data-center/mnt/blockSD/bafd0f16-9aba-4f9f-ba90-46d3b8a29157/images/e6d71bf6-80cf-407c-b7d0-2083a40e7584/e2115e7b-0d88-47df-a8ee-1acb7db6f1d9. [Errno 2] No su
ch file or directory: '/rhev/data-center/mnt/blockSD/bafd0f16-9aba-4f9f-ba90-46d3b8a29157/images/e6d71bf6-80cf-407c-b7d0-2083a40e7584/e2115e7b-0d88-47df-a8ee-1acb7db6f1d9' (blockSD:694)
2022-05-18 22:30:05,621+0300 INFO  (tasks/6) [storage.storagedomainmanifest] removed: [] (blockSD:697)
2022-05-18 22:30:05,621+0300 INFO  (tasks/6) [storage.storagedomainmanifest] Removing image directory '/rhev/data-center/mnt/blockSD/bafd0f16-9aba-4f9f-ba90-46d3b8a29157/images/e6d71bf6-80cf-407c-b7d0-2083a40e7584' (blockSD:703)
2022-05-18 22:30:05,621+0300 WARN  (tasks/6) [storage.storagedomainmanifest] Can't rmdir /rhev/data-center/mnt/blockSD/bafd0f16-9aba-4f9f-ba90-46d3b8a29157/images/e6d71bf6-80cf-407c-b7d0-2083a40e7584 (blockSD:707)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/blockSD.py", line 705, in rmDCImgDir
    os.rmdir(imgPath)
FileNotFoundError: [Errno 2] No such file or directory: '/rhev/data-center/mnt/blockSD/bafd0f16-9aba-4f9f-ba90-46d3b8a29157/images/e6d71bf6-80cf-407c-b7d0-2083a40e7584'
2022-05-18 22:30:05,663+0300 INFO  (tasks/6) [storage.threadpool.workerthread] FINISH task d3121fae-451a-443e-916d-b3803ce4bba2 (threadPool:148)

The issue is likey in the bad infrastructure warning about missing files and directories
when removing them.

How to reproduce

  1. Create disk on block storage
  2. Delete the disk
  3. Check vdsm log on the SPM host

Actual results:
Bogus warnings

Expected results:
No warning about missing links and directories in the flow.

Fix

  • Change code to use fixed infrastructure that does not log useless warnings for expected conditions
  • Or fix the bad infrastructure to avoid the warnings and tracebacks.

Copying master file system can fail or miss data updated during the copy

When coping master file system from old master domain during
StoragePool.switchMaster or StorageDomain.deactivate, vdsm users tar to copy
the contents of the master file system. If another task changes state during
the copy, the copy can fail, or be incorrect.

We have 2 issues:

  1. tar fails with "/usr/bin/tar: ./path: file changed as we read it"

2021-09-14 00:33:44,369+0300 ERROR (tasks/1) [storage.taskmanager.task] (Task='928b287c-fdd0-487f-bec5-f98d63b63ba2') Unexpect
ed error (task:877)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 884, in _run
return fn(*args, **kargs)
File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 350, in run
return self.cmd(*self.argslist, **self.argsdict)
File "/usr/lib/python3.6/site-packages/vdsm/storage/securable.py", line 79, in wrapper
return method(self, *args, **kwargs)
File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 989, in switchMaster
self.masterMigrate(oldMasterUUID, newMasterUUID, masterVersion)
File "/usr/lib/python3.6/site-packages/vdsm/storage/securable.py", line 79, in wrapper
return method(self, *args, **kwargs)
File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 914, in masterMigrate
exclude=('./lost+found',))
File "/usr/lib/python3.6/site-packages/vdsm/storage/fileUtils.py", line 101, in tarCopy
raise se.TarCommandError(errors)
vdsm.storage.exception.TarCommandError: Tar command failed: ({'reader': {'cmd': ['/usr/bin/tar', 'cf', '-', '--exclude=./lost+
found', '-C', '/rhev/data-center/mnt/sparse:_export_00/60edfd87-a97c-497c-85a2-2f044993bc2e/master', '.'], 'rc': 1, 'err': '/u
sr/bin/tar: ./tasks/1b1ef2f3-6e2b-4dde-9e07-f720569ccd76: file changed as we read it\n/usr/bin/tar: ./tasks/1b1ef2f3-6e2b-4dde
-9e07-f720569ccd76.temp: File removed before we read it\n/usr/bin/tar: ./tasks: file changed as we read it\n'}},)

This will fail the copy and the API call. The user can try the operation
again.

  1. Tasks status is persisted after a file was copied with tar.

The tasks on the new master storage domain will not include the correct status
of the task. This can cause failure when trying to roll back the tasks on
the new master storage domain.

This is silent failure, we don't know if this happens, and we don't have a way to
detect the issue in the logs.

Version-Release number of selected component (if applicable):

  • Exists since first vdsm version.
  • More important since 4.4.4, when StoragePool.switchMaster was introduced.

How reproducible:
Very hard to reproduce. I could reproduce once when running 200 switch master
operations in a loop, while moving disks between other storage domains.

Steps to Reproduce:

  1. Run change_mater.py example from the sdk in a loop
  2. At the same time, move disks between storage domains in a loop

I reproduced the issue only on NFS storage, but it should affect any storage type.

Previous attempt to fix that: https://gerrit.ovirt.org/116413

Original bug: https://bugzilla.redhat.com/2012843

Rewrite doc/thin-provisioning.md

doc/thin-provisioning.md is severely outdated and needs a rewrite.

Integrate check program for monitoring storage domains

Description of problem:

Current storage domain monitoring is horribly inefficient, running

dd if=/path/to/monitor bs=4096 count=1 iflag=direct of=/dev/null

Every 10 seconds, for every active storage domain. The program stderr
is parsed to extract the timing info for reporting the read delay.

To manage the dd processes, we have an event loop monitoring the dd
child processes and submitting results back to storage monitoring.
This code is based on asyncio event loop from python 3.4, modified to
work with asyncore.

asyncore was deprecated in python 3.6[1] - the documentation clearly says:

Deprecated since version 3.6: Please use asyncio instead.

We are lucky that python developers kept asyncore in python 3.9. However
is was removed in python 3.11[2].

It is possible to port the existing code to asyncio, but this code is complex
and will be inefficient after porting.

The right way to fix this is to use the check helper program:

https://github.com/nirs/check

This is a helper process similar to iporocess, but much simpler, written in
Go. To integrate it, we need to write a python client similar to ioprocess
python client that will communicate with the helper child process.

Unlike ioprocess we need to run only one check helper, not one helper per
storage domain.

Since the check program is much more efficient, we can start to monitor additional
paths, and improve monitoring to do also writes, not only reads.

[1] https://docs.python.org/3.6/library/asyncore.html?highlight=asyncore
[2] https://twitter.com/VictorStinner/status/1460391333265346564

Originally filed in bugzilla: https://bugzilla.redhat.com/2024119

Remove vdsm.common.compat module

This module was added to handle differences between python versions:

  • simplejson on RHEL 6.x
  • subprocess32 on python 2
  • missing glob.escape on python 2
  • using inspect.getargspec on python 2
  • handling missing modules on python 3

All these hacks are not needed for a while, so the module can be removed.

Dependnecies:

  • vdsm.storage.compat depending on vdsm.common.compat.MissingModule.

Depends: #56

Support standard NBD URL syntax

Description

NBD protocol introduced new URL syntax recenlty:
https://github.com/NetworkBlockDevice/nbd/blob/master/doc/uri.md

NBD URI Export name
nbd+unix:///disk?socket=sock disk
nbd+unix:///?socket=sock (empty string)
nbd://example.com/disk disk
nbd://example.com/ (empty string)

Vdsm generates qemu non-standard syntax:

QEMU NBD URI Export name
nbd:unix:/socket:exportname=disk disk
nbd:example.com/disk disk

We want to switch to the standard NBD URL syntax for better compatibility with
virt-v2v, and possibly allowing backup using NBD socket.

Vdsm generates NBD URL in:

  • NBD.start_server
  • VM.start_backup

In both cases we need to switch to the new syntax.

This change depends on oVirt/ovirt-imageio#68
we cannot generate URLs that imageio does not support, since imageio is
consuming these URLs when downloading disks and backups.

We also need to check that qemu-nbd and qemu support the new syntax, we never
tested the new syntax.

How this can work with mixed environment?

Old hosts will generate the old non-standard NBD URL, and new hosts the standard NBD URL.
Upload on new host uses imageio supporting new NBD URL. Upload on old hosts use imageio
that may not support new standard NBD URL.

Plan

  • Require imageio version supporting standard NBD URL
  • Generate standard NBD URL

More info

Original bug: https://bugzilla.redhat.com/1849097

Merge over extends the base volume during the merge

Before merge, the base volume is extended too much, using a dumb way:

new_size = top_size + base_size + chunk_size

This is the worst possible case, when both top and base have no free space, and all clusters in top are not in base. But in the common case, some clusters in top are already in base, so the
needed allocation is much smaller.

After the merge, the base volume is shrinked to optimal size, release the unneeded space to the system.

Because too much space was allocated, we can have these issues during the merge:

  • The storage domain can become full - VMs will pause when they don't have enough space to extend thin disks
  • Other storage operations to fail, because there is not enough free space
  • Preparing the merge will fail, because there is not enough space to extend base

How to reproduce

  1. Create VM with 30g thin disk

  2. In the guest, write the first 10g

    # dd if=/dev/zero bs=8M count=1280 of=/dev/sdb oflag=direct
    
  3. Create a snapshot including the 30g disk. This creates a new image with actual size of 2.5g.

  4. In the guest, write again the first 10g

    # dd if=/dev/zero bs=8M count=1280 | tr "\0" "\1" > /dev/sdb; sync
    

    This extends the top image to 12.5g

  5. Shut down the vm

  6. Delete snapshot in engine UI

On the SPM, the base volume is extended to 27.5g:

2022-04-13 18:49:49,004+0300 INFO  (tasks/7) [storage.lvm] Extending LV bafd0f16-9aba-4f9f-ba90-46d3b8a29157/9106b4de-bb36-4d3f-86ea-76bedbab842f to 28160 megabytes (lvm:1616)

After the merge completes, the image is reduced to optimal size: 10243 MiB (10.002 GiB)

2022-04-13 18:50:20,767+0300 INFO  (tasks/6) [storage.lvm] Reducing LV bafd0f16-9aba-4f9f-ba90-46d3b8a29157/9106b4de-bb36-4d3f-86ea-76bedbab842f to 10243 megabytes (force=False) (lvm:1654)

This request is aligned to lvm extent size (128 MiB):

# lvs --devicesfile= -o vg_name,lv_name,tags,size | grep f1fd2ae0-3217-4ef7-b870-fe5469dd3fa2

bafd0f16-9aba-4f9f-ba90-46d3b8a29157 9106b4de-bb36-4d3f-86ea-76bedbab842f IU_f1fd2ae0-3217-4ef7-b870-fe5469dd3fa2,MD_7,PU_00000000-0000-0000-0000-000000000000 10.12g

Summary:

  1. The size required for merging was 10.12g
  2. The LV was resized to 27.5g during the merge.

Host cannot connect to storage domains

After upgrade from 4.4 to 4.5 host cannot be activated because cannot connect to data domain.
I have a data domain in NFS (master) and a GlusterFS. It complains about the Gluster domain:
The error message for connection node1-teste.acloud.pt:/data1 returned by VDSM was: XML error

# rpm -qa|grep glusterfs*
glusterfs-10.1-1.el8s.x86_64
glusterfs-selinux-2.0.1-1.el8s.noarch
glusterfs-client-xlators-10.1-1.el8s.x86_64
glusterfs-events-10.1-1.el8s.x86_64
libglusterfs0-10.1-1.el8s.x86_64
glusterfs-fuse-10.1-1.el8s.x86_64
glusterfs-server-10.1-1.el8s.x86_64
glusterfs-cli-10.1-1.el8s.x86_64
glusterfs-geo-replication-10.1-1.el8s.x86_64

engine log:

[2022-04-27 13](callto:2022-04-27 13):35:16,118+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-66) [e
be79c6] EVENT_ID: VDS_STORAGES_CONNECTION_FAILED(188), Failed to connect Host NODE1 to the Storage Domains DATA1.
[2022-04-27 13](callto:2022-04-27 13):35:16,169+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-66) [e
be79c6] EVENT_ID: STORAGE_DOMAIN_ERROR(996), The error message for connection node1-teste.acloud.pt:/data1 returned by VDSM was: XML error
[2022-04-27 13](callto:2022-04-27 13):35:16,170+01 ERROR [org.ovirt.engine.core.bll.storage.connection.FileStorageHelper] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-66) [ebe79c6
] The connection with details 'node1-teste.acloud.pt:/data1' failed because of error code '4106' and error message is: xml error

vdsm log:

[2022-04-27 13](callto:2022-04-27 13):40:07,125+0100 ERROR (jsonrpc/4) [storage.storageServer] Could not connect to storage server (storageServer:92)
Traceback (most recent call last):
 File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 90, in connect_all
   con.connect()
 File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 233, in connect
   self.validate()
 File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 365, in validate
   if not self.volinfo:
 File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 352, in volinfo
   self._volinfo = self._get_gluster_volinfo()
 File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 405, in _get_gluster_volinfo
   self._volfileserver)
 File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 56, in __call__
   return callMethod()
 File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda>
   **kwargs)
 File "<string>", line 2, in glusterVolumeInfo
 File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in _callmethod
   raise convert_to_error(kind, result)
vdsm.gluster.exception.GlusterXmlErrorException: XML error: rc=0 out=() err=[b'<cliOutput>\n  <opRet>0</opRet>\n  <opErrno>0</opErrno>\n  <opErrstr />\n  <volInfo>\n    <volumes>\
n      <volume>\n        <name>data1</name>\n        <id>d7eb2c38-[2707-4774-9873](callto:2707-4774-9873)-a7303d024669</id>\n        <status>1</status>\n        <statusStr>Started</statusStr>\n        <sn
apshotCount>0</snapshotCount>\n        <brickCount>2</brickCount>\n        <distCount>2</distCount>\n        <replicaCount>1</replicaCount>\n        <arbiterCount>0</arbiterCount>
\n        <disperseCount>0</disperseCount>\n        <redundancyCount>0</redundancyCount>\n        <type>0</type>\n        <typeStr>Distribute</typeStr>\n        <transport>0</tran
sport>\n        <bricks>\n          <brick uuid="08c7ba5f-9aca-49c5-abfd-8a3e42dd8c0b">node1-teste.acloud.pt:/home/brick1<name>node1-teste.acloud.pt:/home/brick1</name><hostUuid>0
8c7ba5f-9aca-49c5-abfd-8a3e42dd8c0b</hostUuid><isArbiter>0</isArbiter></brick>\n          <brick uuid="08c7ba5f-9aca-49c5-abfd-8a3e42dd8c0b">node1-teste.acloud.pt:/brick2<name>nod
e1-teste.acloud.pt:/brick2</name><hostUuid>08c7ba5f-9aca-49c5-abfd-8a3e42dd8c0b</hostUuid><isArbiter>0</isArbiter></brick>\n        </bricks>\n        <optCount>23</optCount>\n   
    <options>\n          <option>\n            <name>nfs.disable</name>\n            <value>on</value>\n          </option>\n          <option>\n            <name>transport.addre
ss-family</name>\n            <value>inet</value>\n          </option>\n          <option>\n            <name>storage.fips-mode-rchecksum</name>\n            <value>on</value>\n  
       </option>\n          <option>\n            <name>storage.owner-uid</name>\n            <value>36</value>\n          </option>\n          <option>\n            <name>storag
e.owner-gid</name>\n            <value>36</value>\n          </option>\n          <option>\n            <name>cluster.min-free-disk</name>\n            <value>5%</value>\n        
 </option>\n          <option>\n            <name>performance.quick-read</name>\n            <value>off</value>\n          </option>\n          <option>\n            <name>perfor
mance.read-ahead</name>\n            <value>off</value>\n          </option>\n          <option>\n            <name>performance.io-cache</name>\n            <value>off</value>\n  
       </option>\n          <option>\n            <name>performance.low-prio-threads</name>\n            <value>32</value>\n          </option>\n          <option>\n            <
name>network.remote-dio</name>\n            <value>enable</value>\n          </option>\n          <option>\n            <name>cluster.eager-lock</name>\n            <value>enable<
/value>\n          </option>\n          <option>\n            <name>cluster.quorum-type</name>\n            <value>auto</value>\n          </option>\n          <option>\n         
  <name>cluster.server-quorum-type</name>\n            <value>server</value>\n          </option>\n          <option>\n            <name>cluster.data-self-heal-algorithm</name>\n
           <value>full</value>\n          </option>\n          <option>\n            <name>cluster.locking-scheme</name>\n            <value>granular</value>\n          </option>
\n          <option>\n            <name>cluster.shd-wait-qlength</name>\n            <value>10000</value>\n          </option>\n          <option>\n            <name>features.shar
d</name>\n            <value>off</value>\n          </option>\n          <option>\n            <name>user.cifs</name>\n            <value>off</value>\n          </option>\n       
  <option>\n            <name>cluster.choose-local</name>\n            <value>off</value>\n          </option>\n          <option>\n            <name>client.event-threads</name>\
n            <value>4</value>\n          </option>\n          <option>\n            <name>server.event-threads</name>\n            <value>4</value>\n          </option>\n         
<option>\n            <name>performance.client-io-threads</name>\n            <value>on</value>\n          </option>\n        </options>\n      </volume>\n      <count>1</count>\
n    </volumes>\n  </volInfo>\n</cliOutput>']
[2022-04-27 13](callto:2022-04-27 13):40:07,125+0100 INFO  (jsonrpc/4) [storage.storagedomaincache] Invalidating storage domain cache (sdc:74)
[2022-04-27 13](callto:2022-04-27 13):40:07,125+0100 INFO  (jsonrpc/4) [vdsm.api] FINISH connectStorageServer return={'statuslist': [{'id': 'dede3145-651a-4b01-b8d2-82bff8670696', 'status': 4106}]} from=
::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132), flow_id=4c170005, task_id=cec6f36f-46a4-462c-9d0a-feb8d814b465 (api:54)
[2022-04-27 13](callto:2022-04-27 13):40:07,410+0100 INFO  (jsonrpc/5) [api.host] START getAllVmStats() from=::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132) (api:48)
[2022-04-27 13](callto:2022-04-27 13):40:07,411+0100 INFO  (jsonrpc/5) [api.host] FINISH getAllVmStats return={'status': {'code': 0, 'message': 'Done'}, 'statsList': (suppressed)} from=::ffff:192.168.5.1
65,42132 (api:54)
[2022-04-27 13](callto:2022-04-27 13):40:07,785+0100 INFO  (jsonrpc/7) [api.host] START getStats() from=::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132) (api:48)
[2022-04-27 13](callto:2022-04-27 13):40:07,797+0100 INFO  (jsonrpc/7) [vdsm.api] START repoStats(domains=()) from=::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132), task_id=4fa4e8c4-7c65-499a-827e-8ae153aa875e (api:48)
[2022-04-27 13](callto:2022-04-27 13):40:07,797+0100 INFO  (jsonrpc/7) [vdsm.api] FINISH repoStats return={} from=::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132), task_id=4fa4e8c4-7c65-499a-827e-8ae153aa875e (api:54)
[2022-04-27 13](callto:2022-04-27 13):40:07,797+0100 INFO  (jsonrpc/7) [vdsm.api] START multipath_health() from=::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132), task_id=c6390f2a-845b-420b-a833-475605a24078 (api:48)
[2022-04-27 13](callto:2022-04-27 13):40:07,797+0100 INFO  (jsonrpc/7) [vdsm.api] FINISH multipath_health return={} from=::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132), task_id=c6390f2a-845b-420b-a833-475605a24078 (api:54)
[2022-04-27 13](callto:2022-04-27 13):40:07,802+0100 INFO  (jsonrpc/7) [api.host] FINISH getStats return={'status': {'code': 0, 'message': 'Done'}, 'info': (suppressed)} from=::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132) (
api:54)
[2022-04-27 13](callto:2022-04-27 13):40:11,980+0100 INFO  (jsonrpc/6) [api.host] START getAllVmStats() from=::1,37040 (api:48)
[2022-04-27 13](callto:2022-04-27 13):40:11,980+0100 INFO  (jsonrpc/6) [api.host] FINISH getAllVmStats return={'status': {'code': 0, 'message': 'Done'}, 'statsList': (suppressed)} from=::1,37040 (api:54)
[2022-04-27 13](callto:2022-04-27 13):40:12,365+0100 INFO  (periodic/2) [vdsm.api] START repoStats(domains=()) from=internal, task_id=f5084096-e5c5-4ca8-9c47-a92fa5790484 (api:48)
[2022-04-27 13](callto:2022-04-27 13):40:12,365+0100 INFO  (periodic/2) [vdsm.api] FINISH repoStats return={} from=internal, task_id=f5084096-e5c5-4ca8-9c47-a92fa5790484 (api:54)
[2022-04-27 13](callto:2022-04-27 13):40:22,417+0100 INFO  (jsonrpc/0) [api.host] START getAllVmStats() from=::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132) (api:48)
[2022-04-27 13](callto:2022-04-27 13):40:22,417+0100 INFO  (jsonrpc/0) [api.host] FINISH getAllVmStats return={'status': {'code': 0, 'message': 'Done'}, 'statsList': (suppressed)} from=::ffff:192.168.5.1
65,42132 (api:54)
[2022-04-27 13](callto:2022-04-27 13):40:22,805+0100 INFO  (jsonrpc/1) [api.host] START getStats() from=::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132) (api:48)
[2022-04-27 13](callto:2022-04-27 13):40:22,816+0100 INFO  (jsonrpc/1) [vdsm.api] START repoStats(domains=()) from=::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132), task_id=a9fb939c-ea1a-4116-a22f-d14a99e6eada (api:48)
[2022-04-27 13](callto:2022-04-27 13):40:22,816+0100 INFO  (jsonrpc/1) [vdsm.api] FINISH repoStats return={} from=::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132), task_id=a9fb939c-ea1a-4116-a22f-d14a99e6eada (api:54)
[2022-04-27 13](callto:2022-04-27 13):40:22,816+0100 INFO  (jsonrpc/1) [vdsm.api] START multipath_health() from=::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132), task_id=5eee2f63-2631-446a-98dd-4947f9499f8f (api:48)
[2022-04-27 13](callto:2022-04-27 13):40:22,816+0100 INFO  (jsonrpc/1) [vdsm.api] FINISH multipath_health return={} from=::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132), task_id=5eee2f63-2631-446a-98dd-4947f9499f8f (api:54)
[2022-04-27 13](callto:2022-04-27 13):40:22,822+0100 INFO  (jsonrpc/1) [api.host] FINISH getStats return={'status': {'code': 0, 'message': 'Done'}, 'info': (suppressed)} from=::ffff:[192.168.5.165,42132](callto:192.168.5.165,42132) (
api:54)

lvm_tests: tests should not assume system configuration

Description

LVM tests try to fake a config using lvm filter, but this cannot work for testing real lvm commands. The test must not assume anything about the system configuration.

This command is wrong:

    # Simulate removal of the second PV on another host, leaving stale PV in
    # the cache.
    commands.run([
        "vgreduce",
        "--config", tmp_storage.lvm_config(),
        vg_name,
        stale_pv_name,
    ])

The test and the code can also not change the system configuration, it should adapt to it and use --devices on this system.

Reproduce

Run a single test that employs the stale_pv fixture as root on a CentOS stream 8 or other supported system, e.g.:

  • sudo ~/.venv/vdsm/bin/tox -e storage tests/storage/lvm_test.py -- -k test_pv_stale_reload_one_stale

Test will fail with:

E           vdsm.common.cmdutils.Error: Command ['vgreduce', '--config', 'global { use_lvmetad=0 } devices { write_cache_state=0 filter=["a|^/dev/loop10$|^/dev/loop11$|", "r|.*|"] }', '5f6c5d91-b734-4785-99a8-13aa137f61c0', '/dev/loop11'] failed with rc=5 out=b'' err=b'  Cannot use /dev/loop11: device is not in devices file\n  Volume group "5f6c5d91-b734-4785-99a8-13aa137f61c0" not found\n  Cannot process volume group 5f6c5d91-b734-4785-99a8-13aa137f61c0\n'

Actions to fix it

We can fix this in two steps:

  1. Check how we build the command in the real code, and replace the manual code
  2. Ideally, we should really run these tests in a new VM started for this purpose, to allow quick feedback

fsFreeze timeout not working

It seems like the fsFreeze timeout is not doing its job correctly.

Today I had a Windows VM that was snapshotted but snapshot did not complete because the fsFreeze failed (vm was having memory issues).
I had to poweroff the VM to have the snapshot fail:

2022-05-03 10:19:01,667+0200 WARN  (virt/5612d87b) [virt.vm] (vmId='808827db-1b8b-4563-b085-7b34fec4dde1') Unable to freeze guest filesystems: Guest agent is not responding: Guest agent disappeared while executing command (vm:4312)
2022-05-03 10:19:01,667+0200 ERROR (virt/5612d87b) [virt.vm] (vmId='808827db-1b8b-4563-b085-7b34fec4dde1') Non-memory snapshot timeout 480 passed after 27905.794062350877 seconds (snapshot:483)

The timeout was added in commit f6a1cf8

The freeze() call is blocking (https://github.com/oVirt/vdsm/blob/master/lib/vdsm/virt/jobs/snapshot.py#L476), so it never reaches the timeout part?

@liranr23

This caused some other issues as well, see https://bugzilla.redhat.com/show_bug.cgi?id=2081294

Run supervdsmd in the tests

Code that need to run as root is using the supervdsmd daemon to run function from vdsm code
as root.

Some code have smart wrapper that run calls supevdsmd API when not running as root, and run
the actual code when running as root. This allows testing the code by running the tests as
root, without a running supervdsm daemon.

This is bad, since code that pass the tests (running as root) may fail in runtime (running as vdsm). We must test code that use supervdsm with running supervdsm daemon.

Some tests like tests/storage/nbd_test.py show how to run the test with supervdsmd daemon, serving the current user:

To run this test you must run the tests as root, or have writable /run/vdsm and
running supervdsm serving the user running the tests.

To setup the environment for unprivileged user:

    $ sudo mkdir /run/vdsm

    $ sudo chown $USER:$USER /run/vdsm

    $ sudo env PYTHONPATH=lib static/libexec/vdsm/supervdsmd \
          --data-center /var/tmp/vdsm/data-center \
          --transient-disks /var/tmp/vdsm/transient-disks \
          --sockfile /run/vdsm/svdsm.sock \
          --user=$USER \
          --group=$USER \
          --logger-conf tests/conf/svdsm.logger.conf \
          --disable-gluster \
          --disable-network

This works, but experience show that developers fail to run the nbd tests even with these instructions.

Fix:

  • Run supervdsmd serving the current user running the tests. This should be done as preparation step for the tests, similar to "make storage"
  • Make tests that require root and use supervdsm are enabled when running as unprivileged user and supervdsm is running, serving the current user.
  • Make it easy to setup supervdsmd daemon locally like "make storage".
  • Make it easy to restarst supervdsmd - required when changing vdsm code used by supervdsmd.
  • Consider removing the wrapper allowing testing by running as root instead of via supervdsmd.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.