We've noticed that the QCOW2 overhead of 1.1 the disk size seems to be not enough in s

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks for the answer <a class="user-mention notranslate" data-hovercard-type="user" d

You mentioned 70g disk, but this disk is only 15? <p di

You mentioned 70g disk, but this disk is only 15? </bloc

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

VM pauses after qcow2 volume is extended to maximum size, impossible "image end offset",about ovirt/vdsm

Comments (14)

dupondje commented on July 17, 2024

@nirs : I think this is your area, if you could have a look at the questions it would be awesome!

from vdsm.

nirs commented on July 17, 2024

The 1.1 factor is bad because the actual value is less then 1%.

You may have leaked clusters or huge amount of dirty bitmaps that were not deleted by a backup application.

Please share the output of

qemu-img info /image
qemu-img check /image

Notes:

If the disk is on block storage you will have to activate the LV while you inspect the image.
If the VM os up, you need to stop it for inspection.

from vdsm.

nirs commented on July 17, 2024

Is there a way to calculate/estimate how big an qcow2 image can grow?

Yes, ‘qemu-img measure’ can tell you the required size for an image, plus the required size for the bitmaps.

Vdsm uses it to measure the required size before copying disks or merging snapshots. ovirt-img uses it before uploading an image to allocate enough space for the disk.

The fully-allocated value tells the worst case, when all clusters are allocated.

from vdsm.

dupondje commented on July 17, 2024

Thanks for the answer @nirs !

$ qemu-img info qcow2.img 
image: qcow2.img
file format: qcow2
virtual size: 15 GiB (16106127360 bytes)
disk size: 16.5 GiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    bitmaps:
        [0]:
            flags:
                [0]: in-use
                [1]: auto
            name: 428fae80-3892-4083-9107-51fb76a7f06b
            granularity: 65536
        [1]:
            flags:
                [0]: in-use
                [1]: auto
            name: 51ccd1fc-08a4-485d-8c04-0eb750665e05
            granularity: 65536
        [2]:
            flags:
                [0]: in-use
                [1]: auto
            name: 19796bed-56a5-44c1-a7f2-dae633e65c87
            granularity: 65536
        [3]:
            flags:
                [0]: in-use
                [1]: auto
            name: 13056186-e65e-448e-a3c3-019ab25d3a27
            granularity: 65536
    refcount bits: 16
    corrupt: false
    extended l2: false

$ qemu-img check qcow2.img 
No errors were found on the image.
199703/245760 = 81.26% allocated, 16.27% fragmented, 0.00% compressed clusters
Image end offset: 17716150272

qemu-img measure -O qcow2 qcow2.img 
required size: 13090422784
fully allocated size: 16108814336
bitmaps size: 589824

But still it did want to grow ... And causes the VM to end up in a pause state.
Next to that, if I migrated the VM to another storage domain, the storage usage was normal again.

PS: the qcow2.img is just a dd'ed dump of the VM LV.

from vdsm.

nirs commented on July 17, 2024

$ qemu-img info qcow2.img
virtual size: 15 GiB (16106127360 bytes)

You mentioned 70g disk, but this disk is only 15?

bitmaps:

Only 4 bitmaps so this is not the issue.

$ qemu-img check qcow2.img
No errors were found on the image.
199703/245760 = 81.26% allocated, 16.27% fragmented, 0.00% compressed clusters
Image end offset: 17716150272

No leaked clusters, so this is not the issue.

But - the image end offset does not make sense:

>>> 17716150272/1024**3
16.49945068359375

This should not be possible for 16g image.

qemu-img measure -O qcow2 qcow2.img 
required size: 13090422784
fully allocated size: 16108814336
bitmaps size: 589824

Maximum possible size for this disk is 16108814336 + 589824,
which is 16109404160 (15.0030517578125)

(If you add more bitmaps you will need more space).

But still it did want to grow ... And causes the VM to end up in a pause state. Next to that, if I migrated the VM to another storage domain, the storage usage was normal again.

It sounds like the qcow2 image is corrupted in some way, and copying
the data to another storage domain fixed this issue by coping the data into
a new fresh qcow2 image.

PS: the qcow2.img is just a dd'ed dump of the VM LV.

This should be fine, but if the VM is running while you dd, you may get inconsistent
qcow2 image.

To complete the picture, please share the volume metadata. The easier way is to
dump the storage domain and paste here the json for the relevant volume.

The command is something like (I don't work on oVirt for while):

vdsm-client StorageDomain dump sd_id=storge-domain-id

And grep the relevant volume id in the output.

Regardless of the volume metadata, I think we have a corrupted qcow2 image,
that qemu developer would like to inspect. Please make the image available
if possible somewhere, and file a qemu bug about this.

As a workaround to fix such images, migrating to another storage domain is
seems like the best way to fix.

from vdsm.

dupondje commented on July 17, 2024

You mentioned 70g disk, but this disk is only 15?

Correct, had it on a 70GB disk also. But I only copied it once I had on a 15GB disk.
Same issue. Just another disk (size).

But - the image end offset does not make sense:
>>> 17716150272/1024**3
16.49945068359375
This should not be possible for 16g image.

I think qemu-img info just reports the filesize as image end offset? And as this is a dd copy of the LV, it has the size of the LV (aka 15GB*1.1).
qemu-img info reports 0 for disk size when disk is on a LV/blockdev.

It sounds like the qcow2 image is corrupted in some way, and copying the data to another storage domain fixed this issue by coping the data into a new fresh qcow2 image.

Happend on multiple VM's already, which is quite strange then.

This should be fine, but if the VM is running while you dd, you may get inconsistent qcow2 image.

The VM was in paused state. So should be safe :)

To complete the picture, please share the volume metadata. The easier way is to dump the storage domain and paste here the json for the relevant volume.

The command is something like (I don't work on oVirt for while):
vdsm-client StorageDomain dump sd_id=storge-domain-id
And grep the relevant volume id in the output.

You still know the commands by hart :) The volume was resized to 20GB after the issue, so I doubt its still relavant.

        "91a454a2-6139-4794-8d70-b18403323ebf": {
            "apparentsize": 14629732352,
            "capacity": 21474836480,
            "ctime": 1674805919,
            "description": "{\"DiskAlias\":\"\",\"DiskDescription\":\"\"}",
            "disktype": "DATA",
            "format": "COW",
            "generation": 1,
            "image": "6ccc3ee1-02e5-4fad-b1b7-9f2d6c187416",
            "legality": "LEGAL",
            "mdslot": 33,
            "parent": "00000000-0000-0000-0000-000000000000",
            "sequence": 0,
            "status": "OK",
            "truesize": 14629732352,
            "type": "SPARSE",
            "voltype": "LEAF"
        },

Regardless of the volume metadata, I think we have a corrupted qcow2 image, that qemu developer would like to inspect. Please make the image available if possible somewhere, and file a qemu bug about this.

It might contain sensitive data, so its hard to share it somewhere online :)

As a workaround to fix such images, migrating to another storage domain is seems like the best way to fix.

Thing is, the image was migrated to another storage domain, everything went fine for 2 months, and then it occured again.
Which makes me think its not a corruption but something else.

from vdsm.

nirs commented on July 17, 2024

You mentioned 70g disk, but this disk is only 15?

Correct, had it on a 70GB disk also. But I only copied it once I had on a 15GB disk. Same issue. Just another disk (size).

So it happened on at least 2 disks multiple VMs based on next comment.
This may be good since your system is more likely to reproduce the issue.

But - the image end offset does not make sense:
>>> 17716150272/1024**3
16.49945068359375
This should not be possible for 16g image.
I think qemu-img info just reports the filesize as image end offset?

"image end offset" is reported by qemu-img check. This is the highest
offset used by the image. If you truncate the file to this size, or
reduce a logical volume to this size, you will not corrupt the image.

And as this is a dd copy of the LV, it has the size of the LV (aka 15GB*1.1).

Right, your image may have some unused space at the end of the LV. But image end
offset is computed based on qcow2 metadata.

It sounds like the qcow2 image is corrupted in some way, and copying the data to another storage domain fixed this issue by coping the data into a new fresh qcow2 image.

Happend on multiple VM's already, which is quite strange then.

To complete the picture, please share the volume metadata. The easier way is to dump the storage domain and paste here the json for the relevant volume.
The command is something like (I don't work on oVirt for while):
vdsm-client StorageDomain dump sd_id=storge-domain-id
And grep the relevant volume id in the output.
You still know the commands by hart :) The volume was resized to 20GB after the issue, so I doubt its still relavant.

You edited the disk size on engine, adding 5g?

        "91a454a2-6139-4794-8d70-b18403323ebf": {
            "apparentsize": 14629732352,
            "capacity": 21474836480,
            ...

Is this the same 15g that was copied to another storage?

Regardless of the volume metadata, I think we have a corrupted qcow2 image, that qemu developer would like to inspect. Please make the image available if possible somewhere, and file a qemu bug about this.

It might contain sensitive data, so its hard to share it somewhere online :)

Maybe qemu folks have a tool for anonymizing the data in the image?

It is possible to do this using ovirt-imageio nbd client:

open the image using ovirt_imageio._internal.qemu_nbd.open()
iterate over image extents
write some pattern into all data extents

This will destroy the image, but it will keep the qcow2 metadata as is.
The image will compress very well since it is full of the same pattern.

You can check the tests for example usage of the module:
https://github.com/oVirt/ovirt-imageio/blob/master/test/qemu_nbd_test.py

As a workaround to fix such images, migrating to another storage domain is seems like the best way to fix.

Thing is, the image was migrated to another storage domain, everything went fine for 2 months, and then it occured again. Which makes me think its not a corruption but something else.

I see, we never have such report. This may be a bug in vdsm. This may be a bug
in qemu since:

image has impossible image end offset considering the virtual size
image fixed by copying to another image using qemu-img convert
fixed image corrupted again after some time
the VM pauses trying to write data at offset way above the virtual size

It may help to collect vdsm logs showing few minutes before the VM was paused. It may
show invalid values reported by qemu/libnvirt.

Also please add here output of

rpm -qa | egrep 'qemu|libvirt|vdsm'

from vdsm.

nirs commented on July 17, 2024

@aesteve-rh want to take a look at this?

from vdsm.

dupondje commented on July 17, 2024

So it happened on at least 2 disks multiple VMs based on next comment. This may be good since your system is more likely to reproduce the issue.

Occured 3 times now I think.

"image end offset" is reported by qemu-img check. This is the highest offset used by the image. If you truncate the file to this size, or reduce a logical volume to this size, you will not corrupt the image.

But the end is still within the size of the image:
Image end offset: 17716150272

$ stat qcow2.img
File: qcow2.img
Size: 17716740096

If that matters :)

You edited the disk size on engine, adding 5g?

Yes, was extended via engine with 5gb after it was migrated to another storage domain.

        "91a454a2-6139-4794-8d70-b18403323ebf": {
            "apparentsize": 14629732352,
            "capacity": 21474836480,
            ...

Is this the same 15g that was copied to another storage?

Yes, after storage migration the used size drops a lot!

You can check the tests for example usage of the module: https://github.com/oVirt/ovirt-imageio/blob/master/test/qemu_nbd_test.py

I'll check this out

I see, we never have such report. This may be a bug in vdsm. This may be a bug in qemu since:

1. image has impossible image end offset considering the virtual size

2. image fixed by copying to another image using qemu-img convert

3. fixed image corrupted again after some time

4. the VM pauses trying to write data at offset way above the virtual size

We have oVirt running for some years now, and only occured recently (but on some other workload).

It may help to collect vdsm logs showing few minutes before the VM was paused. It may show invalid values reported by qemu/libnvirt.

Logs rotated already unfortunately. I surely check this the next time it occurs!

Also please add here output of
rpm -qa | egrep 'qemu|libvirt|vdsm'  

# rpm -qa | egrep 'qemu|libvirt|vdsm'  
libvirt-daemon-driver-storage-mpath-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
qemu-kvm-core-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
libvirt-client-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
libvirt-daemon-config-nwfilter-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
libvirt-daemon-driver-storage-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
qemu-kvm-docs-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
qemu-guest-agent-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
vdsm-jsonrpc-4.50.3.4-1.el8.noarch
libvirt-daemon-driver-storage-iscsi-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
qemu-kvm-block-iscsi-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
libvirt-lock-sanlock-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
libvirt-daemon-driver-nwfilter-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
libvirt-daemon-driver-secret-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
vdsm-python-4.50.3.4-1.el8.noarch
libvirt-daemon-driver-nodedev-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
libvirt-daemon-driver-storage-iscsi-direct-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
libvirt-daemon-driver-storage-scsi-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
qemu-kvm-common-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
qemu-kvm-block-rbd-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
libvirt-daemon-driver-network-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
libvirt-daemon-driver-qemu-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
qemu-kvm-ui-spice-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
qemu-kvm-block-gluster-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
qemu-img-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
vdsm-api-4.50.3.4-1.el8.noarch
libvirt-daemon-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
vdsm-yajsonrpc-4.50.3.4-1.el8.noarch
libvirt-daemon-driver-interface-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
vdsm-gluster-4.50.3.4-1.el8.x86_64
vdsm-client-4.50.3.4-1.el8.noarch
libvirt-daemon-driver-storage-core-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
libvirt-daemon-driver-storage-logical-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
qemu-kvm-ui-opengl-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
qemu-kvm-block-ssh-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
libvirt-daemon-config-network-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
libvirt-daemon-driver-storage-gluster-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
qemu-kvm-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
vdsm-http-4.50.3.4-1.el8.noarch
libvirt-daemon-driver-storage-disk-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
qemu-kvm-block-curl-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
libvirt-libs-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
vdsm-common-4.50.3.4-1.el8.noarch
libvirt-daemon-kvm-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
vdsm-network-4.50.3.4-1.el8.x86_64
libvirt-daemon-driver-storage-rbd-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
ipxe-roms-qemu-20181214-11.git133f4c47.el8.noarch
qemu-kvm-hw-usbredir-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
python3-libvirt-8.0.0-2.module_el8.7.0+1218+f626c2ff.x86_64
libvirt-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
vdsm-4.50.3.4-1.el8.x86_64

from vdsm.

nirs commented on July 17, 2024

"image end offset" is reported by qemu-img check. This is the highest offset used by the image. If you truncate the file to this size, or reduce a logical volume to this size, you will not corrupt the image.

But the end is still within the size of the image: Image end offset: 17716150272

$ stat qcow2.img File: qcow2.img Size: 17716740096

If that matters :)

The fully allocated size + bitmaps is 15.000... GiB. The image end
offset cannot be more than this value.

The file size is just the size of the LV, which can be up to 3 GiB
bigger than the image end offset (Vdsm extends volumes by 2.5 GiB when
they have less than 0.5 GiB free).

Also please add here output of
rpm -qa | egrep 'qemu|libvirt|vdsm'  

# rpm -qa | egrep 'qemu|libvirt|vdsm'  
qemu-kvm-core-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
...

So this may be an issue in qemu 6.2.

from vdsm.

XanClic commented on July 17, 2024

I think we’ve had a similar issue before, and the first thing to note is that qemu’s qcow2 driver gives no guarantee on the length of the qcow2 files it produces/touches.

That being said, naturally, it will try to avoid producing holes/gaps. To help allocating clusters quickly, it internally keeps a free_cluster_index, which is reset when freeing clusters (to the minimum of it and the freed cluster), and incremented when allocating clusters. So far, so good. There are situation where this algorithm absolutely can produce holes, though, the one I know of is when we need to do multi-cluster allocations (e.g. L1 table, refcount table), because then we’ll jump over gaps that are only a single cluster wide, and those will then remain there (free_cluster_index is incremented beyond them).

Note that this specific “peculiarity” shouldn’t be a concern here, though, because the image is small enough that L1 table and refcount table will fit into a single cluster each. Still, it’s entirely possible that there are unknown bugs around free_cluster_index, where it should have been decreased at some point, but wasn’t, so the image keeps growing.

So even if the qcow2 driver gives no guarantees, it is still unexpected that a file length would exceed the required disk space by so much. Especially with such a small image, where L1 table and refcount table comfortably fit within a single cluster, so there shouldn’t be any multi-cluster allocations; I expect the free_cluster_index algorithm to be fairly reliable for single-cluster allocations only.

(There are cases where it’s expected, e.g. when the guest discards a large number of clusters, because then the file length is not truncated. But that doesn’t apply here, as far as I can see, because we’re not comparing the file length with the disk space actually used as-is, but with the disk space used by the qcow2 image if it were fully allocated.)

Two things come to mind when seeing a qcow2 file that uses more space than it should: Internal snapshots and VM state. Can we rule out that either were used with this image at some point?

(To be honest, off the top of my head, I can’t imagine a situation where either would cause the described situation, but it’s still nice if we can rule them out so we don’t have to think about them.)

It’s a bit unfortunate that we don’t get accurate disk space information from qemu-img info, because the data was just dd-ed off of the LVM volume, so it’s the same as its file length. What is interesting is the result of qemu-img check, which tells us that 199703 clusters (12.2 GB) are allocated, so we’re far from 17 GB on that front. Even if the guest has just discarded clusters recently and the image was indeed fully allocated at some point, discarding clusters should decrease free_cluster_index to point into those holes and thus prevent the whole issue.

What might be interesting (while not presenting sensitive data) is a qemu-img map --output=json dump. Can you provide that, so we can see which clusters are actually allocated where? And perhaps the first 64 kB of the image file, i.e. the image header? Just to be sure, I’d like to see that the L1 table and refcount table both actually are just a single cluster in size (and actually placed near the image’s beginning).

Besides this being caused by something around free_cluster_index, the other thing that comes to mind it qemu commit a8c07ec287554dcefd33733f0e5888a281ddc95e, which fixed qemu-img check -r all on LVM volumes (BZ https://bugzilla.redhat.com/show_bug.cgi?id=1519071). That’s a case where we used to intentionally allocated clusters beyond the image’s end, but that would only happen during qemu-img check, never during normal operation, so it can’t be the cause of this issue here. But, still want to note it, because it was fixed in 7.0, but the bug was still present in 6.2.

EDIT: Noted now that this bug last mentioned was fixed in 8.7 (that is what the BZ is about…), so this 6.2 here is in fact already fixed. :)

from vdsm.

dupondje commented on July 17, 2024

qcow2.map.txt

As requested the output of the qemu-img map.

Also the mentioned bug IS fixed in 6.2.0-13:

kvm-qcow2-Improve-refcount-structure-rebuilding.patch [bz#1519071]

So that should not be the issue neither.

from vdsm.

XanClic commented on July 17, 2024

Unfortunately, the mapping doesn’t give me much of a hint; the unallocated areas (in host cluster space, i.e. in the image file) are sprinkled throughout the image, and they’re often several clusters long, so it doesn’t seem like single clusters that have been skipped at some point when looking for a longer cluster range.

I’ve scoured the code for something, but didn’t find anything yet. I don’t think I have much of a choice but just testing with guests myself and seeing whether I see the behavior myself.

Just to be clear (I had this point in my last post until my browser crashed and deleted the whole thing, and then I forgot it when rewriting): I absolutely understand and agree that it’s unreasonable to allow qcow2 images unlimited growth in their file length. I don’t think we can guarantee holding qemu-img measure’s reported size[1], but with a bit of leeway, we absolutely should stay within bounds. 10 % is definitely a generous bound (for plain images without VM state or internal snapshots). So this does need investigation on the qemu side.

[1] With actively used images, there’s things to consider like having to reallocate L1/refcount tables at runtime. The old one is freed afterwards, but for a brief period, you still need to have both allocated, and if you plan to write no more but a single cluster afterwards, you’ll likely go over the limit qemu-img measure reported, because it doesn’t consider the fact that the new table can’t use the old table’s space. If qemu-img measure is supposed to work for file length of actively used images as well, I think it’ll need much more scrutiny than has been applied when it was introduced (both its formula, and qemu’s qcow2 code).

from vdsm.

dupondje commented on July 17, 2024

As this doesn't seem to be a vdsm bug, but rather a qemu bug. I'm closing it here.
Opened a new report at https://gitlab.com/qemu-project/qemu/-/issues/1621

Added some gdb traces, which might be useful to find the root cause @XanClic

from vdsm.

VM pauses after qcow2 volume is extended to maximum size, impossible "image end offset" about vdsm HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent