Comments (14)
@nirs : I think this is your area, if you could have a look at the questions it would be awesome!
from vdsm.
The 1.1 factor is bad because the actual value is less then 1%.
You may have leaked clusters or huge amount of dirty bitmaps that were not deleted by a backup application.
Please share the output of
qemu-img info /image
qemu-img check /image
Notes:
- If the disk is on block storage you will have to activate the LV while you inspect the image.
- If the VM os up, you need to stop it for inspection.
from vdsm.
- Is there a way to calculate/estimate how big an qcow2 image can grow?
Yes, ‘qemu-img measure’ can tell you the required size for an image, plus the required size for the bitmaps.
Vdsm uses it to measure the required size before copying disks or merging snapshots. ovirt-img uses it before uploading an image to allocate enough space for the disk.
The fully-allocated value tells the worst case, when all clusters are allocated.
from vdsm.
Thanks for the answer @nirs !
$ qemu-img info qcow2.img
image: qcow2.img
file format: qcow2
virtual size: 15 GiB (16106127360 bytes)
disk size: 16.5 GiB
cluster_size: 65536
Format specific information:
compat: 1.1
compression type: zlib
lazy refcounts: false
bitmaps:
[0]:
flags:
[0]: in-use
[1]: auto
name: 428fae80-3892-4083-9107-51fb76a7f06b
granularity: 65536
[1]:
flags:
[0]: in-use
[1]: auto
name: 51ccd1fc-08a4-485d-8c04-0eb750665e05
granularity: 65536
[2]:
flags:
[0]: in-use
[1]: auto
name: 19796bed-56a5-44c1-a7f2-dae633e65c87
granularity: 65536
[3]:
flags:
[0]: in-use
[1]: auto
name: 13056186-e65e-448e-a3c3-019ab25d3a27
granularity: 65536
refcount bits: 16
corrupt: false
extended l2: false
$ qemu-img check qcow2.img
No errors were found on the image.
199703/245760 = 81.26% allocated, 16.27% fragmented, 0.00% compressed clusters
Image end offset: 17716150272
qemu-img measure -O qcow2 qcow2.img
required size: 13090422784
fully allocated size: 16108814336
bitmaps size: 589824
But still it did want to grow ... And causes the VM to end up in a pause state.
Next to that, if I migrated the VM to another storage domain, the storage usage was normal again.
PS: the qcow2.img is just a dd'ed dump of the VM LV.
from vdsm.
$ qemu-img info qcow2.img
virtual size: 15 GiB (16106127360 bytes)
You mentioned 70g disk, but this disk is only 15?
bitmaps:
Only 4 bitmaps so this is not the issue.
$ qemu-img check qcow2.img
No errors were found on the image.
199703/245760 = 81.26% allocated, 16.27% fragmented, 0.00% compressed clusters
Image end offset: 17716150272
No leaked clusters, so this is not the issue.
But - the image end offset does not make sense:
>>> 17716150272/1024**3
16.49945068359375
This should not be possible for 16g image.
qemu-img measure -O qcow2 qcow2.img required size: 13090422784 fully allocated size: 16108814336 bitmaps size: 589824
Maximum possible size for this disk is 16108814336 + 589824,
which is 16109404160 (15.0030517578125)
(If you add more bitmaps you will need more space).
But still it did want to grow ... And causes the VM to end up in a pause state. Next to that, if I migrated the VM to another storage domain, the storage usage was normal again.
It sounds like the qcow2 image is corrupted in some way, and copying
the data to another storage domain fixed this issue by coping the data into
a new fresh qcow2 image.
PS: the qcow2.img is just a dd'ed dump of the VM LV.
This should be fine, but if the VM is running while you dd, you may get inconsistent
qcow2 image.
To complete the picture, please share the volume metadata. The easier way is to
dump the storage domain and paste here the json for the relevant volume.
The command is something like (I don't work on oVirt for while):
vdsm-client StorageDomain dump sd_id=storge-domain-id
And grep the relevant volume id in the output.
Regardless of the volume metadata, I think we have a corrupted qcow2 image,
that qemu developer would like to inspect. Please make the image available
if possible somewhere, and file a qemu bug about this.
As a workaround to fix such images, migrating to another storage domain is
seems like the best way to fix.
from vdsm.
You mentioned 70g disk, but this disk is only 15?
Correct, had it on a 70GB disk also. But I only copied it once I had on a 15GB disk.
Same issue. Just another disk (size).
But - the image end offset does not make sense:
>>> 17716150272/1024**3 16.49945068359375
This should not be possible for 16g image.
I think qemu-img info just reports the filesize as image end offset? And as this is a dd
copy of the LV, it has the size of the LV (aka 15GB*1.1).
qemu-img info reports 0 for disk size when disk is on a LV/blockdev.
It sounds like the qcow2 image is corrupted in some way, and copying the data to another storage domain fixed this issue by coping the data into a new fresh qcow2 image.
Happend on multiple VM's already, which is quite strange then.
This should be fine, but if the VM is running while you dd, you may get inconsistent qcow2 image.
The VM was in paused state. So should be safe :)
To complete the picture, please share the volume metadata. The easier way is to dump the storage domain and paste here the json for the relevant volume.
The command is something like (I don't work on oVirt for while):
vdsm-client StorageDomain dump sd_id=storge-domain-id
And grep the relevant volume id in the output.
You still know the commands by hart :) The volume was resized to 20GB after the issue, so I doubt its still relavant.
"91a454a2-6139-4794-8d70-b18403323ebf": {
"apparentsize": 14629732352,
"capacity": 21474836480,
"ctime": 1674805919,
"description": "{\"DiskAlias\":\"\",\"DiskDescription\":\"\"}",
"disktype": "DATA",
"format": "COW",
"generation": 1,
"image": "6ccc3ee1-02e5-4fad-b1b7-9f2d6c187416",
"legality": "LEGAL",
"mdslot": 33,
"parent": "00000000-0000-0000-0000-000000000000",
"sequence": 0,
"status": "OK",
"truesize": 14629732352,
"type": "SPARSE",
"voltype": "LEAF"
},
Regardless of the volume metadata, I think we have a corrupted qcow2 image, that qemu developer would like to inspect. Please make the image available if possible somewhere, and file a qemu bug about this.
It might contain sensitive data, so its hard to share it somewhere online :)
As a workaround to fix such images, migrating to another storage domain is seems like the best way to fix.
Thing is, the image was migrated to another storage domain, everything went fine for 2 months, and then it occured again.
Which makes me think its not a corruption but something else.
from vdsm.
You mentioned 70g disk, but this disk is only 15?
Correct, had it on a 70GB disk also. But I only copied it once I had on a 15GB disk. Same issue. Just another disk (size).
So it happened on at least 2 disks multiple VMs based on next comment.
This may be good since your system is more likely to reproduce the issue.
But - the image end offset does not make sense:
>>> 17716150272/1024**3 16.49945068359375
This should not be possible for 16g image.
I think qemu-img info just reports the filesize as image end offset?
"image end offset" is reported by qemu-img check
. This is the highest
offset used by the image. If you truncate the file to this size, or
reduce a logical volume to this size, you will not corrupt the image.
And as this is a
dd
copy of the LV, it has the size of the LV (aka 15GB*1.1).
Right, your image may have some unused space at the end of the LV. But image end
offset is computed based on qcow2 metadata.
It sounds like the qcow2 image is corrupted in some way, and copying the data to another storage domain fixed this issue by coping the data into a new fresh qcow2 image.
Happend on multiple VM's already, which is quite strange then.
To complete the picture, please share the volume metadata. The easier way is to dump the storage domain and paste here the json for the relevant volume.
The command is something like (I don't work on oVirt for while):vdsm-client StorageDomain dump sd_id=storge-domain-id
And grep the relevant volume id in the output.
You still know the commands by hart :) The volume was resized to 20GB after the issue, so I doubt its still relavant.
You edited the disk size on engine, adding 5g?
"91a454a2-6139-4794-8d70-b18403323ebf": { "apparentsize": 14629732352, "capacity": 21474836480, ...
Is this the same 15g that was copied to another storage?
Regardless of the volume metadata, I think we have a corrupted qcow2 image, that qemu developer would like to inspect. Please make the image available if possible somewhere, and file a qemu bug about this.
It might contain sensitive data, so its hard to share it somewhere online :)
Maybe qemu folks have a tool for anonymizing the data in the image?
It is possible to do this using ovirt-imageio nbd client:
- open the image using ovirt_imageio._internal.qemu_nbd.open()
- iterate over image extents
- write some pattern into all data extents
This will destroy the image, but it will keep the qcow2 metadata as is.
The image will compress very well since it is full of the same pattern.
You can check the tests for example usage of the module:
https://github.com/oVirt/ovirt-imageio/blob/master/test/qemu_nbd_test.py
As a workaround to fix such images, migrating to another storage domain is seems like the best way to fix.
Thing is, the image was migrated to another storage domain, everything went fine for 2 months, and then it occured again. Which makes me think its not a corruption but something else.
I see, we never have such report. This may be a bug in vdsm. This may be a bug
in qemu since:
- image has impossible image end offset considering the virtual size
- image fixed by copying to another image using qemu-img convert
- fixed image corrupted again after some time
- the VM pauses trying to write data at offset way above the virtual size
It may help to collect vdsm logs showing few minutes before the VM was paused. It may
show invalid values reported by qemu/libnvirt.
Also please add here output of
rpm -qa | egrep 'qemu|libvirt|vdsm'
from vdsm.
@aesteve-rh want to take a look at this?
from vdsm.
So it happened on at least 2 disks multiple VMs based on next comment. This may be good since your system is more likely to reproduce the issue.
Occured 3 times now I think.
"image end offset" is reported by
qemu-img check
. This is the highest offset used by the image. If you truncate the file to this size, or reduce a logical volume to this size, you will not corrupt the image.
But the end is still within the size of the image:
Image end offset: 17716150272
$ stat qcow2.img
File: qcow2.img
Size: 17716740096
If that matters :)
You edited the disk size on engine, adding 5g?
Yes, was extended via engine with 5gb after it was migrated to another storage domain.
"91a454a2-6139-4794-8d70-b18403323ebf": { "apparentsize": 14629732352, "capacity": 21474836480, ...
Is this the same 15g that was copied to another storage?
Yes, after storage migration the used size drops a lot!
You can check the tests for example usage of the module: https://github.com/oVirt/ovirt-imageio/blob/master/test/qemu_nbd_test.py
I'll check this out
I see, we never have such report. This may be a bug in vdsm. This may be a bug in qemu since:
1. image has impossible image end offset considering the virtual size 2. image fixed by copying to another image using qemu-img convert 3. fixed image corrupted again after some time 4. the VM pauses trying to write data at offset way above the virtual size
We have oVirt running for some years now, and only occured recently (but on some other workload).
It may help to collect vdsm logs showing few minutes before the VM was paused. It may show invalid values reported by qemu/libnvirt.
Logs rotated already unfortunately. I surely check this the next time it occurs!
Also please add here output of
rpm -qa | egrep 'qemu|libvirt|vdsm'
# rpm -qa | egrep 'qemu|libvirt|vdsm'
libvirt-daemon-driver-storage-mpath-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
qemu-kvm-core-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
libvirt-client-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
libvirt-daemon-config-nwfilter-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
libvirt-daemon-driver-storage-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
qemu-kvm-docs-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
qemu-guest-agent-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
vdsm-jsonrpc-4.50.3.4-1.el8.noarch
libvirt-daemon-driver-storage-iscsi-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
qemu-kvm-block-iscsi-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
libvirt-lock-sanlock-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
libvirt-daemon-driver-nwfilter-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
libvirt-daemon-driver-secret-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
vdsm-python-4.50.3.4-1.el8.noarch
libvirt-daemon-driver-nodedev-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
libvirt-daemon-driver-storage-iscsi-direct-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
libvirt-daemon-driver-storage-scsi-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
qemu-kvm-common-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
qemu-kvm-block-rbd-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
libvirt-daemon-driver-network-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
libvirt-daemon-driver-qemu-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
qemu-kvm-ui-spice-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
qemu-kvm-block-gluster-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
qemu-img-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
vdsm-api-4.50.3.4-1.el8.noarch
libvirt-daemon-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
vdsm-yajsonrpc-4.50.3.4-1.el8.noarch
libvirt-daemon-driver-interface-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
vdsm-gluster-4.50.3.4-1.el8.x86_64
vdsm-client-4.50.3.4-1.el8.noarch
libvirt-daemon-driver-storage-core-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
libvirt-daemon-driver-storage-logical-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
qemu-kvm-ui-opengl-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
qemu-kvm-block-ssh-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
libvirt-daemon-config-network-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
libvirt-daemon-driver-storage-gluster-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
qemu-kvm-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
vdsm-http-4.50.3.4-1.el8.noarch
libvirt-daemon-driver-storage-disk-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
qemu-kvm-block-curl-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
libvirt-libs-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
vdsm-common-4.50.3.4-1.el8.noarch
libvirt-daemon-kvm-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
vdsm-network-4.50.3.4-1.el8.x86_64
libvirt-daemon-driver-storage-rbd-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
ipxe-roms-qemu-20181214-11.git133f4c47.el8.noarch
qemu-kvm-hw-usbredir-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
python3-libvirt-8.0.0-2.module_el8.7.0+1218+f626c2ff.x86_64
libvirt-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
vdsm-4.50.3.4-1.el8.x86_64
from vdsm.
"image end offset" is reported by
qemu-img check
. This is the highest offset used by the image. If you truncate the file to this size, or reduce a logical volume to this size, you will not corrupt the image.But the end is still within the size of the image: Image end offset: 17716150272
$ stat qcow2.img File: qcow2.img Size: 17716740096
If that matters :)
The fully allocated size + bitmaps is 15.000... GiB. The image end
offset cannot be more than this value.
The file size is just the size of the LV, which can be up to 3 GiB
bigger than the image end offset (Vdsm extends volumes by 2.5 GiB when
they have less than 0.5 GiB free).
Also please add here output of
rpm -qa | egrep 'qemu|libvirt|vdsm'
# rpm -qa | egrep 'qemu|libvirt|vdsm' qemu-kvm-core-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64 ...
So this may be an issue in qemu 6.2.
from vdsm.
I think we’ve had a similar issue before, and the first thing to note is that qemu’s qcow2 driver gives no guarantee on the length of the qcow2 files it produces/touches.
That being said, naturally, it will try to avoid producing holes/gaps. To help allocating clusters quickly, it internally keeps a free_cluster_index
, which is reset when freeing clusters (to the minimum of it and the freed cluster), and incremented when allocating clusters. So far, so good. There are situation where this algorithm absolutely can produce holes, though, the one I know of is when we need to do multi-cluster allocations (e.g. L1 table, refcount table), because then we’ll jump over gaps that are only a single cluster wide, and those will then remain there (free_cluster_index
is incremented beyond them).
Note that this specific “peculiarity” shouldn’t be a concern here, though, because the image is small enough that L1 table and refcount table will fit into a single cluster each. Still, it’s entirely possible that there are unknown bugs around free_cluster_index
, where it should have been decreased at some point, but wasn’t, so the image keeps growing.
So even if the qcow2 driver gives no guarantees, it is still unexpected that a file length would exceed the required disk space by so much. Especially with such a small image, where L1 table and refcount table comfortably fit within a single cluster, so there shouldn’t be any multi-cluster allocations; I expect the free_cluster_index
algorithm to be fairly reliable for single-cluster allocations only.
(There are cases where it’s expected, e.g. when the guest discards a large number of clusters, because then the file length is not truncated. But that doesn’t apply here, as far as I can see, because we’re not comparing the file length with the disk space actually used as-is, but with the disk space used by the qcow2 image if it were fully allocated.)
Two things come to mind when seeing a qcow2 file that uses more space than it should: Internal snapshots and VM state. Can we rule out that either were used with this image at some point?
(To be honest, off the top of my head, I can’t imagine a situation where either would cause the described situation, but it’s still nice if we can rule them out so we don’t have to think about them.)
It’s a bit unfortunate that we don’t get accurate disk space information from qemu-img info, because the data was just dd-ed off of the LVM volume, so it’s the same as its file length. What is interesting is the result of qemu-img check, which tells us that 199703 clusters (12.2 GB) are allocated, so we’re far from 17 GB on that front. Even if the guest has just discarded clusters recently and the image was indeed fully allocated at some point, discarding clusters should decrease free_cluster_index
to point into those holes and thus prevent the whole issue.
What might be interesting (while not presenting sensitive data) is a qemu-img map --output=json dump. Can you provide that, so we can see which clusters are actually allocated where? And perhaps the first 64 kB of the image file, i.e. the image header? Just to be sure, I’d like to see that the L1 table and refcount table both actually are just a single cluster in size (and actually placed near the image’s beginning).
Besides this being caused by something around free_cluster_index
, the other thing that comes to mind it qemu commit a8c07ec287554dcefd33733f0e5888a281ddc95e, which fixed qemu-img check -r all on LVM volumes (BZ https://bugzilla.redhat.com/show_bug.cgi?id=1519071). That’s a case where we used to intentionally allocated clusters beyond the image’s end, but that would only happen during qemu-img check, never during normal operation, so it can’t be the cause of this issue here. But, still want to note it, because it was fixed in 7.0, but the bug was still present in 6.2.
EDIT: Noted now that this bug last mentioned was fixed in 8.7 (that is what the BZ is about…), so this 6.2 here is in fact already fixed. :)
from vdsm.
As requested the output of the qemu-img map
.
Also the mentioned bug IS fixed in 6.2.0-13:
- kvm-qcow2-Improve-refcount-structure-rebuilding.patch [bz#1519071]
So that should not be the issue neither.
from vdsm.
Unfortunately, the mapping doesn’t give me much of a hint; the unallocated areas (in host cluster space, i.e. in the image file) are sprinkled throughout the image, and they’re often several clusters long, so it doesn’t seem like single clusters that have been skipped at some point when looking for a longer cluster range.
I’ve scoured the code for something, but didn’t find anything yet. I don’t think I have much of a choice but just testing with guests myself and seeing whether I see the behavior myself.
Just to be clear (I had this point in my last post until my browser crashed and deleted the whole thing, and then I forgot it when rewriting): I absolutely understand and agree that it’s unreasonable to allow qcow2 images unlimited growth in their file length. I don’t think we can guarantee holding qemu-img measure’s reported size[1], but with a bit of leeway, we absolutely should stay within bounds. 10 % is definitely a generous bound (for plain images without VM state or internal snapshots). So this does need investigation on the qemu side.
[1] With actively used images, there’s things to consider like having to reallocate L1/refcount tables at runtime. The old one is freed afterwards, but for a brief period, you still need to have both allocated, and if you plan to write no more but a single cluster afterwards, you’ll likely go over the limit qemu-img measure reported, because it doesn’t consider the fact that the new table can’t use the old table’s space. If qemu-img measure is supposed to work for file length of actively used images as well, I think it’ll need much more scrutiny than has been applied when it was introduced (both its formula, and qemu’s qcow2 code).
from vdsm.
As this doesn't seem to be a vdsm bug, but rather a qemu bug. I'm closing it here.
Opened a new report at https://gitlab.com/qemu-project/qemu/-/issues/1621
Added some gdb traces, which might be useful to find the root cause @XanClic
from vdsm.
Related Issues (20)
- Start SPM Task failed after Attaching a Storage Domain
- Failed to Clone VM from snapshot HOT 1
- Remove unnecessary coding utf-8 header line
- Remove unneeded files
- Remove COPYING and change spec to load LICENSES instead
- Run CI also on centos stream 9
- Cannot move bridge port to bond
- mom issue HOT 1
- Live merge sometimes fails, "No space left on device" appear in the log
- Not getting unregistered VM on the detached SD that has that VM lease
- Storage domain creation by passing a multiple Ceph node
- Live migration failures after upgrading from 4.5.2 to 4.5.4 HOT 4
- Duplicate tc filters when attach vlanned and non-vlanned networks with qos to interface HOT 1
- setCpuTuneQuota: libvirt.libvirtError Invalid value "-1" for "cpu.max": Invalid argument HOT 3
- Can't connect to CIFS fs
- VM stuck in unresponsive state and prohibits listing processes on host HOT 2
- Log BLOCK_JOB_COMPLETED error HOT 1
- Logical Disk Name Not Displayed
- Log output is incomplete or unavailable HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vdsm.