Code Monkey home page Code Monkey logo

Comments (18)

iphutch avatar iphutch commented on July 24, 2024

@bwarden could you take a look to see if this is a Clear issue vs a hyper-v bug.

from distribution.

bwarden avatar bwarden commented on July 24, 2024

From what I can see, everything should be in place. Please make sure you've installed the os-cloudguest-azure bundle for the userspace utilities, and follow this guide to verify that the kernel modules are installed properly:
https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/manage/manage-hyper-v-integration-services#start-and-stop-an-integration-service-from-a-linux-guest

Please also provide your client kernel version (uname -r) so I can make sure I'm looking at the right kernel build.

from distribution.

Andy2244 avatar Andy2244 commented on July 24, 2024

@bwarden oki i did not have the "os-cloudguest-azure bundle" installed, is this bundle needed to get basic integration services working? (shutdown/time)

After installing the bundle and rebooting the vm i get this:

 uname -r
4.14.21-123.hyperv

lsmod | grep hv_utils = nothing

lsmod | grep hv_
hv_netvsc              49152  0
compgen -c hv_
hv_fcopy_daemon
hv_kvp_daemon
hv_vss_daemon
ps -ef | grep hv
root        42     2  0 11:08 ?        00:00:00 [hv_vmbus_con]
root        95     2  0 11:08 ?        00:00:00 [hv_balloon]
root      1867     1  0 11:10 ?        00:00:00 hv_kvp_daemon
root      2429  1711  0 11:16 pts/0    00:00:00 grep hv

On the windows side i get this:

Get-Service -Name vm*

Status   Name               DisplayName
------   ----               -----------
Running  vmcompute          Hyper-V Host Compute Service
Stopped  vmicguestinterface Hyper-V Guest Service Interface
Stopped  vmicheartbeat      Hyper-V Heartbeat Service
Stopped  vmickvpexchange    Hyper-V Data Exchange Service
Stopped  vmicrdv            Hyper-V Remote Desktop Virtualizati...
Stopped  vmicshutdown       Hyper-V Guest Shutdown Service
Stopped  vmictimesync       Hyper-V Time Synchronization Service
Stopped  vmicvmsession      Hyper-V PowerShell Direct Service
Stopped  vmicvss            Hyper-V Volume Shadow Copy Requestor
Running  vmms               Hyper-V Virtual Machine Management
Get-VMIntegrationService -VMName "clear linux"

VMName      Name                    Enabled PrimaryStatusDescription SecondaryStatusDescription
------      ----                    ------- ------------------------ --------------------------
clear linux Guest Service Interface True    OK
clear linux Heartbeat               True    OK
clear linux Key-Value Pair Exchange True    OK                       The protocol version of the component installed in the virtual machine does not match the version expec...
clear linux Shutdown                True    OK
clear linux Time Synchronization    True    OK
clear linux VSS                     False   OK

So it seems the "hv_kvp_daemon" is running, but the integration services are not supported/started for the VM? Manually trying to start the services fails, with the info that it has to-be supported by both the host + VM or that the service is not needed and therefor start/stopped automatically.

from distribution.

bwarden avatar bwarden commented on July 24, 2024

lsusb is OK -- we actually build hv_utils into the kernel statically. The os-cloudguest-azure bundle provides the user-space tools (hv_fcopy_daemon, etc).

Looks like the services are running in the Clear Linux guest. The Get-Service step above applies to Windows guest VMs. Did you follow this section (from the Hyper-V manager) to ensure they're enabled on the host? Probably equivalent to Get-VMIntegrationService, but I'm not very familiar with Hyper-V.

https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/manage/manage-hyper-v-integration-services#turn-an-integration-service-on-or-off-using-hyper-v-manager

With the 22010 image, and those enabled on my system, I see time update automatically whenever I resume the VM.

from distribution.

bwarden avatar bwarden commented on July 24, 2024

You can also try "sudo journalctl | grep hv_" to see the status of the daemons.

from distribution.

Andy2244 avatar Andy2244 commented on July 24, 2024

@bwarden So the "Stopped vmictimesync" is only used for windows guests? Yet in the documentation they don't actually provide which service/daemon is responsible for time/shutdown on the linux side?

Here is the journalctl and the date still lists the last shutdown time (1:14), while i typed this command at 10:18 CEST. So i only get a time update via NTP once it kicks in at its normal update interval, without it i wont get any valid time at all.

Wed Apr 25 01:14:24 CEST 2018
root@clear-vm ~ # sudo journalctl | grep hv_
Apr 24 10:54:30 clear-vm kernel: hv_utils: Shutdown request received - graceful shutdown initiated
Apr 24 10:54:54 clear-vm kernel: calling  netvsc_drv_init+0x0/0x1000 [hv_netvsc] @ 163
Apr 24 10:54:54 clear-vm kernel: hv_vmbus: registering driver hv_netvsc
Apr 24 10:54:54 clear-vm kernel: initcall netvsc_drv_init+0x0/0x1000 [hv_netvsc] returned 0 after 65 usecs
Apr 24 10:54:55 clear-vm hv_vss_daemon[210]: Hyper-V VSS: VSS starting; pid is:210
Apr 24 10:54:55 clear-vm hv_vss_daemon[210]: Hyper-V VSS: open /dev/vmbus/hv_vss failed; error: 2 No such file or dir                                       ectory
Apr 24 10:54:55 clear-vm kernel: hv_utils: KVP IC version 4.0
Apr 24 10:55:39 clear-vm kernel: hv_balloon: Max. dynamic memory size: 2560 MB
Apr 24 10:58:50 clear-vm kernel: hv_utils: Shutdown IC version 3.0
Apr 24 10:58:52 clear-vm kernel: hv_utils: TimeSync IC version 4.0
Apr 24 10:58:50 clear-vm kernel: hv_utils: Heartbeat IC version 3.0
Apr 24 10:58:50 clear-vm kernel: hv_utils: FCopy IC version 1.1
Apr 24 11:05:28 clear-vm kernel: hv_utils: Shutdown request received - graceful shutdown initiated
Apr 24 11:05:54 clear-vm kernel: calling  netvsc_drv_init+0x0/0x1000 [hv_netvsc] @ 151
Apr 24 11:05:54 clear-vm kernel: hv_vmbus: registering driver hv_netvsc
Apr 24 11:05:54 clear-vm kernel: initcall netvsc_drv_init+0x0/0x1000 [hv_netvsc] returned 0 after 382 usecs
Apr 24 11:05:55 clear-vm hv_vss_daemon[214]: Hyper-V VSS: VSS starting; pid is:214
Apr 24 11:05:55 clear-vm hv_vss_daemon[214]: Hyper-V VSS: open /dev/vmbus/hv_vss failed; error: 2 No such file or dir                                       ectory
Apr 24 11:05:55 clear-vm kernel: hv_utils: KVP IC version 4.0
Apr 24 11:06:39 clear-vm kernel: hv_balloon: Max. dynamic memory size: 2560 MB
Apr 24 11:08:25 clear-vm kernel: hv_utils: Shutdown request received - graceful shutdown initiated
Apr 24 11:08:48 clear-vm kernel: calling  netvsc_drv_init+0x0/0x1000 [hv_netvsc] @ 157
Apr 24 11:08:48 clear-vm kernel: hv_vmbus: registering driver hv_netvsc
Apr 24 11:08:48 clear-vm kernel: initcall netvsc_drv_init+0x0/0x1000 [hv_netvsc] returned 0 after 42 usecs
Apr 24 11:08:49 clear-vm kernel: hv_utils: KVP IC version 4.0
Apr 24 11:08:49 clear-vm hv_vss_daemon[214]: Hyper-V VSS: VSS starting; pid is:214
Apr 24 11:08:49 clear-vm hv_vss_daemon[214]: Hyper-V VSS: open /dev/vmbus/hv_vss failed; error: 2 No such file or dir                                       ectory
Apr 24 11:09:33 clear-vm kernel: hv_balloon: Max. dynamic memory size: 2560 MB

The "clear-vm kernel: hv_utils: TimeSync IC version 4.0" line would indicate that it has support for it, yet i get no update. "Get-VMIntegrationService" shows that its offered/enabled to the VM from the host, so what else can i try? Is there some specific systemd service that is responsible for hyperv time/shutdown handling or is this some kernel only thing?

PS: I will try to compare this with a ubuntu VM on the same host, so i can at least figure out if its a clearlinux or hyperv host problem.

from distribution.

Andy2244 avatar Andy2244 commented on July 24, 2024

I get the same behavior for ubuntu 17.10 on a different host, so it seems this is a general problem utilizing the LIS time service under linux.

I did dig a little deeper and going by this pdf for the latest LIS the hyperv Time-Service needs to-be properly configured to-be used as time source.

Going by the docs the timesource is installed and working via kernel.

 ls /sys/class/ptp
ptp0
cat /sys/class/ptp/ptp0/clock_name 
hyperv

The simplest option suggested is to disable ntp and switch to the timesync source via:
echo Y > /sys/module/hv_utils/parameters/timesync_mode

Yet this does not work on clear or ubuntu, since the "parameters" is not present under ubuntu (/sys/module/hv_utils/parameters), while in clear "/sys/module/hv_utils" is not present at all?

The more complex option suggested is to use chronyd instead of ntpd, since the later does not support the ptp source.

/etc/chrony.conf: 
refclock PHC /dev/ptp0 poll 3 dpoll -2 offset 0

I tried this under ubuntu and clear and enabled the service and restarted it. Yet under clear i'm still unsure how are simple config changes handled? Do i copy the /use/share/defaults/chrony/chrony.conf to /etc/chrony.conf and add the extra parameter or do i just create the /etc/chrony.conf and somehow its merged with the default?

I tried both and on ubuntu/clear i get:

chronyc sources
210 Number of sources = 5
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
#~ PHC0                          0   3     7     9  -145.5s[-145.5s] +/-  349ns

Yet in both systems i get no time updates after a sleep/resume cycle. I would prefer the "echo Y > /sys/module/hv_utils/parameters/timesync_mode " option, yet i have no clue how to enable this. At this point i also have no idea why the chrony approach does not work and i would have to test this all under the official supported centOS using the latest LIS as well and verify/cross-reference both solution's.

Btw what LIS version is clear using atm? (LIS 4.2.4-1 seems to-be the latest version)

Maybe someone with a better understanding of this can take a look, since this all would suggest that time synchronization is "broken" for all hyperv images regarding sleep/resume or save/restore states.

PS: Maybe as a work around systemd needs to update via ntp immediately after detecting a resume state, in addition to the default ntp poll rate. I guess this could at least work for a resume state, not sure about a hyperv restore/save operation, i guess those are transparent to systemd and only LIS understands those?

from distribution.

bwarden avatar bwarden commented on July 24, 2024

Let me rephrase. In addition to checking on the host via "Get-VMIntegrationService", could you please try from the Hyper-V Manager GUI, under Settings for the VM, verifying that Time synchronization is checked under Integration Services? I know it should be the same, but something's not right.

I've tried replicating this on my own system, and I left date running in a loop, then saved the VM yesterday afternoon. When I resumed it this morning, the time rolled over exactly as expected. Given that you're having trouble with multiple client VMs, it's more likely that something's not quite right on your host.

As I mentioned, hv_utils is built into the kernel, not as a standalone module. This is why you don't see it in /sys/module. Time sync should work out of the box, without having to pass an extra parameter or configure anything in userspace.

I do have one other idea. By default, we include a user-space SNTP client, systemd-timesyncd. In my environment, it can't reach any NTP servers, so it does nothing. If it could reach a server, maybe it could interfere with LIS. You can check its status with timedatectl, and if it shows Network Time or NTP synchronized as yes, we could try some additional actions. Also, if you have manually configured any other time services, please make sure they're disabled. Having multiple time services trying to set the clock (with their own heuristics for slewing vs. stepping) can cause a lot of problems.

from distribution.

Andy2244 avatar Andy2244 commented on July 24, 2024

@bwarden Yes the service is set enabled (checked) from the management gui. I also have a working NTP aka:

timedatectl:
Network time on: yes
NTP synchronized: yes
RTC in local TZ: no

I just noticed something:

then saved the VM yesterday afternoon. When I resumed it this morning, the time rolled over exactly as expected.

I'm not manually saveing the VM, i let the host windows system go into its normal sleep/suspend/hibernate state, while the VM is running and than just resume the host. Maybe i was just too naive assuming such a scenario is covered by hyperv + LIS ?
I assumed that LIS would detect the difference between the resumed host system clock and the restored linux VM and forces a update.

from distribution.

bwarden avatar bwarden commented on July 24, 2024

Ah, fantastic data point. I've been running this:
while true; do date; dmesg -c | grep timesync; sleep 1; done
...on the Clear Linux VM to show time discontinuities and the logs from hv_utils indicating receipt of messages from the TimeSync service.

When I save/suspend the VM, I can see the time jump on resume. When I suspend the host, I reproduce your problem -- the time continues from where it left off, and most importantly, there are no messages from TimeSync. At this point it looks like this might be a problem with Hyper-V itself not sending the messages, since we know the guest VM can receive them. I'll look into whether this is a known issue with Hyper-V or LIS.

from distribution.

bwarden avatar bwarden commented on July 24, 2024

As an interesting data point, I setup chrony to track the PTP clock device exposed by LIS. This is merely a workaround, but it's less clunky than running an NTP client in a guest VM.

I put this in /etc/chrony.conf (which completely overrides the system defaults from /usr/share/defaults/chrony/chrony.conf):
refclock PHC /dev/ptp0

I disabled systemd-timesyncd (the SNTP client):
systemctl mask --now systemd-timesyncd

I enabled chrony:
systemctl enable --now chronyd

With chrony running, I suspended, waited, and resumed my host. While the time initially didn't match, chrony noticed the disparity and gracefully accelerated the clock to make up lost time within a couple of minutes, as verified in the logs, via:
journalctl -u chronyd

which reported:
System clock wrong by 16.715141 seconds, adjustment started

from distribution.

bwarden avatar bwarden commented on July 24, 2024

You could also add this to chrony.conf to make it step immediately on errors larger than one second:
makestep 1 -1

I found anecdotes like this from people with similar experiences, but I haven't found any official documentation. I would guess that it's just assumed you would suspend a guest VM properly before suspending or shutting down a host.

from distribution.

Andy2244 avatar Andy2244 commented on July 24, 2024

@bwarden Thanks for looking into this issue, i could reproduce your fix and was apparently just missing the "makestep 1 -1" option to allow the large changes.
I now get:

System clock wrong by 31939.769192 seconds, adjustment started
System clock was stepped by 31939.769192 seconds

I guess i will close this issue, since its a LIS/Hyperv specific "oddity".
I still think its strange that the shutdown LIS service transparently handles host restart/shutdowns, without any manual interactions and can even spin-up the VM again if configured on a restart, yet seems to wrongly handle sleep/suspend. It seems to me that having LIS and a direct communication channel to the host, should be enough to handle also those scenarios, but what do i know 😄

Thanks again for the time diagnosing this and the fix.

from distribution.

bwarden avatar bwarden commented on July 24, 2024

I left my host hibernated overnight, and with an error of 54409 seconds, chrony didn't believe the reference clock was accurate, so it didn't update. I'd recommend adding "trust" to the refclock statement so that chrony always believes this clock.

from distribution.

Andy2244 avatar Andy2244 commented on July 24, 2024

@bwarden Just a quick followup, i had to add ntp servers also, since after 3+ days of having the vm suspended the clock would not forward again. Chrony seemed not to be able to use/find the refclock device for whatever reasons.

Here is my current config that seem to work for anyone having this issues.
/etc/chrony.conf

refclock PHC /dev/ptp0 trust poll 2
makestep 1 -1
maxdistance 16.0
pool pool.ntp.org iburst
driftfile /var/lib/chrony/drift

from distribution.

bwarden avatar bwarden commented on July 24, 2024

Is there any useful information in dmesg or the journal? I wonder if there's an issue with the virtual device.

from distribution.

Andy2244 avatar Andy2244 commented on July 24, 2024

@bwarden Forgot to check those sorry.

from distribution.

DonJianguo avatar DonJianguo commented on July 24, 2024

Hello. If only taking RTC/PHC as the source in chrony.conf, what is the confluence on Time Cycle in the VM / HyperV ?

from distribution.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.