Hi,
I am currently in the status of testing "what can go wrong" using OpenNebula and the new linstor add-on. So far I could create new VMs, live migrate and delete VMs.
My current Test: I have switched off the passive storage node (resource state=unused) and switched it on again to see if replication starts again.
See detailed history below (FYI, that the text is partly crossed is unintended)
SETUP:
srv485 -> linstor controller & OpenNebula Frontend
srv484 & srv483 -> linstor satellite & storage node & OpenNebula KVM host,
datatore policy -> LINSTOR_AUTO_PLACE = 2
OS: Centos 7.5 (same error behaviour on Ubuntu 18.04 btw)
Is there conceptually something I am missing here or is this unexpected behavior?
oneadmin@srv485:~$ linstor resource list
╭────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node ┊ Port ┊ Usage ┊ State ┊
╞┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄╡
┊ OpenNebula-Image-0 ┊ srv483 ┊ 7000 ┊ Unused ┊ UpToDate ┊
┊ OpenNebula-Image-0 ┊ srv484 ┊ 7000 ┊ Unused ┊ UpToDate ┊
╰────────────────────────────────────────────────────────╯
#CREATE VM in OpenNebula
oneadmin@srv485:~$ onetemplate instantiate 0
VM ID: 0
oneadmin@srv485:~$ linstor resource list
╭──────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node ┊ Port ┊ Usage ┊ State ┊
╞┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄╡
┊ OpenNebula-Image-0 ┊ srv483 ┊ 7000 ┊ Unused ┊ UpToDate ┊
┊ OpenNebula-Image-0 ┊ srv484 ┊ 7000 ┊ Unused ┊ UpToDate ┊
┊ OpenNebula-Image-0-vm0-disk0 ┊ srv483 ┊ 7001 ┊ Unused ┊ UpToDate ┊
┊ OpenNebula-Image-0-vm0-disk0 ┊ srv484 ┊ 7001 ┊ InUse ┊ UpToDate ┊
╰──────────────────────────────────────────────────────────────────╯
#SWITCHING OFF NODE srv483 at this point
oneadmin@srv485:~$ linstor resource list
╭──────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node ┊ Port ┊ Usage ┊ State ┊
╞┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄╡
┊ OpenNebula-Image-0 ┊ srv483 ┊ 7000 ┊ ┊ Unknown ┊
┊ OpenNebula-Image-0 ┊ srv484 ┊ 7000 ┊ Unused ┊ UpToDate ┊
┊ OpenNebula-Image-0-vm0-disk0 ┊ srv483 ┊ 7001 ┊ ┊ Unknown ┊
┊ OpenNebula-Image-0-vm0-disk0 ┊ srv484 ┊ 7001 ┊ InUse ┊ UpToDate ┊
╰──────────────────────────────────────────────────────────────────╯
oneadmin@srv485:~$ linstor node list
╭─────────────────────────────────────────────────────────────────────╮
┊ Node ┊ NodeType ┊ Addresses ┊ State ┊
╞┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄╡
┊ srv483 ┊ SATELLITE ┊ 100.80.0.15,100.80.0.15:3366 (PLAIN) ┊ OFFLINE ┊
┊ srv484 ┊ SATELLITE ┊ 100.80.0.16,100.80.0.16:3366 (PLAIN) ┊ Online ┊
╰─────────────────────────────────────────────────────────────────────╯
#SWITCHING ON node srv483 again and waiting until the node comes back online
oneadmin@srv485:$ linstor node list
╭────────────────────────────────────────────────────────────────────╮
┊ Node ┊ NodeType ┊ Addresses ┊ State ┊
╞┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄╡
┊ srv483 ┊ SATELLITE ┊ 100.80.0.15,100.80.0.15:3366 (PLAIN) ┊ Online ┊
┊ srv484 ┊ SATELLITE ┊ 100.80.0.16,100.80.0.16:3366 (PLAIN) ┊ Online ┊
╰────────────────────────────────────────────────────────────────────╯
oneadmin@srv485:$ linstor resource list
╭──────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node ┊ Port ┊ Usage ┊ State ┊
╞┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄╡
┊ OpenNebula-Image-0 ┊ srv483 ┊ 7000 ┊ Unused ┊ UpToDate ┊
┊ OpenNebula-Image-0 ┊ srv484 ┊ 7000 ┊ Unused ┊ UpToDate ┊
┊ OpenNebula-Image-0-vm0-disk0 ┊ srv483 ┊ 7001 ┊ ┊ Unknown ┊
┊ OpenNebula-Image-0-vm0-disk0 ┊ srv484 ┊ 7001 ┊ InUse ┊ UpToDate ┊
╰──────────────────────────────────────────────────────────────────╯
oneadmin@srv485:$ linstor error-reports list
╭────────────────────────────────────────────────────────────╮
┊ Nr. ┊ Id ┊ Datetime ┊ Node ┊
╞┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄╡
┊ 1 ┊ 5BFE9E8A-90340-000000 ┊ 2018-11-28 14:56:47 ┊ srv483 ┊
╰────────────────────────────────────────────────────────────╯
oneadmin@srv485:$ linstor error-reports show 5BFE9E8A-90340-000000
ERROR REPORT 5BFE9E8A-90340-000000
============================================================
Application: LINBIT® LINSTOR
Module: Satellite
Version: 0.7.3
Build ID: 6e47dd2
Build time: 2018-11-22T10:55:51+00:00
Error time: 2018-11-28 14:56:47
Node: srv483
============================================================
Reported error:
Description:
Operations on resource 'OpenNebula-Image-0-vm0-disk0' were aborted
Cause:
Meta data creation for volume 0 failed because the execution of an external command failed
Correction:
- Check whether the required software is installed
- Check whether the application's search path includes the location
of the external software
- Check whether the application has execute permission for the external command
Category: LinStorException
Class name: ResourceException
Class canonical name: com.linbit.linstor.core.DrbdDeviceHandler.ResourceException
Generated at: Method 'createResourceMetaData', Source file 'DrbdDeviceHandler.java', Line #1524
Error message: Meta data creation for resource 'OpenNebula-Image-0-vm0-disk0' volume 0 failed
Error context:
Meta data creation for resource 'OpenNebula-Image-0-vm0-disk0' volume 0 failed
Call backtrace:
Method Native Class:Line number
createResourceMetaData N com.linbit.linstor.core.DrbdDeviceHandler:1524
createResource N com.linbit.linstor.core.DrbdDeviceHandler:1138
dispatchResource N com.linbit.linstor.core.DrbdDeviceHandler:364
run N com.linbit.linstor.core.DeviceManagerImpl$DeviceHandlerInvocation:1225
run N com.linbit.WorkerPool$WorkerThread:179
Caused by:
Description:
Execution of the external command 'drbdadm' failed.
Cause:
The external command exited with error code 20.
Correction:
- Check whether the external program is operating properly.
- Check whether the command line is correct.
Contact a system administrator or a developer if the command line is no longer valid
for the installed version of the external program.
Additional information:
The full command line executed was:
drbdadm -vvv --max-peers 7 -- --force create-md OpenNebula-Image-0-vm0-disk0/0
The external command sent the following output data:
drbdmeta 1001 v09 /dev/vg-drbdpool/OpenNebula-Image-0-vm0-disk0_00000 internal create-md 7 --force
The external command sent the follwing error information:
open(/dev/vg-drbdpool/OpenNebula-Image-0-vm0-disk0_00000) failed: No such file or directory
open(/dev/vg-drbdpool/OpenNebula-Image-0-vm0-disk0_00000) failed: No such file or directory
Command 'drbdmeta 1001 v09 /dev/vg-drbdpool/OpenNebula-Image-0-vm0-disk0_00000 internal create-md 7 --force' terminated with exit code 20
Category: LinStorException
Class name: ExtCmdFailedException
Class canonical name: com.linbit.extproc.ExtCmdFailedException
Generated at: Method 'execute', Source file 'DrbdAdm.java', Line #437
Error message: The external command 'drbdadm' exited with error code 20
Call backtrace:
Method Native Class:Line number
execute N com.linbit.drbd.DrbdAdm:437
simpleAdmCommand N com.linbit.drbd.DrbdAdm:388
createMd N com.linbit.drbd.DrbdAdm:217
createVolumeMetaData N com.linbit.linstor.core.DrbdDeviceHandler:1060
createResourceMetaData N com.linbit.linstor.core.DrbdDeviceHandler:1489
createResource N com.linbit.linstor.core.DrbdDeviceHandler:1138
dispatchResource N com.linbit.linstor.core.DrbdDeviceHandler:364
run N com.linbit.linstor.core.DeviceManagerImpl$DeviceHandlerInvocation:1225
run N com.linbit.WorkerPool$WorkerThread:179
END OF ERROR REPORT.
Maybe some additional information from the storage node that is switched off and on again:
BEFORE:
root@srv483:# ls /dev/vg-drbdpool/ -la
total 0
drwxr-xr-x 2 root root 80 Nov 29 09:20 .
drwxr-xr-x 21 root root 6460 Nov 29 09:20 ..
lrwxrwxrwx 1 root root 7 Nov 29 09:20 OpenNebula-Image-0_00000 -> ../dm-4
lrwxrwxrwx 1 root root 7 Nov 29 09:20 OpenNebula-Image-0-vm1-disk0_00000 -> ../dm-5
AFTER:
root@srv483:# ls /dev/vg-drbdpool/ -la
total 0
drwxr-xr-x 2 root root 60 Nov 29 09:25 .
drwxr-xr-x 21 root root 6420 Nov 29 09:25 ..
lrwxrwxrwx 1 root root 7 Nov 29 09:25 OpenNebula-Image-0_00000 -> ../dm-4
For some reason /dev/dm-5 vanished...
Thanks
Uli
P.S.: I also openend an issue at the OpenNebula Addon Repository and was adviced tom come here. For reference: OpenNebula/addon-linstor#2