Code Monkey home page Code Monkey logo

fence-agents's Introduction

Fence agents

Fence agents were developed as device "drivers" which are able to prevent computers from destroying data on shared storage. Their aim is to isolate a corrupted computer, using one of three methods:

  • Power - A computer that is switched off cannot corrupt data, but it is important to not do a "soft-reboot" as we won't know if this is possible. This also works for virtual machines when the fence device is a hypervisor.
  • Network - Switches can prevent routing to a given computer, so even if a computer is powered on it won't be able to harm the data.
  • Configuration - Fibre-channel switches or SCSI devices allow us to limit who can write to managed disks.

Fence agents do not use configuration files, as configuration management is outside of their scope. All of the configuration has to be specified either as command-line arguments or lines of standard input (see the complete list for more info).

Because many fence agents are quite similar to each other, a fencing library (in Python) was developed. Please use it for further development. Creating or modifying a new fence agent should be quite simple using this library.

Where can I find more information?

fence-agents's People

Contributors

andyprice avatar astralbob avatar bdperkin avatar beekhof avatar bmarzins avatar chrissie-c avatar fabbione avatar feist avatar helen-fornazier avatar jbrassow avatar jfriesse avatar jnpkrn avatar johnruemker avatar kergon avatar lhh avatar marxsk avatar megelatim avatar mssedusch avatar nrwahl2 avatar oalbrigt avatar ondrejhome avatar ondrejmular avatar rhn-support-wcheng avatar rohara avatar ryan-mccabe avatar scattym avatar swhiteho avatar teigland avatar vuntz avatar watologo1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fence-agents's Issues

fence_vmware_* - Cannot manage hosts on different ESX hosts through the vcenter host

If I use fence_vmware_soap or fence_vmware_rest with -a <vcenter_host> -o list, it is for some reason only showing me the hosts managed by a single ESX host, rather than all the hosts across all the ESX hosts managed by that vCenter instance. Using the same credentials, I can log into vSphere and see all the hosts that I should be able to manage, across all ESX hosts. I found that I have to use -a <esx_host> instead at this point.

incorrect help for fence_rhevm -n (--plug) option

fence_rhevm --help :

-n, --plug=[id] Physical plug number on device, UUID or
identification of machine

This is not correct, the parameter must map to the VM name as known to RHV

Should be

-n, --plug=[name] The VM name in RHV

Can't use "fence_pve" in pacemkaer

Don't no if is right place to ask but I try anyway.

I have working "fence_pve" agent if I run it in terminal with arguments, but it doesn't work if I try to add it as stonith resource in pacemaker. It says some parameters like "passwd" or "login" does not exist (they are listed here [1]).
some more info.

# stonith -L |grep pve

external/fence_pve
# stonith -t external/fence_pve -n
Failed: You have to enter fence address

Please use '-h' for usage

CRIT: external_get_confignames: 'fence_pve getconfignames' failed with rc 1

I'm using "fence_pve" from Debian testing which is identical with "fence_pve" in this repo.

[1] https://www.mankier.com/8/fence_pve#Stdin_Parameters

Clarify license for this repository (license conflict with APLv2)

  • The license in doc/README.licence, doc/COPYING.applications and doc/COPYING.libraries state that this is released under LGPLv2/GPLv2
  • doc/README.licence is licensing "The Red Hat Cluster" and not fence-agents
  • In agents/amt_ws/fence_amt_ws.py there is a license header for APLv2 (which is only compatible with GPLv3/LGPLv3 and not GPLv2/LGPLv2)
  • The files make/gitlog-to-changelog and make/git-version-gen state those files are released under GPLv3
  • agents/ovh/fence_ovh.py is licensed under CC BY-SA 3.0 (not compatible with any non-CC licenses: https://creativecommons.org/share-your-work/licensing-considerations/compatible-licenses/)
  • The file COPYRIGHT doesn't list all contributors
  • m4/ac_python_module.m4 sets license AllPermissive (GPL is strongly protective)

As a result it is not clear under which license this repository can be used (especially delivered as a package) and it is even not compliant to the licenses used inside the repository itself.

Necessary steps:

  • Clean up the licenses in doc and create a LICENSE file in the root of the project as is common practice on Github and other projects
  • In best case relicense the whole project under GPLv3 (LGPL for libraries) or at least remove agents/amt_ws/fence_amt_ws.py from the repository as APLv2 is not compatible with GPLv2
  • following the recommendation of the GPLv3 license text make sure that every source file includes the license header. See "How to Apply These Terms to Your New Programs" at http://www.gnu.org/copyleft/gpl.html
  • update COPYRIGHT file in /doc

Following files include a license header:

  • agents/amt_ws/fence_amt_ws.py (APLv2 - not compatible with GPLv2/LGPLv2)
  • agents/kdump/fence_kdump.c (GPLv2)
  • agents/kdump/fence_kdump_send.c (GPLv2)
  • agents/kdump/message.h (GPLv2)
  • agents/kdump/options.h (GPLv2)
  • agents/kdump/version.h (GPLv2)
  • agents/ovh/fence_ovh.py (CC BY-SA 3.0 - not compatible with any non-CC license)
  • agents/xenapi/fence_xenapi.py (GPLv2)
  • agents/zvm/fence_zvm.c (LGPLv2 - library?)
  • agents/zvm/fence_zvm.h (LGPLv2 - library?)
  • agents/zvm/fence_zvmip.c (LGPLv2 - library?)
  • lib/XenAPI.py.py (LGPLv2)

fence_scsi: not Python3 safe

Using fence_scsi with python3 as the interpreter leads to errors like this:

2018-05-22 14:44:19,118 DEBUG: 0 totem.cluster_name (str) = zfs
Traceback (most recent call last):
  File "/usr/sbin/fence_scsi", line 512, in <module>
    main()
  File "/usr/sbin/fence_scsi", line 489, in main
    options["--key"] = generate_key(options)
  File "/usr/sbin/fence_scsi", line 199, in generate_key
    return "%.4s%.4d" % (get_cluster_id(options), int(get_node_id(options)))
  File "/usr/sbin/fence_scsi", line 184, in get_cluster_id
    return hashlib.md5(match.group(1)).hexdigest()
TypeError: Unicode-objects must be encoded before hashing

RHEL6 branches mismatch with git.fedorahosted.org/fence-agents.git (follow-up of #64)

Unfortunately, it seems that GitHub migration killed integrity.

$ git clone https://github.com/ClusterLabs/fence-agents
$ cd fence-agents
$ git remote add fh git://git.fedorahosted.org/fence-agents.git
$ git fetch origin
$ git fetch fh
$ C=$(git merge-base origin/RHEL6 fh/RHEL6)
$ git log -n1 --decorate=full --simplify-by-decoration "${C}"
commit 5a518f3cb382c2711bd027916164c7ac45ab3e3e (tag: v3.1.5)
Author: Marek 'marx' Grac <[email protected]>
Date:   Fri Jul 8 09:54:42 2011 +0200

    fence_drac5: Incorrect output of 'list' operation on Drac 5

    Previously (due to other bug) fence agents operation 'list' was not
    working properly. After fixing that issue, drac5 returned empty line
    instead of N/A as an output for 'list'. This happends because fencing
    library does not have support for situations when for one fence agents
    there are fence devices that supports / not supports 'list' option.

    Resolves: rhbz#718194 (regression introduced by previous patch)

Thus, the newest commit these two variants of RHEL6 branch have in common
is almost 5 years ago (long before migration to GitHub, IIUIC) when
fence-agents-3.1.5 was released.

It is expected that the two remotes match on all the branches up
to the point fedorahosted.org was abandoned as the current unhealthy
state results in "all hosting sites are equal (up to the point of
abandonment) but some are more equal".

fence_scsi action list ?

Hi,

Does fence_scsi implements list action ? Because when i tried tu use it on multi-nodes fencing already failed.

Thanks for your responses.

Regards.

fence_vmware_soap does not move in RHEL6.7.

Hi,

Python of RHEL6.7 varies in a parameter of logging.StreamHandler for 2.6.6.

[root@rh67-01 rpm]# fence_vmware_soap -o list -a "xxx.xxx.xx.xx" -l "xxxx" -p "xxx" -z --ssl-insecure
Traceback (most recent call last):
File "/usr/sbin/fence_vmware_soap", line 257, in
main()
File "/usr/sbin/fence_vmware_soap", line 223, in main
options_global = check_input(device_opt, process_input(device_opt))
File "/usr/share/fence/fencing.py", line 633, in check_input
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stderr))
TypeError: init() got an unexpected keyword argument 'stream'

I think that the following correction is necessary.

`/fence/agents/lib/fencing.py
(snip)
632 ## add logging to stderr
633 if sys.version_info >= (2, 7, 0):
634 logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stderr))
635 else:
636 logging.getLogger().addHandler(logging.StreamHandler(strm=sys.stderr))
(snip)

Best Regards,
Hideo Yamauchi.

fence_vmware_rest: monitor issue with more than 1000 VMs

In relation to the known BUG https://access.redhat.com/solutions/3718421
Caused by this code:

res = send_command(conn, "vcenter/vm")

Proposed solution:
It makes more sense to supply the filter parameters as described here
http://vmware.github.io/vsphere-automation-sdk-rest/6.7.1/operations/com/vmware/vcenter/vm.list-operation.html

Need to convert list of VM names into "vcenter/vm?filter.names.1=vm1&filter.names.2=vm2&....

[Enhancement] The fence_scsi watchdog script can cause mount error.

Hi All,

We configured the cluster using fence_scsi in the pacemaker cluster.
As recommended by RHEL, it also uses the watchdog service and fence_scsi_check_hardreboot.

An error may occur when mounting a partition managed by fence_scsi by Filesystem RA with start.
We received a conclusion from RHEL that this is because the sg_persist command executed from the script executed by the watchdog service makes the mount target invisible for a moment.

This situation can be easily reproduced by executing the following two scripts simultaneously.
(In the following example, the device name is /dev/msa_b.)

If you run the script at the same time, mount sometimes fails.

  1. Script1.
#!/bin/sh
for i in {1..1000}; do
echo "### ${i} ###"

mount /dev/msa_b1 /dbfp/pgdata
if [ $? -ne 0 ]; then
    break
fi
sleep 2
umount /dev/msa_b1
sleep 2

done;
  1. Script2.
#!/bin/sh
for i in {1..10000}; do
echo "### ${i} ###"
sg_persist -n -i -k -d /dev/msa_b
sleep 1
done;

Is it possible to change the sg_persist command executed from fence_scsi(scsi_check) to something to avoid this mounting error?
Or is it difficult to change that?

(I tried to confirm the same with fnc_mpath (mpathpersist), but mpathpersist did not seem to have a problem.)

Best Regards,
Hideo Yamauchi.

How to run agent during development?

So I'm reading this:

https://docs.pagure.org/ClusterLabs.fence-agents/FenceAgentAPI.md

All making sense however if I try to run my agent (or say the AWS agent) directly via Python, I get:

~/fence-agents# python agents/aws/fence_aws.py
Traceback (most recent call last):
  File "agents/aws/fence_aws.py", line 7, in <module>
    from fencing import *
ImportError: No module named fencing

I unblocked myself by adding another sys.path.append call pointing to my local fence-agents/lib directory. First I tried setting an env variable FENCEAGENTSLIBDIR but that didn't work (so no clue what sys.path.append("@FENCEAGENTSLIBDIR@") is doing -- rarely use Python).

It would be great if there was a brief overview of the workflow of developing an agent. Particularly, how to get around the above issue. When do I need to build? Etc. I think I'm unblocked but at the very least opening this for the next person in my shoes.

[RFE] fence-agents could use some porting attention (FreeBSD)

Already told e-ddie on IRC, but just for the record, fence-agents fails to build on FreeBSD.

./autogen.sh
./configure PYTHON=python3.6 MAKE=gmake

the first issue appears to be in configure itself:

15:43:08 checking for strtol... yes
15:43:08 find: -printf: unknown primary or operator
15:43:08 checking for python version... 3.6

the find implementation on FreeBSD does not understand %P.

There are several errors during the build, probably caused by generated bashisms. bash is not default shell on many system and we should probably not assume itยดs even installed.

Question: fabrics and network fencing

(please, let me know if this isn't the right place to post questions, I am new to the community)
Hi,
The README.md defines power, networking and configuration fencing devices, I also see in some parts of the code the option "fabric_fencing", but in the actions there are defined are on, off, reboot, status, list, and monitor.
Then it seems that only power fencing is supported, is this correct? Or maybe I am not understanding the right way to write a fabric or networking fencing within the project ? I would appreciate any pointers

Thanks

./configure: line 3004: syntax error near unexpected token `2.2.6'

Should I run ./autogen.sh with any kind of argument?

koike@fence-debian9-1:~/fence-agents$ ./autogen.sh
autoreconf: Entering directory `.'
autoreconf: configure.ac: not using Gettext
autoreconf: running: aclocal -I make -I m4
configure.ac:70: warning: AC_LANG_CONFTEST: no AC_LANG_SOURCE call detected in body
../../lib/autoconf/lang.m4:193: AC_LANG_CONFTEST is expanded from...
../../lib/autoconf/general.m4:2601: _AC_COMPILE_IFELSE is expanded from...
../../lib/autoconf/general.m4:2617: AC_COMPILE_IFELSE is expanded from...
configure.ac:70: the top level
configure.ac:70: warning: AC_LANG_CONFTEST: no AC_LANG_SOURCE call detected in body
../../lib/autoconf/lang.m4:193: AC_LANG_CONFTEST is expanded from...
../../lib/autoconf/general.m4:2601: _AC_COMPILE_IFELSE is expanded from...
../../lib/autoconf/general.m4:2617: AC_COMPILE_IFELSE is expanded from...
configure.ac:70: the top level
autoreconf: configure.ac: tracing
configure.ac:70: warning: AC_LANG_CONFTEST: no AC_LANG_SOURCE call detected in body
../../lib/autoconf/lang.m4:193: AC_LANG_CONFTEST is expanded from...
../../lib/autoconf/general.m4:2601: _AC_COMPILE_IFELSE is expanded from...
../../lib/autoconf/general.m4:2617: AC_COMPILE_IFELSE is expanded from...
configure.ac:70: the top level
autoreconf: configure.ac: not using Libtool
autoreconf: running: /usr/bin/autoconf --include=make
configure.ac:70: warning: AC_LANG_CONFTEST: no AC_LANG_SOURCE call detected in body
../../lib/autoconf/lang.m4:193: AC_LANG_CONFTEST is expanded from...
../../lib/autoconf/general.m4:2601: _AC_COMPILE_IFELSE is expanded from...
../../lib/autoconf/general.m4:2617: AC_COMPILE_IFELSE is expanded from...
configure.ac:70: the top level
autoreconf: running: /usr/bin/autoheader --include=make
configure.ac:70: warning: AC_LANG_CONFTEST: no AC_LANG_SOURCE call detected in body
../../lib/autoconf/lang.m4:193: AC_LANG_CONFTEST is expanded from...
../../lib/autoconf/general.m4:2601: _AC_COMPILE_IFELSE is expanded from...
../../lib/autoconf/general.m4:2617: AC_COMPILE_IFELSE is expanded from...
configure.ac:70: the top level
autoreconf: running: automake --add-missing --copy --no-force
configure.ac:70: warning: AC_LANG_CONFTEST: no AC_LANG_SOURCE call detected in body
../../lib/autoconf/lang.m4:193: AC_LANG_CONFTEST is expanded from...
../../lib/autoconf/general.m4:2601: _AC_COMPILE_IFELSE is expanded from...
../../lib/autoconf/general.m4:2617: AC_COMPILE_IFELSE is expanded from...
configure.ac:70: the top level
configure.ac:54: installing './compile'
configure.ac:18: installing './config.guess'
configure.ac:18: installing './config.sub'
configure.ac:9: installing './install-sh'
configure.ac:9: installing './missing'
fence/agents/Makefile.am: installing './depcomp'
autoreconf: Leaving directory `.'
Now run ./configure and make
koike@fence-debian9-1:~/fence-agents$ ./configure
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... no
checking for mawk... mawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
./configure: line 3004: syntax error near unexpected token `2.2.6'
./configure: line 3004: `LT_PREREQ(2.2.6)'

fence_kdump: monitor action does not work correctly.

fence_kdump monitor action checks local node only and does not checks target node.
(It is described in a commit log "monitor action checks if LOCAL node can enter kdump")

It makes no sense because fence_kdump have to check target node configuration.
And it is difficult to check target node without ssh or other remote shell command.

I have no ideas to resolve this issue. Anyone have ideas?

TypeError: Client() takes at least 1 argument (0 given)

Hi!

I'm trying to use fence_openstack agent and it requires auth.

When I run command like this:

fence_openstack --uuid=xxx -l xxx -p xxx --auth-url=xxx

I get an error:

Traceback (most recent call last):
  File "/usr/sbin/fence_openstack", line 123, in <module>
    main()
  File "/usr/sbin/fence_openstack", line 119, in main
    result = fence_action(None, options, set_power_status, get_power_status,None)
  File "/usr/share/fence/fencing.py", line 884, in fence_action
    status = get_multi_power_fn(connection, options, get_power_fn)
  File "/usr/share/fence/fencing.py", line 752, in get_multi_power_fn
    plug_status = get_power_fn(connection, options)
  File "/usr/sbin/fence_openstack", line 21, in get_power_status
    output = nova_run_command(options, "status")
  File "/usr/sbin/fence_openstack", line 45, in nova_run_command
    novaclient=nova_login(username,password,projectname,auth_url,user_domain_name,project_domain_name)
  File "/usr/sbin/fence_openstack", line 35, in nova_login
    nova = novaclient.Client(session=session)
TypeError: Client() takes at least 1 argument (0 given)

The problem is here:

def nova_login(username,password,projectname,auth_url,user_domain_name,project_domain_name):
auth=v3.Password(username=username,password=password,project_name=projectname,user_domain_name=user_domain_name,project_domain_name=project_domain_name,auth_url=auth_url)
session = ksc_session.Session(auth=auth)
keystone = ksclient.Client(session=session)
nova = novaclient.Client(session=session)
return nova

novaclient gets 1 positional argument "Version" here

https://github.com/openstack/python-novaclient/blob/9b184080fcbd3f67ba3bf626328609259ead58e8/novaclient/client.py#L270-L291

fence_vmware_soap doesn't work with >= Python 2.7.9

It appears that on systems with Python version 2.7.9 and beyond the vmware_fence_soap script fails with the following:

root@server:~# fence_vmware_soap -a 172.16.30.232 -l vmware_fence_user -p secure_pass --ssl-insecure -z -o list
Unable to connect/login to fencing device

This is seen on Ubuntu 16.04 running Python 2.7.12 out of the box, whereas CentOS 7 running Python 2.7.5 doesn't have this issue.

Hacky changes can be made to fence_vmware_soap to fix it, adding in the following near the top

import ssl
ssl._create_default_https_context = ssl._create_unverified_context

This yields InsecureRequestWarning alerts still, which can also be overridden using

import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

This appears to be related to HTTPS Hostname verification being added in Python 2.7.9 (https://docs.python.org/2/library/httplib.html#httplib.HTTPSConnection)

using fence_ipmilan from RHCS with pacemaker on SLES

Hi,
i'm trying to use a fence_ipmilan from the RHCS in conjunction with pacemaker 1.1.12 on a SLES 11 SP4 node. I found a rpm for SLES 11, fence-agents-3.1.11-7.17. I also installed the necessary python packages.
First i'd like to know if it is basically possible to use RHCS fence agents with pacemaker.
I found some postings claiming that it's possible:
http://lists.clusterlabs.org/pipermail/users/2016-February/002287.html
I followed the guide:

mkdir /usr/lib{,64}/stonith/plugins/rhcs

cd /usr/lib{,64}/stonith/plugins/rhcs

ln -s /usr/sbin/fence_pve

but with fence_ipmilan.
I can stonith the other node using /usr/sbi/fence_ipmilan:
fence_ipmilan -a 146.107.235.163 -p ********* -P -l root -o reboot -v

But using the fence agent with stonith does not suceed:

ha-idg-1:/usr/sbin # stonith -dt rhcs/ipmilan ipaddr=146.107.235.163 login=root passwd=*** -S
** (process:10896): DEBUG: NewPILPluginUniv(0x6062b0)
** (process:10896): DEBUG: PILS: Plugin path = /usr/lib64/stonith/plugins:/usr/lib64/heartbeat/plugins
** (process:10896): DEBUG: NewPILInterfaceUniv(0x606890)
** (process:10896): DEBUG: NewPILPlugintype(0x606350)
** (process:10896): DEBUG: NewPILPlugin(0x6071d0)
** (process:10896): DEBUG: NewPILInterface(0x607220)
** (process:10896): DEBUG: NewPILInterface(0x607220:InterfaceMgr/InterfaceMgr)*** user_data: 0x(nil) *******
** (process:10896): DEBUG: InterfaceManager_plugin_init(0x607220/InterfaceMgr)
** (process:10896): DEBUG: Registering Implementation manager for Interface type 'InterfaceMgr'
** (process:10896): DEBUG: PILS: Looking for InterfaceMgr/generic => [/usr/lib64/stonith/plugins/InterfaceMgr/generic.so]
** (process:10896): DEBUG: Plugin file /usr/lib64/stonith/plugins/InterfaceMgr/generic.so does not exist
** (process:10896): DEBUG: PILS: Looking for InterfaceMgr/generic => [/usr/lib64/heartbeat/plugins/InterfaceMgr/generic.so]
** (process:10896): DEBUG: Plugin path for InterfaceMgr/generic => [/usr/lib64/heartbeat/plugins/InterfaceMgr/generic.so]
** (process:10896): DEBUG: PluginType InterfaceMgr already present
** (process:10896): DEBUG: Plugin InterfaceMgr/generic init function: InterfaceMgr_LTX_generic_pil_plugin_init
** (process:10896): DEBUG: NewPILPlugin(0x606be0)
** (process:10896): DEBUG: Plugin InterfaceMgr/generic loaded and constructed.
** (process:10896): DEBUG: Calling init function in plugin InterfaceMgr/generic.
** (process:10896): DEBUG: NewPILInterface(0x607b90)
** (process:10896): DEBUG: NewPILInterface(0x607b90:InterfaceMgr/stonith2)*** user_data: 0x0x6072d0 *******
** (process:10896): DEBUG: Registering Implementation manager for Interface type 'stonith2'
** (process:10896): DEBUG: IfIncrRefCount(1 + 1 )
** (process:10896): DEBUG: PluginIncrRefCount(0 + 1 )
** (process:10896): DEBUG: IfIncrRefCount(1 + 100 )
** (process:10896): DEBUG: PILS: Looking for stonith2/rhcs => [/usr/lib64/stonith/plugins/stonith2/rhcs.so]
** (process:10896): DEBUG: Plugin path for stonith2/rhcs => [/usr/lib64/stonith/plugins/stonith2/rhcs.so]
** (process:10896): DEBUG: Creating PluginType for stonith2
** (process:10896): DEBUG: NewPILPlugintype(0x607ee0)
** (process:10896): DEBUG: Plugin stonith2/rhcs init function: stonith2_LTX_rhcs_pil_plugin_init
** (process:10896): DEBUG: NewPILPlugin(0x6089c0)
** (process:10896): DEBUG: Plugin stonith2/rhcs loaded and constructed.
** (process:10896): DEBUG: Calling init function in plugin stonith2/rhcs.
** (process:10896): DEBUG: NewPILInterface(0x608010)
** (process:10896): DEBUG: NewPILInterface(0x608010:stonith2/rhcs)*** user_data: 0x0x7fbc764172b0 *******
** (process:10896): DEBUG: IfIncrRefCount(101 + 1 )
** (process:10896): DEBUG: PluginIncrRefCount(0 + 1 )
debug: rhcs_set_config: called.
debug: rhcs_status: called.
debug: rhcs_run_cmd: Calling '/usr/lib64/stonith/plugins/rhcs/fence_ipmilan'
debug: set rhcs plugin param 'agent=ipmilan'
debug: set rhcs plugin param 'action=monitor'
debug: set rhcs plugin param 'login=root'
debug: set rhcs plugin param 'passwd=***'
debug: set rhcs plugin param 'ipaddr=146.107.235.163'
CRIT: rhcs_run_cmd: fence agent exit code: 1
CRIT: rhcs_status: 'ipmilan monitor' failed with rc -1
ERROR: rhcs/ipmilan device not accessible.
debug: rhcs_destroy: called.
** (process:10896): DEBUG: IfIncrRefCount(1 + -1 )
** (process:10896): DEBUG: RemoveAPILInterface(0x608010/rhcs)
** (process:10896): DEBUG: RmAPILInterface(0x608010/rhcs)
** (process:10896): DEBUG: PILunregister_interface(stonith2/rhcs)
** (process:10896): DEBUG: Calling InterfaceClose on stonith2/rhcs
** (process:10896): DEBUG: IfIncrRefCount(102 + -1 )
** (process:10896): DEBUG: PluginIncrRefCount(1 + -1 )
** (process:10896): DEBUG: RemoveAPILPlugin(stonith2/rhcs)
** (process:10896): DEBUG: RmAPILPlugin(stonith2/rhcs)
** (process:10896): DEBUG: Closing dlhandle for (stonith2/rhcs)
** (process:10896): DEBUG: RmAPILPluginType(stonith2)
** (process:10896): DEBUG: DelPILPluginType(stonith2)
** (process:10896): DEBUG: DelPILInterface(0x608010/rhcs)

I can configure the agent in pacemaker:
primitive prim_stonith_ipmilan_ha-idg-2 stonith:rhcs/ipmilan
params ipaddr=146.107.235.163 login=root passwd=***
params power_wait=4 lanplus action=off delay=20 timeout=120
op monitor interval=3600s timeout=120s
meta target-role=Started

But starting fails:
...
May 18 18:23:10 ha-idg-1 attrd[8395]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-prim_stonith_ipmilan_ha-idg-2 ()
May 18 18:23:10 ha-idg-1 attrd[8395]: notice: attrd_perform_update: Sent delete 6815: node=ha-idg-1, attr=fail-count-prim_stonith_ipmilan_ha-idg-2, id=<n/a>, set=(null), section=status
May 18 18:23:10 ha-idg-1 crmd[8397]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
May 18 18:23:11 ha-idg-1 pengine[8396]: notice: unpack_config: On loss of CCM Quorum: Ignore
May 18 18:23:11 ha-idg-1 pengine[8396]: warning: unpack_rsc_op_failure: Processing failed op migrate_to for prim_vm_mausdb on ha-idg-1: unknown error (1)
May 18 18:23:11 ha-idg-1 pengine[8396]: notice: LogActions: Start prim_stonith_ipmilan_ha-idg-2 (ha-idg-1)
May 18 18:23:11 ha-idg-1 pengine[8396]: notice: process_pe_message: Calculated Transition 39: /var/lib/pacemaker/pengine/pe-input-140.bz2
May 18 18:23:11 ha-idg-1 crmd[8397]: notice: do_te_invoke: Processing graph 39 (ref=pe_calc-dc-1495124591-264) derived from /var/lib/pacemaker/pengine/pe-input-140.bz2
May 18 18:23:11 ha-idg-1 crmd[8397]: notice: te_rsc_command: Initiating action 10: monitor prim_stonith_ipmilan_ha-idg-2_monitor_0 on ha-idg-1 (local)
May 18 18:23:11 ha-idg-1 crmd[8397]: notice: process_lrm_event: Operation prim_stonith_ipmilan_ha-idg-2_monitor_0: not running (node=ha-idg-1, call=1905, rc=7, cib-update=2029, confirmed=true)
May 18 18:23:11 ha-idg-1 crmd[8397]: notice: te_rsc_command: Initiating action 9: probe_complete probe_complete-ha-idg-1 on ha-idg-1 (local) - no waiting
May 18 18:23:11 ha-idg-1 crmd[8397]: notice: te_rsc_command: Initiating action 45: start prim_stonith_ipmilan_ha-idg-2_start_0 on ha-idg-1 (local)
May 18 18:23:11 ha-idg-1 stonith: rhcs_run_cmd: fence agent exit code: 1
May 18 18:23:11 ha-idg-1 stonith: rhcs_status: 'ipmilan monitor' failed with rc -1
May 18 18:23:11 ha-idg-1 stonith: rhcs/ipmilan device not accessible.
May 18 18:23:12 ha-idg-1 stonith: rhcs_run_cmd: fence agent exit code: 1
May 18 18:23:12 ha-idg-1 stonith: rhcs_status: 'ipmilan monitor' failed with rc -1
May 18 18:23:12 ha-idg-1 stonith: rhcs/ipmilan device not accessible.
May 18 18:23:12 ha-idg-1 stonith-ng[8393]: notice: log_operation: Operation 'monitor' [12102] for device 'prim_stonith_ipmilan_ha-idg-2' returned: -201 (Generic Pacemaker error)
May 18 18:23:12 ha-idg-1 stonith-ng[8393]: warning: log_operation: prim_stonith_ipmilan_ha-idg-2:12102 [ Performing: stonith -t rhcs/ipmilan -S ]
May 18 18:23:12 ha-idg-1 stonith-ng[8393]: warning: log_operation: prim_stonith_ipmilan_ha-idg-2:12102 [ failed: 255 ]
May 18 18:23:12 ha-idg-1 crmd[8397]: warning: stonith_plugin: rhcs plugins don't really support getinfo-devid
May 18 18:23:12 ha-idg-1 crmd[8397]: error: process_lrm_event: Operation prim_stonith_ipmilan_ha-idg-2_start_0 (node=ha-idg-1, call=1906, status=4, cib-update=2030, confirmed=true) Error
May 18 18:23:12 ha-idg-1 crmd[8397]: warning: status_from_rc: Action 45 (prim_stonith_ipmilan_ha-idg-2_start_0) on ha-idg-1 failed (target: 0 vs. rc: 1): Error
May 18 18:23:12 ha-idg-1 crmd[8397]: warning: update_failcount: Updating failcount for prim_stonith_ipmilan_ha-idg-2 on ha-idg-1 after failed start: rc=1 (update=INFINITY, time=1495124592)
May 18 18:23:12 ha-idg-1 crmd[8397]: notice: abort_transition_graph: Transition aborted by prim_stonith_ipmilan_ha-idg-2_start_0 'modify' on ha-idg-1: Event failed (magic=4:1;45:39:0:be494227-3368-4ea
...

Is it a basic problem or did i just make an error ?

Thanks.

Bernd

"fence_pve" cannot import name run_delay

Hi,

I'm trying to test stonith on proxmox with "fence_pve" agent, unfortunately if i run it I see
ImportError: cannot import name run_delay
error, I've checked /usr/share/fence/fencing.py and can't find it there too.

If I remove "run_delay" from imported modules in "fence_pve" I run into next issues.
fence_pve --action=status --ip=192.168.122.2 --username=root@pam --password=secret --plug=30000 Parse error: option --action not recognized Please use '-h' for usage
Don't know if it's related, did I miss something?

OS: Debian Wheezy amd64

Remove obsolete/deprecated thing

For next major version we can remove obsolete/deprecated thing. This issue should track pieces that we should take look at and remove:

  • Support for --plug [switch]:[plug] notation that was used before (e.g. in fence_apc)
  • Support for enable/disable instead of standard on/off (in fencing.py)

Currently, there is not enough of the code to remove :)

The change of the pass is necessary.

In the next version of docker, path name is different.

  • Docker 1.7.1

In Docker 1.7.1, it becomes v1.12.

(snip)
def send_cmd(options, cmd, post = False):
url = "http%s://%s:%s/v1.11/%s" % ("s" if "--ssl" in options else "", options["--ip"], options["--ipport"], cmd)
(snip)

The pass of fence_docker seems to have to be a parameter to support Docker of the new version.

Best Regards,
Hideo Yamauch.

fence_openstack - bad options "username" and "password"

According to pcs, fence_openstack requires options that are not allowed:

[cloudadm@my-host ~]$ sudo pcs stonith create cloud-stonith-my-host fence_openstack \
    username=xxx \
    password=xxx \
    auth-url=http://xxx:5000/v3 \
    project-name="xxx" \
    user-domain-name=Default \
    uuid=8473a546-22c8-4cb6-8776-ba1628ff594f
Error: invalid stonith options: 'password', 'username', allowed options are: \
   action, auth-url, debug, delay, login, login_timeout, passwd, passwd_script, pcmk_action_limit,
   pcmk_delay_base, pcmk_delay_max, pcmk_host_argument, pcmk_host_check, pcmk_host_list,
   pcmk_host_map, pcmk_list_action, pcmk_list_retries, pcmk_list_timeout, pcmk_monitor_action, 
   pcmk_monitor_retries, pcmk_monitor_timeout, pcmk_off_action, pcmk_off_retries, pcmk_off_timeout, 
   pcmk_on_action, pcmk_on_retries, pcmk_on_timeout, pcmk_reboot_action, pcmk_reboot_retries,
   pcmk_reboot_timeout, pcmk_status_action, pcmk_status_retries, pcmk_status_timeout, power_timeout,
   power_wait, priority, project-domain-name, project-name, retry_on, shell_timeout, user-domain-name,
   uuid, verbose, use --force to override

The fencing agent should use already existing options such as "login", "passwd" and "passwd_script". For instance the options are:

[cloudadm@my-host ~]$ /usr/sbin/fence_openstack --help
Usage:
        fence_openstack [options]
Options:
   -l, --username=[name]          Login name
   -p, --password=[password]      Login password or passphrase
   --auth-url=[authurl]            Keystone Auth URL
   --project-name=[project]      Tenant Or Project Name
   --user-domain-name=[user-domain]      Keystone User Domain Name
   --project-domain-name=[project-domain]      Keystone Project Domain Name
   --uuid=[uuid]      UUID of the nova instance
   -S, --password-script=[script] Script to run to retrieve password
   -o, --action=[action]          Action: status, reboot (default), off or on
   -v, --verbose                  Verbose mode
   -D, --debug-file=[debugfile]   Debugging to output file
   -V, --version                  Output version information and exit
   -h, --help                     Display this help and exit
   --power-timeout=[seconds]      Test X seconds for status change after ON/OFF
   --shell-timeout=[seconds]      Wait X seconds for cmd prompt after issuing command
   --login-timeout=[seconds]      Wait X seconds for cmd prompt after login
   --power-wait=[seconds]         Wait X seconds after issuing ON/OFF
   --delay=[seconds]              Wait X seconds before fencing is started
   --retry-on=[attempts]          Count of attempts to retry power on

Redfish based fence agent

Hi,
are there any plans to do a redfish based fence agent? This could help to consolidate a lot of the new remote management cards.

Greetings
Klaas

HAVE_PYMOD_OPENSTACKCLIENT variable is not defined in configure.ac

Hi!

I'm trying to build project from source and I want to build fence_openstack agent. But I can't do it, because HAVE_PYMOD_OPENSTACKCLIENT variable is not defined and I always get warning message "Not building fence_openstack".
Problem is here:

fence-agents/configure.ac

Lines 245 to 252 in 3ae5b58

if echo "$AGENTS_LIST" | grep -q openstack; then
AC_PYTHON_MODULE(novaclient)
AC_PYTHON_MODULE(keystoneclient)
if test "x${HAVE_PYMOD_OPENSTACKCLIENT}" != xyes; then
AGENTS_LIST=$(echo "$AGENTS_LIST" | sed -E "s#openstack/fence_openstack.py( |$)##")
AC_MSG_WARN("Not building fence_openstack")
fi
fi

Maybe this variable should be defined somewhere else, but in my case I need to add this line here:

        AC_PYTHON_MODULE(openstackclient)
if echo "$AGENTS_LIST" | grep -q openstack; then
        AC_PYTHON_MODULE(novaclient)
        AC_PYTHON_MODULE(keystoneclient)
        AC_PYTHON_MODULE(openstackclient)
        if test "x${HAVE_PYMOD_OPENSTACKCLIENT}" != xyes; then
                AGENTS_LIST=$(echo "$AGENTS_LIST" | sed -E "s#openstack/fence_openstack.py( |$)##")
                AC_MSG_WARN("Not building fence_openstack")
        fi
fi

fence_rhevm is working only for admin

fence_rhevm script is sending start/stop REST API calls (vms//)
One of the parameters that fence_rhevm takes is the plug parameter that is actually the VM name as it known by RHEV
In order to construct a REST API call , fence_rhevm issues a search call via the API in order to find the VM UUID given its name
But, if the user is not an admin this call will return an empty result since search can be used only by admin user.
So, what we get here is a situation in which a user that owns the VM can not access it via the fence_rhevm script

A possible solution to this problem may be getting rid from the search request , sending a GET REST API call to get all user VMs and then analyzing the result in order to get the requested VM UUID

fence_vbox line 34 log_expect() missing "options" for first argument.

amy and pat are two of my five VirtualBox demo VMs.
yttrium is my workstation, hosting them

Issue this command:
[root@amy ~]# ./fence_vbox --plug=pat --action=status
> --ip=yttrium --ssh --username=mdiehn
> --identity-file=/root/.ssh/id_rsa-to-mdiehn_yttrium

Expected result:
Status: ON

Actual result:
Traceback (most recent call last):
File "./fence_vbox", line 113, in
main()
File "./fence_vbox", line 108, in main
result = fence_action(conn, options, set_power_status, get_power_status, get_outlets_status)
File "/usr/share/fence/fencing.py", line 964, in fence_action
status = get_multi_power_fn(tn, options, get_power_fn)
File "/usr/share/fence/fencing.py", line 871, in get_multi_power_fn
plug_status = get_power_fn(tn, options)
File "./fence_vbox", line 59, in get_power_status
_invoke(conn, options, "list", "runningvms")
File "./fence_vbox", line 34, in _invoke
conn.log_expect(options["--command-prompt"], int(options["--shell-timeout"]))
TypeError: log_expect() takes exactly 4 arguments (3 given)

Cause:

Line 34 of fence_vbox is missing the first argument, which I infer should be "options." Easy mistake to make seeing that the next two arguments are "options[blah, blah]", and "options[nag, nag]":

Is like this:
conn.log_expect(options["--command-prompt"], int(options["--shell-timeout"]))

Should be like this:
conn.log_expect(options, options["--command-prompt"], int(options["--shell-timeout"]))

fence_pve fails with stacktrace

Ubuntu 14.04

root@dev-nfs-archive-1001:/usr/sbin# fence_pve --ip=<ADDRESS> --username=root@pam --password=<PASSWORD> --action=status --plug=187

Traceback (most recent call last):
  File "/usr/sbin/fence_pve", line 184, in <module>
    main()
  File "/usr/sbin/fence_pve", line 175, in main
    options["auth"] = get_ticket(options)
  File "/usr/sbin/fence_pve", line 82, in get_ticket
    result = send_cmd(options, "access/ticket", post=post)
  File "/usr/sbin/fence_pve", line 110, in send_cmd
    if opt.has_key("--ssl") or opt.has_key("--ssl-secure"):
NameError: global name 'opt' is not defined

fence_apc_snmp.py error on python3

Hello, when I use fence_apc_snmp.py on fedora 26 with python 3 I get this error

    Traceback (most recent call last):
      File "./fence_apc_snmp.py", line 224, in <module>
        main()
      File "./fence_apc_snmp.py", line 220, in main
        result = fence_action(FencingSnmp(options), options, set_power_status, get_power_status, get_outlets_status)
      File "/usr/share/fence/fencing.py", line 780, in fence_action
        outlets = get_outlet_list(connection, options)
      File "./fence_apc_snmp.py", line 172, in get_outlets_status
        apc_set_device(conn)
      File "./fence_apc_snmp.py", line 114, in apc_set_device
        apc_type = conn.walk(OID_SYS_OBJECT_ID)
      File "/usr/share/fence/fencing_snmp.py", line 132, in walk
        output = self.run_command(cmd, additional_timemout).splitlines()
      File "/usr/share/fence/fencing_snmp.py", line 100, in run_command
        if (res_code != 0) or (re.search("^Error ", res_output, re.MULTILINE) != None):
      File "/usr/lib64/python3.6/re.py", line 182, in search
        return _compile(pattern, flags).search(string)
    TypeError: cannot use a string pattern on a bytes-like object

I also try latest version from this repo with sam error

Fence vmware Soap Problem

Environment : 2 VM linux CentOS 7.5.1804
pcs-0.9.162-5.el7.centos.1.x86_64
fence-agents-common-4.0.11-86.el7_5.3.x86_64
fence-agents-vmware-soap-4.0.11-86.el7_5.3.x86_64

pcs status log this error

WARNING: following stonith devices have the 'action' option set, it is recommended to set 'pcmk_off_action'

And the log show this error

Oct 9 12:54:08 XXX stonith-ng[2302]: warning: fence_vmware_soap[1563] stderr: [ Failed: Unable to obtain correct plug status or plug is not available ]

[Feature] fence_openstack - use an openrc file

For instance, fence_openstack requires some options to run correctly, for example --auth-url or --project-domain-name. However these parameters are usually written into an openrc file used to export OS_* variables.

We should be able to use an option called --openrc-file to get the pieces of information we need thanks to environment variables.

using a maintained soap library instead of deprecated python-suds

From Debian bug #788079: (http://bugs.debian.org/788079)

Date: Mon, 08 Jun 2015 14:03:07 +0200
From: Mathias Behrle <[email protected]>
Reply-To: Mathias Behrle <[email protected]>, [email protected]
To: [email protected]
Subject: [Debian-ha-maintainers] Bug#788079: fence-agents: Please use a maintained
        soap library instead of deprecated python-suds.

Package: fence-agents
Severity: important
User: [email protected]
Usertags: migrate-suds

Dear maintainer of fence-agents,

your package is listed as a Reverse Depend of the python-suds
package, which is now deprecated due to long time missing upstream
maintenance as well as missing compatibility for Python3 (#783029,
#774948, #782970). It is planned to remove python-suds before the
release of stretch.

Please consider to migrate your package to use a maintained soap
library (like pysimplesoap, at the time of writing in NEW).

Thanks for your work!

Best, Mathias

[Question] About multiple device specification of fence_scsi.

Hi All,

We are checking control using fence_scsi.

With fence_scsi, multiple devices can be specified as parameters.

# pcs stonith create prmStonithScsi fence_scsi pcmk_host_list="rh80-01 rh80-02" pcmk_reboot_action="off" devices="/dev/sdb,/dev/sdc,/dev/sdd" meta provides="unfencing" --force

However, when multiple devices are actually specified as parameters, inconsistency of reservation keys from split brain may occur.

The key of each node is as follows
 nodeA - 0x5e2a0000
 nodeB - 0x5e2a0001

After split brain occurs, each device is reserved as follows.

[root@rh80-01 ~]# sg_persist -i -n -k -d /dev/sdb
  PR generation=0x6a1, 4 registered reservation keys follow:
    0x5e2a0001
    0x5e2a0001
    0x5e2a0001
    0x5e2a0001
[root@rh80-01 ~]# sg_persist -i -n -k -d /dev/sdc
  PR generation=0x54, 4 registered reservation keys follow:
    0x5e2a0001
    0x5e2a0001
    0x5e2a0001
    0x5e2a0001
[root@rh80-01 ~]# sg_persist -i -n -k -d /dev/sdd
  PR generation=0x54, 4 registered reservation keys follow:
    0x5e2a0000
    0x5e2a0000
    0x5e2a0000
    0x5e2a0000

As a result, detection and restart of the watchdog service fail(fence_scsi_check_hardreboot), causing a service outage.

If you specify multiple devices, what cluster configuration do you assume?
In order to avoid this problem, is it necessary to set the delay parameter of fence_scsi to ensure that either node takes a reservation?

Best Regards,
Hideo Yamauchi.

Please tell me how to use fence_mpath

Hi,
I would like to confirm the fencing function using the shared disk now.
We assume 2 nodes.

When setting fence_mpath in the cluster, what kind of setting should be done?
Please tell me examples of the pcs command.

[Enhancement] fence_scsi - Correspondence to new node name specification method.

Hi All,

The following setup command line is available from RHEL8.

 pcs cluster setup my_cluster rh80-01 addr=192.168.106.183 addr=192.168.107.183  rh80-2 addr=192.168.106.184 addr=192.168.107.184

In this case, the corosync.conf nodelist is generated as follows.

(snip)
nodelist {
    node {
        ring0_addr: 192.168.106.183
        ring1_addr: 192.168.107.183
        name: rh80-01
        nodeid: 1
    }

    node {
        ring0_addr: 192.168.106.184
        ring1_addr: 192.168.107.184
        name: rh80-02
        nodeid: 2
    }
}
(snip)

At this time, nodelist returned by corosyc-cmapctl is as follows.

[root@rh80-01 ~]# corosync-cmapctl nodelist
nodelist.local_node_pos (u32) = 0
nodelist.node.0.name (str) = rh80-01
nodelist.node.0.nodeid (u32) = 1
nodelist.node.0.ring0_addr (str) = 192.168.106.183
nodelist.node.0.ring1_addr (str) = 192.168.107.183
nodelist.node.1.name (str) = rh80-02
nodelist.node.1.nodeid (u32) = 2
nodelist.node.1.ring0_addr (str) = 192.168.106.184
nodelist.node.1.ring1_addr (str) = 192.168.107.184

If you set the fence_scsi resource with the following command, get_node_id() will fail.

pcs stonith create prmStonithScsi fence_scsi pcmk_host_list="rh80-01 rh80-02" pcmk_reboot_action="off" devices="/dev/sdb" meta provides="unfencing" --force

This is because fence_scsi does not correspond to the designation of the node name at the time of cluster setup with new pcs.

def get_node_id(options):
        cmd = options["--corosync-cmap-path"] + " nodelist"

        match = re.search(r".(\d+).ring._addr \(str\) = " + options["--plug"] + "\n", run_cmd(options, cmd)["out"])
        return match.group(1) if match else fail_usage("Failed: unable to parse output of corosync-cmapctl or node does not exist")

If matching to nodelist.node.X.ringX_addr fails, it is necessary to match to nodelist.node.X.name.

re.search(r".(\d+).name \(str\) = " + options["--plug"] + "\n", run_cmd(options, cmd)["out"])

I need to fix it.
The setup of fence_scsi fails when using the new pcs cluster specification in RHEL8 environment.

Best Regards,
Hideo Yamauchi.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.