gluster / storhaug Goto Github PK

View Code? Open in Web Editor NEW

12.0 9.0 9.0 41 KB

High Availability (HA) setup utility for NFS-Ganesha

License: GNU General Public License v2.0

Shell 90.31% Roff 9.69%

storhaug's People

Contributors

Stargazers

Watchers

Forkers

jfarcher amritasyaputra ryno83 epol thotz drgr33n chenhui0228 runejuhl akarasulu

storhaug's Issues

need to send SM_NOTIFY to client to recover locks after failover

[13:37:56] kkeithley, looks like there may not be anything in HA scripts to send SM_NOTIFY?
[13:38:13] what is SM_NOTIFY?
[13:40:14] it is NSM rpc call that must be made to NLM clients after failover to trigger them to reclaim locks
[13:42:31] and what would sending an SM_NOTIFY look like? A dbus msg to kick the ganesha.nfsd to do it?
[13:42:41] I'm guessing not.
[13:46:58] kkeithley, there's src/Protocols/NLM/sm_notify.c which generates sm_notify.ganesha

storhaug setup is throwing ERROR: num_servers != num_addrs

[01:08:56] hello
[01:09:15] storhaug setup is throwing
[01:09:17] an error

[root@gluster-1683 ~]# storhaug setup
Setting up
ERROR: num_servers != num_addrs
[root@gluster-1683 ~]# cat /etc/ctdb/nodes
192.168.0.1
192.168.0.2
192.168.0.3
[root@gluster-1683 ~]# cat /etc/ctdb/public_addresses
192.168.0.254 ens224
192.168.0.253 ens224
192.168.0.252 ens224

Dependencies not install on Debian

The dependencies e.g. ctdb, nfs-ganesha-gluster, glusterfs are not automatically install on Debian.

Storhaug unable to send dbus commands to ganesha when using default service file

Error given is below. Ganesha's default service file is registering via commend in a service file, and not through a conf file.

Error org.freedesktop.DBus.Error.ServiceUnknown: The name org.ganesha.nfsd was not provided by any .service files
WARNING: Command failed on 192.168.1.123: dbus-send --print-reply --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr org.ganesha.nfsd.exportmgr.AddExport string:/run/gluster/shared_storage/nfs-ganesha/exports/export.gv0.conf string:EXPORT(Path=/gv0)

Is root the safest option to run storhaug?

Hey, this is more of a question rather than a bug report. I'm not a fan of running anything as root but the storhaug configuration takes it to a whole new level and require you to give root ssh access. Wouldn't it be better to use a dedicated user? I've setup my test lab with a user and it seems to work fine. Would this be something you guys are interested in as I don't mind forking and submitting a pull request if that would help out?

Enhance setup options to setup standealone server, pnfs cluster etc

The setup provide option to set up ganesha cluster using ctdb. Provide options in setup command which enables use cases "running ganesha without HA", "setup pnfs cluster" etc.

Suggested format : storhaug setup [HA|without HA | pNFS] .

Or above scenarios can be handled as different command altogether

adding option for subdir export

Currently volume can be export via following command storhuag export

Enhance current option take subdir as option as well.

So option will look like
storhaug export [subdir path]
It should be optional, by default it will take root as paremeter.

For the ease of management the it is better to store subdir configuration as different file than in same export conf file. So the way of storing export conf also need to be changed.

Following is one of the suggestion exports of each volume should belong to new directory, say if we have volume vol we need to export three subdirs subdir1, subdir2,subdir3 and "/" as well. The directory hierarchy will be
/exports/vol
|---- root.conf
|---- export.subdir1.conf
|-----export.subdir2.conf
|-----export.subdir3.conf

Debian targets use "smbd" and "nmbd" instead of "smb" and "nmb" for service names

When attempting to use storhaug on Debian there are potential problems with the naming of the services because Debian targets use "smbd" and "nmbd" instead of "smb" and "nmb"

needs a man page

prevent multiple definitions of same export

Logrotate for ganesha-gfapi.log

@kalebskeithley Do you think storhaug should create the logrotate config for ganesha-gfapi.log when invoking storhaug setup ? And remove it when teardown ?

CTDB MoveIP

(Be Kind - Totally new to this whole thing)

My understanding - lets say I want to take a node down, and the VIP resides on that node.. I would do a moveip to another node for a better transition.

Except I always get the following:

[root@gluster01 data]# ctdb ip
Public IPs on node 0
192.168.2.100 1
[root@gluster01 data]# ctdb moveip 192.168.2.100 0
Control TAKEOVER_IP failed, ret=1
Failed to takeover IP on node 0
[root@gluster01 data]# ctdb ip
Public IPs on node 0
192.168.2.100 0
[root@gluster01 data]#

Even with the failed error - it looks like the command moved the IP over.

please update the wiki for the following /etc/ctdb/ctdb.conf content

CTDB_RECOVERY_LOCK=/run/gluster/shared_storage/.ctdb/reclock

List of nodes in the cluster. Default is below.

CTDB_NODES=/etc/ctdb/nodes

List of public addresses for providing NAS services. No default.

CTDB_PUBLIC_ADDRESSES=/etc/ctdb/public_addresses

What services should CTDB manage? Default is none.

CTDB_MANAGES_SAMBA=yes

CTDB_MANAGES_WINBIND=yes

CTDB_MANAGES_NFS=yes

CTDB_NFS_CALLOUT=/etc/ctdb/nfs-ganesha-callout
CTDB_NFS_STATE_FS_TYPE=glusterfs
CTDB_NFS_STATE_MNT=/run/gluster/shared_storage
CTDB_NFS_SKIP_SHARE_CHECK=yes
NFS_HOSTNAME=localhost

MY_IPs uses hostname -I instead of hostname -i

as the title says, pulling the list of IP addresses uses -i instead of -I. the command options under inetutils also supports -I has been removed.

Project missing README file

APT sources not found

Hi,
At this place : https://download.gluster.org/pub/gluster/storhaug/1.0/Debian/ the source path is wrong. It should be

https://download.gluster.org/pub/gluster/storhaug/1.0/Debian/stretch/amd64/apt (without LATEST)

[question] Expected downtime during failover

Is there any expected downtime during failover?

If a client is mounting NFS from a public IP address and the machine hosting that address suddenly becomes unavailable then that IP is moved to another available server, after that the new clients are able to mount NFS from the same IP, but old clients (that already mounted the NFS) resume working only after some kind of timeout. In our experience that timeout is about 30 seconds for NFSv3 and about 90 seconds for NFSv4.

Are this values expected or there may be any problem in our configuration? Is there any way to lower this timeouts?

different IP?

we have 3 nodes glusterfs and each node has 2 IP 172.16.16.x (management) and 192.168.0.x(CTDB,NFS,GLUSTERFS)

When I used the storhaug setup, it would automatically create entry based on my management IPs

[root@gluster-1683 nfs-ganesha]# pwd
/run/gluster/shared_storage/nfs-ganesha

[root@gluster-1683 nfs-ganesha]# ls -al
total 29
drwxr-xr-x 7 root root 4096 Aug 5 15:53 .
drwxr-xr-x 4 root root 4096 Aug 5 15:39 ..
drwxr-xr-x 4 root root 4096 Aug 5 15:39 172.16.16.130
drwxr-xr-x 4 root root 4096 Aug 5 15:44 172.16.16.131
drwxr-xr-x 4 root root 4096 Aug 5 15:44 172.16.16.132
drwxr-xr-x 2 root root 4096 Aug 5 15:47 exports
-rw-r--r-- 1 root root 75 Aug 5 15:52 ganesha.conf
drwxr-xr-x 2 root root 4096 Aug 5 15:59 .noderefs
[root@gluster-1683 nfs-ganesha]#

[root@gluster-1683 nfs-ganesha]# cd 172.16.16.130/
[root@gluster-1683 172.16.16.130]# ls -al
total 16
drwxr-xr-x 4 root root 4096 Aug 5 15:39 .
drwxr-xr-x 7 root root 4096 Aug 5 15:53 ..
drwxr-xr-x 4 root root 4096 Aug 5 15:46 ganesha
drwxr-xr-x 4 root root 4096 Aug 5 15:46 statd
-rw-r--r-- 1 root root 0 Aug 5 15:47 state

[root@gluster-1683 172.16.16.130]# cd ganesha/
[root@gluster-1683 ganesha]# ls -al
total 16
drwxr-xr-x 4 root root 4096 Aug 5 15:46 .
drwxr-xr-x 4 root root 4096 Aug 5 15:39 ..
lrwxrwxrwx 1 root root 71 Aug 5 15:46 172.16.16.131 -> /run/gluster/shared_storage/nfs-ganesha/.noderefs/172.16.16.131/ganesha
lrwxrwxrwx 1 root root 71 Aug 5 15:46 172.16.16.132 -> /run/gluster/shared_storage/nfs-ganesha/.noderefs/172.16.16.132/ganesha
lrwxrwxrwx 1 root root 71 Aug 5 15:46 192.168.0.252 -> /run/gluster/shared_storage/nfs-ganesha/.noderefs/192.168.0.252/ganesha
lrwxrwxrwx 1 root root 71 Aug 5 15:46 192.168.0.253 -> /run/gluster/shared_storage/nfs-ganesha/.noderefs/192.168.0.253/ganesha
lrwxrwxrwx 1 root root 71 Aug 5 15:46 192.168.0.254 -> /run/gluster/shared_storage/nfs-ganesha/.noderefs/192.168.0.254/ganesha
drwxr-xr-x 3 root root 4096 Aug 5 15:39 v4old
drwxr-xr-x 3 root root 4096 Aug 5 15:39 v4recov

anyone know why it created based on 172.16.16.x IPs? how do i remove it?

discussion about switch from pcs to ctdb

What was the reason to switch from pcs to ctdb? I guess when ctdb manages vips then pcs cannot be used productively.

error message not descriptive

After just running storhaug setup as per the wiki, i was given the followin output:
Setting up
ERROR: num_servers != num_addrs

I can take a guess as to what it means, but not sure which area it is referring to.

storhaug support for multiple IPs/interfaces, choose which one to use

version: storhaug-nfs-1.0-1.el7.noarch

Currently storhaug ctdb nfs-ganesha-callout script hardcodes the following:

eval hostaddrs=( $(hostname -I) ) hostaddr="${hostaddrs[0]}"

This only works if there's only one IP/interface on the server. When there are multiple IPs/interfaces, one might want to use other than the first IP for ctdb/clustering/nfs-ganesha.

ctdb itself properly supports multiple IPs/interfaces, it's only storhaug nfs-ganesha-callout script which doesn't.

Maybe make the [0] index configurable with a parameter, so the user can specify which IP address to use?

CTDB_PUBLIC_ADDRESSES in ctdbd.conf

I had to add the following line to make VIPs working in ctdbd.conf
CTDB_PUBLIC_ADDRESSES=/etc/ctdb/public_addresses

Would you like to update the Wiki ?

clean shutdown of ganesha can lead to reclaim race windows

Suppose we have two parallel ganesha heads. One ganesha is then shut down, by initiating a shutdown via dbus or by sending it a SIGTERM. Eventually, ctdb notices this and migrates the IP addr and puts the other nodes into grace. This is racy.

In either case, ganesha will tear down all of the state held during shutdown. That includes any file locks held. By the time that we notice that a ganesha has gone down, any state held by that ganesha will have already been released. This opens a window where clients of other nodes in the cluster can race in and grab that state.

Question: why the requirement num_servers == num_addrs?

When running storhaug setup it's checked that the number of server in the pool is equal to the number of public addresses available.

Why this limitation? What would be the problems of having less (for example just 1) or more IPs than server?

I can think of this use cases:

just 1 public IP: just the one used by clients to connect to the cluster without any form of round robin
in a 3 nodes cluster 6 IPs: in this way ctdb assures that the IPs are equally split in case of failure (3 nodes -> 2 IPs each, 2 nodes -> 3 IPs each, 1 node -> 6 IPs on it)
scaling the number of nodes without changing the number of public IPs

ctdb service can't start

ctdb: ctdb-4.7.1-9.el7_5.x86_64
os: CentOS Linux release 7.5.1804 (Core)
kernel : Linux nfs-ganesha-01 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
node: three nodes

I have with the cephfs + ctdb + nfs-ganesha to appear to HA nfs cluster , but when i configure ctdbd.conf ,and disabled selinux , then i start ctdb service ,

but the ctdb service can't start
err logs:

2018/10/17 12:41:38.789592 ctdbd[53761]: Unable to set scheduler to SCHED_FIFO (Operation not permitted)
2018/10/17 22:15:25.018211 ctdbd[257441]: Starting CTDBD (Version 4.7.1) as PID: 257441
2018/10/17 22:15:25.018396 ctdbd[257441]: Unable to set scheduler to SCHED_FIFO (Operation not permitted)

and , i have try echo 10000 > /sys/fs/cgroup/cpu/system.slice/cpu.rt_runtime_us

then start ctdb , and appear the same problem.

i hope get the reason and solved method.

look forward to your reply .

thanks

/etc/ctdb/nfs-ganesha-callout wrong path for storhaug

installing storhaug via yum the path is /usr/sbin/storhaug

but in the /etc/ctdb/nfs-ganesha-callout script it is pointing to /usr/bin/storhaug