Comments (4)
Hello,
Considering the code, I believe you meant: https://github.com/dalibo/PAF/blob/v1.0.0/script/pgsqlms#L940
Considering the OCF developer guide[1] about the stop action, you are right.
But we have to study this case in detail. At least, if we don't catch this failing status during the stop precess, it MUST be catched somewhere else at some point. Depending on your Pacemaker setup, the CRM might try to recover your failing master (demote->stop->start->promote transition), or maybe this failing instance will be started back as a slave later. This can leads to unpredictable situations.
If I remember correctly, we decided to raise a failure to put this consideration away for this first version and make sure this resource in an inconsistent state will be kicked out of the cluster until a human fix this up. Maybe we are over-protective?
Note that if you set up fencing, the failing node is supposed to be fenced (after failcount recover) and a slave is supposed to be promoted.
Unless I miss something really important here, I prefer to focus on documentation and v1.1 for now, but definitely we keep this open until we can give it some more time and tests.
But feel free to experiment and post feedback, contributions and new contribution are very welcome, obviously! I'll give time to follow up with this issue on updates.
[1] http://www.linux-ha.org/doc/dev-guides/_literal_stop_literal_action.html
from paf.
Hello,
I just pushed a commit that might help if the CRM tries to recover the resource on the same node. See 8eb2bab.
We still have to investigate for other situation where the CRM really wants to move the resource to another node and the resource is not able to start correctly anymore though.
Regards,
from paf.
Hi,
Considering this (and partly #9), I think using a watchdog (eg. soft dog which is available in the Linux kernel) might help you in this scenario. This is not a remote fencing, but at leaast the node can commit a suicide and the transition can keep going on.
However, I'm not sure yet if this is supported under Pacemaker/RHEL 6.
from paf.
Hello,
I believe the solution is to set up watchdog in your cluster so the node where your resource can not stop can fence itself.
I wrote some documentation about how to set up watchdog under CentOS 7. It shouldn't be hard to adapt to EL6 if needed. See: http://dalibo.github.io/PAF/CentOS-7-admin-cookbook.html#setting-up-a-watchdog
Regards,
from paf.
Related Issues (20)
- Postgres14 support.
- [PCS] postgres9.6 node in blocked state HOT 2
- Failover aborted due to error when trying to stop already stopped old primary HOT 3
- Maintenance status HOT 1
- problem during installation on almalinux 8.5 using postgres13 HOT 6
- pg_rewind automatically HOT 1
- Postgres start as slave in every severs HOT 2
- Auto Failover recovery HOT 4
- doc: add some metadata and manual doc about notify=1 HOT 1
- pgsqld monitor timed out on master when sync slave crashed HOT 3
- Check status of my resources HOT 4
- 2 node cluster, when master node is shutdown, promotion of pgsqld on slave is aborted HOT 21
- PAF without db user postgres
- PAF from Centos outdated for newer PCS/pacemaker - ? HOT 4
- PAF, Pacemaker, Postgres 15 and replication slots HOT 1
- PAF, PostgreSQL 15 and Debian 12 HOT 2
- Stale DB instance being promoted leads to data loss HOT 12
- Questions trying to im;le HOT 2
- When the node restarts, the pg_last_wal_replay_lsn() is used as the LSN location for election.
- How to configure PAF for two clusters from two different data centers HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from paf.