Comments (2)
Hi,
Please help me to understand the situation.
In your environment, among the multiple compute nodes, you have node-12.local, node-13.local( reserve host). Then, masakari-hostmonitors (one or many of them) in the cluster send the false notification saying that node-12.local is down, however node-12.local is not down and nova-compute is still running.
When masakari-controller receives node-failure notificaions, first it will disable the compute node (node-12.local) and try to evacuate VMs on failed node (node-12.local) to reserve-host (node-13.local). As shown in your log, masakari-controller worked as expected. However, nova refused to evacuate, because nova-compute state on failed-node (node-12.local) expected to be down, but it was up. As a result, recovery process terminated abnormally.
If above understanding is correct, next question is who and why send the false host down notification?
[1] Can you please find out the full db record of uuid=a0752992-61f9-447e-b9bf-5d4099d09be9?
It is in your [db host ip or hostname].vmha.notification_list table.
[2] Can you please check the masakari-hostmonitor.log in other nodes in the cluster for above notificaion?
Please let me know if you need more information for how to get those info from your environment.
In abnormal termination, current masakari does not return the reserve host back to its original state because it would be a problem if recovery failed after some VMs are successfully evacuated to reserve host. And, also it does not re-enable the failed node (node-12.local). In this case, operator has to check the situation and operator may re-enable the failed node (node-12.local) through nova API and operator may readd the reserve-host (node-13.local).
from masakari.
Hello,
Sorry for the late response. We haven't encountered this issue since the initial deployment so I'm not sure this is really relevant anymore but I wanted to provide you with the details you requested. Your assessment is mostly correct with the exception that node-12 is the reserved host and node-13 was the failed host.
Here's the DB entry:
*************************** 3. row ***************************
id: 7
create_at: 2016-11-04 07:45:54
update_at: 2016-11-04 07:49:54
delete_at: 2016-11-04 07:49:54
deleted: 0
notification_id: a0752992-61f9-447e-b9bf-5d4099d09be9
notification_type: rscGroup
notification_regionID: RegionOne
notification_hostname: node-13.local
notification_uuid:
notification_time: 2016-11-04 07:45:53
notification_eventID: 1
notification_eventType: 2
notification_detail: 2
notification_startTime: 2016-11-04 07:45:53
notification_endTime: NULL
notification_tzname: 'UTC', 'UTC'
notification_daylight: 0
notification_cluster_port: 226.94.1.1:5405
progress: 2
recover_by: 0
iscsi_ip: NULL
controle_ip: 172.17.1.20
recover_to: node-12.local
3 rows in set (0.00 sec)
Unfortunately I no longer have the logs from this time as they've been rotated.
If we experience this issue again I'll provide the logs.
from masakari.
Related Issues (18)
- Replace MySQLdb with SQLalchemy HOT 1
- Enable hostmonitor to check cluster status with Pacemaker-Remote HOT 1
- Refactor no developer freindly notification between controller and 3 monitors
- Re-organaize namespace for each Masakari process HOT 1
- Bux Fixing
- Use novaclient Instead of curl Command
- Python Installers
- Package installation fails if user 'openstack' does not exists HOT 1
- Different content in proc.list in CentOS and Ubuntu
- No logs being generated in case of a masakari service failure HOT 1
- sqlalchemy_utils is not installed as a requirement HOT 2
- maskari-controller is not a python package HOT 2
- what should I do if I want to use it in centos7+openstack kilo HOT 5
- can't run masakari-hostmonitor HOT 1
- issue in recovery instance on another compute node HOT 1
- how to install masakari HOT 3
- incorrect masakari work on centos9 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from masakari.