Code Monkey home page Code Monkey logo

masakari's People

Contributors

aspiers avatar hirose31 avatar muroi avatar natsumetakashi avatar sampathp avatar takedakn avatar tkentaro avatar toshikazu0314 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

masakari's Issues

issue in recovery instance on another compute node

Hi masakari team.
Thank you for this project and your support.
we have issue on operation for this project.
some information:
openstack version: newton
environment:
1 controller node
2 compute nodes

we clone this project, compile and successfully run masakari-controller on controller node and run masakari-instancemonitor on compute nodes.
some failed test scenarios list below:

  1. when we power-off an instance on compute1 , masakari-controller ignore all notifications from instancemonitor and do not try to recover and turning on this instance.
  2. when we manually destroy an instance on compute1 and this instance goes to error state, masakari-controller do not recover this instance on compute2 and just update vm_list table of vm_ha database(in mysql database).

additionally, we install masakari-hostmonitor on controller, when we destroy compute1, masakari-hostmonitor log which compute1 is offline but masakari-controller do not infrom from this event and finally none of instances recover on other compute node.

thank you for your notice.
Vahid

sqlalchemy_utils is not installed as a requirement

While creating database like:
openstack@openstack-VirtualBox:~/masakari/masakari_controller/db$ sudo ./create_database.sh /home/openstack/masakari/masakari_controller/db /home/openstack/masakari Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/home/openstack/masakari/masakari_controller/db/create_tables.py", line 22, in <module> from sqlalchemy_utils.functions import database_exists, create_database ImportError: No module named sqlalchemy_utils.functions
we get import error for sqlalchemy_utils module. It should be mentioned in the masakari-controller/requirements.txt and should be installed as a dependency

Package installation fails if user 'openstack' does not exists

This is quite obvious and may not be considered a bug as such. But I would like to point out here that requiring a specific 'user name' to be present for package installation to be successful doesn't seem like a good idea.

IMO following two approaches look better:

  1. Create the required user a part of the installation process. This is done by many softwares. For e.g. postgresql, mysql etc.
  2. Let the software be runnable/usable by any user who has required privilages/permissions.

In our case 2nd approach seems more appropriate. For e.g. I am running OpenStack as a user named 'ubuntu'. So naturally I should be able to run masakari as the user 'ubuntu'.

EDIT: I think what we want here is to have the same user running openstack and masakari. Hence while installing masakari we should be able to 'configure' the user. As a result of this someone running openstack as 'abc' user will be able to configure/install/use masakari via the user 'abc'.

can't run masakari-hostmonitor

hello I try to run the masakari-hostmonitor on the centos7 and there are some errors.I read the shell file :
image
I don't know how to configure the cib .

No logs being generated in case of a masakari service failure

While setting up and running Masakari for the first time. Instance monitor failed to start without generating any log files. Running python files for the particular service directly from the code throws out errors on console. There should be a way where such errors are reported directly in instancemonitor.log file even if the service is failing to start.

service masakari-instancemonitor start
Starting masakari-instancemonitor: *

service masakari-instancemonitor status
* masakari-instancemonitor is not running

python /opt/masakari/instancemonitor/masakari_instancemonitor.py

Traceback (most recent call last):
  File "/opt/masakari/instancemonitor/masakari_instancemonitor.py", line 29, in <module>
    import libvirt_eventfilter as evf
  File "/opt/masakari/instancemonitor/libvirt_eventfilter.py", line 29, in <module>
    from libvirt_callback import *
  File "/opt/masakari/instancemonitor/libvirt_callback.py", line 223, in <module>
    from httplib2 import Http
ImportError: No module named httplib2

pip install httplib2
service masakari-instancemonitor start

Result: SUCCESS

* masakari-instancemonitor is running

maskari-controller is not a python package

The character "-" in the name of a pyhon package is invalid. we get import error while creating database:

openstack@openstack_3:~/masakari/masakari-controller/db$ ./create_database.sh
/home/openstack/masakari/masakari-controller/db
/home/openstack/masakari
/usr/bin/python: No module named masakari-controller.db

Also masakari-controller does not contain init.py so as to be recognized as a package.

IMO directory structure should be like:

masakari-controller
`---controller
      `---__init__.py and other py files
masakari-instancemonitor
`---instance_monitor
     `---__init__.py and other py files

and so on.

Enable hostmonitor to check cluster status with Pacemaker-Remote

hostmonitor hits corosync's scaling limits since it relays on hosts listed in "Online" of crm_mon command's output to check whether each compute host works well or now. Pacemaker-remote which doesn't relay on corosync's connection has been released since pacemaker-1.1.11. If hostmonitor also checks each host status with Pacemaker-remote, it'll be released from the corosync's limits.

Bux Fixing

(1) Prevent re-process VMHA after recovery controller restarted.
(2) Implement the DB lock to avoid the race conditions
(3) Fix the Nova API response handling, since API response is different in Juno and Kilo

how to install masakari

Hi, I have read your document, I would like to use Masakari and I'm having trouble finding a step by step or other documentation to get started with. Which part should be installed on controller, which is should be on compute, and what is the prerequisite to install masakari, I have installed corosync and pacemaker, what else do I need to do ?

Use novaclient Instead of curl Command

The controller should use python client since the clients can follow Nova API changes and configure every Keystone domain and project model. Currently, the controller uses curl commands to call OpenStack API. That is not generalized for Keystone project (tenant) model and Nova API version users of Masakari would deploy.

incorrect masakari work on centos9

Hello Masakari team! Can you help me? I installed Masakari on centos9, the openstack zen version, when I run the "openstack segment list" command, I get a 'segment' when I should get an empty deadline, I previously deployed Masakari on Ubuntu 22, the openstack yoga version and it worked, tell me what the problem is, how to fix it?

scrin_problem

Different content in proc.list in CentOS and Ubuntu

In CentOS and ubuntu, process path of the instance_monitor is different.
rpm package puts the script as /usr/bin/masakari-instancemonitor and 'ps' output give something similar.
we have to make things consistent between rpm packaging and deb packaging.

Python Installers

Add python installers, setup.py, to easy to deploy on any OS. deb package is only supported in some Linux distribution, Debian, Ubuntu and so on. Python works in multi platform that enables users to deploy this easily.

False Notifications

I've just recently installed Masakari into our OpenStack environment and have started to notice false notifications from masakari-hostmonitor. When masakari-controller receives the false notification it disables nova-compute on the node and attempts a migration. The migration eventually fails with ..

Compute service of node-13.local is still in use

This leaves me with two problems:

  • nova-compute is disabled on the node so instances are no longer scheduled there
  • the reserve host is removed from the database and therefore no longer available in the event that a node actually goes down

I know that corosync/pacemaker are not reporting the node as down because fencing never takes place. There is no sign of any attempt to fence the node and no failed resources are shown when running crm status.

Here's the masakari-controller log after receiving a false notification:

2016-11-04 07:49:54.436 30958 INFO controller.masakari_util [Thread:notification_list(a0752992-61f9-447e-b9bf-5d4099d09be9)] Do update_notification_list_dict.
2016-11-04 07:49:54.438 30958 INFO controller.masakari_util [Thread:notification_list(a0752992-61f9-447e-b9bf-5d4099d09be9)] Succeeded in update_notification_list_dict.
2016-11-04 07:49:54.729 30958 INFO controller.masakari_worker [Thread:vm_list(52)] Do get_vm_list_by_id.
2016-11-04 07:49:54.732 30958 INFO controller.masakari_worker [Thread:vm_list(52)] Succeeded in get_vm_list_by_id. Return_value = (0L, 'node-12.local')
2016-11-04 07:49:54.732 30958 INFO controller.masakari_util [Thread:vm_list(52)] Call Evacuate API with c332baca-ca06-43d1-afb2-543b2d483feb to node-12.local
2016-11-04 07:49:54.739 30958 INFO controller.masakari_worker [Thread:vm_list(55)] Do get_vm_list_by_id.
2016-11-04 07:49:54.743 30958 INFO controller.masakari_worker [Thread:vm_list(55)] Succeeded in get_vm_list_by_id. Return_value = (0L, 'node-12.local')
2016-11-04 07:49:54.743 30958 INFO controller.masakari_util [Thread:vm_list(55)] Call Evacuate API with 18d95395-3a83-41ab-a5f2-2c49ef59a4d5 to node-12.local
2016-11-04 07:49:54.759 30958 INFO controller.masakari_worker [Thread:vm_list(46)] Do get_vm_list_by_id.
2016-11-04 07:49:54.761 30958 INFO controller.masakari_worker [Thread:vm_list(46)] Succeeded in get_vm_list_by_id. Return_value = (0L, 'node-12.local')
2016-11-04 07:49:54.761 30958 INFO controller.masakari_util [Thread:vm_list(46)] Call Evacuate API with 89419892-1ead-4150-8dc4-2906e86cc1d9 to node-12.local
2016-11-04 07:49:54.782 30958 INFO controller.masakari_worker [Thread:vm_list(58)] Do get_vm_list_by_id.
2016-11-04 07:49:54.783 30958 INFO controller.masakari_worker [Thread:vm_list(58)] Succeeded in get_vm_list_by_id. Return_value = (0L, 'node-12.local')
2016-11-04 07:49:54.783 30958 INFO controller.masakari_util [Thread:vm_list(58)] Call Evacuate API with 143345a1-7d68-4f1b-8ebb-c21b1b6df046 to node-12.local
2016-11-04 07:49:54.804 30958 ERROR controller.masakari_util [Thread:vm_list(52)] Fails to call Instance Evacuate API onto node-12.local: Compute service of no
de-13.local is still in use. (HTTP 400) (Request-ID: req-1e41c332-c3f1-4798-b3db-881dab855f23)
2016-11-04 07:49:54.804 30958 ERROR controller.masakari_worker [Thread:vm_list(52)] <class 'novaclient.exceptions.BadRequest'>
2016-11-04 07:49:54.804 30958 ERROR controller.masakari_worker [Thread:vm_list(52)] Compute service of node-13.local is still in use. (HTTP 400) (Request-ID: r
eq-1e41c332-c3f1-4798-b3db-881dab855f23)
2016-11-04 07:49:54.804 30958 ERROR controller.masakari_worker [Thread:vm_list(52)] File "/opt/masakari/controller/masakari_worker.py", line 209, in _do_node_accident_vm_recovery
self.rc_util_api.do_instance_evacuate(uuid, evacuate_node)
2016-11-04 07:49:54.805 30958 ERROR controller.masakari_worker [Thread:vm_list(52)] File "/opt/masakari/controller/masakari_util.py", line 744, in do_instance_evacuate
on_shared_storage=True)
2016-11-04 07:49:54.805 30958 ERROR controller.masakari_worker [Thread:vm_list(52)] File "/opt/masakari/masakari_ve/local/lib/python2.7/site-packages/novaclient/api_versions.py", line 402, in substitution
return methods[-1].func(obj, *args, **kwargs)
2016-11-04 07:49:54.805 30958 ERROR controller.masakari_worker [Thread:vm_list(52)] File "/opt/masakari/masakari_ve/local/lib/python2.7/site-packages/novaclient/v2/servers.py", line 1744, in evacuate
body)
2016-11-04 07:49:54.805 30958 ERROR controller.masakari_worker [Thread:vm_list(52)] File "/opt/masakari/masakari_ve/local/lib/python2.7/site-packages/novaclient/v2/servers.py", line 1856, in _action_return_resp_and_body
return self.api.client.post(url, body=body)
2016-11-04 07:49:54.805 30958 ERROR controller.masakari_worker [Thread:vm_list(52)] File "/opt/masakari/masakari_ve/local/lib/python2.7/site-packages/keystoneauth1/adapter.py", line 222, in post
return self.request(url, 'POST', **kwargs)
2016-11-04 07:49:54.806 30958 ERROR controller.masakari_worker [Thread:vm_list(52)] File "/opt/masakari/masakari_ve/local/lib/python2.7/site-packages/novaclient/client.py", line 117, in request
raise exceptions.from_response(resp, body, url, method)
2016-11-04 07:49:54.806 30958 INFO controller.masakari_util [Thread:vm_list(52)] Do update_vm_list_by_id_dict.
2016-11-04 07:49:54.810 30958 INFO controller.masakari_util [Thread:vm_list(52)] Succeeded in update_vm_list_by_id_dict.
2016-11-04 07:49:54.810 30958 INFO controller.masakari_worker [Thread:vm_list(52)] Recovery process has been terminated abnormally. <

Replace MySQLdb with SQLalchemy

Current Masakari uses MySQLdb to access its DB. In some environment or distribution, they use not only MySQL but PostgresQL or other db. To enable masakari to work with other DB series, we need to replace MySQLdb with SQLalchemy.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.