Code Monkey home page Code Monkey logo

airflow-scheduler-failover-controller's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

airflow-scheduler-failover-controller's Issues

it dont work for apache-airfow==1.9.0 and airflow-scheduler-failover-controller==1.0.5

我发现它只会保障当前节点的scheduler在被kill掉后起来。 是这样, 我先在两台上都起来了scheduler, 再起来scheduler_failover_controller, 它kill掉后边一台的scheduler, 为了保障只有1个scheduler负责调度这是正常的。 但是我再测试 scheduler_failover_controller, kill掉正在运行的scheduler, 正常来讲它要启动另一台上的scheduler, 但是它却重启了本机被kill的scheduler

两台机器上进程:
slave2 进程
[root@slave2 airflow]# ps -aux|grep scheduler
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
root 20677 0.0 0.0 103324 852 pts/1 S+ 13:47 0:00 grep scheduler
root 32506 0.1 0.6 365468 47164 pts/2 S 11:42 0:08 /usr/local/bin/python /usr/local/bin/scheduler_failover_controller start
scheduler 节点:
[root@scheduler airflow]# ps -aux|grep scheduler
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
root 2362 0.0 1.1 721204 43376 ? Sl Dec04 1:16 /usr/local/bin/python /usr/local/bin/flower -b pyamqp://airflow:[email protected]:5672/airflow --address=scheduler --port=5555
root 19335 0.0 0.0 103328 856 pts/0 S+ 14:14 0:00 grep scheduler
root 21835 0.0 1.2 365372 48880 pts/1 S 11:42 0:03 /usr/local/bin/python /usr/local/bin/scheduler_failover_controller start
root 26588 1.4 1.6 402848 63220 ? S 12:06 1:50 /usr/local/bin/python /usr/local/bin/airflow scheduler

杀死scheduler进程
[root@scheduler airflow]# kill -9 26588
[root@scheduler airflow]# ps -aux|grep scheduler
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
root 2362 0.0 1.1 721204 43376 ? Sl Dec04 1:16 /usr/local/bin/python /usr/local/bin/flower -b pyamqp://airflow:[email protected]:5672/airflow --address=scheduler --port=5555
root 19714 0.0 0.0 103328 856 pts/0 S+ 14:16 0:00 grep scheduler
root 21835 0.0 1.2 365372 48880 pts/1 S 11:42 0:03 /usr/local/bin/python /usr/local/bin/scheduler_failover_controller start

很快它自己重启了本机的scheduler进程
[root@scheduler airflow]# ps -aux|grep scheduler
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
root 2362 0.0 1.1 721204 43376 ? Sl Dec04 1:16 /usr/local/bin/python /usr/local/bin/flower -b pyamqp://airflow:[email protected]:5672/airflow --address=scheduler --port=5555
root 19737 9.2 1.5 401316 61660 ? S 14:16 0:00 /usr/local/bin/python /usr/local/bin/airflow scheduler
root 19768 0.0 0.0 103328 852 pts/0 S+ 14:16 0:00 grep scheduler
root 21835 0.0 1.2 365372 48880 pts/1 S 11:42 0:03 /usr/local/bin/python /usr/local/bin/scheduler_failover_controller start

然而它并没有实现Ha
[root@slave2 airflow]# ps -aux|grep scheduler
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
root 25731 0.0 0.0 103324 856 pts/1 S+ 14:18 0:00 grep scheduler
root 32506 0.1 0.6 365468 47164 pts/2 S 11:42 0:10 /usr/local/bin/python /usr/local/bin/scheduler_failover_controller start
因为slave2上并没有启动scheduler

Airflow2.3.2 is incompatible

When "scheduler_failover_controller start " is executed in terminal, the following exception is thrown:
Traceback (most recent call last):
File "/home/bigai/miniconda2/envs/airflow_env/bin/scheduler_failover_controller", line 7, in
exec(compile(f.read(), file, 'exec'))
File "/home/bigai/airflow_ha/airflow-scheduler-failover-controller-master/scheduler_failover_controller/bin/scheduler_failover_controller", line 7, in
args.func(args)
File "/home/bigai/airflow_ha/airflow-scheduler-failover-controller-master/scheduler_failover_controller/bin/cli.py", line 92, in start
scheduler_nodes_in_cluster, poll_frequency, metadata_service, emailer, failover_controller = get_all_scheduler_failover_controller_objects()
File "/home/bigai/airflow_ha/airflow-scheduler-failover-controller-master/scheduler_failover_controller/bin/cli.py", line 26, in get_all_scheduler_failover_controller_objects
metadata_service = build_metadata_service(configuration, logger)
File "/home/bigai/airflow_ha/airflow-scheduler-failover-controller-master/scheduler_failover_controller/app.py", line 12, in build_metadata_service
logger
File "/home/bigai/airflow_ha/airflow-scheduler-failover-controller-master/scheduler_failover_controller/metadata/sql_metadata_service.py", line 28, in init
self.engine = create_engine(sql_alchemy_conn, **engine_args)
File "", line 2, in create_engine
File "/home/bigai/miniconda2/envs/airflow_env/lib/python3.7/site-packages/sqlalchemy/util/deprecations.py", line 298, in warned
return fn(*args, **kwargs)
File "/home/bigai/miniconda2/envs/airflow_env/lib/python3.7/site-packages/sqlalchemy/engine/create.py", line 520, in create_engine
u, plugins, kwargs = u._instantiate_plugins(kwargs)
AttributeError: 'NoneType' object has no attribute '_instantiate_plugins'

Here's my solution: vi airflow-scheduler-failover-controller-master/scheduler_failover_controller/configuration.py
modify this code:
120 def get_sql_alchemy_conn(self):
121 # Airflow 2.3版本SQL_ALCHEMY_CONN配置在database片段中
122 return self.get_config("database", "SQL_ALCHEMY_CONN")
123 #return self.get_config("core", "SQL_ALCHEMY_CONN")

Because the airflow.cfg format changes!
Can you retrofit this piece of code?

Convert to python3

Hi, I've converted it to python3; it worked like a charm. Basically, I just used a conversion tool: 2to3 and fixed a type conversion bug.
I was thinking how would you like to maintain both python2 and python3 versions so that I will send a pull request if you want. Thanks.

about metadata_service_type = SQLMetadataService Startup error

scheduler_failover_controller start
/opt/anaconda3/envs/airflow_all/bin/scheduler_failover_controller:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
import('pkg_resources').require('scheduler-failover-controller==1.0.8')
Traceback (most recent call last):
File "/opt/anaconda3/envs/airflow_all/bin/scheduler_failover_controller", line 7, in
exec(compile(f.read(), file, 'exec'))
File "/data/APP/airflow_all/airflow-scheduler-failover-controller-master/scheduler_failover_controller/bin/scheduler_failover_controller", line 7, in
args.func(args)
File "/data/APP/airflow_all/airflow-scheduler-failover-controller-master/scheduler_failover_controller/bin/cli.py", line 92, in start
scheduler_nodes_in_cluster, poll_frequency, metadata_service, emailer, failover_controller = get_all_scheduler_failover_controller_objects()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/APP/airflow_all/airflow-scheduler-failover-controller-master/scheduler_failover_controller/bin/cli.py", line 26, in get_all_scheduler_failover_controller_objects
metadata_service = build_metadata_service(configuration, logger)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/APP/airflow_all/airflow-scheduler-failover-controller-master/scheduler_failover_controller/app.py", line 10, in build_metadata_service
return SQLMetadataService(
^^^^^^^^^^^^^^^^^^^
File "/data/APP/airflow_all/airflow-scheduler-failover-controller-master/scheduler_failover_controller/metadata/sql_metadata_service.py", line 27, in init
self.engine = create_engine(sql_alchemy_conn, **engine_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 2, in create_engine
File "/opt/anaconda3/envs/airflow_all/lib/python3.11/site-packages/sqlalchemy/util/deprecations.py", line 375, in warned
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/airflow_all/lib/python3.11/site-packages/sqlalchemy/engine/create.py", line 516, in create_engine
u, plugins, kwargs = u._instantiate_plugins(kwargs)
^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute '_instantiate_plugins'

My
airflow celery worker, airflow scheduler, airflow webserver
You can start the application normally
scheduler_failover_controller init No problem
scheduler_failover_controller test_connection No problem
only
scheduler_failover_controller star Error reporting

I may not be very familiar with Airflow, do you all have the same question

Can you provide some information on the configuration of the Zookeeper?

I found the configuration of metadata_service_zookeeper_nodes in the configuration file, but did not find any relevant help instructions. Can you explain how this configuration can be used and work?
Also, I have a few questions.

  1. is the metadata_service_zookeeper_nodes configuration must required?
  2. If metadata_service_zookeeper_nodes is set, do I still need to configure scheduler_nodes_in_cluster?
  3. use systemd to manage the airflow schedule process, Whether theRestart=alwaysand theRestartSec=60configuration will affect the failover use scheduler_failover?
    Thanks for all the plugins you contributed about airflow!

meet AttributeError: 'NoneType' object has no attribute 'replace' when start the asfc

hi there,

I just met a issue when I use the systemd way to start the asfc, logs as following:

Mar 30 12:24:49 ip-xxx scheduler_failover_controller: File "/usr/local/bin/scheduler_failover_controller", line 7, in
Mar 30 12:24:49 ip-xxx scheduler_failover_controller: args.func(args)
Mar 30 12:24:49 ip-xxx scheduler_failover_controller: File "/usr/local/lib/python3.7/site-packages/scheduler_failover_controller/bin/cli.py", line 92, in start
Mar 30 12:24:49 ip-xxx scheduler_failover_controller: scheduler_nodes_in_cluster, poll_frequency, metadata_service, emailer, failover_controller = get_all_scheduler_failover_controller_objects()
Mar 30 12:24:49 ip-xxx scheduler_failover_controller: File "/usr/local/lib/python3.7/site-packages/scheduler_failover_controller/bin/cli.py", line 33, in get_all_scheduler_failover_controller_objects
Mar 30 12:24:49 ip-xxx scheduler_failover_controller: logger=logger
Mar 30 12:24:49 ip-xxx scheduler_failover_controller: File "/usr/local/lib/python3.7/site-packages/scheduler_failover_controller/failover/failover_controller.py", line 21, in init
Mar 30 12:24:49 ip-xxx scheduler_failover_controller: self.airflow_scheduler_stop_command = configuration.get_airflow_scheduler_stop_command()
Mar 30 12:24:49 ip-xxx scheduler_failover_controller: File "/usr/local/lib/python3.7/site-packages/scheduler_failover_controller/configuration.py", line 142, in get_airflow_scheduler_stop_command
Mar 30 12:24:49 ip-xxx scheduler_failover_controller: return self.get_scheduler_failover_config("AIRFLOW_SCHEDULER_STOP_COMMAND").replace("\;", ";")
Mar 30 12:24:49 ip-xxx scheduler_failover_controller: AttributeError: 'NoneType' object has no attribute 'replace'
Mar 30 12:24:49 ip-xxx systemd: scheduler_failover_controller.service: main process exited, code=exited, status=1/FAILURE

and i checked the cfg file as following, it seems ok.
#Command to use when trying to stop a Scheduler instance on a node
#airflow_scheduler_stop_command = for pid in ps -ef | grep "airflow scheduler" | awk '{print $2}' ; do kill -9 $pid ; done
airflow_scheduler_stop_command = sudo systemctl stop airflow-scheduler

so could you help have a look at this problem ? thanks in advance.

Failover Doesn't Occur if Scheduler Host is Dead

While testing the failover feature, I noticed it is only successful if the scheduler host is alive but the scheduler daemon is dead. This is quite strange as test_connection confirms that the host is dead but scheduler_failover_controller says the scheduler daemon is running.

Scheduler_failover_controller.log

[2017-09-08 00:48:06,709] {failover_controller.py:143} - INFO - Starting to Check if Scheduler on host 'airflow_sub' is running...
[2017-09-08 00:48:06,721] {failover_controller.py:166} - INFO - Finished Checking if Scheduler on host 'airflow_sub' is running. is_running: True

Test_connection

root@airflow:/# scheduler_failover_controller test_connection
[2017-09-08 00:48:59,737] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python2.7/lib2to3/Grammar.txt
[2017-09-08 00:48:59,756] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
[2017-09-08 00:49:00,074] {__init__.py:36} INFO - Using executor CeleryExecutor
Testing Connection for host 'airflow'
(True, ['Connection Succeeded', ''])
Testing Connection for host 'airflow_sub'
(True, ['ssh: Could not resolve hostname airflow_sub: Name or service not known\r\n'])

Also, the I think the scheduler_failover_contollver should monitor other hosts if the failover_controller is running on other host.

Need to Run on kubernetes

Hi,

I need to run airflow-scheduler-failover-controller on my kubernetes cluster as a POD.
But there is no image found on docker hub.
Can you please suggest

Add tests

At the moment it's just a lot of code with no tests.

Software without tests is alpha. Alpha is not meant to be used in production.

Thus, adding tests (and Travis integration) would be very handy.

a bytes-like object is required, not 'str' 1.0.4

I tried use this in airflow-1.10 with 4 machine.After I install the zip packages in one machine, I use scheduler_failover_controller init,then I change the airflow.cfg scheduler_nodes_in_cluster = slave2,slave3,and start it in the installed machine,then it is wrong with a bytes-like object is required, not 'str'.

Will it work with apache-airflow 1.10.0

I initially adopted this library with airflow 1.7. It worked as expected but we are upgrading airflow to 1.10 version . Not able to install the package as it is asking for 1.7 airflow dependency. Do you plan to upgrade to support apache 1.10 or are there any workarounds? What would be the cleanest way to use it with apache-airflow 1.10?

Multiple Email Recievers

Hi,

First of all, thank you so much for this program.
I was wondering if it has the capability to notify multiple emails upon failure?

Thanks

Is there any way to keep airflow scheduler HA

After the version airflow2.x.x ,airflow can start more then one scheduler at the same time.Is there any way to start up all those node and keep half of them working rather then check all the nodes and keep one node running every time

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.