Comments (8)
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
from airflow.
I saw that in version 1.9.0, there is already a different type of exception for this case.
I believe that the following change could be made to fix this behavior:
+ except (AirflowFailException, AirflowSensorTimeout, AirflowTaskTerminated) as e:
- except (AirflowFailException, AirflowSensorTimeout) as e:
# If AirflowFailException is raised, task should not retry.
# If a sensor in reschedule mode reaches timeout, task should not retry.
self.handle_failure(e, test_mode, context, force_fail=True, session=session)
session.commit()
raise
+ except (AirflowTaskTimeout, AirflowException) as e:
- except (AirflowTaskTimeout, AirflowException, AirflowTaskTerminated) as e:
if not test_mode:
self.refresh_from_db(lock_for_update=True, session=session)
# for case when task is marked as success/failed externally
# or dagrun timed out and task is marked as skipped
# current behavior doesn't hit the callbacks
if self.state in State.finished:
self.clear_next_method_args()
session.merge(self)
session.commit()
return None
else:
self.handle_failure(e, test_mode, context, session=session)
session.commit()
raise
from airflow.
This behaviour is documented in the code
airflow/airflow/models/taskinstance.py
Lines 2594 to 2597 in 5f6f4a5
So it's not a bug
from airflow.
Yes, I saw that. So there is no intention to change this behavior?
To me this doesn't make sense, marking a task as failed should trigger the failure callback.
If that's the case, is there any workaround to do this? I'm trying to use the on_kill method to do this, does it make sense to you?
from airflow.
Yes, I saw that. So there is no intention to change this behavior? To me this doesn't make sense, marking a task as failed should trigger the failure callback. If that's the case, is there any workaround to do this? I'm trying to use the on_kill method to do this, does it make sense to you?
There are no plans to change the behaviour at the moment. Also, I don't know why you would want the callback to run when you intentionally failed a task
from airflow.
There are no plans to change the behaviour at the moment. Also, I don't know why you would want the callback to run when you intentionally failed a task
In my case, I use Airflow as an orchestrator to run Spark applications (spark-sql and spark-submit). When a task has failed or is marked as failed, in addition to notifying that the task has failed, I need to make sure that the corresponding step has been canceled in Spark. Most of the time, when a task actually fails, the cancel step operation is unnecessary, but this is not the case when a task is marked as failed.
I don't think it's such a rare scenario, plus similar issues have been opened before. I believe this behavior change would be very welcome in future Airflow releases.
from airflow.
Anyway, I managed to solve my problem overwriting the on_kill
method in my spark operator.
The only downside is that the on_kill
method doesn't have access to the task context. To solve this problem, I created variables on the execute
method with the context information I need.
def on_kill(self):
logging.info('starting on_kill')
cancel_step(self.project, self.dag_name, self.run_id, self.task_id)
def execute(self, context):
self.log.info("starting SparkToDataLake.execute")
self.dag_name = context.get('task_instance').dag_id
self.run_id = context.get('run_id')
self.task_id = context.get('task_instance').task_id + str(context.get('task_instance').map_index)
PS: The on_kill
method is also called on the AirflowTaskTimeout
exception.
from airflow.
I think it might be worth it if you can create a feature request and explain how necessary this issue is, the use cases because I see it, that it's not required because the user is performing a manual action on the UI, what's the need to run the callback when same callback can be a task instead.
closing since it's not a bug
from airflow.
Related Issues (20)
- Links to dagruns beyond last 25 runs don't work (take 3) HOT 1
- Starttls error when it's disabled in config HOT 2
- @hookimpl on_dag_run_running, on_dag_run_success, on_dag_run_failed do not find Connections and Variables HOT 1
- Params Fail to Evaluate In Dag Body HOT 6
- The Contributor's Quick Start isn't quick enough and needs some TLC HOT 5
- Airflow 2.8.1: awscli installation hindered by version conflicts in requirements.txt HOT 2
- Bad rendering of an inline code in the documentation HOT 1
- Add median duration markline to dag run duration page
- DAG list: filter FAILED/RUNNING HOT 3
- Output tasks doc in the UI HOT 1
- GenericTransfer - Warning: "Placeholder defined in Connection..." HOT 4
- Dynamically Mapped Tasks: DB performance issues HOT 14
- KPO - get_logs - logs missing HOT 4
- EksPodOperator error with deferrable=True HOT 1
- Difference between logs which stores in files and logs in UI HOT 5
- KubernetesPodOperator with multiple containers hangs if container other than base container is still running HOT 3
- Unable to open AirFlow webserver UI HOT 1
- data to long when using s3 to gcs HOT 1
- Audit task failed-deps for later investigation HOT 2
- Executor reports task instance (...) finished (failed) although the task says it's queued HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from airflow.