Comments (33)
I'm hesitant with letting jobs just fail silently. It seems it's more of an operational concern for the queue to decide what to do with a failed job. They could retry it only X times, then discard it, or whatever. But there would be a paper trail. Or at least there's a way for a paper trail to be created.
I don't think AJ should take it upon itself to deal with this. At least that's currently my position.
from activejob.
We could call #report_failure
so we can hook an action to it. By default, it will do nothing, but sometime it useful to know why a job failed and write a log entry.
from activejob.
I'm hesitant with letting jobs just fail silently.
Yeah, I definitely agree it's a bit scary. Which is why I thought I'd create discussion about it first, instead of just sending a PR.
It seems it's more of an operational concern for the queue to decide what to do with a failed job. They could retry it only X times, then discard it, or whatever. But there would be a paper trail. Or at least there's a way for a paper trail to be created.
Right, but in this case we know beforehand that retrying is pointless as the model will never "revive". Things will just work(TM) with the current failing behaviour but it would make for useless load on the workers and an "failed jobs" count in your dashboard that can no longer be trusted.
from activejob.
I think this is creeping into the operational responsibility of the queues themselves. Where they store things and how they deal with failure is imo a queue concern. Not a job concern. AJ should preferably stay just a thin wrapper to provide a uniform interface.
On May 19, 2014, at 4:03 PM, Abdelkader Boudih [email protected] wrote:
We could call #report_failure so we can hook an action to it. By default, it will do nothing, but sometime it useful to know why a job failed and write a log entry.
—
Reply to this email directly or view it on GitHub.
from activejob.
We know that it’s pointless to retry, but we don’t know what needs to be done. Maybe it actually is a critical error that needs to be investigated. The failure is cheap in any case, so creating the paper trail on the queue is imo the preferred option.
I do appreciate the discussion, though. Nothing is set in stone. It’s great to have a debate on where to draw the boundaries around what AJ should do and what the queues should do.
On May 19, 2014, at 4:06 PM, Douwe Maan [email protected] wrote:
I'm hesitant with letting jobs just fail silently.
Yeah, I definitely agree it's a bit scary. Which is why I thought I'd create discussion about it first, instead of just sending a PR.
It seems it's more of an operational concern for the queue to decide what to do with a failed job. They could retry it only X times, then discard it, or whatever. But there would be a paper trail. Or at least there's a way for a paper trail to be created.
Right, but in this case we know beforehand that retrying is pointless as the model will never "revive". Things will just work(TM) with the current failing behaviour but it would make for useless load on the workers and an "failed jobs" count in your dashboard that can no longer be trusted.
—
Reply to this email directly or view it on GitHub.
from activejob.
Well, a difference is that in this case it's AJ calling Model.find
and raising the error, with the user (the job implementor) having no say over how to handle the fact that the model no longer exists. When you write the lookup yourself, as with all other potentially error-raising actions in your worker, you can rescue from errors however you see fit. In the case of AJ, you don't get a chance to decide to handle it "cleanly" or simply let the error go through.
Failure is cheap, but having a skyrocketing "failed jobs" count is annoying, and sometimes you might want to handle the error in a special way instead of just retrying endlessly and never getting a chance to have your code resolve, rescue from or cleanup after the error.
Job error handling is definitely a responsibility of the underlying queue, as you say in your response to @seuros, but only after the user has been given a chance to handle the error themselves and has decided to let it go through. AJ positions itself between the user and the queue, but doesn't allow the user say over errors it causes, while these errors could be seen as errors from "inside the job" that should be put past the user first.
The best option I've heard so far is @seuros's suggestion of adding an extra endpoint to allow the user to override handling of the model-lookup error, but it seems to bit silly to add such functionality "just" for the automatic (de)serialization of model objects.
from activejob.
How about this:
class MyJob < ActiveJob::Base
rescue_from ActiveRecord::NotFound do |e|
Rails.logger.error "We lost the account before MyJob executed"
throw :finished
end
def perform(account)
# work
end
end
The rescue_from block would catch exceptions from both the serialization and #perform methods. It could throw a variety of things depending on what you want to do. throw :finished
makes sense if you want to just swallow the error, and then delete the job, without running it. throw :continue
could be used to swallow the error but still carry on with the rest of #perform.
Thoughts?
from activejob.
Oooh, I like that. Even outside of this model location special case where begin/rescue/end
inside #perform
isn't an option, this is definitely cleaner and it's in line with AC, which is of course a good thing if this is going to be in Rails.
How about just making throw :finished
the default instead of explicit, like in AC's rescue_from
and regular rescue
and renaming throw :continue
to throw :retry
in line with Ruby's retry
keyword?
from activejob.
Whoops, looks like I mixed up your options for throw
. :finished
would be default, raising the error and passing it on to the queueing backend would be :retry
and continuing on with #perform
with model
as nil
would be :continue
. I'm not sure how throw :continue
would work with other errors than this specific serialization one, though.
from activejob.
Wow, looks like I mixed it up again. Either way, I propose swallowing and not running #perform
by default, :retry
to raise the error and let the backend handle it and :continue
to swallow and carry on with #perform
(as you proposed).
from activejob.
I considered adding this GlobalID auto-lookup feature for a while. Unfortunately there is a fatal flaw: the instance might be missing because the database transaction hasn't committed yet. You can't tell the difference and know what to do automatically. Treating it as an error is the only solution.
from activejob.
@mperham Hmm, that's a good point. If we're going to add rescue_from
we shouldn't recommend using it with NotFound
in transactional databases. It might still be nice for other errors (although it takes away from the clear flow of #perform
a bit), and it's still useful for Mongoid::DocumentNotFound
, for example, as it doesn't have transactions anyway. But yeah, transactions make this far less clear-cut than I thought.
from activejob.
Not sure I follow the point about transaction issue? Just don't enqueue a job before you're out of the transaction?
I think it still makes plenty of sense to catch NotFound for the common case of something being queued, but another job or something else has destroyed the record before the job is run.
Other examples include catching MySQL connection errors and the like, and retrying the job later.
I like throw :retry | :continue (basically swallow) | :finished (job is done).
from activejob.
@ddh Lots of people/existing gems use after_save :enqueue_stuff
, which could(/will always?) fire before the transaction is committed, so if the queue immediately performs, the .find
query will fail one or more times until the transaction is committed.
from activejob.
Sounds like we need to fix that problem in AR. Add a callback like :after_transaction
. Then people can switch to that if they want to create jobs as part of the callback cycle.
from activejob.
Sounds like a good idea.
Existing gems using after_save
rather than after_transaction
aren't using ActiveJob yet anyway, so they won't run into any trouble if they're just letting the error through currently. We'd just need to make it very clear that for future enqueueing using AJ, one would need to use after_transaction
rather than after_save
.
from activejob.
Sounds good. Want to make an AR PR for this? Then we can reference that in the docs here.
AJ will come out with Rails 4.2 so we can just make sure that the AR PR has been incorporated by then.
On May 20, 2014, at 6:21 PM, Douwe Maan [email protected] wrote:
Sounds like a good idea.
Existing gems using after_save rather than after_transaction aren't using ActiveJob yet anyway, so they won't run into any trouble if they're just letting the error through currently. We'd just need to make it very clear that for future enqueueing using AJ, one would need to use after_transaction rather than after_save.
—
Reply to this email directly or view it on GitHub.
from activejob.
We already have after_commit
for this.
from activejob.
Ha. Well, we’ve apparently thought of everything :D
Makes it even easier. Just need to document that this is the way to go, then.
On May 20, 2014, at 6:34 PM, Rafael Mendonça França [email protected] wrote:
We already have after_commit for this.
—
Reply to this email directly or view it on GitHub.
from activejob.
@rafaelfranca Do we now? In that case the transaction issue really isn't an issue and people should just be taught to use that.
from activejob.
@mperham Were there any other reasons why you never implemented this, or was it because of people using after_save
where they should after_commit
and backward compatibility?
from activejob.
We have since Rails 3.0 (but maybe even before)
from activejob.
Keep in mind sometimes you want the entire transaction to fail if you can't enqueue the job for some reason:
# checkout! uses an implicit transaction so if anything is raises, everything rolls back
cart.checkout! do
CartMailer.delay.send_thank_you(cart.id)
end
Obviously the database-based queues don't have this problem but it's a concern for Redis, beanstalkd, etc.
from activejob.
People will just have to pick their queues accordingly if that’s a concern. GlobalID is a monumental win for simplicity and clarity in other regards, so imo well worth that drawback.
On May 20, 2014, at 6:42 PM, Mike Perham [email protected] wrote:
Keep in mind sometimes you want the entire transaction to fail if you can't enqueue the job for some reason:
checkout! uses an implicit transaction so if anything is raises, everything rolls back
cart.checkout! do
CartMailer.delay.send_thank_you(cart.id)
end
Obviously the database-based queues don't have this problem but it's a concern for Redis, beanstalkd, etc.—
Reply to this email directly or view it on GitHub.
from activejob.
I think it's a valid concern, but note that people that require that can just use throw :retry
in their rescue_from ActiveRecord::NotFound
handler.
from activejob.
👍
On May 20, 2014, at 6:45 PM, Douwe Maan [email protected] wrote:
Also note that people that require that can just use throw :retry in their rescue_from ActiveRecord::NotFound handler.
—
Reply to this email directly or view it on GitHub.
from activejob.
So let's move on to some practical questions before this can be implemented:
-
What should
throw :retry
do with backends that don't automatically retry jobs that raise an error?I know Sidekiq does this, so we'd just say
raise exception
in ourrescue
block, but others (See Features on the wiki) apparently only have "global retries", whatever those are, and some don't support it at all so we'd need to explicitly re-queue the job? This could easily flood all of the workers. -
What should
throw :continue
do with errors occurring duringjob.new.perform
rather thanParameters.deserialize
?In the latter case we can just perform the job with params where the missing model(s) are nil, but if the error is raised during
#perform
, we obviously can't just go on with the method. Wouldthrow :continue
just do the same asthrow :finished
in this case, that is silently having the job fail? This would be very confusing, asthrow :continue
would only work forActiveRecord::NotFound
, and even then only when that error occurs during parameter deserialization, which the user can't "see". -
throw :finished
is straightforward in the sense that it does nothing at all. It simply lets therescue
block finish and doesn't re-raise the error, thus letting the queue regard the job as successful. Because this is the same behaviour as nothrow
at all, it could be left out if you asked me.
from activejob.
On retry, we could just handle that in-house. That is, 1) rescue, 2) swallow (with logging), 3) then enqueue another job with exactly the same parameters.
Not sure :continue actually makes sense. Let's just start with throw :finished and :retry for now.
from activejob.
Agreed with skipping :continue
. Anything special the user would want to do if model == nil
inside #perform
, they can do from the rescue handler.
As for in-house retrying, I suggest we implement exponential backoff like Sidekiq does when the adapter supports #enqueue_at
: https://github.com/mperham/sidekiq/wiki/Error-Handling#automatic-job-retry. For backends that don't, we'll have no choice but to re-queue immediately and pray we don't flood the workers.
You're sure you want an explicit throw :finished
? There's no explicit finished
to balance retry
in regular Ruby rescue
either.
from activejob.
Sounds good on the enqueue_at delay. I think the control flow is clearer when all rescue_from’s end with a throw.
On May 21, 2014, at 3:29 PM, Douwe Maan [email protected] wrote:
Agreed with skipping :continue. Anything special the user would want to do if model == nil inside #perform, they can do from the rescue handler.
As for in-house retrying, I suggest we implement exponential backoff like Sidekiq does when the adapter supports #enqueue_at: https://github.com/mperham/sidekiq/wiki/Error-Handling#automatic-job-retry. For backends that don't, we'll have no choice but to re-queue immediately and pray we don't flood the workers.
You're sure you want an explicit throw :finished? There's no explicit finished to balance retry in regular Ruby rescue either.
—
Reply to this email directly or view it on GitHub.
from activejob.
All right. throw :finished
would be the default like with ActionController rescue_from
, but I agree adding it explicitly makes the control flow clearer, especially with multiple rescue_from
blocks or with multiple throw
s in one block. Makes for easy early returning as well.
I'm up for implementing this and will probably make a start today.
from activejob.
Throw turned out to actually be more hassle than it was worth. I went with an explicit retry* implementation instead: ef4aff0
from activejob.
Alright, looks nice enough to me!
from activejob.
Related Issues (20)
- 4.2.0.rc2 NoMethodError Exception: undefined method `deliver_now' for #<Mail::Message:0xfbcfef8> HOT 1
- Can I stack resque gem with plugin resque-history and ActiveJob? HOT 1
- ActiveJob with multiple databases HOT 4
- adapter to Amazon Simple Workflow HOT 1
- Convert resque-statsd to activejob-statsd HOT 9
- Add a proper usage guide HOT 3
- [Feature request] Cancel jobs HOT 40
- Set the job_id attribute on ActiveJob to the value from the adapter HOT 1
- Add deep serialization HOT 1
- amqp, SQS, stomp, other messaging/queuing support? HOT 5
- Support multiple adapters, map adapters to queues HOT 25
- Add new inline-like built-in adapter to process it in a thread HOT 28
- MyJob.enqueue Syntax HOT 6
- Provide more solid default adapter HOT 40
- Are multiple queues even possible as written? HOT 3
- Girl_friday adapter? HOT 4
- Should we add a delayed mixin so users can delay any method? HOT 32
- delayed job adapter doesn't work properly with float timestamp
- resque job serialization fails under Rails
- Setting resque queue adapter fails HOT 12
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from activejob.