Code Monkey home page Code Monkey logo

Comments (33)

dhh avatar dhh commented on July 28, 2024

I'm hesitant with letting jobs just fail silently. It seems it's more of an operational concern for the queue to decide what to do with a failed job. They could retry it only X times, then discard it, or whatever. But there would be a paper trail. Or at least there's a way for a paper trail to be created.

I don't think AJ should take it upon itself to deal with this. At least that's currently my position.

from activejob.

seuros avatar seuros commented on July 28, 2024

We could call #report_failure so we can hook an action to it. By default, it will do nothing, but sometime it useful to know why a job failed and write a log entry.

from activejob.

DouweM avatar DouweM commented on July 28, 2024

@dhh

I'm hesitant with letting jobs just fail silently.

Yeah, I definitely agree it's a bit scary. Which is why I thought I'd create discussion about it first, instead of just sending a PR.

It seems it's more of an operational concern for the queue to decide what to do with a failed job. They could retry it only X times, then discard it, or whatever. But there would be a paper trail. Or at least there's a way for a paper trail to be created.

Right, but in this case we know beforehand that retrying is pointless as the model will never "revive". Things will just work(TM) with the current failing behaviour but it would make for useless load on the workers and an "failed jobs" count in your dashboard that can no longer be trusted.

from activejob.

dhh avatar dhh commented on July 28, 2024

I think this is creeping into the operational responsibility of the queues themselves. Where they store things and how they deal with failure is imo a queue concern. Not a job concern. AJ should preferably stay just a thin wrapper to provide a uniform interface.

On May 19, 2014, at 4:03 PM, Abdelkader Boudih [email protected] wrote:

We could call #report_failure so we can hook an action to it. By default, it will do nothing, but sometime it useful to know why a job failed and write a log entry.


Reply to this email directly or view it on GitHub.

from activejob.

dhh avatar dhh commented on July 28, 2024

We know that it’s pointless to retry, but we don’t know what needs to be done. Maybe it actually is a critical error that needs to be investigated. The failure is cheap in any case, so creating the paper trail on the queue is imo the preferred option.

I do appreciate the discussion, though. Nothing is set in stone. It’s great to have a debate on where to draw the boundaries around what AJ should do and what the queues should do.

On May 19, 2014, at 4:06 PM, Douwe Maan [email protected] wrote:

@dhh

I'm hesitant with letting jobs just fail silently.

Yeah, I definitely agree it's a bit scary. Which is why I thought I'd create discussion about it first, instead of just sending a PR.

It seems it's more of an operational concern for the queue to decide what to do with a failed job. They could retry it only X times, then discard it, or whatever. But there would be a paper trail. Or at least there's a way for a paper trail to be created.

Right, but in this case we know beforehand that retrying is pointless as the model will never "revive". Things will just work(TM) with the current failing behaviour but it would make for useless load on the workers and an "failed jobs" count in your dashboard that can no longer be trusted.


Reply to this email directly or view it on GitHub.

from activejob.

DouweM avatar DouweM commented on July 28, 2024

@dhh

Well, a difference is that in this case it's AJ calling Model.find and raising the error, with the user (the job implementor) having no say over how to handle the fact that the model no longer exists. When you write the lookup yourself, as with all other potentially error-raising actions in your worker, you can rescue from errors however you see fit. In the case of AJ, you don't get a chance to decide to handle it "cleanly" or simply let the error go through.

Failure is cheap, but having a skyrocketing "failed jobs" count is annoying, and sometimes you might want to handle the error in a special way instead of just retrying endlessly and never getting a chance to have your code resolve, rescue from or cleanup after the error.

Job error handling is definitely a responsibility of the underlying queue, as you say in your response to @seuros, but only after the user has been given a chance to handle the error themselves and has decided to let it go through. AJ positions itself between the user and the queue, but doesn't allow the user say over errors it causes, while these errors could be seen as errors from "inside the job" that should be put past the user first.

The best option I've heard so far is @seuros's suggestion of adding an extra endpoint to allow the user to override handling of the model-lookup error, but it seems to bit silly to add such functionality "just" for the automatic (de)serialization of model objects.

from activejob.

dhh avatar dhh commented on July 28, 2024

How about this:

class MyJob < ActiveJob::Base
  rescue_from ActiveRecord::NotFound do |e|
    Rails.logger.error "We lost the account before MyJob executed"
    throw :finished
  end

  def perform(account)
    # work
  end
end

The rescue_from block would catch exceptions from both the serialization and #perform methods. It could throw a variety of things depending on what you want to do. throw :finished makes sense if you want to just swallow the error, and then delete the job, without running it. throw :continue could be used to swallow the error but still carry on with the rest of #perform.

Thoughts?

from activejob.

DouweM avatar DouweM commented on July 28, 2024

Oooh, I like that. Even outside of this model location special case where begin/rescue/end inside #perform isn't an option, this is definitely cleaner and it's in line with AC, which is of course a good thing if this is going to be in Rails.

How about just making throw :finished the default instead of explicit, like in AC's rescue_from and regular rescue and renaming throw :continue to throw :retry in line with Ruby's retry keyword?

from activejob.

DouweM avatar DouweM commented on July 28, 2024

Whoops, looks like I mixed up your options for throw. :finished would be default, raising the error and passing it on to the queueing backend would be :retry and continuing on with #perform with model as nil would be :continue. I'm not sure how throw :continue would work with other errors than this specific serialization one, though.

from activejob.

DouweM avatar DouweM commented on July 28, 2024

Wow, looks like I mixed it up again. Either way, I propose swallowing and not running #perform by default, :retry to raise the error and let the backend handle it and :continue to swallow and carry on with #perform (as you proposed).

from activejob.

mperham avatar mperham commented on July 28, 2024

I considered adding this GlobalID auto-lookup feature for a while. Unfortunately there is a fatal flaw: the instance might be missing because the database transaction hasn't committed yet. You can't tell the difference and know what to do automatically. Treating it as an error is the only solution.

from activejob.

DouweM avatar DouweM commented on July 28, 2024

@mperham Hmm, that's a good point. If we're going to add rescue_from we shouldn't recommend using it with NotFound in transactional databases. It might still be nice for other errors (although it takes away from the clear flow of #perform a bit), and it's still useful for Mongoid::DocumentNotFound, for example, as it doesn't have transactions anyway. But yeah, transactions make this far less clear-cut than I thought.

from activejob.

dhh avatar dhh commented on July 28, 2024

Not sure I follow the point about transaction issue? Just don't enqueue a job before you're out of the transaction?

I think it still makes plenty of sense to catch NotFound for the common case of something being queued, but another job or something else has destroyed the record before the job is run.

Other examples include catching MySQL connection errors and the like, and retrying the job later.

I like throw :retry | :continue (basically swallow) | :finished (job is done).

from activejob.

DouweM avatar DouweM commented on July 28, 2024

@ddh Lots of people/existing gems use after_save :enqueue_stuff, which could(/will always?) fire before the transaction is committed, so if the queue immediately performs, the .find query will fail one or more times until the transaction is committed.

from activejob.

dhh avatar dhh commented on July 28, 2024

Sounds like we need to fix that problem in AR. Add a callback like :after_transaction. Then people can switch to that if they want to create jobs as part of the callback cycle.

from activejob.

DouweM avatar DouweM commented on July 28, 2024

Sounds like a good idea.

Existing gems using after_save rather than after_transaction aren't using ActiveJob yet anyway, so they won't run into any trouble if they're just letting the error through currently. We'd just need to make it very clear that for future enqueueing using AJ, one would need to use after_transaction rather than after_save.

from activejob.

dhh avatar dhh commented on July 28, 2024

Sounds good. Want to make an AR PR for this? Then we can reference that in the docs here.

AJ will come out with Rails 4.2 so we can just make sure that the AR PR has been incorporated by then.

On May 20, 2014, at 6:21 PM, Douwe Maan [email protected] wrote:

Sounds like a good idea.

Existing gems using after_save rather than after_transaction aren't using ActiveJob yet anyway, so they won't run into any trouble if they're just letting the error through currently. We'd just need to make it very clear that for future enqueueing using AJ, one would need to use after_transaction rather than after_save.


Reply to this email directly or view it on GitHub.

from activejob.

rafaelfranca avatar rafaelfranca commented on July 28, 2024

We already have after_commit for this.

from activejob.

dhh avatar dhh commented on July 28, 2024

Ha. Well, we’ve apparently thought of everything :D

Makes it even easier. Just need to document that this is the way to go, then.

On May 20, 2014, at 6:34 PM, Rafael Mendonça França [email protected] wrote:

We already have after_commit for this.


Reply to this email directly or view it on GitHub.

from activejob.

DouweM avatar DouweM commented on July 28, 2024

@rafaelfranca Do we now? In that case the transaction issue really isn't an issue and people should just be taught to use that.

from activejob.

DouweM avatar DouweM commented on July 28, 2024

@mperham Were there any other reasons why you never implemented this, or was it because of people using after_save where they should after_commit and backward compatibility?

from activejob.

rafaelfranca avatar rafaelfranca commented on July 28, 2024

We have since Rails 3.0 (but maybe even before)

from activejob.

mperham avatar mperham commented on July 28, 2024

Keep in mind sometimes you want the entire transaction to fail if you can't enqueue the job for some reason:

# checkout! uses an implicit transaction so if anything is raises, everything rolls back
cart.checkout! do
  CartMailer.delay.send_thank_you(cart.id)
end

Obviously the database-based queues don't have this problem but it's a concern for Redis, beanstalkd, etc.

from activejob.

dhh avatar dhh commented on July 28, 2024

People will just have to pick their queues accordingly if that’s a concern. GlobalID is a monumental win for simplicity and clarity in other regards, so imo well worth that drawback.

On May 20, 2014, at 6:42 PM, Mike Perham [email protected] wrote:

Keep in mind sometimes you want the entire transaction to fail if you can't enqueue the job for some reason:

checkout! uses an implicit transaction so if anything is raises, everything rolls back

cart.checkout! do
CartMailer.delay.send_thank_you(cart.id)
end
Obviously the database-based queues don't have this problem but it's a concern for Redis, beanstalkd, etc.


Reply to this email directly or view it on GitHub.

from activejob.

DouweM avatar DouweM commented on July 28, 2024

I think it's a valid concern, but note that people that require that can just use throw :retry in their rescue_from ActiveRecord::NotFound handler.

from activejob.

dhh avatar dhh commented on July 28, 2024

👍

On May 20, 2014, at 6:45 PM, Douwe Maan [email protected] wrote:

Also note that people that require that can just use throw :retry in their rescue_from ActiveRecord::NotFound handler.


Reply to this email directly or view it on GitHub.

from activejob.

DouweM avatar DouweM commented on July 28, 2024

So let's move on to some practical questions before this can be implemented:

  • What should throw :retry do with backends that don't automatically retry jobs that raise an error?

    I know Sidekiq does this, so we'd just say raise exception in our rescue block, but others (See Features on the wiki) apparently only have "global retries", whatever those are, and some don't support it at all so we'd need to explicitly re-queue the job? This could easily flood all of the workers.

  • What should throw :continue do with errors occurring during job.new.perform rather than Parameters.deserialize?

    In the latter case we can just perform the job with params where the missing model(s) are nil, but if the error is raised during #perform, we obviously can't just go on with the method. Would throw :continue just do the same as throw :finished in this case, that is silently having the job fail? This would be very confusing, as throw :continue would only work for ActiveRecord::NotFound, and even then only when that error occurs during parameter deserialization, which the user can't "see".

  • throw :finished is straightforward in the sense that it does nothing at all. It simply lets the rescue block finish and doesn't re-raise the error, thus letting the queue regard the job as successful. Because this is the same behaviour as no throw at all, it could be left out if you asked me.

from activejob.

dhh avatar dhh commented on July 28, 2024

On retry, we could just handle that in-house. That is, 1) rescue, 2) swallow (with logging), 3) then enqueue another job with exactly the same parameters.

Not sure :continue actually makes sense. Let's just start with throw :finished and :retry for now.

from activejob.

DouweM avatar DouweM commented on July 28, 2024

Agreed with skipping :continue. Anything special the user would want to do if model == nil inside #perform, they can do from the rescue handler.

As for in-house retrying, I suggest we implement exponential backoff like Sidekiq does when the adapter supports #enqueue_at: https://github.com/mperham/sidekiq/wiki/Error-Handling#automatic-job-retry. For backends that don't, we'll have no choice but to re-queue immediately and pray we don't flood the workers.

You're sure you want an explicit throw :finished? There's no explicit finished to balance retry in regular Ruby rescue either.

from activejob.

dhh avatar dhh commented on July 28, 2024

Sounds good on the enqueue_at delay. I think the control flow is clearer when all rescue_from’s end with a throw.

On May 21, 2014, at 3:29 PM, Douwe Maan [email protected] wrote:

Agreed with skipping :continue. Anything special the user would want to do if model == nil inside #perform, they can do from the rescue handler.

As for in-house retrying, I suggest we implement exponential backoff like Sidekiq does when the adapter supports #enqueue_at: https://github.com/mperham/sidekiq/wiki/Error-Handling#automatic-job-retry. For backends that don't, we'll have no choice but to re-queue immediately and pray we don't flood the workers.

You're sure you want an explicit throw :finished? There's no explicit finished to balance retry in regular Ruby rescue either.


Reply to this email directly or view it on GitHub.

from activejob.

DouweM avatar DouweM commented on July 28, 2024

All right. throw :finished would be the default like with ActionController rescue_from, but I agree adding it explicitly makes the control flow clearer, especially with multiple rescue_from blocks or with multiple throws in one block. Makes for easy early returning as well.

I'm up for implementing this and will probably make a start today.

from activejob.

dhh avatar dhh commented on July 28, 2024

Throw turned out to actually be more hassle than it was worth. I went with an explicit retry* implementation instead: ef4aff0

from activejob.

DouweM avatar DouweM commented on July 28, 2024

Alright, looks nice enough to me!

from activejob.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.