Code Monkey home page Code Monkey logo

Comments (8)

ankon avatar ankon commented on June 2, 2024 2

Quoting myself a bit:

I think passing a canceled context is definitely wrong.
I did find this interesting discussion: golang/go#54775. I'm almost leaning towards "don't pass a context to cleanup at all", or maybe: New context w/o any timeout unless it actually has a meaning in the operation.

I don't think there's any good timeout that certmagic would be able to define, and there's also no good sane way to ask for one in the configuration ("unlock-timeout-when-using-a-weird-storage"?), so it's probably the storage side that needs to know what makes sense.

The referenced discussion reads like "just stop and abandon properly, the context is now canceled and there's nothing wrong with that" -- and in that case I as storage should actually pass a new context to my functions that do funky I/O. Or in other words: If this would be not a function but inlined, then I would still have to call my I/O-y function, and I would have to pass a new context latest at that point.

(Disclaimer: Way out of my depth here, too, let's see what the audience joker gives :D)

What do you think, would that work for your deployment?

In our specific case a 10s timeout should be plenty to basically do a DynamoDB DeleteItem call, so it would work.

from certmagic.

ankon avatar ankon commented on June 2, 2024 1

Or maybe we SHOULD make a new context that isn't canceled, but has a timeout, so that unlock can still finish, but the time is set.

I think I have seen something like this when reading other golang code, but I cannot remember where and point to that pattern. In a way it sounds reasonable, and I don't think that I-as-storage should be allowed to take ages to cleanup.

But then again, I couldn't come up with a good timeout here -- 1s? 10s? 10% of the available timeout?


Incidentally, before I forget this yet again:

ctx := context.TODO() // TODO: get a proper context? from somewhere...
the answer to the TODO is probably ctx := clientHello.Context(). Can file PR if that makes sense, but it has this risk that there's another place where this now would break because we missed something somewhere.

... and that might answer your rhetorical question: Because actually GetCertificate passes a TODO context by default.

from certmagic.

mholt avatar mholt commented on June 2, 2024 1

As for the context used for locking...

I feel like the most correct thing to do is to pass a fresh new context with our own timeout. Of course, the obvious question is, what should the duration be?

I don't know of course... but maybe like 10s?

I made a Twitter poll just for fun: https://twitter.com/mholt6/status/1689331972114690048 -- but admittedly it's a niche/obscure question.

What do you think, would that work for your deployment?

from certmagic.

mholt avatar mholt commented on June 2, 2024 1

Ok, actually, I think your logic makes sense.

I'll change our code to pass in a fresh context that is not canceled and has no timeout. If we encounter problems with hanging unlocks, we can enforce a timeout. (Or the implementation just needs to be fixed; either way.)

from certmagic.

ankon avatar ankon commented on June 2, 2024

I found the underlying cause of the canceled context by now: The background renewal of a not-yet-expired certificate is using the same context as was given to GetCertificateWithContext, which in our case is actually the handshake context coming from https://pkg.go.dev/crypto/tls#ClientHelloInfo.Context.

See also #248: The background renewal should use a different context, IMHO it is wrong to assume that it can use the one given to GetCertificate, which is supposed to be called inside the TLS handshake and as such does have a limited time to do things.

from certmagic.

mholt avatar mholt commented on June 2, 2024

Is there anything we can do better here? For instance, would it be reasonable to pass another context into releaseLock/Unlock that is not canceled? Or should we "ignore" the cancelation on our side?

Good question! Hmm.

Part of me thinks, ignore the context because the point is to cleanup after the context is cancelled. The other part of me thinks, if the context is cancelled, it's important to abort to avoid a goroutine leak (or something).

Or maybe we SHOULD make a new context that isn't canceled, but has a timeout, so that unlock can still finish, but the time is set. Is that reasonable? Or do you think the storage implementation should have control of the timeout and set its own when it encounters an already-canceled context?

I found the underlying cause of the canceled context by now: The background renewal of a not-yet-expired certificate is using the same context as was given to GetCertificateWithContext, which in our case is actually the handshake context coming from https://pkg.go.dev/crypto/tls#ClientHelloInfo.Context.

See also #248: The background renewal should use a different context, IMHO it is wrong to assume that it can use the one given to GetCertificate, which is supposed to be called inside the TLS handshake and as such does have a limited time to do things.

I think I agree. Hadn't really considered this, but a background routine indeed should not be confined to a single handshake.

But why haven't we encountered this problem before? 🤔 (rhetorical question, and mostly for me)

from certmagic.

francislavoie avatar francislavoie commented on June 2, 2024

Go 1.21 adds context.WithoutCancel() that would be useful here. But we can't bump the minimum version for certmagic quite yet (not before Caddy does, and we usually wait till the next release so there's 2 versions supported at any given time)

from certmagic.

mholt avatar mholt commented on June 2, 2024

@ankon

the answer to the TODO is probably ctx := clientHello.Context(). Can file PR if that makes sense, but it has this risk that there's another place where this now would break because we missed something somewhere.

I somehow missed that Go 1.17 added that.

Scanning the code, I think that is a correct solution: the context should be limited to the handshake. If the client aborts the handshake, the server should abort whatever it's doing. UNLESS we start a background maintenance goroutine, in which case we should probably use a different context (done in #248).

from certmagic.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.