python-trio / trio Goto Github PK

View Code? Open in Web Editor NEW

5.9K 83.0 320.0 8.62 MB

Trio – a friendly Python library for async concurrency and I/O

Home Page: https://trio.readthedocs.io

License: Other

Python 99.35% Shell 0.62% Cython 0.04%

python async async-await networking io trio structured-concurrency

trio's Introduction

Trio – a friendly Python library for async concurrency and I/O

The Trio project aims to produce a production-quality, permissively licensed, async/await-native I/O library for Python. Like all async libraries, its main purpose is to help you write programs that do multiple things at the same time with parallelized I/O. A web spider that wants to fetch lots of pages in parallel, a web server that needs to juggle lots of downloads and websocket connections simultaneously, a process supervisor monitoring multiple subprocesses... that sort of thing. Compared to other libraries, Trio attempts to distinguish itself with an obsessive focus on usability and correctness. Concurrency is complicated; we try to make it easy to get things right.

Trio was built from the ground up to take advantage of the latest Python features, and draws inspiration from many sources, in particular Dave Beazley's Curio. The resulting design is radically simpler than older competitors like asyncio and Twisted, yet just as capable. Trio is the Python I/O library I always wanted; I find it makes building I/O-oriented programs easier, less error-prone, and just plain more fun. Perhaps you'll find the same.

This project is young and still somewhat experimental: the overall design is solid, and the existing features are fully tested and documented, but you may encounter missing functionality or rough edges. We do encourage you to use it, but you should read and subscribe to issue #1 to get a warning and a chance to give feedback about any compatibility-breaking changes.

Where to next?

I want to try it out! Awesome! We have a friendly tutorial to get you started; no prior experience with async coding is required.

Ugh, I don't want to read all that – show me some code! If you're impatient, then here's a simple concurrency example, an echo client, and an echo server.

How does Trio make programs easier to read and reason about than competing approaches? Trio is based on a new way of thinking that we call "structured concurrency". The best theoretical introduction is the article Notes on structured concurrency, or: Go statement considered harmful. Or, check out this talk at PyCon 2018 to see a demonstration of implementing the "Happy Eyeballs" algorithm in an older library versus Trio.

Cool, but will it work on my system? Probably! As long as you have some kind of Python 3.8-or-better (CPython or currently maintained versions of PyPy3 are both fine), and are using Linux, macOS, Windows, or FreeBSD, then Trio will work. Other environments might work too, but those are the ones we test on. And all of our dependencies are pure Python, except for CFFI on Windows, which has wheels available, so installation should be easy (no C compiler needed).

I tried it, but it's not working. Sorry to hear that! You can try asking for help in our chat room or forum, filing a bug, or posting a question on StackOverflow, and we'll do our best to help you out.

Trio is awesome, and I want to help make it more awesome! You're the best! There's tons of work to do – filling in missing functionality, building up an ecosystem of Trio-using libraries, usability testing (e.g., maybe try teaching yourself or a friend to use Trio and make a list of every error message you hit and place where you got confused?), improving the docs, ... check out our guide for contributors!

I don't have any immediate plans to use it, but I love geeking out about I/O library design! That's a little weird? But let's be honest, you'll fit in great around here. We have a whole sub-forum for discussing structured concurrency (developers of other systems welcome!). Or check out our discussion of design choices, reading list, and issues tagged design-discussion.

I want to make sure my company's lawyers won't get angry at me! No worries, Trio is permissively licensed under your choice of MIT or Apache 2. See LICENSE for details.

Code of conduct

Contributors are requested to follow our code of conduct in all project spaces.

trio's People

Contributors

Stargazers

Watchers

Forkers

brettcannon njsmith willingc trishnaguha ratanshreshtha bag-of-projects barleyj xyicheng zed jakirkham theelous3 srkunze carreau thebashar mpacer kyrias remleduff smurfix duncanbetts nmalaguti gras100 goodboy reaperhulk bluetech encukou gpfreitas drpoggi pquentin mjpieters touilleman albertogomcas dattatele hanaasagi sorcio razerm jab oremanj zmitchell ptaoussanis turpure cclauss rtindru nicoddemus miracle2k hjbyt zheaoli emfloyd2 zezha-msft ssanderson scottbelden hugoren s-hertel zac-hd mentaal quantaxis nonsleepr alexchamberlain thejohnfreeman gridl monobot 2easy dredamonsta1 mikeengland jmfrank63 dalejung belm0 davidbrochart ncollins johnjihong wgwz kirkmcdonald joernheissler tkerwin merrellb malongge jxub rnovatorov visense cburgdorf iseeyah asafuli niejn theendsofinvention vault-the dd-dent harshanarayana arshall smason abd-elrazek gruglife vbe0201 cih-y2k aratz-lasa giorgil yijxiang asmodehn djapy emuhedo georgekakouris rotu

trio's Issues

Windows: wait_{read,writ}able limited to 512 sockets

This is because we currently use select.select. The limit is entirely artificial -- the underlying fd_set object is just a (int length, array of SOCKET integers) structure which we could allocate at whatever size we wanted, if we wanted. Or we could use WSAPoll.

Add mechanism to fetch current deadline

Motivation: this can be useful for deciding not to initiate costly actions if we know we won't have time, or for passing a timeout along to with requests using protocols that support that like gRPC.

But this is a bit tricky.

It's easy to figure out the current deadline if we only consider cancel scopes local to the current task -- just iterate over the cancel stack and take the minimum deadline.

But often this is not the most informative thing. The effective deadline is actually the minimum of all of the current tasks deadlines, and all of the ancestor tasks' deadlines.

# This deadline effectively applies to the child task
with move_on_after(10):
    with open_nursery() as nursery:
        nursery.spawn(child)
        # This one doesn't
        with move_on_after(5):
            ...
async def child():
    print(current_deadline())

We can even write it in a more confusing way...

with move_on_after(10):
    with open_nursery() as nursery:
        with move_on_after(5):
            nursery.spawn(child)

Maybe this requires refactoring so that cancel scopes and nurseries go onto a common stack?

get setup.py etc. in order

have wait_socket_{read,writ}able functions everywhere, not just Windows

validates that the object is in fact a socket.socket (really, nothing else)
then socket.py can assume these things' existence

pytest plugin

Extract what we've got into a proper standalone plugin

(Or alternatively make the regular trio package include an entry point? But probably it is better to decouple them a bit, in case we ever have to make an emergency release due to pytest breaking something -> don't want to force people to upgrade trio itself for that.)

Synchronization primitives

ParkingLot is cool and has rigorous tests, but beyond that our story is currently a little flaky. We should have a solid set of core synchronization primitives, probably something like:

Lock (possibly re-entrant only?)
RWLock (maybe?)
Event
Semaphore (bounded versus unbounded?)
Condition
Queue (leaning towards: regular queue is always bounded and takes the queue size as a mandatory argument; and then there's UnboundedQueue with a different API, e.g. get_all instead of get, no join but only join_nowait, etc.) [Edit: this split is not implemented]

(There's currently draft sketches of some of these things in trio._sync, but mostly untested.)

Missing piece: worker process pools, run_in_worker_process

It'd be nice to have a clean way to dispatch CPU-bound work. Possibly based on multiprocessing.

Design: alternative scheduling models

Currently we use a simple round-robin scheduling strategy. It isn't technically FIFO, because we randomize the order of execution within each "batch" in an attempt to force people not to make too many assumptions about the scheduling strategy :-). But basically FIFO in its sophistication.

Can we do better?

There's a rich literature on fancy scheduling, e.g. the weighted-fair queuing used by the Linux kernel's CFS ("completely fair scheduler"). And there are a lot of challenges in building async applications that come down to scheduling issues (e.g., #14 and https://stackoverflow.com/questions/40601119). But AFAICT a lot of the existing scheduling literature assumes that you have a pre-emptive scheduler; there's very little out there on applying these ideas to cooperative scheduling systems. in some ways the network packet scheduling literature is more relevant to us, because packets come in different sizes and the scheduler doesn't get to change that. (OTOH, packet scheduling algorithms tend to assume you can look into the future and predict how long a packet will spend transmitting, whereas when we start a task step we don't know how long it will run before yielding.)

Does it matter though?

It's not entirely clear that "fairness" is actually the solution to the problems linked above! Fair task scheduling is in part about negotiating between somewhat adversarial users (e.g. different TCP connections fighting for their share of a link), which isn't as obvious a fit to the tasks in our system. Though OTOH those tasks may be doing work on behalf of different competing users. And scheduler algorithms only matter when a program is CPU-bound, which hopefully trio programs usually aren't. But OTOH even programs that are I/O-bound overall will become CPU-bound in bursts, and it's exactly when things are melting down under load that you'd most like to handle things gracefully. OTOOH the real solution is often going to be something like load shedding or otherwise rate-limiting incoming work; better scheduling isn't a silver bullet to fix load issues.

So, I really don't know whether this is actually a good/useful idea for trio or not. But it might be a place where we can make a substantial difference to trio programs' usability in a central, principled way, so that seems worth exploring!

Options

In principle, a WFQ scheduler is actually very simple to implement (see below). What's not so clear is whether this would actually help in practice. There are two obvious issues:

If some task hogs the CPU for 100 ms, then there is no scheduling policy that can possibly avoid this causing a 100 ms spike for everyone else. You can get amortized fairness in the long run by exiling that task from the CPU for a comparatively long time afterwards, but the end result is still going to be jittery and spiky. Conclusion: nothing we do at the scheduler level is going to have a great effect unless user code is written to yield with a good frequency. I guess the solution here is to make sure we have great tools for profiling user code and warning of CPU hogs. The instrumentation API is a good start.
In a cooperative scheduler, tasks run until they block and become non-runnable. In pure WFQ, the scheduler only cares about runnable tasks; a task that blocks effectively disappears, and when it becomes runnable again it's treat like a new task with zero history. This is okay in a pre-emptive scheduler where tasks don't have a chance to "overdraw" CPU time, but doesn't make much sense for us. For us the way you treat a task that wakes from sleep is the whole question.

Maybe what we really need is better knobs to let users set the priority of different tasks. (WFQ makes it easy to implement relative priorities, and relatively easy to implement hierarchical scheduling policies; it's also pretty straightforward to implement strictly tiered priorities like the Linux kernel's realtime priorities.) But "here's 100 knobs to tune, try tweaking some whenever your server performance degrades" is a pretty bad UX too. There's something to be said for the predictability of FIFO.

In general, this is a deep problem that will definitely require experience and data and visualizations of real apps on top of trio.

Prior art

Ironically, the Python GIL is essentially a cooperative scheduling system (though a weird one), and probably the one with the best public analysis! Interesting documents:

The classic Mac OS used cooperative scheduling heavily. This was known as the "thread manager". I'm very curious what their scheduling strategy looked like, but I don't know how well documented it is. The "Inside Macintosh: Thread Manager" book (PDFs are floating around on the web) might have more details. [Edit: there's now some notes on this on the reading list wiki page.]

Alternatively...

Alternatively, if we decide that we don't want a fancy scheduling system, we could go the other way, and actually guarantee deterministic FIFO scheduling, on the theory that determinism is generally a nice thing if you aren't losing anything else to get it.

remove abort callback support from ParkingLot

I originally put this in because I was hoping to make ParkingLot the only public interface to sleeping, even at the hazmat level (i.e., keeping yield_indefinitely as an implementation detail). But:

I ended up exposing yield_indefinitely anyway because it really is the natural way to implement things like sleep and run_in_worker_thread
The abort callback doesn't really make sense in ParkingLot's semantics. If a task needs a non-trivial abort, then it's because before parking it set up some work to eventually cause itself to unpark. But abort is task-scoped while unpark is lot-scoped, so how would this work exactly? How do we know that the work being aborted would have woken up this task as opposed to another one? It really only makes sense if you use ParkingLot in a degenerate way with only 1 parking task. Which you can do, and currently it would be possible to implement something like run_in_worker_thread this way. But it's a bit clumsy compared to just using yield_indefinitely directly, and less efficient to boot.

So: let's remove the abort callback from ParkingLot.

Is our current strategy for handling instrument errors optimal?

Right now, if an instrumentation callback raises an exception, we (a) ~~print~~ log (see #306) the traceback to stderr, (b) disable that instrument, (c) carry on.

This is the only place in trio that discards exceptions instead of handling or propagating them. Is this the right choice?

The motivation is that instrumentation gets called at all kinds of weird times where it's quite difficult to propagate an error, and it's not clear where to propagate to in any case, and while we in general prefer to crash early and noisily in general it's still probably true that no-one wants their instrumentation to take down their server (I think).

The downside are the obvious ones: it's easy to miss stuff dumped to stderr, if some tool is trying to automatically collect instrumentation then it could get wedged in unexpected ways if the instrument just disappears, etc.

So, I'm not entirely sure this is the best approach, and would be interested to hear what others think.

system_task_wrapper doesn't properly handle MultiError

system_task_wrapper has some code whose intention is to let some exceptions through (like Cancelled) and wrap others in TrioInternalError. But right now this logic totally fails to handle MultiErrors.

It's not super urgent, since this probably only comes up if someone makes a system task that hosts children and lets their exceptions out. But still, it should be fixed...

Relevant: #49

Windows: improve usability for low-level IOCP operations

Is wait_overlapped the best way to handle "call a function that takes an OVERLAPPED"?
Make wait_overlapped return the information in OVERLAPPED_ENTRY, like dwNumberOfBytesTransferred
Document everything

Haven't really been thought through, need a good look over

Design: task supervision and results

Ugh, this is a big open area.

I'm somewhat hesitant to build a specific supervision framework into trio right away, because it feels like an area that needs some exploration. OTOH I guess trio is unstable, so whatever? But what I'm most concerned about is getting the underlying APIs right; I'm pretty sure we want some lower level API to exist beyond the actual supervisor system.

I really don't like the current .join() design though, because it's way too easy to write task.join() and now you've just swallowed an exception.

Design: how to handle async generator gc callbacks

My inclination is to only register if it lets us print some useful message like "stop depending on the gc callbacks"? Not sure what this would look like exactly.

trio.ssl handling for Unicode (IDNA) domains is deeply broken

trio.ssl relies on the stdlib ssl module for hostname checking. Unfortunately, when it comes to non-ASCII domain names, it's totally broken. Which means that trio.ssl is also totally broken. In other ways, trio handles IDNA well (better than the stdlib). But this is hard to work around.

We don't have a lot of great options here. We could:

Substantially rewrite trio.ssl to use PyOpenSSL instead. But this requires adding a bunch of tricky code, and also would break our API. (We currently take a stdlib ssl.SSLContext object as input, and it's impossible to read the SNI callback and other critical attributes off of an ssl.SSLContext object, which means that it's impossible to correctly convert a stdlib SSLContext into a PyOpenSSL equivalent. See bpo-32359.)
Hope that upstream ssl gets better, or fix it ourselves. Not impossible, but maybe not a thing to hold our breath on either.
Hope that PEP 543 comes along and saves us.
....?

I'm not sure what to do, but at least now we have a bug to track the problem...

Original text:

IDNA support in trio.socket

Help I have no idea what I'm doing here.

Probably also touches on TLS support, #9.

Which clock should we use on Windows?

Right now we use time.monotonic everywhere. This is a bit problematic on Windows, where time.monotonic is GetTickCount64, which has ~15 ms resolution. The other option is QueryPerformanceCounter, which has much higher resolution, is also monotonic, and is exposed as time.perf_counter (checked with time.get_clock_info).

The reason for this appears to be that QueryPerformanceCounter has had a troubled past: https://www.python.org/dev/peps/pep-0418/#windows-queryperformancecounter

But all these issues are like "Virtualbox had a bug that was fixed 6 years ago" or "Windows XP is flaky" which is probably true but irrelevant since we don't support it – it's not clear that any of them apply anymore.

Advantages of using a higher-precision clock:

Right now if there's just one task running and it does a sleep for t seconds, the actual timeout passed to the underlying system call will be (t + current_time()) - current_time(), which can be pretty imprecise depending on whether the clock ticks over between the two calls. OTOH I don't know what the actual resolution of our sleeping syscall is (currently GetQueuedCompletionStatusEx), or whether anyone cares about millisecond accurate sleeps.
I've had two bugs already with tests that assumed that time always, like, passes. These were trivial (like replacing a < with a <=), but it's always annoying to have the tests pass locally then fail on appveyor.
If we implement a fancier scheduling system (#32) then we'll definitely need better than 15 ms precision to measure tasks running. (Though there's no reason that the clock we use for that has to match the clock we use for deadlines.)

Backwards-incompatible changes - SUBSCRIBE TO THIS THREAD if you use trio!

Stability is great for users! It lets them focus on solving their problem without worrying about their platform shifting under their feet. But stability is also bad for users! It means that anywhere an API is error-prone or hard to use or missing important functionality, they're stuck with it. Making the optimal trade-off here is tricky and context-dependent.

Trio is a very young project, that contains lots of new ideas, and that doesn't have much code built on top of it yet. So as we build things with it we'll probably discover places where the API is less awesome than desired, and for now we'll be relatively aggressive about fixing them. Hopefully we won't discover any real stinkers, but you never know. Then over time we'll gradually transition over to become more stable as we flush out the bad stuff and get more users.

This means that if you're an early adopter of Trio, it'd be good to have some strategy to make this as painless as possible. Our suggestions:

Pin your version. For example, in your install_requires= or requirements.txt, do not write: trio >= 0.1.0. Instead, write: trio ~= 0.1.0.
- You can also use == if you prefer. The difference is that while both == 0.1.0 and ~= 0.1.0 will disallow upgrading to 0.2.0, ~= allows upgrading to 0.1.1 but == 0.1.0 does not. Our intention is that 0.x.y and 0.x.(y+1) will be backwards compatible.
Please do report back on how trio is working out for you, e.g. by posting a comment on this issue.
- Especially any rough spots you ran into where the API wasn't as helpful as it could be.
- Especially if you didn't run into any rough spots, because that information is incredibly valuable in helping us decide when to declare things stable!
Subscribe to this issue (for example, by pressing the little "Subscribe" button in the right column →). We'll bring up backwards-incompatible changes here before we make them, so this will give you fair warning and a chance to give feedback.

Design: do we need batched accept / accept_nowait?

https://bugs.python.org/issue27906 suggests that it's important to be able to accept multiple connections in a single event loop tick.

It's trivial to implement, too -- just

def accept_nowait(self):
     try:
         return self._sock.accept()
     except BlockingIOError:
         raise _core.WouldBlock from None

But...

I am... not currently able to understand how/why this can help. Consider a simple model where after accepting each connection, we do a fixed bolus of CPU-bound work taking T seconds and then immediately send the response, so each connection is handled in a single loop step. If we accept only 1 connection per loop step, then each loop step takes 1 * T seconds and we handle 1 connection/T seconds on average. If we accept 100 connections per loop step, then each loop step takes 100 * T seconds and we handle 1 connection/T seconds on average.

Did the folks in the bug report above really just need an increased backlog parameter to absorb bursts? Batching accept() certainly increases the effective listen queue depth (basically making it unlimited), but "make your queues unbounded" is not a generically helpful strategy.

The above analysis is simplified in that (a) it ignores other work going on in the system and (b) it assumes each connection triggers a fixed amount of synchronous work to do. If it's wrong it's probably because one of these factors matters somehow. The "other work" part obviously could matter, if the other work is throttlable at the event loop level, in the sense that if loop steps take longer then they actually do less work. Which is not clear here (the whole idea of batching accept is make it not have this property, so if this is how we're designing all our components then it doesn't work...).

I guess one obvious piece of "other work" that scales with the number of passes through the loop is just, loop overhead. One would hope this is not too high, but it is not nothing.

If we do want this, then #13 will want to use it.

The same question probably applies to recvfrom / sendto -- right now a task can only send or receive 1 UDP packet per event loop tick.

fix cancel

cancel_nowait -> cancel

and make it repeatable

Make it easier to tell where a Cancelled exception came from

Right now repr(exc) just gives Cancelled() which is not so helpful when trying to debug. Maybe something like <Cancelled, scope 3 in task some_func-3>?

(Even nicer if we could provide line numbers, but I think that would be expensive? Also misleading when the scope is hidden inside open_nursery or move_on_at.)

Missing: quick-start helper for servers

Some way to specify a listening socket (or maybe some endpoints? #8) and have it listen and spawn workers as connections come in.

Depends on #7 (task supervision). Maybe also #9 (TLS API).

Coalesce unix signals

Right now we queue up unix signals using a regular call_soon and a regular Queue(UNBOUNDED). This is suboptimal, because if a bunch of signals arrive all at once then we'll

Unix signal semantics are that they are flags, not events, i.e., the information is "this signal was raised at least once". We totally can coalesce signals.

This has two parts:

In the guts of _runner.py: a version of call_soon that enqueues the callback into a set rather than deque. call_soon(..., idempotent=True), maybe. This should be pretty straightforward and might well have other applications. The one tricky bit is how to do a batched get from a set in a thread/signal-safe way. I think the solution is:

batch = call_soon_set.copy()
for thunk in batch:
    call_soon_set.remove(thunk)
    thunk.call()

In trio._signals: for delivering the signals, use bespoke kind of queue-like object that keeps a set instead of an ordered list of events to deliver.

The latter part changes API, so we should do it sooner rather than later.

OTOH this is not that urgent, it's really a bit of polish.

Handling of SO_REUSEADDR and friends

Before I looked into it, my assumption was that we should default SO_REUSEADDR to enabled. But it turns out that things are more complicated than that.

The situation as I understand it:

On Unix:

For clients, connect on an unbound socket will automatically pick a good port + interface, and part of picking a "good" one is that it knows who you're connecting to, so it can strategically re-use local ports. (It's okay to have two client connections use the same local port so long as the peers have different addresses.)
For servers, bind will by default disallow re-use of ports that are in TIME_WAIT, which is generally considered over-fussy these days. So generally it's recommended to enable SO_REUSEADDR, which allows to bind to ports that are in TIME_WAIT but otherwise unused.
For clients that call bind before connect, you probably don't want to use SO_REUSEADDR, because it makes it possible to get a port that ends up failing when you call connect (ref). Though really you're best off not calling bind at all, because connect can do a better job of binding than you can, because it has more information at hand. [Edit: on recent Linux there's also sock.setsockopt(IPPROTO_IP, IP_BIND_ADDRESS_NO_PORT, 1); sock.bind((host, 0)) which means "bind me to this host, but delay picking the port until I call connect.]

On Windows:

For clients, the plain connect function acts similar to Unix, AFAIK. (The WSA-level functions like ConnectEx are different and require you to bind first, but ATM we aren't using those.)
For servers, you have to enable SO_EXCLUSIVEADDRUSE or else any program with the same uid can hijack your port. (Yes! At least this is better than it used to be – in XP and earlier, they didn't even have the uid check.) This is also required to prevent weird problems like being allowed to bind to a wildcard address + port where there is already another program bound to that port on all the concrete addresses. However, the downside is that it also prevents re-using ports that are in TIME_WAIT.
Never ever ever use SO_REUSEADDR, it's totally broken

(Reference for the delightful Windows behavior)

So one option would be to default-enable SO_REUSEADDR on Unix and SO_EXCLUSIVEADDRUSE on Windows. I'm a bit concerned about whether this will have a negative effect on clients, though – maybe we only want this to be the default for listening sockets? That's trickier. I guess we could set it in bind if not overridden? Or maybe we should keep it simple and say that it's default-enabled, and if you want to turn it off again then go for it.

Also, we should probably just not even expose SO_REUSEADDR on Windows, b/c it is a massive trap. Or even make trying to access it raise AttributeError: no really you don't want this, see <link>.

Missing piece: TLS support

It should be easy and ergonomic to use TLS with trio. This needs to be in the core.

My general idea is that trio.socket only exposes raw sockets, not fake-sockets-that-have-ssl-wrapped-around-them, and then we have a Stream adaptor that applies TLS. Initially using ssl.SSLObject and the BIO interfaces, but we should also keep a close eye on the plans for a new set of TLS APIs in the 3.7 time-frame (unfortunately now is kind of the worst time to be defining a TLS API!): Lukasa/peps#1

Anyway, I like this general design, but there are questions about how to make it ergonomic. Getting a socket + TLS is a pretty fundamental thing; we don't want it to feel cumbersome. And you might want to do things like call getsockopt or getpeername on your socket, which is tricky if you just called something like create_connection that handed you back a wrapped stream that doesn't have those methods! (Or worse, might hand back either a stream or a socket depending on the arguments you pass.) OTOH it would be nice if we can also make it pleasant to work with more complicated things like SNI callbacks etc. -- in jongleur I found curio's helper stuff pretty useless and immediately moved to separate socket and wrapped-socket objects anyway.

The thing where both for clients both the socket connect and the TLS handshake need to know the remote hostname is also a challenge for layering.

#8 is closely related.

Graceful handling of sockets (or whatever) getting closed while in use

Suppose we have one task happily doing its thing:

async def task1(sock):
    await sock.sendall(b"...")

and simultaneously, another task is a jerk:

async def task2(sock):
    sock.close()

It would be nice to handle this gracefully.

How graceful can we get? There is a limit here, which is that (a) it's Python so we can't actually stop people from closing things if they insist, and (b) the OS APIs we depend on don't necessarily handle this in a helpful way. Specifically I believe that for epoll and kqueue, if a file descriptor that's they're watching gets closed, they just silently stop watching it, which in the situation above would mean task1 blocks forever or until cancelled. (Windows select -- or at least the select.select wrapper on Windows -- seems to return immediately with the given socket object marked as readable.)

As an extra complication, there are really two cases here: the one where the object gets closed just before we hand it to the IO layer, and the one where it gets closed while in possession of the IO layer.

And for sockets, one more wrinkle: when a stdlib socket.socket object is closed, then its fileno() starts returning -1. This is actually kinda convenient, because at least we can't accidentally pass in a valid fd/handle that has since been assigned to a different object.

Some things we could do:

In our close methods, first check with the IOManager whether the object is in use, and if so cancel those uses first. (On Windows we can't necessarily cancel immediately, but I guess that's OK b/c on Windows it looks like closing the handle will essentially trigger a cancellation already; it's the other platforms where we have to emulate this.)
In IOManager methods that take an object-with-fileno()-or-fd-or-handle, make sure to validate the fd/handle while still in the caller's context. I think on epoll/kqueue we're OK right now because the wait_* methods immediately register the fd, and on Windows the register_for_iocp method is similar. But for Windows select, the socket could be invalid and we won't notice until it gets selected on in the select thread. Or it could become invalid on its way to the select thread, or in between calls to select... right now I think this will just cause the select loop to blow up.

missing feature: task- and run-local storage

Similar to what I implemented in curio

It'd be nice if run_in_worker_thread preserved the task-local context.

Design: thread pools

Right now, run_in_worker_thread just always spawns a new thread for the operation, and then kills it after. This might sound ridiculous, but it's not so obviously wrong as it looks! There's a large comment in trio._threads talking about some of the issues.

Questions:

Should there be any global control over worker thread spawning?
If there is, should it be a strict limit on the number of threads, or something more subtle like a limiter on the rate at which they spawn?
How do administrators configure this stuff? Or instrument it?
What should the API to interact with it look like, e.g. do we need a way for a specific run_in_worker_thread to say that it shouldn't block waiting for a thread because it might unblock a thread?

Prior art: https://twistedmatrix.com/trac/ticket/5298

Interacting with the products at Rackspace which use Twisted, I've seen problems caused by thread-pool maximum sizes with some annoying regularity. The basic problem is this: if you have a hard limit on the number of threads, it is not possible to write a correct program which may require starting a new thread to un-block a blocked pool thread - glyph

Better ergonomics for working with MultiErrors

Unfortunately, MultiErrors aren't very nice to work with right now.

They don't print backtraces for the individual exceptions. (And unfortunately python doesn't make this particularly easy.) I'm sure we can do better than we are though.

A relatively common pattern is to want to catch a particular exception, or allow particular exceptions through (e.g., this comes up in trio._core._run quite a bit!), or transform particular exceptions, which requires "slicing" a MultiError. Right now we're doing this by hand when we need to, but it's obviously something that would be better handled by some utility functions with the right abstraction. (And there are places where we aren't handling it right, like system_task_wrapper – see #50.)

Unfortunately I'm not entirely sure what this looks like. Maybe it will become clearer as we work with these things more.

Clean up / test / APIify the KeyboardInterrupt handling

This needs work in a few ways:

I'm not a big fan of how we currently respond to KeyboardInterrupt. Better idea: when the SIGINT handler fires, deliver a KeyboardInterruptCancelled exception (inherits from Cancelled, KeyboardInterrupt) to the current task immediately + all other tasks at their next cancellation point + set the special thing that makes new call_soon spawns get cancelled with this if they show up. This captures the idea that a KeyboardInterrupt is targeted at the whole process, not just one task! And then at the end the final exception should be a KeyboardInterrupt (or subclass) in most cases, rather than UnhandledExceptionError. (Maybe a strong supervision tree would help here, by routing all exceptions up through main rather than having them go straight to the crash machinery? Of course right now we literally send KeyboardInterrupt straight to the crash machinery!)

Also, there are no tests for the current KeyboardInterrupt handling.

Also, we need to provide some sort of reasonable public (hazmat) API for this stuff.

And finally we need to audit the codebase to make sure we're using this API to mark the correct things as protected. E.g. I'm pretty sure ParkingLot needs some annotations! And the run_in_trio_thread stuff is just wrong right now. (call_soon(..., spawn=True) needs to not enable interrupts, but currently it does; and then {run,await}_in_trio_thread need to enable them at the right point.)

assert_yields, assert_no_yield

in trio.testing

easy to write with instrument API

Design: higher-level stream abstractions

There's a gesture towards moving beyond concrete objects like sockets and pipes in the trio._streams interfaces. I'm not sure how much of this belongs in trio proper (as opposed to a library on top), but I think even a simple bit of convention might go a long way. What should this look like?

Prior art: Twisted endpoints. I like the flexibility and power. I'm not as big a fan of the strings and plugin architecture, but hey.

Write a CONTRIBUTING.md

Some points to mention:

A test for every change
Keep the docs updated (uh... once we have docs)
Keep the test suite fast – no sleeping!
Speedups are awesome, but need: (a) microbenchmarks to demonstrate the change does what you expect, (b) data from some real use case to demonstrate that it matters or (c) doesn't complicate code. That's (a AND (b or c)).
Notes on issue labels (e.g. "todo soon" means "not aware of any big questions here, the idea is good and the plan seems straightforward")

Can probably crib most of it from someone else...

There should be some kind of rate-limiting on getaddrinfo threads

trio.socket.getaddrinfo (and some of its friends, like socket.resolve_*_address) internally spawn a thread. Right now there's no limited on this, so if you spawn 1000 tasks and they each make a socket connection, you'll briefly get 1000 threads at once. Probably it would be better if you instead, got, like, 20 threads at once, 50 times. Or maybe not, I haven't tried it!

The obvious thing would be to make a global (well, thread-local) semaphore and require getaddrinfo to hold it when calling run_in_worker_thread. Not so obvious how to allow the number of threads to be tuned etc. though.

Debugging / introspection features

For quality of life:

A method on Task to get a traceback
Ability to introspect set of all tasks (compare to thread iteration APIs)
Give a task, quickly summarize what it's going. Curio has its "status" field. I don't like the manual bookkeeping this requires, and I think we can do better. Idea: walk the task's stack to find the innermost frame with a __trio_wchan__ local (or whatever), and report it as the wchan-equivalent. It could even be callable to get more details (e.g. run_in_worker_thread could set it to print the repr of the call being run). Or I guess we'd get something similar by just showing the name+arguments of the frame.
An async REPL that provides little utilities based on things like the above to see what's going on in your program.

What should happen if a task is cancelled before it begins executing?

Currently, if a task is spawned and then immediately cancelled, we begin executing the task, with a note that Cancelled should be raised the first time it blocks. C# does it differently: if a task is cancelled before it begins executing, then it just never begins executing. Effectively there is an implicit cancellation point just before the task begins executing.

I feel like there is some reason why I did it this way, but I don't remember what that is :-). We should at least write down the rationale somewhere for the historical record...

One place where this makes a difference: Currently, spawn(fn, *args) calls fn(*args) immediately, within the caller's scope. Because fn is async, this doesn't actually start executing the code, but it does mean that errors like "fn is not callable" or "fn takes 3 arguments (2 given)" get raised immediately within the caller. It also means that we create the coroutine object immediately when spawn is called, and once we create the coroutine object we are committed to executing it, because otherwise Python will whine about "coroutine was never awaited". So if we switched to allowing pre-emptive cancellation to prevent task execution entirely, then we would also have to delay calling fn(*args) until just before we pumped the task for the first time. Doable, but definitely some extra complexity, and a change in user-visible semantics.

write docs

write a real README

IO waits + control-C on Windows

Windows has no concept of EINTR; if you hit control-C while we're blocked in select or GetQueuedCompletionStatusEx, then the C-level signal handler runs, but the blocking call still runs to completion before returning to Python and letting the Python-level signal handler run. So currently if you do trio.run(trio.sleep, 100000) then on Windows you can't interrupt this with control-C. That's unfortunate! We should fix it.

CPython itself avoids this, e.g. time.sleep(10000) is interruptible by hitting control-C. The way they do it is: in the real C-level signal handler, in addition to setting the magic flag saying that a signal arrived, they also (if this is windows and the signal is SIGINT) do

SetEvent(sigint_event);

where sigint_event is a global singleton Event object which can be retrieved using:

PyAPI_FUNC(void*) _PyOS_SigintEvent(void);

So the basic idea is: before entering your blocking call, you (a) ResetEvent(sigint_event), and (b) arrange (somehow) for your blocking call to exit early if this handle becomes signaled. And then Python's regular signal-checking logic resumes and runs the Python-level handler and we're good to go. We already need the machinery for waking up the main thread when an Event becomes signaled, so that part shouldn't be a big deal, though it's not implemented yet. [Edit: and of course we also have to be careful to only turn this logic on if we are running in the main thread. And it might be important to explicitly check for signals after waking up? I'm not sure how often the interpreter checks, and if we wake up and then go back to sleep without checking then that's bad.]

Open question: I have no idea what the equivalent of this is on PyPy. It's possible they simply don't have this machinery at all (i.e. I don't even know if pypy-on-windows allows you to break out of time.sleep(100000)). As of 2017-02-07 there is no version of pypy that can run trio on Windows (currently the 3.5-compatibility branch is linux-only), so the issue may not arise for a bit.

trio.socket tests

async file I/O

Maybe a wrapper around pathlib.Path that wraps everything in run_in_worker_thread, and where open returns a file-like object whose methods have again been wrapped in run_in_worker_thread? (#10 and #6 are relevant.)

On Windows it'd be nice to go the extra step and use the IOCP methods for actual file I/O.

Design: daemonic tasks

I left them out for now because it complicated the lifecycle management. Specifically around crashes and shutting down.

I haven't yet hit a smoking gun where we absolutely need them, but there have been a few "close calls". For example, this batching code is a reasonable use for a daemonic task, though with a bit more work it could be written to automatically start and stop the task there as work arrives.

Do we want them? How should they work?

set up readthedocs

happy eyeballs

Whatever our create_connection looks like (#8), it'd be neat if it automatically did happy eyeballs :-). This should be pretty trivial (though it'd help if #7 was sorted), basically something like:

async def create_connection(host, port):
    first_family = None
    results = await getaddrinfo(...)
    # pick the first two results with different AF_* settings
    first_res = results[0]
    second_res = [res for res in results if res[0] != first_res[0]][0]
    async def connect1():
        return await connect_to(first_res)
    async def connect2():
        await trio.sleep(0.300)
        return await connect_to(second_res)
    # 'race' runs several tasks, returns the one that finishes first, and cancels the others
    return await race(connect1, connect2)

(+ the mandatory caching the RFC 6555 specifies)

Maybe: add {before,after}_io_wait callbacks to instrument interface?

It's trivial to do and gives useful information; I'm just a bit hesitant because depending on the outcome to #32, we might or might not have a discrete handle_io step in the scheduler loop.

Windows sockets support quality-of-implementation issues

Currently limited to 512 sockets (#3)
The way socket notifications get funneled through call_soon is probably suboptimal
- Especially because right now, after a socket becomes active the select thread keeps selecting on it until the main thread acknowledges it and creates a call_soon storm! The select loop needs to have one-shot logic internally.
The new "RIO" sockets API might have fewer gotchas than classic sockets IOCP (see), but of course requires Windows 8 or better.

Missing feature: sendfile

On Unix, os.sendfile is present and just needs a simple (but slightly finicky) wrapper.

On Windows, os.sendfile doesn't exist, but we can wrap TransmitFile - shouldn't be too hard.

Implement more featureful mock clock for testing

Right now MockClock allows direct control over time. Which, I mean, when you put it like that it sounds pretty awesome. But there are some more features that would make it even more awesome:

.rate: Rate at which trio time passes, versus real time. Current behavior is rate = 0, but we should support other options too.
.autojump_threshold: If at any point the run loop goes idle for this long, then jump the clock forward to the next timeout (Idea stolen from https://github.com/majek/fluxcapacitor). Current behavior is autojump_threshold = inf.

To implement the latter, we should enhance wait_run_loop_idle to accept a threshold argument. This is useful in its own right (to give IO a chance to settle etc.). Then when autojump_threshold is set to a non-inf value, spawn a system task in the background that loops on wait_run_loop_idle and steps the clock. (current_statistics gives you the next deadline.)

Subprocess support

Edit: if you're just coming to this issue, then this comment has a good overview of what there is to do: #4 (comment)

Original description

Lots of annoying and fiddly details, but important!

I think for waiting, the best plan is to just give up on SIGCHLD (seriously, SIGCHLD is the worst) and park a thread in waitpid for each child process. Threads are lighter weight than processes so one thread-per-process shouldn't be a big deal. At least on Linux - if we're feeling ambitious we can do better on kqueue platforms. On Windows, it depends on what the state of our WaitFor{Multiple,Single}Object-fu is.

Improve low-level IO monitor functions (for IOCP and kqueue)

These should take a queue, rather than returning a queue, to support the case where you want to register multiple filters and multiplex them onto the same queue. (This is probably more important for kqueue than IOCP, b/c in IOCP you can take the completion key and use it for multiple registrations? but either way.)

CompletionKeyEventInfo should also carry the actual completion key, b/c it becomes ambiguous.

Stop using .throw method on coroutines

If an exception is raised inside an except block, Python does its lovely implicit exception chaining thing and shows us both tracebacks:

# Regular raise of KeyError while handling ValueError
def f():
    try:
        raise ValueError
    except:
        raise KeyError

>> f()
Traceback (most recent call last):
  File "<stdin>", line 3, in f
ValueError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in f
KeyError

But if you use .throw() to inject an exception, then this does not happen – all record of the ValueError is lost! (Using generators in the example instead of coroutines b/c there's less boilerplate:)

# .throw() raises a KeyError while handling ValueError
def f2():
    try:
        raise ValueError
    except:
        yield

>>> gen = f2(); gen.send(None); gen.throw(KeyError)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in f2
KeyError

This can be avoided by avoiding throw, and instead sending in the exception object and re-raising it inside the generator callstack:

def f3():
    try:
        raise ValueError
    except:
        raise (yield)

>>> gen = f3(); gen.send(None); gen.send(KeyError)  # notice "send", not "throw"!
Traceback (most recent call last):
  File "<stdin>", line 3, in f3
ValueError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in f3
KeyError

Making this change for us is really easy -- right now to resume a coroutine we take a Result object, and have it either send or throw the unwrapped version of itself into the yield_* trap. All we need to do is switch it so we send in the Result object, and then have the yield_* trap call unwrap.

python-trio / trio Goto Github PK

trio's Introduction

Trio – a friendly Python library for async concurrency and I/O

Where to next?

Code of conduct

trio's People

Contributors

Stargazers

Watchers

Forkers

trio's Issues

Does it matter though?

Options

Prior art

Alternatively...

IDNA support in trio.socket

Recommend Projects

Recommend Topics

Recommend Org