encode / httpcore Goto Github PK
View Code? Open in Web Editor NEWA minimal HTTP client. ⚙️
Home Page: https://www.encode.io/httpcore/
License: BSD 3-Clause "New" or "Revised" License
A minimal HTTP client. ⚙️
Home Page: https://www.encode.io/httpcore/
License: BSD 3-Clause "New" or "Revised" License
This seems to be an error with h2, just raising here to confirm it, as before using httpx with httpcore interface, I never saw this issue, at least not with this frequency.
Error: Invalid input ConnectionInputs.RECV_HEADERS in state ConnectionState.CLOSED
From encode/httpx#859 (comment)
Traceback (most recent call last):
File "client.py", line 13, in fetch
r = await client.get(target)
File "/Users/florimond/Developer/python-projects/httpx/httpx/_client.py", line 1279, in get
return await self.request(
File "/Users/florimond/Developer/python-projects/httpx/httpx/_client.py", line 1121, in request
response = await self.send(
File "/Users/florimond/Developer/python-projects/httpx/httpx/_client.py", line 1142, in send
response = await self.send_handling_redirects(
File "/Users/florimond/Developer/python-projects/httpx/httpx/_client.py", line 1169, in send_handling_redirects
response = await self.send_handling_auth(
File "/Users/florimond/Developer/python-projects/httpx/httpx/_client.py", line 1206, in send_handling_auth
response = await self.send_single_request(request, timeout)
File "/Users/florimond/Developer/python-projects/httpx/httpx/_client.py", line 1238, in send_single_request
) = await dispatcher.request(
File "/Users/florimond/Developer/python-projects/httpx/venv/lib/python3.8/site-packages/httpcore/_async/http_proxy.py", line 81, in request
return await self._tunnel_request(
File "/Users/florimond/Developer/python-projects/httpx/venv/lib/python3.8/site-packages/httpcore/_async/http_proxy.py", line 155, in _tunnel_request
proxy_response = await connection.request(
File "/Users/florimond/Developer/python-projects/httpx/venv/lib/python3.8/site-packages/httpcore/_async/connection.py", line 52, in request
assert url[:3] == self.origin
AssertionError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "client.py", line 24, in <module>
loop.run_until_complete(main())
File "/Users/florimond/.pyenv/versions/3.8.2/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "client.py", line 20, in main
await asyncio.gather(*tasks)
File "client.py", line 15, in fetch
num_ok += 1
File "/Users/florimond/Developer/python-projects/httpx/httpx/_client.py", line 1454, in __aexit__
await self.aclose()
File "/Users/florimond/Developer/python-projects/httpx/httpx/_client.py", line 1443, in aclose
await proxy.aclose()
File "/Users/florimond/Developer/python-projects/httpx/venv/lib/python3.8/site-packages/httpcore/_async/connection_pool.py", line 275, in aclose
await self._remove_from_pool(connection)
File "/Users/florimond/Developer/python-projects/httpx/venv/lib/python3.8/site-packages/httpcore/_async/connection_pool.py", line 258, in _remove_from_pool
self._connection_semaphore.release()
File "/Users/florimond/Developer/python-projects/httpx/venv/lib/python3.8/site-packages/httpcore/_backends/asyncio.py", line 210, in release
self.semaphore.release()
File "/Users/florimond/.pyenv/versions/3.8.2/lib/python3.8/asyncio/locks.py", line 533, in release
raise ValueError('BoundedSemaphore released too many times')
ValueError: BoundedSemaphore released too many times
Currently the test suite makes live requests to http://example.org
.
We need to update those to run against uvicorn or hypercorn instead.
The advantage of uvicorn would be that we can keep it in supported-versions lockstep with httpcore
.
The advatage of hypercorn would be that we can run any HTTP/2 tests.
Figure we probably(?) ought to stick with uvicorn as with httpx for right now, since that ought to get us up and going as quickly as possible, but might not be averse to switching to hypercorn if it makes sense sometime.
body = b''
async for chunk in stream:
body += chunk
This gets slower and slower as the size of body grows. Best to use a list and .join() at the end.
Is the usage of slots something to consider? I know some of the benefits are lower memory footprint and faster attribute access and I am curious if it something to consider here.
Strictly enforce 100% test coverage.
We have that with the current test suite, but it's not yet being enforced.
With a service deployed with httpx version with httpcore interface, I started receiving the following error:
<class 'httpcore._exceptions.WriteError'>: [Errno 104] Connection reset by peer
The only problem is that this is when connecting to a service that others applications have no problems. The connection is (should be) HTTP/1.1 in this case.
I think a sensible tack onto the proxy tests would be if our test environment included a running client proxy, and we made live requests via it.
I'm wondering what the most well used client proxy services are that we'd be able to have easily installed into our Travis / GitHub actions / Local developer setup, that we could use to exercise forwarding/tunneling requests?
Anyone got thoughts on which services we could easily have installed into the docker environment that'd meet our requirements here?
Our test suite performs real internet connections. This has worked well for us since it gives us full confidence that httpcore works in real circumstances. However it has some drawbacks:
In HTTPX we use a local test server based on Uvicorn which has worked well, the only drawback that I'm aware it's the lack of HTTP/2 support.
Should we go for the same approach here?
Related #108
import asyncio
import httpcore
async def main():
headers = proxy_headers = [(b"host", b"google.org")]
async with httpcore.AsyncHTTPProxy(
(b'http',b'127.0.0.1',12590),
proxy_headers=proxy_headers,
http2=True,
) as http:
http_version, status_code, reason_phrase, headers, stream = await http.request(
method=b'GET',
url=(b'https', b'www.google.com', 443, b'/'), headers=headers,
)
try:
body = []
async for chunk in stream:
body.append(chunk)
print(b"".join(body))
except Exception as e:
print(e)
finally:
await stream.aclose()
asyncio.run(main())
This is my sample code , I want to set a cookie (like a dict, {"xxxx": 'aaaa'}) for this request. But I don't know where to set.
Prompted by a comment from @hynek
We ought to add more __init__
controls to AsyncConnectionPool(...)
and SyncConnectionPool(...)
for advanced connection options, including...
It'd be useful to do a comprehensive review of...
Compare against controls available in urllib3
, aiohttp
.
On http_proxy.py coroutine is never awaited and return is never used. Is this on purpose?
Line:
read_body(proxy_stream)
We should hide anything except .request
and .close
on the ConnectionPool, but we might well want to provide a .get_connection_stats()
that returns eg...
>>> print(http.get_connection_stats())
{(b'https', b'example.org', 443): {ConnectionState.IDLE: 1}}
Above: A connection pool, maintaining a single keep-alive connection to https://example.org
Noticable in particular in the test cases which happen to reach inside intended-private API in order to deteremine the connection state.
master
.When connecting to HTTP2 streams, if the other side closes the connection (causing the socket to go into a CLOSE_WAIT
state), the Client/Stream doesn't handle that appropriately, instead it keeps reading b''
from a closed socket forever.
Install from master:
pip install git+https://github.com/encode/httpx.git@bacc2d18350d9f6645828461435fe6e19b9dc518
Generate self-signed keys:
openssl req -x509 -newkey rsa:2048 -nodes -sha256 -subj '/CN=localhost' -keyout key.pem -out cert.pem
I'm using nodejs
version 14.5 to create a small HTTP2 server that streams continuously, called server.js
.
const http2 = require('http2');
const fs = require('fs');
const server = http2.createSecureServer({
key: fs.readFileSync('key.pem'),
cert: fs.readFileSync('cert.pem')
});
server.on('error', (err) => console.error(err));
server.on('stream', (stream, headers, flags) => {
stream.respond({
':status': 200,
'content-type': 'text/event-stream'
});
setInterval(() => stream.write(`${Math.random()}\n`), 500);
});
server.listen(8443);
To install nodejs
:
$ curl -sL https://deb.nodesource.com/setup_14.x | sudo -E bash -
..
$ sudo apt-get install -y nodejs
...
$ nodejs -v
v14.5.0
$
A small python client that talks to the server (client.py
):
import httpx
def get_client():
timeout = httpx.Timeout(read_timeout=5.0)
client = httpx.Client(
base_url="https://localhost:8443",
http2=True,
timeout=timeout,
verify="cert.pem",
)
return client
def main():
with get_client() as c, c.stream("GET", "") as s:
for i in s.iter_lines():
print(i)
if __name__ == '__main__':
main()
I start the server:
nodejs server.js
In another terminal I then connect with the client and I start getting things like:
$ python3.8 client.py
0.5366774977298967
0.282442060319521
0.1150984598113638
0.3190482876162004
...
I then kill the server:
$ nodejs server.js
^C
$
The client exits somewhat gracefully, perhaps logging a warning that the other side closed the connection.
The client just blocks. On inspecting with strace it seems to be continuously reading from an empty buffer:
$ sudo strace -p <PID OF client.py>
read(5, "", 5) = 0
ioctl(5, FIONBIO, [1]) = 0
read(5, "", 5) = 0
ioctl(5, FIONBIO, [1]) = 0
read(5, "", 5) = 0
ioctl(5, FIONBIO, [1]) = 0
read(5, "", 5) = 0
...
I was able to track the bug to SyncSocketStream.read
:
def read(self, n: int, timeout: TimeoutDict) -> bytes:
read_timeout = timeout.get("read")
exc_map = {socket.timeout: ReadTimeout, socket.error: ReadError}
with self.read_lock:
with map_exceptions(exc_map):
self.sock.settimeout(read_timeout)
return self.sock.recv(n) # returns b''
Here self.sock.recv(n)
returns simply b''
but that's never handled further up the call stack:
> /home/<username>/.virtualenvs/sandbox3.8/lib/python3.8/site-packages/httpcore/_backends/sync.py(63)read()
-> return self.sock.recv(n)
(Pdb) p self.sock.recv(n)
b''
If I keep pressing n
it comes back around:
(Pdb) n
> /home/<username>/.virtualenvs/sandbox3.8/lib/python3.8/site-packages/httpcore/_backends/sync.py(63)read()
-> return self.sock.recv(n)
(Pdb) n
--Return--
> /home/<username>/.virtualenvs/sandbox3.8/lib/python3.8/site-packages/httpcore/_backends/sync.py(63)read()->b''
-> return self.sock.recv(n)
(Pdb)
> /home/<username>/.virtualenvs/sandbox3.8/lib/python3.8/site-packages/httpcore/_sync/http2.py(205)receive_events()
-> events = self.h2_state.receive_data(data)
(Pdb)
> /home/<username>/.virtualenvs/sandbox3.8/lib/python3.8/site-packages/httpcore/_sync/http2.py(206)receive_events()
-> for event in events:
(Pdb)
> /home/<username>/.virtualenvs/sandbox3.8/lib/python3.8/site-packages/httpcore/_sync/http2.py(216)receive_events()
-> data_to_send = self.h2_state.data_to_send()
(Pdb)
> /home/<username>/.virtualenvs/sandbox3.8/lib/python3.8/site-packages/httpcore/_sync/http2.py(217)receive_events()
-> self.socket.write(data_to_send, timeout)
(Pdb)
--Return--
> /home/<username>/.virtualenvs/sandbox3.8/lib/python3.8/site-packages/httpcore/_sync/http2.py(217)receive_events()->None
-> self.socket.write(data_to_send, timeout)
(Pdb)
> /home/<username>/.virtualenvs/sandbox3.8/lib/python3.8/site-packages/httpcore/_sync/http2.py(196)wait_for_event()
-> while not self.events[stream_id]:
(Pdb)
> /home/<username>/.virtualenvs/sandbox3.8/lib/python3.8/site-packages/httpcore/_sync/http2.py(197)wait_for_event()
-> self.receive_events(timeout)
(Pdb)
> /home/<username>/.virtualenvs/sandbox3.8/lib/python3.8/site-packages/httpcore/_backends/sync.py(63)read()
-> return self.sock.recv(n)
I'm happy to provide more information if that's not sufficient.
With httpx using httpcore interface, started receiving the following error:
<ConnectionTerminated error_code:ErrorCodes.NO_ERROR, last_stream_id:1999, additional_data:None>
Don't know h2 specifics to say, but could NO_ERROR
be used for some indication like graceful connection closing rather than raising an error?
Since we're now officially using HTTPCore in HTTPX I thought it'd be good to define a policy for coordinating changes between the two.
In terms of releases, and given we're using semantic versioning, we could go:
Which ever we choose we should probably also define development workflows. For instance, I've seen repos that immediately after a release a version bump is merged indicating a new "development cycle". Another option is to enforce a version bump on each PR, or at least an entry in the Changelog describing whether the change should be considered bugfix, minor or major.
It's possible encode already has a policy but I thought it'd be good have an issue where we define it. 🙂
Thoughts?
We've got a byte stream interface on both the request and response bodies. There's a few bits there that'll need a bit of careful description:
.close()
methods..close()
, but clients should almost always do so. (eg. httpx
's context-managed stream API)close()
method to return a list of headers (which will typically be an empty list).httpcore
, but simply by matching its type signatures. We can't do that if we introduce a class that streams my strictly subclass. We might? want to annotate the stream types as AsyncIterable[bytes]
and Iterable[bytes]
with the close method being an optional under-typed method. (Ie. client code can use close = getattr(stream, 'close')
)From encode/httpx#1096
The httpcore.SyncByteStream
and httpcore.AsyncByteStream
classes provide a simple base implementation that allows users to pass iterator
/close
or aiterator
/aclose
parameters. That's actually a bit fiddly if you've just got a non-iterating bytestring that you want to pass. We ought to also support passing plain bytes to the base implementation. So...
"""
The base interface for request and response bodies.
Concrete implementations should subclass this class, and implement
the `\\__aiter__` method, and optionally the `close` method.
"""
def __init__(
self, content: Union[bytes, AsyncIterator[bytes]] = b"", aclose_func: Callable = None,
) -> None:
self.content = content
self.aclose_func = aclose_func
async def __aiter__(self) -> AsyncIterator[bytes]:
"""
Yield bytes representing the request or response body.
"""
if isinstance(self.content, bytes):
yield self.content
else:
async for chunk in self.content:
yield chunk
async def aclose(self) -> None:
"""
Must be called by the client to indicate that the stream has been closed.
"""
if self.aclose_func is not None:
await self.aclose_func()
Prompted by https://github.com/encode/httpx/pull/998#issuecomment-671070588…
cc @tomchristie
Right now our transport API looks like this... I'm using a slightly edited version of the README, where we only inspect the response headers:
async with httpcore.AsyncConnectionPool() as http:
http_version, status_code, reason_phrase, headers, stream = await http.request(
method=b'GET',
url=(b'https', b'example.org', 443, b'/'),
headers=[(b'host', b'example.org'), (b'user-agent', 'httpcore')]
)
print(status_code, body)
Although it's definitely not obvious to the un-trained eye, this code could be problematic in some situations: the stream
object is a wrapper over an open socket, and we're not enforcing that users close it properly before exiting the context of the associated request.
This example runs fine, but it's mostly by chance. AsyncConnectionPool
may open and close connections on a per-request basis, but this doesn't pose a problem because the lifespan of a connection typically outranges that of a request (eg because of Keep-Alive
). So the overall machinery occurs at the transport level. In other words, when the AsyncConnectionPool
block exits, all underlying connections are closed, and so the underlying socket is closed, and nothing leaks.
But…
Consider the case of ASGITransport
in encode/httpx#998, or any other example that requires a per-request I/O or concurrency resource (in the ASGITransport
case, a task group for running the ASGI app in the background).
Then currently we must do some unclassy __aenter__()
/__aexit__()
dance with that resource, so that request()
returns a stream
containing some kind of reference to that resource, with an aclose_func=...
that properly closes said resource. (This is tricky. I think we need to always pass sys.exc_info()
to __aexit__()
so that the resource is aware that it might be being closed in response to an exception within the block, but I'm not sure).
For example...
from somewhere import create_async_resource
class MyTransport:
async def request(self, ...):
resource = create_async_resource()
await resource.__aenter__() # "Acquire" resource.
async def aclose():
await resource.__aexit__(*sys.exc_info()) # "Cleanup" resource.
stream = httpcore.AsyncByteStream(..., aclose_func=aclose)
return (*..., stream)
And now my variant of the README example will be broken for this particular transport:
async with MyTransport() as http:
http_version, status_code, reason_phrase, headers, stream = await http.request(
method=b'GET',
url=(b'https', b'example.org', 443, b'/'),
headers=[(b'host', b'example.org'), (b'user-agent', 'httpcore')]
)
print(status_code, body)
# OOPS! `resource` has leaked...
One way to fix it would be to track instances of resource
on the transport class, and then on MyTransport.__aexit__()
make sure all resource
instances are closed. It might not generally be very practical to do this, eg resource
might fail if you're trying to close it again. And overall it feels like late patching, instead of fixing the problem as close to its source as possible (the request "context").
On the other hand, what if .request()
was returning a context manager, instead of a plain awaitable…?
Then we could do this:
from somewhere import create_async_resource
class MyTransport:
@asynccontextmanager
async def request(self, ...):
async with create_async_resource() as resource:.
stream = httpcore.AsyncByteStream(...)
yield (*..., stream)
The usage syntax will change slightly, to become:
async with MyTransport() as http:
async with http.request(
method=b'GET',
url=(b'https', b'example.org', 443, b'/'),
headers=[(b'host', b'example.org'), (b'user-agent', 'httpcore')]
) as (
http_version,
status_code,
reason_phrase,
headers,
stream,
):
print(status_code, body)
# Regardless of whether `stream.aclose()` was called, the stream is now properly closed,
# and `resource` was cleaned up.
Now the syntax guarantees us that resource
will be closed whenever we exit the request context.
Benefits:
The documentation currently reads:
url - (bytes, bytes, int, bytes) - The URL as a 4-tuple of (scheme, host, port, path).
Where does userinfo fit into this, e.g. foo:bar
in http://foo:[email protected]/some/path
?
host
? Should host
actually be authority
?TypedDict
?import typing
class URL(typing.TypedDict, total=False):
scheme: bytes
username: bytes
password: bytes
host: bytes
port: int
path: bytes
url: URL = {
"scheme": b"http",
"host": b"localhost",
"port": 8000,
"path": b"/hello?beautiful=world#today"
}
Another thing I noticed while reworking encode/httpx#998 and thinking about encode/httpx#1145... (Actually, anyone that tries to solve encode/httpx#1145 should arrive at the same conclusion than I'm exposing in this issue, since the HTTPX clients does the wrong thing there as well, which causes issues with exception handling.)
Right now when someone wants to implement a custom transport, we tell them:
.request(...) -> ...
..aclose() -> None
.I think we might want to consider switch to:
.request(...) -> ...
..__aenter__() -> None
and __aexit__(...) -> None
.That is, making it clear to transport implementers that transports are just context managers. More specifically, that if they want their transport to wrap around an async resource, they can do so by forwarding __aenter__()
and __aexit__()
to that resource.
We can probably (?) keep .aclose()
as a synonym of .__aexit__(None, None, None)
, but it's actually not clear to me whether keeping it wouldn't maintain some sort of confusion. If we really want to enforce context managed usage of transports (and it looks like we should), then transport = Transport()
(plain instantiation) + calling .aclose()
is not a correct option.
So, AsyncHTTPTransport
would become this:
class AsyncHTTPTransport:
"""
The base interface for sending HTTP requests.
Concete implementations should subclass this class, and implement
the `request` method, and optionally the `__aenter__` and/or `__aexit__` methods.
"""
def request(
self,
method: bytes,
url: URL,
headers: Headers = None,
stream: AsyncByteStream = None,
timeout: TimeoutDict = None,
) -> AsyncContextManager[
Tuple[bytes, int, bytes, List[Tuple[bytes, bytes]], AsyncByteStream]
]:
"""
The interface for sending a single HTTP request, and returning a response.
**Parameters:**
* **method** - `bytes` - The HTTP method, such as `b'GET'`.
* **url** - `Tuple[bytes, bytes, Optional[int], bytes]` - The URL as a 4-tuple of
(scheme, host, port, path).
* **headers** - `Optional[List[Tuple[bytes, bytes]]]` - Any HTTP headers
to send with the request.
* **stream** - `Optional[AsyncByteStream]` - The body of the HTTP request.
* **timeout** - `Optional[Dict[str, Optional[float]]]` - A dictionary of
timeout values for I/O operations. Supported keys are "pool" for acquiring a
connection from the connection pool, "read" for reading from the connection,
"write" for writing to the connection and "connect" for opening the connection.
Values are floating point seconds.
** Returns:**
An asynchronous context manager returning a five-tuple of:
* **http_version** - `bytes` - The HTTP version used by the server,
such as `b'HTTP/1.1'`.
* **status_code** - `int` - The HTTP status code, such as `200`.
* **reason_phrase** - `bytes` - Any HTTP reason phrase, such as `b'OK'`.
* **headers** - `List[Tuple[bytes, bytes]]` - Any HTTP headers included
on the response.
* **stream** - `AsyncByteStream` - The body of the HTTP response.
"""
raise NotImplementedError() # pragma: nocover
async def __aenter__(self) -> "AsyncHTTPTransport":
"""
Context manager hook implementation.
I/O and concurrency resources managed by this transport should be opened here.
"""
return
async def __aexit__(
self,
exc_type: Type[BaseException] = None,
exc_value: BaseException = None,
traceback: TracebackType = None,
) -> None:
"""
Close the implementation, which should close any outstanding I/O and concurrency
resources managed by this transport.
"""
async def aclose(self) -> None:
await self.__aexit__(None, None, None)
Other than asyncio and trio, curio seems to be the other "big" event loop implementation.
From python 3.8 CancelledError
a is subclass of BaseException
(https://docs.python.org/3.8/library/asyncio-exceptions.html#asyncio.CancelledError)
If coroutine is cancelled after connection was acquired and during request, connection is not removed from the pool:
https://github.com/encode/httpcore/blob/master/httpcore/_async/connection_pool.py#L179
try:
response = await connection.request(
method, url, headers=headers, stream=stream, timeout=timeout
)
except NewConnectionRequired:
connection = None
except Exception:
logger.trace("remove from pool connection=%r", connection)
await self._remove_from_pool(connection)
raise
I would like to record the HTTPS certificate.
One way is to have a custom Backend, but the January solution can't be used anymore: encode/httpx#782 (comment)
A partial update:
from ssl import SSLContext
from typing import Optional
import asyncio
import httpx
import uvloop
from httpcore._backends.auto import AutoBackend
from httpcore._backends.asyncio import SocketStream
from httpcore._async.connection_pool import AsyncConnectionPool
from httpcore._types import TimeoutDict
class CustomBackend(AutoBackend):
async def open_tcp_stream(
self,
hostname: bytes,
port: int,
ssl_context: Optional[SSLContext],
timeout: TimeoutDict,
) -> SocketStream:
value = await super().open_tcp_stream(hostname, port, ssl_context, timeout)
# use value.stream_reader._transport.get_extra_info('ssl_object')
return value
class CustomAsyncConnectionPool(AsyncConnectionPool):
def __init__(
self,
ssl_context: SSLContext = None,
max_connections: int = None,
max_keepalive: int = None,
keepalive_expiry: float = None,
http2: bool = False,
):
super().__init__(ssl_context, max_connections, max_keepalive, keepalive_expiry, http2)
self._backend = CustomBackend()
async def main():
dispatch = CustomAsyncConnectionPool(http2=True)
async with httpx.AsyncClient(dispatch=dispatch) as client:
response = await client.get('https://github.com/')
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Unfortunately, in the request method, AsyncHTTPConnection is instanciated without the Backend argument:
httpcore/httpcore/_async/connection_pool.py
Lines 143 to 145 in 59eb0a3
And the Backend is instanciated again:
httpcore/httpcore/_async/connection.py
Lines 20 to 40 in 59eb0a3
I guess the purpose is to match the SyncHTTPConnection constructor signature.
Some solutions:
backend
argument to the AsyncHTTPConnection constructor.We'll want to carefully document the four different values in the timeout configuration.
If we could push the timeouts any further out of the stack then that'd be a good thing to do, but I don't think that's possible, since eg. there's no visibility onto "acquire the connection pool semaphore" vs. "wait for a new connection" but both require individually configurable timeout settings.
Our four values are:
read
- Timeouts on read operations.write
- Timeouts on write operations, ie. failing to flush the buffer.connect
- Timeouts on new TCP connections.pool acquiry
- Timeout when waiting to not overload the connection pool.(All optional floats.)
proxy = httpx.HTTPProxy(
"http://{}".format(ip),
proxy_headers={"Host": url+':443',
"Proxy-Connection":"Keep-Alive"})
im sniffing these with charlesproxy and output is
CONNECT www.host.es:443 HTTP/1.1
host: https://www.host.es/m/m/login?:443
user-agent: python-httpx/0.7.4
accept-encoding: gzip, deflate, br
connection: keep-alive
proxy-connection: Keep-Alive
accept: */*
i would like to remove all unless proxy-connection and host
IIRC, RFC defines 2 messages that may carry error_code
:
In addition, h2
might generate ResetStream
event internally:
when the remote party has made a protocol error which only affects a single stream
I find it weird that there's only this one place where error_code
is evaluated in the library 🤔
We ought to place locking around the state changes in the connection pool.
We don't actually need these for the async cases, since the implementations are organised so that no async/await is used within the state changes, but we'll want it for the threaded case, to ensure thread safety.
Rather than add extra backend code, we could instead opt to do this in an implementation that only locks in the threaded case, something like...
class ThreadLock:
def __init__(self):
self.lock = threading.Lock()
def __enter__(...):
self.lock.acquire()
def __exit__(...):
self.lock.release()
def __aenter__(...):
pass
def __aexit__(...):
pass
The async with self.thread_lock:
blocks would in any case be good visual indicators that no async/await code should be used within the block.
At an implementation level, the question of if httpx.AsyncClient
should strictly enforce only being usable as a context-manager comes down to "should httpcore.AsyncConnectionPool
strictly enforce only being usable as a context-manager?"
It's not the Client that might at some point start background tasks, but the connection pool implementation.
Further notes at encode/httpx#769
Hi,
I am trying to build this package and I am constantly getting errors. I am wondering why. Could you please take a look? The package lives at https://build.opensuse.org/package/show/home:mcalabkova:branches:devel:languages:python/python-httpcore.
In short: the software constantly refuses to use HTTP2 even when it is allowed (first half of errors). HTTP1.1 tests then fail on the return code (second half of errors). Do you have any idea what could cause this?
Known problems: our buildservice doesn't have access to internet during the build. Also we do not have all the packages you are pulling in during your tests, but I think I have everything mentioned in the [test] subsection of requirements.txt.
Thanks a lot!
Coming from discussion on Gitter with @dalf…
Currently we are reading response data in chunks of 4kB…
httpcore/httpcore/_async/http11.py
Line 26 in f4240b6
httpcore/httpcore/_async/http2.py
Line 29 in f4240b6
Benchmarking using @dalf's pyhttp-benchmark tool led us to see that increasing this number to 64kB could lead 2-3x execution time improvement for large responses (typically > 256kB).
My rationale would be that reading N bytes in one go via a syscall is faster than reading n = N/k bytes k times — mostly because the kernel is way faster than Python.
Hey there and thanks for your awesome libraries!
I was planning on reviving @florimondmanca's https://github.com/florimondmanca/httpx-unixsocket-poc. I've managed to get it working but hesitate publishing a proper package for now as it's tightly coupled with things that are now private APIs.
I understand that publishing a lot of APIs might not be in scope for httpcore but it is useful in some cases like this one. Are there any plans to...
httpcore._backends.asyncio.SocketStream
and httpcore._async.connection.AsyncHTTPConnection
?AsyncHTTPConnection
to take a custom backend as an input?Also, publishing httpcore._exceptions
and maybe httpcore._types
might be a good idea as those are useful on the outside of the library (e. g. the only way to catch TimeoutException
only is to import it from the private module :-)
in AsyncConnectionPool.request
except Exception:
logger.trace("remove from pool connection=%r", connection)
await self._remove_from_pool(connection)
raise
the connection is not closed.
Individual PRs for each of...
max_keepalive: Optional[int] = None
(number >= 0, or None for no limits.)max_connections: Optional[int] = None
(number >= 1, or None for no limits.)keepalive_expiry: float = None
(time in seconds >= 0.0, or None for no limits.)We'll need to add the semaphores back in to support max_connections, and utilise the pool_acquiry
timeout. (I've dropped the semaphores from the backend code momentarily, so that we've only got implementation that we're actually using.)
Likewise we'll need to add .time()
back onto the backends to support keepalive expiry.
Bug described in encode/httpx#281 where HTTPX doesn't support waiting for other streams to close before creating a new stream on an HTTP/2 connection.
There's a discrepancy in how we deal with proxy headers between the two proxy modes.
Currently forwading extends the headers in the request with the proxy_headers argument, but tunneling only includes the proxy_headers. This forces the user into carefully constructing the proxy headers argument depending on the proxy mode which is annoying and also may result in unexpected errors when using the default proxy mode and requesting a mix of HTTP and HTTPS URLs.
While doing some tests with HTTP/2 and trace logging I noticed that when using concurrent requests connections are being created until hitting max_connections
, even though (I believe) only one should be created and new streams should be used within the same connection.
Here's a quick example:
import asyncio
import httpx
async def main(url, n):
async with httpx.AsyncClient(
pool_limits=httpx.PoolLimits(soft_limit=2, hard_limit=5),
timeout=httpx.Timeout(5.0),
http2=True,
) as client:
await asyncio.gather(*[request(client, url, i) for i in range(n)])
async def request(client, url, i):
response = await client.get(url)
assert response.status_code == 200
assert response.http_version == "HTTP/2"
if __name__ == "__main__":
asyncio.run(main("https://example.org", 10))
With some logging tweaks prints:
DEBUG [2020-05-10 10:36:32] httpcore._async.connection_pool - get_connection_from_pool=(b'https', b'example.org', 443)
DEBUG [2020-05-10 10:36:32] httpcore._async.connection_pool - get_connection_from_pool=(b'https', b'example.org', 443)
DEBUG [2020-05-10 10:36:32] httpcore._async.connection_pool - get_connection_from_pool=(b'https', b'example.org', 443)
DEBUG [2020-05-10 10:36:32] httpcore._async.connection_pool - get_connection_from_pool=(b'https', b'example.org', 443)
DEBUG [2020-05-10 10:36:32] httpcore._async.connection_pool - get_connection_from_pool=(b'https', b'example.org', 443)
DEBUG [2020-05-10 10:36:32] httpcore._async.connection_pool - get_connection_from_pool=(b'https', b'example.org', 443)
DEBUG [2020-05-10 10:36:32] httpcore._async.connection_pool - get_connection_from_pool=(b'https', b'example.org', 443)
DEBUG [2020-05-10 10:36:32] httpcore._async.connection_pool - get_connection_from_pool=(b'https', b'example.org', 443)
DEBUG [2020-05-10 10:36:32] httpcore._async.connection_pool - get_connection_from_pool=(b'https', b'example.org', 443)
DEBUG [2020-05-10 10:36:32] httpcore._async.connection_pool - get_connection_from_pool=(b'https', b'example.org', 443)
DEBUG [2020-05-10 10:36:32] httpcore._async.connection_pool - created connection=AsyncHTTPConnection origin=(b'https', b'example.org', 443) http2=True state=0 pool_size=1
DEBUG [2020-05-10 10:36:32] httpcore._async.connection_pool - created connection=AsyncHTTPConnection origin=(b'https', b'example.org', 443) http2=True state=0 pool_size=2
DEBUG [2020-05-10 10:36:32] httpcore._async.connection_pool - created connection=AsyncHTTPConnection origin=(b'https', b'example.org', 443) http2=True state=0 pool_size=3
DEBUG [2020-05-10 10:36:32] httpcore._async.connection_pool - created connection=AsyncHTTPConnection origin=(b'https', b'example.org', 443) http2=True state=0 pool_size=4
DEBUG [2020-05-10 10:36:32] httpcore._async.connection_pool - created connection=AsyncHTTPConnection origin=(b'https', b'example.org', 443) http2=True state=0 pool_size=5
DEBUG [2020-05-10 10:36:32] httpcore._async.connection_pool - created connection=AsyncHTTPConnection origin=(b'https', b'example.org', 443) http2=True state=0 pool_size=4
DEBUG [2020-05-10 10:36:32] httpcore._async.connection_pool - created connection=AsyncHTTPConnection origin=(b'https', b'example.org', 443) http2=True state=0 pool_size=5
DEBUG [2020-05-10 10:36:32] httpcore._async.connection_pool - created connection=AsyncHTTPConnection origin=(b'https', b'example.org', 443) http2=True state=0 pool_size=4
DEBUG [2020-05-10 10:36:32] httpcore._async.connection_pool - created connection=AsyncHTTPConnection origin=(b'https', b'example.org', 443) http2=True state=0 pool_size=5
DEBUG [2020-05-10 10:36:32] httpcore._async.connection_pool - created connection=AsyncHTTPConnection origin=(b'https', b'example.org', 443) http2=True state=0 pool_size=5
Tracing the flow in the code I found that:
asyncio.wait_for
acquire will not return immediately and will trigger an event loop switchAnd sure enough, removing the wait_for
the output is:
DEBUG [2020-05-10 10:43:35] httpcore._async.connection_pool - get_connection_from_pool=(b'https', b'example.org', 443)
DEBUG [2020-05-10 10:43:35] httpcore._async.connection_pool - created connection=AsyncHTTPConnection origin=(b'https', b'example.org', 443) http2=True state=0 pool_size=1
DEBUG [2020-05-10 10:43:35] httpcore._async.connection_pool - get_connection_from_pool=(b'https', b'example.org', 443)
DEBUG [2020-05-10 10:43:35] httpcore._async.connection_pool - reuse connection=AsyncHTTPConnection origin=(b'https', b'example.org', 443) http2=True state=0 pool_size=1
DEBUG [2020-05-10 10:43:35] httpcore._async.connection_pool - get_connection_from_pool=(b'https', b'example.org', 443)
DEBUG [2020-05-10 10:43:35] httpcore._async.connection_pool - reuse connection=AsyncHTTPConnection origin=(b'https', b'example.org', 443) http2=True state=0 pool_size=1
DEBUG [2020-05-10 10:43:35] httpcore._async.connection_pool - get_connection_from_pool=(b'https', b'example.org', 443)
DEBUG [2020-05-10 10:43:35] httpcore._async.connection_pool - reuse connection=AsyncHTTPConnection origin=(b'https', b'example.org', 443) http2=True state=0 pool_size=1
DEBUG [2020-05-10 10:43:35] httpcore._async.connection_pool - get_connection_from_pool=(b'https', b'example.org', 443)
DEBUG [2020-05-10 10:43:35] httpcore._async.connection_pool - reuse connection=AsyncHTTPConnection origin=(b'https', b'example.org', 443) http2=True state=0 pool_size=1
DEBUG [2020-05-10 10:43:35] httpcore._async.connection_pool - get_connection_from_pool=(b'https', b'example.org', 443)
DEBUG [2020-05-10 10:43:35] httpcore._async.connection_pool - reuse connection=AsyncHTTPConnection origin=(b'https', b'example.org', 443) http2=True state=0 pool_size=1
DEBUG [2020-05-10 10:43:35] httpcore._async.connection_pool - get_connection_from_pool=(b'https', b'example.org', 443)
DEBUG [2020-05-10 10:43:35] httpcore._async.connection_pool - reuse connection=AsyncHTTPConnection origin=(b'https', b'example.org', 443) http2=True state=0 pool_size=1
DEBUG [2020-05-10 10:43:35] httpcore._async.connection_pool - get_connection_from_pool=(b'https', b'example.org', 443)
DEBUG [2020-05-10 10:43:35] httpcore._async.connection_pool - reuse connection=AsyncHTTPConnection origin=(b'https', b'example.org', 443) http2=True state=0 pool_size=1
DEBUG [2020-05-10 10:43:35] httpcore._async.connection_pool - get_connection_from_pool=(b'https', b'example.org', 443)
DEBUG [2020-05-10 10:43:35] httpcore._async.connection_pool - reuse connection=AsyncHTTPConnection origin=(b'https', b'example.org', 443) http2=True state=0 pool_size=1
DEBUG [2020-05-10 10:43:35] httpcore._async.connection_pool - get_connection_from_pool=(b'https', b'example.org', 443)
DEBUG [2020-05-10 10:43:35] httpcore._async.connection_pool - reuse connection=AsyncHTTPConnection origin=(b'https', b'example.org', 443) http2=True state=0 pool_size=1
Note a single connection is created and the following coroutines reuse it.
Obviously just removing the wait_for
is not the right thing to do since we want to keep the pooling timeout, but we probably need to think of a different way to achieve it to maximize throughput in HTTP/2.
Sketching this out...
class AsyncHTTPProxy(AsyncConnectionPool):
"""
A connection pool for making HTTP requests via an HTTP proxy.
**Parameters:**
* **proxy_url** - `Tuple[bytes, bytes, int, bytes]` - The URL of the proxy service as a 4-tuple of (scheme, host, port, path).
* **proxy_headers** - `Optional[List[Tuple[bytes, bytes]]]` - A list of proxy headers to include.
* **proxy_mode** - `Optional[str]` - A proxy mode to operate in. May be "DEFAULT", "FORWARD_ONLY", or "TUNNEL_ONLY".
* **ssl_context** - `Optional[SSLContext]` - An SSL context to use for verifying connections.
* **max_keepalive** - `Optional[int]` - The maximum number of keep alive connections to maintain in the pool.
* **max_connections** - `Optional[int]` - The maximum number of HTTP connections to allow. Attempting to establish a connection beyond this limit will block for the duration specified in the pool acquiry timeout.
"""
def __init__(
self,
proxy_origin: Tuple[bytes, bytes, int],
proxy_headers: List[Tuple[bytes, bytes]] = None,
proxy_mode: str = “DEFAULT”,
ssl_context: SSLContext = None,
):
self.proxy_origin = proxy_origin
self.proxy_headers = proxy_headers
self.proxy_mode = proxy_mode
self.ssl_context = ssl_context
async def request(
self,
method: bytes,
url: Tuple[bytes, bytes, int, bytes],
headers: List[Tuple[bytes, bytes]] = None,
stream: AsyncByteStream = None,
timeout: Dict[str, Optional[float]] = None,
) -> Tuple[bytes, int, bytes, List[Tuple[bytes, bytes]], AsyncByteStream]:
if ...:
return await self.forward_request(...)
else:
return await self.connect_request(...)
async def forward_request(
self,
method: bytes,
url: Tuple[bytes, bytes, int, bytes],
headers: List[Tuple[bytes, bytes]] = None,
stream: AsyncByteStream = None,
timeout: Dict[str, Optional[float]] = None,
) -> Tuple[bytes, int, bytes, List[Tuple[bytes, bytes]], AsyncByteStream]:
origin = self.proxy_origin
connection = await self._get_connection_from_pool(origin)
if connection is None:
connection = AsyncHTTP11Connection(
origin=origin, ssl_context=self.ssl_context,
)
async with self.thread_lock:
self.connections.setdefault(origin, set())
self.connections[origin].add(connection)
target = b'%b://%b:%d%b' % url
url = self.proxy_origin + (target,)
headers = self.proxy_headers + headers
response = await connection.request(
method, url, headers=headers, stream=stream, timeout=timeout
)
(http_version, status_code, reason_phrase, headers, stream,) = response
stream = ResponseByteStream(
stream, connection=connection, callback=self._response_closed
)
return http_version, status_code, reason_phrase, headers, stream
async def connect_request(
self,
method: bytes,
url: Tuple[bytes, bytes, int, bytes],
headers: List[Tuple[bytes, bytes]] = None,
stream: AsyncByteStream = None,
timeout: Dict[str, Optional[float]] = None,
) -> Tuple[bytes, int, bytes, List[Tuple[bytes, bytes]], AsyncByteStream]:
origin = url[:3]
connection = await self._get_connection_from_pool(origin)
if connection is None:
connection = AsyncHTTP11Connection(
origin=origin, ssl_context=self.ssl_context,
)
async with self.thread_lock:
self.connections.setdefault(origin, set())
self.connections[origin].add(connection)
# Issue a CONNECT request
target = b'%b:%d' % (url[1], url[2])
connect_url = self.proxy_origin + (target,)
connect_headers = self.proxy_headers
proxy_response = await connection.request(b"CONNECT", connect_url, headers=connect_headers, timeout=timeout)
async for chunk in proxy_response[4]:
pass
await proxy_response[4].close()
await connection.start_tls()
response = await connection.request(
method, url, headers=headers, stream=stream, timeout=timeout
)
(http_version, status_code, reason_phrase, headers, stream,) = response
stream = ResponseByteStream(
stream, connection=connection, callback=self._response_closed
)
return http_version, status_code, reason_phrase, headers, stream
To be raised for non http
/https
requests.
Note that this will allow use to drop enforce_http_url
in httpx
, and instead defer that handling entirely to the transport class.
Similar to https://www.python-httpx.org/contributing/
Two issues stemming from discussions in #57:
proxy_headers
argument, but tunneling only includes the proxy_headers
. This forces the user into carefully constructing the proxy headers argument depending on the proxy mode which is annoying and also may result in unexpected errors when using the default proxy mode and requesting a mix of HTTP and HTTPS URLs.Hi!
I don't see the license file anymore either on the package or here in the repository. Is that expected?
I ask because I'm one of the maintainers of this package on conda-forge, and we require all packages to have a license.
I notice the logger.trace()
calls we had previously in HTTPX are not present in this package anymore.
I suppose this was done out of simplicity, but I think we should bring them back. They were super useful to debug networking issues, connection management issues, etc.
So here's an issue for trackig purposes. :-)
I am using an HTTPS proxy, with the httpx 0.13.3 and httpcore 0.9.1 client on Mac OS (Darwin 19.5.0) httpx is the only direct dependency I have loaded... abbreviated stack trace
...
httpcore/_sync/httpx_proxy.py line 111 in request
httpcore/_sync/httpx_proxy.py line 214 in _tunnel_request
httpcore/_sync/connection.py line 126 in start_tls
httpcore/_sync/http11.py line 71 in start_tls
httpcore/_backends/sync.py line 50 in start_tls
self.sock, server_hostname=hostname.decode("ascii")
version/3.7.4/lib/python3.7/contextlib.py, line 130, in exit
httpcore._exceptions.ConnectError: [SSL: PRE_MAC_LENGTH_TOO_LONG] invalid alert (_ssl.c:1076)
I have tried increasing debug to trace level without anything more illuminating. the proxy works correctly with curl. I am not sure this is an actual bug but not sure how to proceed.
Here is the verbose TLS output from curl that worked:
Not sure if the last line there is a clue but it seems odd. Any help greatly appreciated.
We ought to add our "Test Suite" and PyPI badges to the README and to docs/index.md
Currently we have here:
AsyncDispatchInterface
SyncDispatchInterface
The Interface
suffix looks a bit jarring to me, as in "of course they are interfaces, aren't they?".
What was the motivation for departing from AsyncDispatcher
/SyncDispatcher
?
I was looking at the get_http_version() function in SocketStream:
class SocketStream(AsyncSocketStream):
def __init__(
self, stream_reader: asyncio.StreamReader, stream_writer: asyncio.StreamWriter,
):
self.stream_reader = stream_reader
self.stream_writer = stream_writer
self.read_lock = asyncio.Lock()
self.write_lock = asyncio.Lock()
def get_http_version(self) -> str:
ssl_object = self.stream_writer.get_extra_info("ssl_object")
if ssl_object is None:
return "HTTP/1.1"
So, it looks like if there is no SSL setup, the http version will always be HTTP/1.1
So, even if I specify http2=True with no SSL, the connection is always a HTTP/1.1 connection
Is there a way to do HTTP/2 over a connection with no SSL ?
Thanks.
This httpx chanelog PR, as part of a movement to sunset usage of urllib3 in favor of httpcore, mentions UDS support as a temporary casualty of the process because "maintaining that would have meant having to push back our work towards a 1.0 release".
Regarding putting this support back into httpcore, there has been recent work done in this proof of concept (thanks @florimondmanca ) that suggests that including UDS support inside the library would not be an overwhelming task.
I personally use a lot of inter-service communication via unix sockets so this would be (opinion) a welcome addition as a first-class citizen of the new release.
I am brand new to this library; I have only used pre-httpcore httpx. After upgrading past 0.12 of httpx, I was surprised that my code could no longer use the uds= keyword when creating clients, enough to blow up on the trio gitter (apologies). I now understand that keeping to a release schedule and making everyone happy is an extremely hard task!
@tomchristie suggested this issue be created to start a discussion here. Go!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.