Using httpx with trio's nursery

I’ve been experimenting with trio web scrappers for a while, and always struggled to work with httplibs (a part from asks there really weren’t any) until I found httpx!

Now I’m wondering, while I know httpx’s developpement is still new, especially with trio, but I just want to confirm that httpx doesn’t work with trio’s nursery yet (because parallel requests aren’t set up yet) and to know if it will once prallel requests become a reality

With this as an example:

import trio
import httpx
import time
from httpx.concurrency.trio import TrioBackend

async def asks_worker(client, thread):
    try:
        start_time = time.time()
        r = await client.get('https://website.com')
        print(r.status_code)
        timenow = time.time() - start_time
        print("[THREAD N°" + str(thread) + "] " + str(timenow) + " seconds elapsed")
    except Exception as e:
        print("error: " + str(e))
        pass

    
async def run_task():
    thread = 0
    async with trio.open_nursery() as nursery:
        async with httpx.AsyncClient(backend=TrioBackend()) as client:
            for x in range(500):
                nursery.start_soon(asks_worker, client, thread)
                thread += 1


try:
    total_time = time.time()

    trio.run(run_task)
    print("TOTAL TIME ELAPSED: " + str(time.time() - total_time))
except Exception as e:
    print(e)

It would be an example I’d often use with asks, as using trio’s nursery with asks to test requests speed, you can easily send up to 15 000 to 20 000 requests per minute and process their responses. Which is quite useful for a webscrapper.

However using a nursery with httpx results in some odd behaviour. In this example, while it should send 500 requests, it’ll send around 100 in 16 seconds and then print 400 blank "error: " Exceptions

I just want to confirm this is normal behaviour, probably due to prallel requests being still work in progress, and that it’ll be functional once parallel requests are supported.

Hi!

In this example, while it should send 500 requests, it’ll send around 100 in 16 seconds and then print 400 blank "error: " Exceptions

Woah, that’s strange. Could it be that the trio backend can’t keep up or scrambles the data it sends to servers, resulting in them sending 400 responses? Not sure about the 100-Continue part though. Or did you mean that it sent 100 failed (500) requests and then start getting 400 responses?

Ah nevermind, I think I figured it out. I usually use this code with trio to stress-test performance of concurrent tasks (mostly with asks).

By 500 requests I meant, I set up 500 concurrent tasks in trio’s nursery (for x in range(500)) that then each send a request to a random website and print out the status code, which works well with asks and other functions (not the most efficient, sure, but I just want to test how much it can handle), but with httpx it would only send around 100 requests returning a 200 HTTP status, and then the remaining 400 tasks would just raise a blank " " python Exception.

But I realised I probably shouldn’t be stress testing raw performance with httpx’s trio backend which probably doesn’t handle the load like asks does. So I set it up with trio’s CapacityLimiter to something more appropriate such as a limit of 40 concurrent tasks, which unsprisingly works a lot better.

It still encounters some issues tho, such as “another task is currently receiving data on this SSLStream” exceptions and a few more of the strange " " blank exception, which after a bit of debugging it seems to arise from “PoolTimeout()” exceptions, so I’m guessing for now I should keep httpx stress-testing to a minimum.

Ah thanks, it seems I was way off understanding your initial use case! :smile:

It turns out HTTPX has pool limits in place to make sure not too many HTTP connections are open at the same time. The default pool limits are in httpx.config:

DEFAULT_POOL_LIMITS = PoolLimits(soft_limit=10, hard_limit=100, pool_timeout=5.0)

There you have it — 100 hard limit on the number of connections by default, which explains why only 100 of your requests went through. :wink:

I bet if you increase the pool limits to 500 you’ll be able to get them all through just fine.

client = httpx.AsyncClient(
    pool_limits=httpx.PoolLimits(soft_limit=10, hard_limit=500, pool_timeout=5),
    backend=TrioBackend()
)
async with client:
    ...

Pretty sure this information would benefit from being in a “Connection pooling” section in the HTTPX advanced docs.

Ah okay! Thanks a lot of the information! :slight_smile:

Turns out this is already somewhat documented in the API reference here. The default PoolLimits object is shown in the Client constructor signature, and a short description of the pool_limits parameter is also available.

I’m not sure if I should be opening a new discussion for this (techincally it’s still within the same subject)

I’ve been playing around with HTTPX and it’s great!.
I’ve only had one issue however, it’s sending more than one specific request under the same session (especially to the same url).

As an example, using a single session like so

import trio
import httpx
import time
import httpx.exceptions
from httpx.concurrency.trio import TrioBackend
async def asks_worker(s, thread, limit):

    async with limit:
        try:
            start_time = time.time()
            r = await s.post('https://google.com')
            print(r.status_code)
            timenow = time.time() - start_time
            print("[THREAD N°" + str(thread) + "] " + str(timenow) + " seconds elapsed")
        except Exception as e:
            print(e)
            pass
            


    
async def run_task():
    thread = 0
    limit = trio.CapacityLimiter(30)
    async with httpx.AsyncClient(pool_limits=httpx.PoolLimits(soft_limit=30, hard_limit=200, pool_timeout=8), backend=TrioBackend()) as s:

        async with trio.open_nursery() as nursery:
            for x in range(10):
                nursery.start_soon(asks_worker, s, thread, limit)
                thread += 1


total_time = time.time()

trio.run(run_task)
print("TOTAL TIME ELAPSED: " + str(time.time() - total_time))

Is significantly faster than opening a new session per request like this (especially when you send 1000-10000-10000000 requests, it can be multiple minutes faster):

import trio
import httpx
import time
import httpx.exceptions
from httpx.concurrency.trio import TrioBackend
async def asks_worker(thread, limit):

    async with limit:
        async with httpx.AsyncClient(pool_limits=httpx.PoolLimits(soft_limit=30, hard_limit=200, pool_timeout=8), backend=TrioBackend()) as s:
            try:
                start_time = time.time()
                r = await s.post('https://google.com')
                print(r.status_code)
                timenow = time.time() - start_time
                print("[THREAD N°" + str(thread) + "] " + str(timenow) + " seconds elapsed")
            except Exception as e:
                print(e)
            pass
            


    
async def run_task():
    thread = 0
    limit = trio.CapacityLimiter(30)

    async with trio.open_nursery() as nursery:
        for x in range(1000):
            nursery.start_soon(asks_worker, thread, limit)
            thread += 1


total_time = time.time()

trio.run(run_task)
print("TOTAL TIME ELAPSED: " + str(time.time() - total_time))

Now this is all well and good, issues arise when I try to send more than one request as such:

import trio
import httpx
import time
import httpx.exceptions
from httpx.concurrency.trio import TrioBackend
async def asks_worker(s, thread, limit):
    async with limit:
            try:
                start_time = time.time()
                r = await s.get('https://example.com')
                print(r.status_code)
                s = await s.get('https://website.com')
                print(s.status_code)
                timenow = time.time() - start_time
                print("[THREAD N°" + str(thread) + "] " + str(timenow) + " seconds elapsed")
            except Exception as e:
                print(e)
                pass
                


    
async def run_task():
    thread = 0
    limit = trio.CapacityLimiter(40)
    async with httpx.AsyncClient(pool_limits=httpx.PoolLimits(soft_limit=40, hard_limit=200, pool_timeout=8), backend=TrioBackend()) as s:

        async with trio.open_nursery() as nursery:
            for x in range(1000):
                nursery.start_soon(asks_worker, s, thread, limit)
                thread += 1


total_time = time.time()

trio.run(run_task)
print("TOTAL TIME ELAPSED: " + str(time.time() - total_time))

This leads to “another task is currently receiving data on this SSLStream” trio exceptions and “HTTPConnection(origin=Origin(scheme='https' host='example.com' port=443))” exceptions.

Worst case scenario, sending more than one request to the same base url as such:

import trio
import httpx
import time
import httpx.exceptions
from httpx.concurrency.trio import TrioBackend
async def asks_worker(s, thread, limit):
    async with limit:
            try:
                start_time = time.time()
                r = await s.get('https://example.com/something')
                print(r.status_code)
                s = await s.get('https://example.com/somethingelse')
                print(s.status_code)
                timenow = time.time() - start_time
                print("[THREAD N°" + str(thread) + "] " + str(timenow) + " seconds elapsed")
            except Exception as e:
                print(e)
                pass
                


    
async def run_task():
    thread = 0
    limit = trio.CapacityLimiter(40)
    async with httpx.AsyncClient(pool_limits=httpx.PoolLimits(soft_limit=40, hard_limit=200, pool_timeout=8), backend=TrioBackend()) as s:

        async with trio.open_nursery() as nursery:
            for x in range(1000):
                nursery.start_soon(asks_worker, s, thread, limit)
                thread += 1


total_time = time.time()

trio.run(run_task)
print("TOTAL TIME ELAPSED: " + str(time.time() - total_time))

There, pretty much 9/10 requests returns the SSL and HTTPConection issue.

Now starting a new session for each request seems very counter-productive, especially for large amounts of requests, is there nothing I can do to make this work efficiently without the errors? Maybe a Client configuration I missed in the docs? Starting a single Session in trio for large amounts of requests used to work fine with other asynchronous HTTPLibs like asks. Maybe I’m doing something wrong?

Nevermind, just had to specify http_versions=["HTTP/1.1"]