Redo ratelimit handling #10287

imptype · 2025-08-29T20:55:09Z

Summary

The old handler hits 429s for concurrent requests and unknown subratelimits. This new handler prioritizes a lower remaining value, a sooner reset_at value, and accounts if the token window has changed during a sleep so it still maintains the same throughput the old handler had.

Test code:

import time
import asyncio
import discord
from discord.ext import commands

intents = discord.Intents.default()
bot = commands.Bot(intents = intents, command_prefix = '.')

# Test 1: Send 200 messages sequentially in the same channel
@bot.command()
async def test1(ctx):
    start = time.time()
    for i in range(200):
        await ctx.send(f'hi {i}')
    print('Finished test1 in:', time.time() - start)

# Test 2: Send 50 messages concurrently in the same channel
@bot.command()
async def test2(ctx):
    start = time.time()
    await asyncio.gather(*[
        ctx.send(f'hi {i}') for i in range(50)
    ])
    print('Finished test2 in:', time.time() - start)

bot.run('TOKEN')

Old results:
Test 1: Finished in 213.27s with 0 429s.
Test 2: Finished in 46.29s with 86 429s.

New results:
Test 1: Finished in 208.48s with 0 429s.
Test 2: Finished in 47.15 with 0 429s.

Checklist

If code changes were made then they have been tested.
- I have updated the documentation to reflect the changes.
This PR fixes an issue.
This PR adds something new (e.g. new method or parameters).
This PR is a breaking change (e.g. methods or parameters removed/renamed)
This PR is not a code change (e.g. documentation, README, ...)

Rapptz · 2025-08-30T01:36:32Z

Thanks for the PR.

I admit though this is a bit of a thinker for me. When I originally wrote the rate limit handling class this was more or less the test that me and other contributors ran to ensure that it worked fine. It seems at some point one of the changes made it so 429s slip because this would not 429 before, especially that much.

I don't have too much time to review this changeset (or even test it) but I don't really like the use of Event here since the original was made with flattening of asyncio resources in mind. It also seems weird to me to handle global rate limits within that class instead of outside since the global rate limit affects all requests. It also feels incredibly brittle to me that almost all of this code is essentially just retrying things multiple times just in case something concurrently happened in the background instead of properly synchronizing it.

Anyway this is failing CI, and I don't think I can test this code anyway.

discord/http.py

HEROgold

I've added a few comments. These comments are specifically for fixing issues that cause this PR to not be testable.

And one comment about a docstring concern

imptype · 2025-08-30T11:33:52Z

Thanks for the PR.

I admit though this is a bit of a thinker for me. When I originally wrote the rate limit handling class this was more or less the test that me and other contributors ran to ensure that it worked fine. It seems at some point one of the changes made it so 429s slip because this would not 429 before, especially that much.

I don't have too much time to review this changeset (or even test it) but I don't really like the use of Event here since the original was made with flattening of asyncio resources in mind. It also seems weird to me to handle global rate limits within that class instead of outside since the global rate limit affects all requests. It also feels incredibly brittle to me that almost all of this code is essentially just retrying things multiple times just in case something concurrently happened in the background instead of properly synchronizing it.

Anyway this is failing CI, and I don't think I can test this code anyway.

24b619a still hits 429s for test 2. asyncio.Event will never cause a deadlock because it's dismissed by asyncio.wait after 3 seconds, and personally I think lock + event is simpler than lock + future. The global wait is inside the class in case the global ratelimit was hit during the wait to acquire, so this avoids more global 429s. During a sleep, the window can change due to requests behind being late, then it must check if there is no remaining left in the new window and if so sleep again - the simplest way is to sleep and check than make it event based or something; it still works fine. The use case for concurrency is more frequent and relevant than you realize, asyncio.gather(50) in the same channel is also the same as 50 people running a ping command in the same channel at the same time, which currently will hit 86 429s.

The remaining CI errors are typehinting or formatting issues which I can't fix.

Rapptz · 2025-08-30T19:44:57Z

24b619a still hits 429s for test 2.

I remember now that the old rate limit, on Discord's side, got rewritten about 2-3 years ago to use GRCA instead of the previous windowed rate limit and that's probably where the discrepancy is. I know for a fact we tested this for weeks so something definitely changed.

asyncio.Event will never cause a deadlock because it's dismissed by asyncio.wait after 3 seconds, and personally I think lock + event is simpler than lock + future.

The original code used a deque of futures because it was slightly more resource efficient since this is a low level part of the library. In asyncio both Lock and Event are just interfaces over a deque of futures and a boolean with no control over the the wakers compared to a semaphore. I don't really get the confusion of a Future here.

The global wait is inside the class in case the global ratelimit was hit during the wait to acquire, so this avoids more global 429s.

I guess this is fair.

During a sleep, the window can change due to requests behind being late, then it must check if there is no remaining left in the new window and if so sleep again - the simplest way is to sleep and check than make it event based or something; it still works fine.

I'm not convinced about this yet. Sleeping and checking whether your value changed under you is just brittle to me.

The use case for concurrency is more frequent and relevant than you realize, asyncio.gather(50) in the same channel is also the same as 50 people running a ping command in the same channel at the same time, which currently will hit 86 429s.

This is literally not at all what I said. I mentioned earlier we tested the gather use case, why do you assume I don't know it's importance? I'm mentioning the concurrency handling within your implementation using sleep is poor.

The remaining CI errors are typehinting or formatting issues which I can't fix.

You can fix your CI failure by running ruff format on the discord directory.

imptype · 2025-08-30T23:48:19Z

@Rapptz According to Discord's public changelog, the last update to ratelimit system was 5 years ago, before the current handler was written. Btw, I was referring to code readability. But even then, doesn't the old handler make multiple futures whereas this only has 1 in the deque at a time because of the lock? Also, there's no simpler way to sleep longer after a sleep was finished other than using asyncio.sleep and checking to sleep again.

Rapptz · 2025-08-31T00:18:09Z

According to Discord's public changelog, the last update to ratelimit system was 5 years ago, before the current handler was written.

I don't get why you're doubting me on this when I speak to employees on a regular basis. The changelog is inaccurate. It doesn't even mention the X-RateLimit-Scope header addition, nor does it mention the change to GCRA back in late 2022, which took until late 2023 for it to reach all endpoints to my knowledge.

imptype · 2025-08-31T10:37:56Z

@Rapptz Ok I didn't have insider info. What will it take to merge?

Rapptz

Truthfully, I was willing to merge and test this out earlier despite this being a large changeset made with no prior discussion whatsoever but your behaviour earlier made me way less willing to do so. Please remember that at the end of the day the person responsible for maintaining this stuff long term is me. For future purposes, please learn to communicate better. Either way, I'll do a review of this code real quick.

This code breaks the Client.assume_unsync_clock option. That's what the use_clock parameter was for and is a public documented option. If you want to change this you either need to support that parameter still or you need to deprecate it. As it stands looking at this behaviour you're changing it always be True.
I still don't like the Event usage but I'll set it slide for this since the resource usage difference is a single bool.
I still don't like the strange design where you're sleeping and expecting the value to change right under you.

The rest of these comments are essentially questions.

Rapptz · 2025-08-31T18:38:17Z

discord/http.py

+    async def _wait_global(self, start_time: float):
+        # Sleep up to 3 times, to account for global reset at overwriting during sleeps
+        for i in range(3):
+            seconds = self.http.global_reset_at - start_time
+            if seconds > 0:
+                await asyncio.sleep(seconds)
+                continue
+            break
+        else:
+            raise ValueError('Global reset at changed more than 3 times')


Due to the way the global rate limit has been rewritten it is no longer as possible to refactor this out into a customisable lock for multi-process set ups. It's always been part of the bucket list in order to implement this. If this changeset was communicated earlier in the Discord I would have also told you about this.

Global rate limit is not a lock anymore, it's just a changeable timestamp that all requests must sleep to first before continuing. So, just replicate the value of http.global_reset_at to other processes when it is hit and it will still work. Btw, will need to change loop.time() to use time.time().

It's not possible to sync the value between processes.

I thought processes can send each other messages and from that you can do http.global_reset_at = new value

Rapptz · 2025-08-31T18:39:18Z

discord/http.py

+                        continue  # sleep again
+                    break
+                else:
+                    raise ValueError('Reset at changed more than 3 times')


I don't get why this is a failure condition that has to bubble out. Ditto for the rest of them.

I expect reset_at to change during the sleep 3 times at most. More than that and I think the client is experiencing extreme lag or something, so the request should not continue.

I'd rather it surface another exception either from aiohttp or HTTPException instead. ValueError is a terrible exception to raise for this stuff.

Rapptz · 2025-08-31T18:41:51Z

discord/http.py

+                    if tries == 4 or e.errno not in (54, 10054):
+                        raise ValueError('Connection reset by peer')
+                    retry_seconds: int = 1 + tries * 2
+                    fmt = 'OS error for %s %s. Retrying in %d seconds.'
+                    _log.warning(fmt, method, url, retry_seconds)
+                    await asyncio.sleep(retry_seconds)


Why'd you swap this?

To avoid the continue and have 1 less line.

Code clarity matters more than line count. Also, in the case where the OSError isn't a connection reset, your code now gives the wrong error message.

Rapptz · 2025-08-31T18:42:06Z

discord/http.py

+        # We've run out of retries, raise
+        if response is not None:
+            raise HTTPException(response, data)


Why'd you remove the 500 raise?

I forgot to add a tries check to fix waiting 9 extra seconds on the last try for no reason. Unless an aiohttp error happens on the last try, it's never met.

Change it back.

Rapptz · 2025-08-31T19:04:13Z

discord/http.py

+                    if copy == self.reset_at:
+                        self.reset()


What is this for?

If reset_at hasn't changed, then we can set remaining = limit. A successful sleep is where we don't need to sleep again can refill the ratelimit remaining to the top.

If you mean the seemingly redundancy of the copy variable, it's needed in case of a bucket change. It will unnecessarily call Ratelimit.reset() when reset_at has changed and there are self.remaining left. Saving a call is good also in case you want to replicate ratelimit state to other processes too.

discord/http.py

Rapptz · 2025-08-31T19:09:43Z

discord/http.py

-                                    # If the previous hash was an actual Discord hash then this means the
-                                    # hash has changed sporadically.
-                                    # This can be due to two reasons
-                                    # 1. It's a sub-ratelimit which is hard to handle
-                                    # 2. The rate limit information genuinely changed
-                                    # There is no good way to discern these, Discord doesn't provide a way to do so.
-                                    # At best, there will be some form of logging to help catch it.
-                                    # Alternating sub-ratelimits means that the requests oscillate between
-                                    # different underlying rate limits -- this can lead to unexpected 429s
-                                    # It is unavoidable.


Not that it matters that much but why did you remove my comment?

It conflicts with other comments explaining the strategy.

Rapptz · 2025-08-31T19:11:01Z

discord/http.py

+                                    self._buckets[ratelimit.key] = ratelimit
+
+                        # Global rate limit 429 wont have ratelimit headers (also can't tell if it's one-shot)
+                        elif response.headers.get('X-RateLimit-Global'):


Why are we using this instead of the global inner key which to my knowledge is more accurate and actually consistently given?

We haven't called json_or_text yet, so can only use headers at this point.

Reorder the code so it is called. Retry-After header is in whole seconds, retry_after in the JSON is in float seconds.

Rapptz

I didn't tell you to remove Client.assume_unsync_clock.

Rapptz · 2025-09-01T08:54:27Z

discord/http.py

+                        continue  # sleep again
+                    break
+                else:
+                    raise ValueError('Reset at changed more than 3 times')


I'd rather it surface another exception either from aiohttp or HTTPException instead. ValueError is a terrible exception to raise for this stuff.

discord/http.py

Rapptz · 2025-09-01T09:16:11Z

discord/http.py

+                                    self._buckets[ratelimit.key] = ratelimit
+
+                        # Global rate limit 429 wont have ratelimit headers (also can't tell if it's one-shot)
+                        elif response.headers.get('X-RateLimit-Global'):


Reorder the code so it is called. Retry-After header is in whole seconds, retry_after in the JSON is in float seconds.

Rapptz · 2025-09-01T09:17:31Z

discord/http.py

+        # We've run out of retries, raise
+        if response is not None:
+            raise HTTPException(response, data)


Change it back.

imptype · 2025-09-01T19:29:44Z

@Rapptz unsync_clock was unused so it was removed.

imptype · 2025-09-02T16:52:44Z

@Rapptz A future replaced the event because you mentioned resource issues.

Rapptz

Previous comments still apply.

I did not ask for Client.assume_unsync_clock to be removed.
I still think bubbling this error as a RuntimeError, or even at all, is not good design.

I raise RuntimeError under there because it's an unreachable exception. It is not meant to ever get hit and if it does then it's bizarre that it happened -- it's purely there to shut up the type checker and not publicly facing.

Anyway, I will be going on a trip for the month of September so this is my last review of this PR until I get home and I don't really have time to do an even deeper review right now since I'm preoccupied preparing for said trip.

discord/http.py

Rapptz · 2025-09-02T21:15:24Z

discord/http.py

-        self.dirty = False
+        self.reset_at = 0.0
+
+    def update(self, response: aiohttp.ClientResponse, data: Union[Dict[str, Any], str]) -> bool:


This should probably only accept Dict[str, Any] or None by restricting it in the calling site so you can get rid of the # type: ignore. I guess it'd also be good to you make a TypedDict with the rate limit expected response type since that's what this is accepting and narrow to that.

I tried to fix the typehint issue but that method adds extra calls, so type: ignore is the simplest fix.

Rapptz · 2025-09-02T21:17:27Z

discord/http.py

+        self.reset_at = 0.0
+
+    def update(self, response: aiohttp.ClientResponse, data: Union[Dict[str, Any], str]) -> bool:
+        # Shared scope 429 has longer "reset_at", determined using the retry-after field


I don't think this is forced to be true.

A shared scope is just a shared resource like a channel message rate limit where you have a set number of messages sent per channel but that doesn't necessarily mean that it's always going to be longer though right now I can't think of any shared resources that are small in size though I do know that Emoji resources are only shared.

Rapptz · 2025-09-02T21:21:07Z

discord/http.py

+    def update(self, response: aiohttp.ClientResponse, data: Union[Dict[str, Any], str]) -> bool:
+        # Shared scope 429 has longer "reset_at", determined using the retry-after field
+        limit = int(response.headers['X-Ratelimit-Limit'])
+        if response.headers.get('X-RateLimit-Scope') == 'shared':


This should have a comment that this header is only returned on 429 responses.

Rapptz · 2025-09-02T21:22:01Z

discord/http.py

+
+    async def _wait(self):
+        # Consider waiting if none is remaining
+        if not self.remaining:


I suggest using explicit checks instead of passive boolean ones for all the integer comparisons.

Suggested change

if not self.remaining:

if self.remaining == 0:

Rapptz · 2025-09-02T21:23:01Z

discord/http.py

+        if not self.remaining:
+            # If reset_at is not set yet, wait for the last request, if outgoing, to finish first
+            # for up to 3 seconds instead of using aiohttp's default 5 min timeout.
+            if not self.reset_at and (not self._last_request or self.http.loop.time() - self._last_request < 3):


Suggested change

if not self.reset_at and (not self._last_request or self.http.loop.time() - self._last_request < 3):

if self.reset_at == 0.0 and (not self._last_request or self.http.loop.time() - self._last_request < 3):

discord/http.py

Rapptz · 2025-09-02T21:24:08Z

discord/http.py

+        delta = self.http.loop.time() - self._last_request
+        return delta >= 300 and (self.one_shot or (self.outgoing == 0 and self.pending == 0))
+
+    def _wake(self) -> None:


The first request will always have this future set to None, not sure if this is what you intended.

discord/http.py

Co-authored-by: Danny <[email protected]>

imptype · 2025-09-03T13:54:21Z

@Rapptz RuntimeError can be avoided with a while true, but that may be preferred to prevent an infinite loop.

imptype added 2 commits August 29, 2025 21:14

Rewrite ratelimit class

c0181e9

Add hit global ratelimit warning

60399e6