Token Bucket vs Exponential Backoff: What Changed After 966 Runs
After 966 production runs of the Trustpilot scraper, I rewrote the rate-limit layer in five of our actors. The old code used exponential backoff with jitter — the textbook answer. The new code uses a token bucket. Reliability went up, cost went down, and the failure mode is finally something I can reason about on a Friday at 18:00 without staring at logs for an hour.
This post is the short version of why, with the code I actually shipped.
The shape of the bug
Exponential backoff is a reactive control. The scraper sends requests at full speed, gets a 429 or a soft block, and waits longer next time. It’s beautifully simple. It also has a quiet pathology in production:
- Every worker independently decides when to back off, so traffic stays bursty even after backoff kicks in.
- The “back off” period is wasted compute — the actor is alive, the proxy is paid for, but nothing is happening.
- On retry storms (10% of failures in our Trustpilot logs over a month), the
combined backoff times exceed the actor’s hard timeout and the whole run
fails with
RUN-TIMEOUT-REACHEDinstead of a useful error.
Looking at the 966-run history, 31 runs (3.2%) died this way. Not a five-alarm fire, but enough to skew our success-rate dashboard and trigger spurious “is the site blocking us?” investigations.
The shape of the fix
A token bucket is a proactive control. You decide up front: “the target site gives us 60 requests per minute, with a bucket size of 10 for bursts.” The scraper cannot send faster than that, even if it wants to. Backoff becomes the exception, not the steady state.
For a single-worker actor, the implementation fits in one screen:
import time
import threading
class TokenBucket:
def __init__(self, rate_per_sec: float, capacity: int):
self.rate = rate_per_sec
self.capacity = capacity
self.tokens = capacity
self.last_refill = time.monotonic()
self.lock = threading.Lock()
def acquire(self, n: int = 1, timeout: float = 30.0) -> bool:
deadline = time.monotonic() + timeout
while True:
with self.lock:
now = time.monotonic()
elapsed = now - self.last_refill
self.tokens = min(
self.capacity,
self.tokens + elapsed * self.rate,
)
self.last_refill = now
if self.tokens >= n:
self.tokens -= n
return True
if time.monotonic() >= deadline:
return False
time.sleep(0.05)
# Trustpilot: ~60 req/min sustained, burst of 10 is fine
bucket = TokenBucket(rate_per_sec=1.0, capacity=10)
for url in urls:
if not bucket.acquire(timeout=15.0):
raise RuntimeError(f"rate-limit acquire timeout for {url}")
response = session.get(url)
handle(response)
The contract is now explicit. If the bucket can’t give a token in 15 seconds, that is the failure — not a vague “site blocking us” timeout 8 minutes later. The error message tells me where to look. The logs tell me how full the bucket was when the run started.
What changed in the dashboards
I ran the new version on the Trustpilot actor for 30 days alongside the old one (A/B at the run-config level). Three things moved:
- Timeouts dropped from 3.2% to 0.4%. Most of the residual 0.4% are target-site outages — runs that should fail loudly, not silently retry.
- Average runtime dropped 11% on long jobs (≥500 reviews). The actor no longer waits out exponential backoff windows after a burst it caused itself.
- Compute-unit cost dropped 7% on the same workload. The bucket prevents the burst, which prevents the backoff, which prevents the paid-for idle time.
The third one was the surprise. I’d budgeted for “same cost, more reliability.” Removing self-inflicted backoff was free money.
When exponential backoff is still correct
Two cases where I kept it:
- Outbound calls to flaky third-party APIs we don’t own — Stripe webhooks, email providers, analytics pixels. Their rate limits are documented but noisy, and they actually want clients to back off on 429. A token bucket here is over-engineering.
- Cold-start retries for transient infrastructure errors (DNS, TLS handshake failures). One retry with 200 ms backoff is fine. Three retries with exponential backoff is also fine. Building a bucket for this is masochism.
The token bucket is for steady-state outbound traffic to a single target you expect to hit thousands of times. Scraping is exactly that. Webhooks to your own services are too. Random HTTP calls in a script — leave the backoff.
The 5-minute migration plan
If you have a scraper running exponential backoff today and you want to try this without rewriting everything:
- Open the actor’s
INPUT_SCHEMA.json. Add two fields:requests_per_minute(default 60) andburst_capacity(default 10). - Drop the
TokenBucketclass above into yourmain.py. - Initialize one bucket per target hostname at startup. Pass
acquire()around the existing HTTP call. - Keep the exponential-backoff retry layer for the response side (5xx, 429). The bucket prevents most of those from happening; the retry catches the rest.
- Ship to one actor first. Watch
runtime_secondsandsuccessful_runsfor a week before rolling to the others.
Total LOC added: ~40. Total LOC removed: 0 (the retry layer stays). Risk: low. The bucket only slows the scraper, it never speeds it up, so the worst case is “actor runs at the old rate but does so deterministically.”
What I still don’t have
The token bucket is per-actor, per-process. Two parallel runs of the same actor on the same hostname will each have their own bucket — you can still self-DDoS across runs if you schedule them aggressively. Fixing that needs a shared bucket (Redis, KV, anything with atomic ops) and I haven’t found a clean enough API to ship it. If you’ve done this in production, I’d be curious to read the post.
For now, the per-actor bucket plus a maxConcurrency: 1 in the Apify run
config covers the cases I care about. Trustpilot has been at 966 runs and
counting, with the new rate-limit code shipping on run #851. The next
hundred runs have been the least-eventful hundred in the actor’s history.
That’s the only KPI I trust.
More production scraping tips: t.me/scraping_ai
Originally published at blog.spinov.online — code-heavy notes from 32 Apify actors and ~2 200 lifetime runs.