May 15, 2026

Token Bucket vs Exponential Backoff: What Changed After 966 Runs

After 966 production runs of the Trustpilot scraper, I rewrote the rate-limit layer in five of our actors. The old code used exponential backoff with jitter — the textbook answer. The new code uses a token bucket. Reliability went up, cost went down, and the failure mode is finally something I can reason about on a Friday at 18:00 without staring at logs for an hour.

This post is the short version of why, with the code I actually shipped.

The shape of the bug

Exponential backoff is a reactive control. The scraper sends requests at full speed, gets a 429 or a soft block, and waits longer next time. It’s beautifully simple. It also has a quiet pathology in production:

Every worker independently decides when to back off, so traffic stays bursty even after backoff kicks in.
The “back off” period is wasted compute — the actor is alive, the proxy is paid for, but nothing is happening.
On retry storms (10% of failures in our Trustpilot logs over a month), the combined backoff times exceed the actor’s hard timeout and the whole run fails with RUN-TIMEOUT-REACHED instead of a useful error.

Looking at the 966-run history, 31 runs (3.2%) died this way. Not a five-alarm fire, but enough to skew our success-rate dashboard and trigger spurious “is the site blocking us?” investigations.

The shape of the fix

A token bucket is a proactive control. You decide up front: “the target site gives us 60 requests per minute, with a bucket size of 10 for bursts.” The scraper cannot send faster than that, even if it wants to. Backoff becomes the exception, not the steady state.

For a single-worker actor, the implementation fits in one screen:

import time
import threading

class TokenBucket:
    def __init__(self, rate_per_sec: float, capacity: int):
        self.rate = rate_per_sec
        self.capacity = capacity
        self.tokens = capacity
        self.last_refill = time.monotonic()
        self.lock = threading.Lock()

    def acquire(self, n: int = 1, timeout: float = 30.0) -> bool:
        deadline = time.monotonic() + timeout
        while True:
            with self.lock:
                now = time.monotonic()
                elapsed = now - self.last_refill
                self.tokens = min(
                    self.capacity,
                    self.tokens + elapsed * self.rate,
                )
                self.last_refill = now
                if self.tokens >= n:
                    self.tokens -= n
                    return True
            if time.monotonic() >= deadline:
                return False
            time.sleep(0.05)

# Trustpilot: ~60 req/min sustained, burst of 10 is fine
bucket = TokenBucket(rate_per_sec=1.0, capacity=10)

for url in urls:
    if not bucket.acquire(timeout=15.0):
        raise RuntimeError(f"rate-limit acquire timeout for {url}")
    response = session.get(url)
    handle(response)

The contract is now explicit. If the bucket can’t give a token in 15 seconds, that is the failure — not a vague “site blocking us” timeout 8 minutes later. The error message tells me where to look. The logs tell me how full the bucket was when the run started.

What changed in the dashboards

I ran the new version on the Trustpilot actor for 30 days alongside the old one (A/B at the run-config level). Three things moved:

Timeouts dropped from 3.2% to 0.4%. Most of the residual 0.4% are target-site outages — runs that should fail loudly, not silently retry.
Average runtime dropped 11% on long jobs (≥500 reviews). The actor no longer waits out exponential backoff windows after a burst it caused itself.
Compute-unit cost dropped 7% on the same workload. The bucket prevents the burst, which prevents the backoff, which prevents the paid-for idle time.

The third one was the surprise. I’d budgeted for “same cost, more reliability.” Removing self-inflicted backoff was free money.

When exponential backoff is still correct

Two cases where I kept it:

Outbound calls to flaky third-party APIs we don’t own — Stripe webhooks, email providers, analytics pixels. Their rate limits are documented but noisy, and they actually want clients to back off on 429. A token bucket here is over-engineering.
Cold-start retries for transient infrastructure errors (DNS, TLS handshake failures). One retry with 200 ms backoff is fine. Three retries with exponential backoff is also fine. Building a bucket for this is masochism.

The token bucket is for steady-state outbound traffic to a single target you expect to hit thousands of times. Scraping is exactly that. Webhooks to your own services are too. Random HTTP calls in a script — leave the backoff.

The 5-minute migration plan

If you have a scraper running exponential backoff today and you want to try this without rewriting everything:

Open the actor’s INPUT_SCHEMA.json. Add two fields: requests_per_minute (default 60) and burst_capacity (default 10).
Drop the TokenBucket class above into your main.py.
Initialize one bucket per target hostname at startup. Pass acquire() around the existing HTTP call.
Keep the exponential-backoff retry layer for the response side (5xx, 429). The bucket prevents most of those from happening; the retry catches the rest.
Ship to one actor first. Watch runtime_seconds and successful_runs for a week before rolling to the others.

Total LOC added: ~40. Total LOC removed: 0 (the retry layer stays). Risk: low. The bucket only slows the scraper, it never speeds it up, so the worst case is “actor runs at the old rate but does so deterministically.”

What I still don’t have

The token bucket is per-actor, per-process. Two parallel runs of the same actor on the same hostname will each have their own bucket — you can still self-DDoS across runs if you schedule them aggressively. Fixing that needs a shared bucket (Redis, KV, anything with atomic ops) and I haven’t found a clean enough API to ship it. If you’ve done this in production, I’d be curious to read the post.

For now, the per-actor bucket plus a maxConcurrency: 1 in the Apify run config covers the cases I care about. Trustpilot has been at 966 runs and counting, with the new rate-limit code shipping on run #851. The next hundred runs have been the least-eventful hundred in the actor’s history. That’s the only KPI I trust.

More production scraping tips: t.me/scraping_ai

Originally published at blog.spinov.online — code-heavy notes from 32 Apify actors and ~2 200 lifetime runs.