Token Bucket vs Exponential Backoff: What Changed After 966 Runs


After 966 production runs of the Trustpilot scraper, I rewrote the rate-limit layer in five of our actors. The old code used exponential backoff with jitter — the textbook answer. The new code uses a token bucket. Reliability went up, cost went down, and the failure mode is finally something I can reason about on a Friday at 18:00 without staring at logs for an hour.

This post is the short version of why, with the code I actually shipped.

The shape of the bug

Exponential backoff is a reactive control. The scraper sends requests at full speed, gets a 429 or a soft block, and waits longer next time. It’s beautifully simple. It also has a quiet pathology in production:

  • Every worker independently decides when to back off, so traffic stays bursty even after backoff kicks in.
  • The “back off” period is wasted compute — the actor is alive, the proxy is paid for, but nothing is happening.
  • On retry storms (10% of failures in our Trustpilot logs over a month), the combined backoff times exceed the actor’s hard timeout and the whole run fails with RUN-TIMEOUT-REACHED instead of a useful error.

Looking at the 966-run history, 31 runs (3.2%) died this way. Not a five-alarm fire, but enough to skew our success-rate dashboard and trigger spurious “is the site blocking us?” investigations.

The shape of the fix

A token bucket is a proactive control. You decide up front: “the target site gives us 60 requests per minute, with a bucket size of 10 for bursts.” The scraper cannot send faster than that, even if it wants to. Backoff becomes the exception, not the steady state.

For a single-worker actor, the implementation fits in one screen:

import time
import threading

class TokenBucket:
    def __init__(self, rate_per_sec: float, capacity: int):
        self.rate = rate_per_sec
        self.capacity = capacity
        self.tokens = capacity
        self.last_refill = time.monotonic()
        self.lock = threading.Lock()

    def acquire(self, n: int = 1, timeout: float = 30.0) -> bool:
        deadline = time.monotonic() + timeout
        while True:
            with self.lock:
                now = time.monotonic()
                elapsed = now - self.last_refill
                self.tokens = min(
                    self.capacity,
                    self.tokens + elapsed * self.rate,
                )
                self.last_refill = now
                if self.tokens >= n:
                    self.tokens -= n
                    return True
            if time.monotonic() >= deadline:
                return False
            time.sleep(0.05)

# Trustpilot: ~60 req/min sustained, burst of 10 is fine
bucket = TokenBucket(rate_per_sec=1.0, capacity=10)

for url in urls:
    if not bucket.acquire(timeout=15.0):
        raise RuntimeError(f"rate-limit acquire timeout for {url}")
    response = session.get(url)
    handle(response)

The contract is now explicit. If the bucket can’t give a token in 15 seconds, that is the failure — not a vague “site blocking us” timeout 8 minutes later. The error message tells me where to look. The logs tell me how full the bucket was when the run started.

What changed in the dashboards

I ran the new version on the Trustpilot actor for 30 days alongside the old one (A/B at the run-config level). Three things moved:

  1. Timeouts dropped from 3.2% to 0.4%. Most of the residual 0.4% are target-site outages — runs that should fail loudly, not silently retry.
  2. Average runtime dropped 11% on long jobs (≥500 reviews). The actor no longer waits out exponential backoff windows after a burst it caused itself.
  3. Compute-unit cost dropped 7% on the same workload. The bucket prevents the burst, which prevents the backoff, which prevents the paid-for idle time.

The third one was the surprise. I’d budgeted for “same cost, more reliability.” Removing self-inflicted backoff was free money.

When exponential backoff is still correct

Two cases where I kept it:

  • Outbound calls to flaky third-party APIs we don’t own — Stripe webhooks, email providers, analytics pixels. Their rate limits are documented but noisy, and they actually want clients to back off on 429. A token bucket here is over-engineering.
  • Cold-start retries for transient infrastructure errors (DNS, TLS handshake failures). One retry with 200 ms backoff is fine. Three retries with exponential backoff is also fine. Building a bucket for this is masochism.

The token bucket is for steady-state outbound traffic to a single target you expect to hit thousands of times. Scraping is exactly that. Webhooks to your own services are too. Random HTTP calls in a script — leave the backoff.

The 5-minute migration plan

If you have a scraper running exponential backoff today and you want to try this without rewriting everything:

  1. Open the actor’s INPUT_SCHEMA.json. Add two fields: requests_per_minute (default 60) and burst_capacity (default 10).
  2. Drop the TokenBucket class above into your main.py.
  3. Initialize one bucket per target hostname at startup. Pass acquire() around the existing HTTP call.
  4. Keep the exponential-backoff retry layer for the response side (5xx, 429). The bucket prevents most of those from happening; the retry catches the rest.
  5. Ship to one actor first. Watch runtime_seconds and successful_runs for a week before rolling to the others.

Total LOC added: ~40. Total LOC removed: 0 (the retry layer stays). Risk: low. The bucket only slows the scraper, it never speeds it up, so the worst case is “actor runs at the old rate but does so deterministically.”

What I still don’t have

The token bucket is per-actor, per-process. Two parallel runs of the same actor on the same hostname will each have their own bucket — you can still self-DDoS across runs if you schedule them aggressively. Fixing that needs a shared bucket (Redis, KV, anything with atomic ops) and I haven’t found a clean enough API to ship it. If you’ve done this in production, I’d be curious to read the post.

For now, the per-actor bucket plus a maxConcurrency: 1 in the Apify run config covers the cases I care about. Trustpilot has been at 966 runs and counting, with the new rate-limit code shipping on run #851. The next hundred runs have been the least-eventful hundred in the actor’s history. That’s the only KPI I trust.


More production scraping tips: t.me/scraping_ai

Originally published at blog.spinov.online — code-heavy notes from 32 Apify actors and ~2 200 lifetime runs.