May 11, 2026

I write production scrapers. AI made 30% of them worse. Here's the rule of thumb.

I run a small fleet of production scrapers on Apify — 32 published actors, 1,819 lifetime runs, top one (Trustpilot reviews) past 962 runs without intervention. Some are mine, some are forks, some were rewrites of scrapers customers paid me to fix.

Over the last six months I tried to use AI assistants — Claude, Cursor, GitHub Copilot — for almost everything I touched in that fleet. I kept a running list of which tasks AI made better and which it quietly made worse.

The short version: AI is great at maybe 40% of scraping work, neutral on 30%, and actively harmful on 30%. That last bucket is the dangerous one, because the broken code passes review.

Here is the rule of thumb I now use before I let an AI write production scraper code.

The boring 40%: where AI just works

These tasks are now AI-default in my workflow. I no longer write them by hand:

CSS / XPath selector generation from a sample HTML chunk. Paste the markup, ask for a selector that survives one wrapper change. AI is good at this because the problem is local and verifiable in the same prompt.
Schema-to-validator code. “Given this Apify dataset row, write a Pydantic model with the right Optional fields.” This is mechanical translation. AI rarely gets it wrong.
Error-message decoding. Paste a stack trace, ask what likely caused it. Faster than Googling, and the failure mode is “wrong guess” not “silent corruption”.
Doc / README writing for an actor. This is where I save the most wall-clock time. Six published actor descriptions on my Apify store were AI-drafted, hand-edited.
jq filters and one-shot data shaping scripts. These run once, output is inspected immediately, errors are loud.

If a task is throwaway, local, and the output is human-inspected before it touches production — AI is a real productivity boost. No surprise.

The neutral 30%: AI saves typing, not thinking

Boilerplate requests / httpx / Playwright skeletons. Faster than copying from another repo, but I’d be just as productive with a snippet.
Test fixtures. AI can produce plausible JSON examples, but I still have to verify them against the real API.
Refactoring a 50-line function into smaller pieces. Works, but it tends to invent abstractions I didn’t ask for.

I leave AI on for these. They don’t matter much either way.

The dangerous 30%: where AI quietly breaks production

This is the part I learned the hard way, mostly by reading my own incident postmortems.

1. Anti-bot detection logic

Ask any AI to “make this scraper avoid getting blocked” and you’ll get a beautiful confident answer that does the wrong thing. The most common AI-generated patches I’ve seen:

Adding time.sleep(random.uniform(1, 3)) between every request. This does almost nothing against modern bot detection (TLS fingerprinting, header order, mouse-movement heuristics) and triples your run cost.
Rotating User-Agent strings from a hardcoded list. Most of those strings are 2-3 years out of date and stick out worse than the default.
Adding “stealth” plugins to Playwright. Some work for one site for two weeks, then the site updates its detection and now your scraper looks both bot-like and like a known evasion tool.

The honest answer for “how do I avoid getting blocked” is: read the site’s robots.txt, respect rate limits, use residential proxies if the target is hostile, and accept that some sites will simply not be scrapable. AI assistants are trained on years of forum advice that mostly stopped working in 2024.

For the production-scale version of this argument, including which proxy tier actually helps and which is theater, my recent sponsored piece on scaling scraping to 100k pages is the long-form companion. Numbers, not vibes.

2. Retry budgets and circuit breakers

AI loves retries. Ask for “make this more resilient” and you’ll get a @retry(tries=10, backoff=2) decorator on every function. What you actually need:

A total budget per run (“max 30 retries across all calls, then fail loud”).
A per-target circuit breaker (“after 5 consecutive 5xx on one domain, stop trying for this run”).
A reason field on every failure, so the dataset row carries failed: true plus the cause.

I have not seen an AI suggestion that gets all three right out of the gate. The default pattern it produces silently retries forever on transient errors and burns through Apify compute units while a target’s API is down. That’s a real bill at scale.

3. Schema drift handling

If a target site adds a new field, or renames one, AI-written scrapers tend to either:

Crash on the next run because a dict["price"] lookup KeyErrors.
Silently coerce the new value to a default, hiding the drift for weeks until someone notices the dashboard looks wrong.

The fix is contract tests: a tiny test suite that asserts the shape of the live response before you save anything to the dataset. I wrote about the three contract tests I keep in every scraper — same numbers, same scraper, just three asserts that save me 30 minutes a week of “why is this column null”.

4. Concurrency and rate limiting

Ask an AI to “speed this up” and you’ll get asyncio.gather(*[scrape(u) for u in urls]) with no semaphore. The target’s WAF will rate-limit you within thirty seconds, then your retry logic (see above) will hammer it until your IP gets banned. Two days of cleanup.

The rule I follow: any AI suggestion that involves concurrency gets a manual code review, no exceptions.

The rule of thumb

Before I let AI touch scraper code, I ask one question: “if this code is wrong, how will I find out?”

If the answer is “the next test run will crash” → AI is fine.
If the answer is “I’ll notice in a daily metric review” → AI with mandatory code review.
If the answer is “the customer will tell me three weeks later that their dataset has been quietly garbage” → write it by hand.

That third category covers most anti-bot, retry, and drift logic. It also covers anything that decides what to skip — skipped records don’t show up in the logs the way crashes do.

This isn’t an anti-AI take. I have AI open in another window while writing this post. But after 1,800 production runs across 32 actors, I have a pretty clear map of which boxes AI fills in and which boxes it pretends to fill while leaving them empty.

If you write scrapers for a living and you want the longer playbook — the failure taxonomies, the contract-test patterns, the actual cost numbers from running this stuff at scale — the Apify store is where the runs live, and the blog is where the patterns are written up. Comments and counter-examples welcome.

More production scraping tips: t.me/scraping_ai