Spinov · Web Scraping & AI Research

Home Blog About Consulting
Apify Store Email Telegram channel @scraping_ai RSS Feed
  • Your AI Agent Re-Reads Every Page It Already Saw. I Measured the 8x Context Tax

    Jun 14, 2026

  • Your AI Agent Trusts a 200 OK. I Logged How Often the Page Was Garbage

    Jun 13, 2026

  • Give Your AI Agent a Web-Fetch Tool: a 60-Line MCP Server (Free, Self-Hosted)

    Jun 12, 2026

  • Your Scraper Re-Downloads Everything. Most Didn't Change.

    Jun 10, 2026

  • Your Scraper Got Clean Data. The Site Lied to It.

    Jun 9, 2026

  • Your Scraper Passes Every Run. It's Still Rotting.

    Jun 8, 2026

  • Your Scraper Collected 50 Rows. There Were 4,000.

    Jun 7, 2026

  • Your Scraper Died at Row 12,000. The Rerun Pattern.

    Jun 6, 2026

  • A 30-Line Probe That Tells You If a Page Needs a Browser

    Jun 5, 2026

  • You Pay for the Bandwidth That Returns Nothing

    Jun 4, 2026

  • A Budget Brake That Stops a Scraper Before $200

    Jun 3, 2026

  • Spoofing Your Scraper's Fingerprint Is a Losing Arcade

    Jun 2, 2026

  • Your Scraper Returned a Clean Row. It Was Wrong.

    Jun 1, 2026

  • 9 Free LLM APIs in 2026 You Can Use Without a Credit Card

    May 31, 2026

  • HTTP 200 Is a Lie: A 30-Line Schema Canary for Source Drift

    May 30, 2026

  • Feeding Raw HTML to Your LLM Is a Token Tax. I Measured It on 10 Real Pages — Median 7.4×, and It Hits Every Scheduled Run

    May 29, 2026

  • I've Run 2,190 Production Scrapes. The Framework You Pick Isn't What Breaks — Here's What Actually Does

    May 28, 2026

  • Scraping All the Text Is the Easy 10%. Keeping the Corpus Worth Training On Is the Other 90% — Notes From 962 Runs

    May 27, 2026

  • I've Run 2,190 Production Scrapes — "Ethical" Isn't a robots.txt Question, It's a Rate-Limit One

    May 25, 2026

  • Conditional GET in production scrapers: what I learned wiring it into 3 actors

    May 19, 2026

  • Three memory-leak patterns in long-running scrapers (and how I caught them after 968 Trustpilot runs)

    May 18, 2026

  • Token Economics of Agent-Driven Scraping: When LLM Agents Cost 50× More Than a Cron Job

    May 18, 2026

  • 5 Apify dataset deduplication patterns that stop double-billing your customers

    May 17, 2026

  • 5 Apify scheduler mistakes that quietly burn compute units

    May 15, 2026

  • Token Bucket vs Exponential Backoff: What Changed After 966 Runs

    May 15, 2026

  • Building a Proxy Health Monitor for 24/7 Scraper Uptime

    May 13, 2026

  • 5 production scraping failures from 1000+ runs (and the fixes that actually shipped)

    May 12, 2026

  • Description drift in serverless function catalogs — a monthly refresh playbook

    May 12, 2026

  • 3 Telegram Channels Worth Following for Production Data Engineering

    May 11, 2026

  • I write production scrapers. AI made 30% of them worse. Here's the rule of thumb.

    May 11, 2026

  • 5 Apify webhook patterns that turn one-off scrapers into reliable data pipelines

    May 3, 2026

  • 5 Apify run-log patterns that make production debugging 10x faster

    May 1, 2026

  • 5 Apify Scheduler Mistakes That Quietly Burn Compute Units (And the Cron Fixes)

    May 1, 2026

  • 5 Apify run-log patterns that make production debugging 10× faster

    May 1, 2026

  • Five Apify Input Schema Mistakes And The Fixes That Stuck

    May 1, 2026

  • Apify vs. self-hosted: the three numbers I use to decide

    Apr 30, 2026

  • Cost per result: a 4-line worksheet for Apify actors

    Apr 30, 2026

  • Dead features in your own code: a self-audit story from my Apify actor

    Apr 30, 2026

  • DuckDB + dbt: a zero-cost analytics warehouse for projects under 100 GB

    Apr 30, 2026

  • Idempotent webhook receivers in 50 lines of Python

    Apr 30, 2026

  • Three operational rules I added after my Trustpilot scraper crossed 100 runs

    Apr 30, 2026

  • Why your retry logic is broken (and the 30-line fix)

    Apr 30, 2026

  • Schema drift killed our pipeline — three contract tests that catch it

    Apr 30, 2026

  • When NOT to scrape: 3 patterns where I now reach for an API instead

    Apr 30, 2026

  • Automate Your Backups with MinIO: Free S3-Compatible Storage for Everything

    Apr 29, 2026

  • Traefik + Docker: Zero-Config Reverse Proxy That Discovers Your Containers Automatically

    Apr 29, 2026

  • How my Trustpilot scraper survived 949 production runs (and the 3 things that almost killed it)

    Apr 29, 2026

  • Welcome — what this blog is for

    Apr 27, 2026

  • What 250 runs of a Trustpilot scraper taught me about anti-bot patterns

    Apr 25, 2026

© 2026 Aleksei Spinov · Apify Store · @scraping_ai on Telegram · spinov001@gmail.com

Some posts contain affiliate links to scraping/proxy providers (Oxylabs, Bright Data) — disclosed at the article level.