Production-grade web scraping & AI research, written by someone who ships it.
I'm Aleksei. I build Apify actors and write code-heavy tutorials about web scraping, data extraction, and AI workflows. Every article includes runnable Python and real benchmarks. Currently 31 published actors (78 total in portfolio) with real users.
Recent posts
-
Your AI Agent Re-Reads Every Page It Already Saw. I Measured the 8x Context Tax
A naive agent loop re-sends the whole transcript every turn, so walking 20 pages costs 8x what a bounded window costs. Here is the honest math, the prompt-caching counter, and a 40-line file you can run.
-
Your AI Agent Trusts a 200 OK. I Logged How Often the Page Was Garbage
Your agent's web-fetch tool returns HTTP 200 and a non-empty string, and your agent believes it. But that body can be a Cloudflare challenge, an empty JS shell, or a half-loaded page, and the agent plans on it anyway. Here's a 40-line sanity gate that tags every fetch OK / BLOCKED / EMPTY_SHELL / TRUNCATED before reasoning, with the real, deterministic stdout.
-
Give Your AI Agent a Web-Fetch Tool: a 60-Line MCP Server (Free, Self-Hosted)
Every MCP web-access tutorial this month points at a paid API. You dont need one. A 60-line, self-hosted MCP server that hands your agent a web_fetch tool returning clean text — with the production defaults (timeout, size cap, SSRF guard) tutorials skip. Real stdout included, tested on mcp 1.27.2.
-
Your Scraper Re-Downloads Everything. Most Didn't Change.
A scheduled scraper re-downloads its whole corpus every run, even though almost nothing changed since last time. The fix isn't faster fetching — it's deciding FETCH/SKIP/CONDITIONAL from a manifest before the first request. A 30-line planner, its real output, and the production trap (weak rotating ETags) that fakes the savings.
Need a custom scraper or research solution?
Pilot pricing: $100 for 1 article or $150 for a 3-article series. Email spinov001@gmail.com with the topic and I'll reply within 24 hours.