Production-grade web scraping & AI research, written by someone who ships it.
I'm Aleksei. I build Apify actors and write code-heavy tutorials about web scraping, data extraction, and AI workflows. Every article includes runnable Python and real benchmarks. Currently 31 published actors (78 total in portfolio) with real users.
Recent posts
-
Your Scraper Re-Downloads Everything. Most Didn't Change.
A scheduled scraper re-downloads its whole corpus every run, even though almost nothing changed since last time. The fix isn't faster fetching — it's deciding FETCH/SKIP/CONDITIONAL from a manifest before the first request. A 30-line planner, its real output, and the production trap (weak rotating ETags) that fakes the savings.
-
Your Scraper Got Clean Data. The Site Lied to It.
A site can detect your scraper and serve a 200 with a perfect schema and plausible values that are deliberately false. Status codes and sanity checks are blind to it by design. Here's a 30-line probe that grounds each row to an independent invariant — and why naive cross-source consensus gets fooled too.
-
Your Scraper Passes Every Run. It's Still Rotting.
Your scraper exits 0 on every run. Schema valid, row count plausible. And the yield has been sliding for weeks. A 20-line lagged-baseline probe over your own run log catches the drift before it becomes a breakage.
-
Your Scraper Collected 50 Rows. There Were 4,000.
A scraper can finish green, return only valid rows, and still hand you a quarter of the dataset. Pagination cutoffs are silent. Here is a 40-line completeness probe that catches them.
Need a custom scraper or research solution?
Pilot pricing: $100 for 1 article or $150 for a 3-article series. Email spinov001@gmail.com with the topic and I'll reply within 24 hours.