Cost per result: a 4-line worksheet for Apify actors
Most teams running Apify actors in production cannot answer this question in under 30 seconds:
“What does one record actually cost us, end-to-end?”
They know the Apify subscription line item. They know “we use proxies.” They sometimes know the storage cost. But the per-record number — the one you need to set pricing for clients, decide if a scrape is worth running, or kill an actor that quietly burns budget — is almost never sitting in a spreadsheet.
This is the worksheet I use when I onboard a new Apify portfolio and the first thing I produce in a paid audit. It is not exotic. It is four lines of math. The reason it works is that it forces you to count every variable cost together, in the same unit, on the same row.
The worksheet (one row per actor)
For every actor you run, fill in:
| Column | What goes here | Where to find it |
|---|---|---|
actor_name | The actor slug | Apify console |
runs_per_month | Last 30-day run count | GET /v2/acts/{id} → stats.totalRuns (delta) |
avg_results_per_run | Mean dataset rows per run | GET /v2/acts/{id}/runs?limit=30 → average stats.itemCount |
compute_units_per_run | Average CU consumed | Apify run detail → stats.computeUnits |
proxy_gb_per_run | Proxy data transferred | Apify proxy report or your proxy provider dashboard |
proxy_$_per_gb | Your residential rate | Proxy invoice |
storage_$_per_run | Dataset retention cost | Apify storage report (often <$0.01) |
Now compute three derived fields:
cost_per_run = (compute_units_per_run × $compute_unit_rate)
+ (proxy_gb_per_run × proxy_$_per_gb)
+ storage_$_per_run
cost_per_result = cost_per_run / avg_results_per_run
monthly_cost = cost_per_run × runs_per_month
That is the entire framework. Three numbers, repeated for every actor in your account.
What you actually find
Most portfolios I have audited end up with a distribution that looks roughly like this once the numbers are in:
- Top 10–20% of actors —
cost_per_resultunder $0.005. These are the actors that scale. Anything client-facing should sit here. - Middle 60–70% —
cost_per_resultbetween $0.005 and $0.05. Acceptable, but worth checking if a small change topageFunctionor proxy pool drops them by 2–3×. - Bottom 10–20% —
cost_per_resultabove $0.05, sometimes over $0.50. These are usually the actors you forgot exist. Schedule them off, or rewrite them.
The reason the bottom tier is the prize is that the cost is rarely caused by the thing the developer thought it was. It is almost never “the proxy is too expensive.” It is one of these four:
- Pagination loop — the actor re-paginates the same N pages on every run because the cursor is not persisted. Fix: persist last-seen ID in the request queue and skip on re-entry.
- Headless browser where HTTP would work — Puppeteer warming for a static HTML page that returns the same data via a
fetch(). Fix: replace with Cheerio +axios, drop CU by 5–10×. - Over-eager retries — every transient 503 triggers 5 retries with full proxy rotation, and the actor reports success because the last attempt resolved. Fix: cap retries at 2, log the rate, alert if it exceeds 8% of requests.
- Saving everything to default Dataset — including debug payloads, raw HTML, and screenshots. Fix: split logging into KV store, keep Datasets schema-clean.
You will find at least one of these in any portfolio of more than ten actors. I have not yet been wrong on this prediction.
A worked example: Trustpilot scraper, 950 lifetime runs
The Trustpilot review scraper I publish on the Apify Store is a useful reference because the numbers are public:
- 950 lifetime runs as of the date of this post.
- Stable success rate across the run history (I do not publish ban statistics, but the scraper does not require bypassing Cloudflare or solving CAPTCHAs).
- Single-domain target (
trustpilot.com), one anti-bot vector to manage, and a clean pagination model.
When I plug this actor into the worksheet, the per-result cost lands in the top tier (sub-$0.005). What pulled it there was not a clever proxy trick. It was deciding upfront that the actor would never use a headless browser, never persist anything to Dataset besides the final review object, and never retry more than twice. Three constraints, decided before a line of pageFunction was written. The cost discipline came from architecture, not from optimization.
This is the lesson the worksheet keeps surfacing: per-result cost is almost entirely set in the first 30 minutes of an actor’s design, and almost never recovered later by tuning.
The 30-minute portfolio audit
If you want to do this for your own account in a single sitting:
- Pull the last 30 days of runs across all actors via the Apify API.
- Fill in the seven input columns for every actor.
- Compute
cost_per_resultand sort. - Pick the worst three. Open each actor’s
pageFunction. Walk through the four-failure list above and identify which one is biting. - Fix the cheapest of the three.
You can typically recover 20–40% of monthly Apify spend in the first pass. After that, the marginal returns drop, and the discipline becomes “do not let new actors into production without a per-result number.”
The honest CTA
If your team runs more than 10 actors and you cannot produce the cost_per_result column for each of them in under an hour, an outside set of eyes pays for itself. I run a fixed-scope audit pack that produces this worksheet for your full portfolio plus a written 30-line fix anchor for the worst three actors. Five business days, async, no calls. Email spinov001@gmail.com with “Apify audit pack” in the subject and I will send the intake checklist.
If you want to dig further into the operational side first, the retry-logic walkthrough and the dead-features post cover two of the four failure modes above in detail.
More tips on running scrapers in production: @scraping_ai on Telegram.