5 Apify Scheduler Mistakes That Quietly Burn Compute Units (And the Cron Fixes)
I run 31 published Apify actors, and the single line item that catches me by surprise on the monthly compute-unit bill is always a misconfigured scheduler. Not the heavy actors. Not the per-result data volume. The scheduler.
The scheduler is the part of Apify that everybody sets once during onboarding and then forgets. It’s also the part that quietly burns compute units when you’re not looking. Below are five scheduler mistakes I’ve personally made or watched a customer make in the last six months — with the exact cron / actor.json fix for each, and the cost-of-mistake math so you know which ones to fix first.
Mistake 1: Scheduling more frequently than the actor’s average run-time
This is the most expensive one and also the most invisible. A scraper that takes 90 seconds on average looks like a candidate for “every 5 minutes” — until one day a target site is slow, the run takes 8 minutes, the next two scheduled runs queue up behind it, and now you have 3 instances of the same actor running in parallel for an hour, all writing into the same dataset.
The bug isn’t the slow target. The bug is that the scheduler has no idea the previous run isn’t done.
Fix: add maxConcurrentExecutionsPerActor: 1 on the schedule object so Apify refuses to start a new run while one is still active. If you also want to skip the queued duplicates entirely (instead of letting them stack up), add runOnFailure: false and treat any timeout as a no-op.
{
"scheduleType": "cron",
"cronExpression": "*/5 * * * *",
"actions": [{
"type": "RUN_ACTOR",
"actorId": "your-actor-id",
"input": { "limit": 100 }
}],
"maxConcurrentExecutionsPerActor": 1
}
Cost-of-mistake on my Trustpilot scraper: one bad afternoon = 6 overlapping runs × ~$0.40 compute = $2.40 in 4 hours, plus a deduplication mess in the dataset that took an hour to clean up.
Mistake 2: No timeoutSecs on the schedule
Closely related to mistake #1. If your actor hangs (network stall, infinite loop on a malformed input, a target that 200-OKs you forever) and you have no timeoutSecs ceiling on the schedule, the run will sit in RUNNING state until it hits the actor’s default timeout — which on free-tier accounts is 1 hour, on paid accounts up to 24 hours.
A 24-hour runaway run on a 4 GB memory actor is roughly $9.60 of compute units. For a single forgotten timeout. I’ve done this. My Reddit scraper had no timeoutSecs and one weekend a target subreddit’s old.reddit.com endpoint started rate-limiting in a way that didn’t return a real HTTP error — it just held the connection open. Saturday morning I had a $14 bill for one stuck run.
Fix: always set timeoutSecs on the schedule object explicitly. For a scraper that normally takes 2 minutes, 600 seconds (10 minutes) is generous and bounds the worst case.
"actions": [{
"type": "RUN_ACTOR",
"actorId": "your-actor-id",
"options": { "timeoutSecs": 600, "memoryMbytes": 1024 }
}]
Mistake 3: Cron in UTC vs. cron in your head
Apify’s scheduler runs cron expressions in UTC, full stop. Every onboarding I’ve seen has at some point produced a Slack message at 3 AM local time because the dev wrote 0 9 * * * thinking “9 AM my time” — and Apify cheerfully fired the run at 9 AM UTC, which was 5 AM in Berlin or midnight in Boston.
This isn’t expensive in compute units. It’s expensive in trust. A daily report that lands at the wrong hour gets ignored, then disabled, then forgotten — and now you have an actor running every morning that nobody is reading the output of.
Fix: write the cron expression in UTC and put a comment in the description with the local-time equivalent. If your team is in multiple timezones, schedule in UTC at a hour that’s the least bad time for everyone — typically 06:00–08:00 UTC for an EU/US-east mix.
{
"name": "daily-trustpilot-digest",
"description": "Runs at 06:00 UTC = 08:00 Berlin / 02:00 EST. Posts to #data-daily.",
"cronExpression": "0 6 * * *"
}
Mistake 4: No-op runs on unchanged sources
A scraper that runs every hour and reprocesses the same 200 reviews because nothing has changed on the source is still spending compute units. Apify charges by run-time and memory, not by “useful work done.”
The fix isn’t sexier scheduling. It’s a 5-line guard at the top of the actor that exits early if the source hasn’t changed since the last run. The cheapest version of this is a HEAD request on the target and a comparison against a lastModified value stored in the actor’s KV store.
from apify import Actor
import httpx
async def main():
async with Actor:
store = await Actor.open_key_value_store()
prev = await store.get_value('last_modified')
head = httpx.head(TARGET_URL).headers.get('last-modified')
if head and head == prev:
await Actor.log.info('Source unchanged, exiting early')
return
await store.set_value('last_modified', head)
# ... actual scrape continues here
On the Trustpilot scraper, adding this early-exit took the average daily compute usage from 22 minutes/day down to 4 minutes/day — because most hours nothing actually changed. That’s an 80% cut on a recurring bill, for ten lines of code.
Mistake 5: Silent failure on schedule
Apify will email you when a run fails, but it will not always notify you when a schedule itself stops firing — for example, if the actor was paused, or if the schedule was disabled by another team member, or if your account hit a quota. The actor stops producing data, your downstream dashboard goes flat, and you find out three weeks later when someone asks “wait, did Trustpilot stop getting bad reviews?”
Fix: add a watchdog. The simplest version is another small actor (or a GitHub Action, or a cron on a tiny VM) that hits the Apify API once a day, checks the timestamp of the last successful run for each scheduled actor, and alerts you if it’s older than the expected interval × 2.
LAST=$(curl -s "https://api.apify.com/v2/acts/$ACTOR_ID/runs?status=SUCCEEDED&limit=1" \
| jq -r '.data.items[0].finishedAt')
AGE_HOURS=$(( ($(date +%s) - $(date -d "$LAST" +%s)) / 3600 ))
if [ "$AGE_HOURS" -gt 26 ]; then
curl -X POST "$SLACK_WEBHOOK" -d "{\"text\":\"⚠️ Actor $ACTOR_ID hasn't succeeded in $AGE_HOURS hours\"}"
fi
I run this nightly across my top 5 scheduled actors. It’s caught a paused-by-mistake schedule once and a quota-exhausted account once — both before the customer noticed.
Order to fix in
If you’re auditing your own scheduled actors right now, do them in this order: timeout (mistake #2) → concurrency cap (mistake #1) → silent-failure watchdog (mistake #5) → no-op guard (mistake #4) → timezone audit (mistake #3). The first three protect you from runaway costs. The last two save money and trust on top of that.
I publish more posts like this — operational lessons from running real Apify actors at small scale — at blog.spinov.online. If you’re running scheduled actors and you’d like a second pair of eyes on a config, the contact is in the footer there.
Shorter scraping/automation notes go to my Telegram channel: t.me/scraping_ai.