Five Apify Input Schema Mistakes And The Fixes That Stuck
When I started publishing Apify actors, I treated the input schema as a formality — drop in a proxy field, a startUrls array, ship it. Two years and 78 actors later, I have a folder of v2 redesigns sitting next to v1 production runs, each one a quiet apology to whoever ran the v1 first.
This post is a tour of five mistakes I shipped, what each one cost, and the schema pattern I use now to avoid repeating them. All examples are real input schemas from actors in my Apify Store. None of this is theoretical.
Mistake 1: Free-text fields where an enum belongs
The first version of my Trustpilot review scraper had a field called language typed as plain string. Documentation said “ISO 639-1 code, e.g. en, de, ru”. You can guess what happened.
Three weeks in, I had runs failing because users typed english, EN, Russian, cyrillic, pl-PL. A few got creative with auto-detect. Each failure was a support email I had to answer, and each support email started with the user explaining how the docs were unclear.
The fix is brutally simple — change the field type to enum:
{
"language": {
"title": "Review language",
"type": "string",
"editor": "select",
"enum": ["en", "de", "fr", "ru", "pl", "es", "it"],
"enumTitles": ["English", "German", "French", "Russian", "Polish", "Spanish", "Italian"],
"default": "en"
}
}
The schema editor renders this as a dropdown. There are no english strings any more. There are no support emails. The fix took 4 minutes; the support time it saved was somewhere around 6 hours over the next year.
Rule: if a field has a finite set of valid values that you’ll have to validate anyway, make it an enum. The Apify input UI does the validation for free.
Mistake 2: Booleans that should have been three-state
A proxy-related field on one of my actors was useResidentialProxy: boolean. Default false. Looks fine.
The trap: a false default meant new users got datacenter proxies on a target site that aggressively blocks them. They saw a 403 on the first request, gave up, and rated the actor 1 star.
The honest schema is three states: auto, force-residential, force-datacenter. The auto default uses residential on known-protected domains and datacenter elsewhere. The other two are escape hatches:
{
"proxyMode": {
"title": "Proxy mode",
"type": "string",
"editor": "select",
"enum": ["auto", "residential", "datacenter"],
"enumTitles": [
"Auto (recommended)",
"Force residential proxies",
"Force datacenter proxies"
],
"default": "auto"
}
}
A boolean implies “the right answer is one of two things you already know.” When you don’t know, give yourself room to add an auto mode without breaking older callers.
Mistake 3: Optional fields with no example
My Reddit scraper had subreddits: array<string>. No prefill, no example, no editor setting. The form rendered as an empty JSON array editor. New users typed:
["r/python"](with ther/prefix)["python, javascript"](one comma-separated string in one slot)["/r/python"](leading slash)- Empty array (and then asked why the actor returned 0 rows)
I didn’t blame them. The empty [] editor told them nothing.
Fix: every array of strings gets a prefill and an editor: 'stringList':
{
"subreddits": {
"title": "Subreddits to scrape",
"type": "array",
"editor": "stringList",
"prefill": ["python", "dataengineering", "MachineLearning"],
"description": "Subreddit names without the r/ prefix.",
"minItems": 1
}
}
The prefill shows three valid inputs the moment the form loads. The description tells them about the prefix. minItems: 1 blocks empty submissions before the run even starts.
This one change cut Reddit-scraper support questions roughly in half over the next 60 days.
Mistake 4: One giant options object
A v1 actor of mine had a single field called options typed as JSON object, “for advanced configuration.” Inside that object I wanted users to control retries, timeouts, concurrency, output format, and proxy country.
The user experience was: stare at an empty {} editor, give up, run the actor with defaults, get bad results, file a support ticket asking how to use the options.
The fix is the opposite of clever — flatten everything to top-level fields with sectioning:
{
"title": "My Actor Input",
"type": "object",
"schemaVersion": 1,
"properties": {
"startUrls": {
"sectionCaption": "What to scrape",
"type": "array", "editor": "requestListSources"
},
"maxConcurrency": {
"sectionCaption": "Performance",
"type": "integer", "default": 10, "minimum": 1, "maximum": 50
},
"outputFormat": {
"sectionCaption": "Output",
"type": "string", "enum": ["json", "csv", "xlsx"], "default": "json"
}
}
}
sectionCaption groups related fields visually. The form renders three labelled sections with collapsible headers. There is no JSON-blob editor anywhere. Users edit one field at a time, each with its own validation, default, and description.
Rule: any field whose type is “JSON object the user has to figure out” is a UX bug. Replace it with explicit primitive fields, even if the resulting schema has 25 properties.
Mistake 5: No maxItems on cost-driving inputs
This one cost me real money.
An older actor accepted searchTerms: array<string>, no maxItems. A user pasted a list of 12,000 search terms in one run. The actor dutifully ran for nine hours, hit my Apify usage budget, killed the dataset write halfway through, and produced an empty output. I refunded the user, ate the compute cost, and learned to put a ceiling on every input that maps to compute time:
{
"searchTerms": {
"title": "Search terms",
"type": "array",
"editor": "stringList",
"minItems": 1,
"maxItems": 500,
"description": "Up to 500 search terms per run. For larger batches, split into multiple runs."
}
}
The 500 cap is per-actor — set it to whatever your actor can finish inside a reasonable run. The error message you get when a user tries 600 is clear, and it fires before compute starts, not nine hours in.
Anything that scales compute — URLs, search queries, page counts, retry depth — needs a maxItems, maximum, or both. Defaults aren’t enough. Defaults protect users who don’t change them; explicit ceilings protect users who do.
What I do now, before publishing any actor
The current pre-publish checklist on every new actor I ship:
- Every string field with a finite valid set is an
enum, not free text. - Every boolean that has a “we don’t know what’s right” middle state is a three-value enum with an
autodefault. - Every array has a
prefill,editor, and eitherminItemsormaxItems(usually both). - There is no field of type
objectcontaining user-editable knobs. Knobs are top-level fields, sectioned. - Every numeric or array input that scales runtime has a hard ceiling expressed in the schema, not in code.
The schema is the contract between your actor and the next person who runs it. Most of the support load and most of the ratings drift I have seen on my own actors traces back to a schema that lied about what the actor accepts. Spending an hour on the schema before the first publish saves a hundred hours of explaining over the actor’s lifetime.
If you maintain Apify actors and want a second pair of eyes on your input schemas, drop a comment with a link to the actor or email me — I’ll point at the rough edges I see.
Spinov — building Apify actors and writing about web scraping. Telegram: @scraping_ai. Apify Store: apify.com/knotless_cadence.