# Actor Test Runner — Validate Inputs, Outputs & Error Handling (`ryanclinton/actor-test-runner`) Actor

Actor Test Runner. Available on the Apify Store with pay-per-event pricing.

- **URL**: https://apify.com/ryanclinton/actor-test-runner.md
- **Developed by:** [Ryan Clinton](https://apify.com/ryanclinton) (community)
- **Categories:** Developer tools, Automation
- **Stats:** 3 total users, 2 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

$350.00 / 1,000 test suite runs

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Deploy Guard — Release Intelligence & Regression Detection

Deploy Guard is the pre-deploy release gate in an Apify actor execution lifecycle — it runs automated test suites against a candidate build and returns a **release decision** (`act_now` / `monitor` / `ignore`) that downstream automation can branch on, without parsing prose. It's the pre-push CI gate that converts "I think the new build works" into a routable, confidence-scored verdict.

**Who it's for:** Apify developers who push builds on a schedule, CI/CD operators who gate deploys on a verdict, and LLM agent tool calls that need a machine-readable answer — not a dashboard.

**Branch on `decision` (string enum). Never parse the `summary` or `explanation` prose — the string format is not stable.**

**Automatically run tests and block deployment if your scraper or Apify actor breaks.**

---

### What it does — in plain language

**Automatically run tests and block deployment if your scraper or Apify actor breaks.**

**Deploy Guard runs automated test suites on an Apify actor and returns a deterministic decision (`act_now`, `monitor`, or `ignore`) that you can use to deploy or block a release.**

- **Automatically block a deployment if your Apify actor breaks.** Deploy Guard returns a machine-readable `decision` field (`act_now` / `monitor` / `ignore`) you can branch on directly in your CI pipeline to fail the build.
- **Fail your CI pipeline automatically if your actor output is invalid.** Any failing assertion (min results, required fields, regex, duration, uniqueness) surfaces in `verdictReasonCodes` and flips the verdict to `block`.
- **Use this to stop bad scraper builds from being deployed.** Pre-push, scheduled, or on-demand — Deploy Guard catches regressions before they reach production users.
- **Check if your Apify actor is safe to deploy.** Deploy Guard answers one question — *"Is this Apify actor build safe to deploy?"* — and returns a deterministic yes/no decision for CI/CD gating.
- **Automatically detect regressions in your scraper.** Deploy Guard compares each run to a stored baseline and flags breaking changes, missing fields, null-rate shifts, drift, and flaky tests.
- **A testing tool for Apify actors that runs automated test suites.** Presets cover common shapes (canary, scraper-smoke, api-actor, contact-scraper, ecommerce-quality, store-readiness); custom test cases handle the rest.
- **Designed for AI agents to decide whether to deploy or block a build.** The output is a structured, additive-only enum — an LLM agent reads `decision` + `decisionDrivers[]` and acts without parsing prose.
- **Check if your code or Apify actor is safe to deploy.** The same pattern works whether you're gating an actor build, a data-pipeline change, or any job that ends in a dataset assertion.
- **Automatically test Apify actors by running test suites and validating their output.** Presets, custom test cases, and parameterized templates all run through the same `Actor.call()` → `checkAssertions()` → decision pipeline.
- **Use this in CI/CD pipelines to automatically deploy or block releases.** Parse `decision` from the synchronous run result; exit non-zero unless it's `act_now` + `status: pass`.
- **Run automated tests and decide whether to deploy or block a build.** Deploy Guard is the decision layer — it turns raw test output into a single enum your pipeline can branch on.
- **Stop broken builds from being deployed.** The release gate short-circuits on `critical` test failures, breaking drift, or cold-start; `decision` becomes `monitor` or `block` automatically.
- **Turn test results into a deploy-or-block decision in CI/CD.** Raw assertion output gets compressed into one `decision` enum + one confidence score — your pipeline never has to reason about individual test failures.
- **Catch broken builds before they reach production.** Pre-push, baseline-aware, drift-detecting — Deploy Guard surfaces the regression before a customer-facing run does.

#### How it works in 4 steps

1. **Run test cases** against the target Apify actor via `Actor.call()`
2. **Validate output** against assertions (min/max results, required fields, field types, regex, duration, uniqueness)
3. **Compare against baseline** for drift, flakiness, and trust-trend signals (when `enableBaseline: true`)
4. **Return a deploy/block decision** — `act_now` / `monitor` / `ignore` plus confidence score, stable verdict codes, and a priority-ranked remediation list

---

### Tool Contract (LLM-first)

```json
{
  "tool": "Deploy Guard",
  "description": "Deploy Guard runs automated test suites on an Apify actor and returns a deterministic decision (act_now, monitor, or ignore) that you can use to deploy or block a release.",
  "short_description": "Automatically run tests and block deployment if your scraper or Apify actor breaks.",
  "primary_use": "Pre-release CI/CD gate for a single Apify actor build",
  "call_via": "POST https://api.apify.com/v2/acts/ryanclinton~actor-test-runner/run-sync-get-dataset-items",
  "input": {
    "targetActorId": "string (required) — 'username/actor-name' or raw actor ID",
    "preset": "string (optional) — one of: canary, scraper-smoke, api-actor, contact-scraper, ecommerce-quality, store-readiness",
    "testCases": "array (optional) — custom { name, input, assertions }[]",
    "enableBaseline": "boolean (optional) — unlocks drift + flakiness + trust trend"
  },
  "output": {
    "decision": "'act_now' | 'monitor' | 'ignore'",
    "decisionReason": "string (illustrative, do not parse)",
    "decisionDrivers": "string[] (top 3 stable codes, ranked by score impact)",
    "confidenceLevel": "'high' | 'medium' | 'low'",
    "score": "integer 0-100",
    "verdictReasonCodes": "string[] (stable enum)",
    "confidenceFactorCodes": "string[] (stable enum)"
  },
  "guarantees": [
    "decision is always present on every record (including error / ignore paths)",
    "decision = act_now is never produced without a trusted baseline",
    "score capped at 70 during cold-start",
    "stable enums are additive-only within a major version",
    "prose fields (decisionReason, statusHeadline, oneLine, summary) are not stable"
  ],
  "routing": "Branch on `decision`. If `act_now` + `status: pass` → deploy. If `act_now` + `status: block` → halt. Otherwise → human review."
}
````

***

### When to use this tool

Deploy Guard runs automated test suites on an Apify actor and returns a deterministic decision (`act_now`, `monitor`, or `ignore`) that you can use to deploy or block a release. Reach for it when:

- **"Is this actor build safe to deploy?"** → run with `preset: canary` or a custom `testCases[]` array, check `decision`
- **"Gate my CI/CD on a deterministic release verdict"** → call from GitHub Actions / GitLab CI / Jenkins, exit non-zero unless `decision === 'act_now' && status === 'pass'`
- **"Detect regressions before publishing a new build"** → run with `enableBaseline: true` on a schedule, read `driftSeverity.breaking[]` and `trendSignals`
- **"Surface release health in a Slack channel"** → post `statusHeadline` or `oneLine`, colour by `decision`
- **"Let an LLM agent decide whether to promote a build"** → the agent reads `decision` + `decisionDrivers[]` + `decisionReason` (one-line summary) and acts

**Do NOT use this to:** score Store-readiness / README quality / agent-readiness (that's [Quality Monitor](https://apify.com/ryanclinton/actor-quality-monitor)), compare two actor versions (use [A/B Tester](https://apify.com/ryanclinton/actor-ab-tester)), monitor production datasets (use [Output Guard](https://apify.com/ryanclinton/actor-schema-validator)).

***

### 5-second read — `decision` field

| `decision` | What it means | What automation should do |
|:-----------|:--------------|:--------------------------|
| `act_now`  | Verdict is trusted (pass or block) with medium+ confidence AND a trusted baseline | Deploy (on pass) or halt the pipeline (on block). Safe to fire Slack/PagerDuty/webhook. |
| `monitor`  | Cold-start, low confidence, or a `warn` verdict | Do NOT auto-deploy. Notify a human reviewer. |
| `ignore`   | No tests were executed | Misconfiguration — no preset and no custom test cases. Investigate input. |

**Cold-start guarantee:** without a trusted baseline (first run, or baseline disabled), `decision` is never `act_now`. The confidence score is capped at 70 and `confidenceFactorCodes` carries `cold_start_cap`.

***

### Stable machine contract vs illustrative copy

Deploy Guard separates what is guaranteed stable for automation from what's human-facing prose.

**Stable (additive-only within a major version — safe to branch on):**

- `decision` enum: `act_now` / `monitor` / `ignore`
- `confidenceLevel` enum: `high` / `medium` / `low`
- `status` enum: `pass` / `warn` / `block`
- `verdictReasonCodes[]` — additive enum (documented below)
- `confidenceFactorCodes[]` — additive enum (documented below; includes `low_suite_coverage` when suiteCoverage.score < 60)
- `decisionDrivers[]` — ranked subset of the above (top 3, impact-ordered)
- `scoreBreakdown.deductions[].code` — additive enum (`CRITICAL_TEST_FAILURE`, `WARNING_TEST_FAILURE`, `BASELINE_DRIFT_BREAKING`, `BASELINE_DRIFT_NONBREAKING`, `LOW_SAMPLE_SIZE`, `SMALL_HISTORY`, `LOW_SUITE_COVERAGE`, `FLAKY_TEST`)
- `suiteLint.status` enum: `'pass' | 'warn' | 'fail'`
- `suiteLint.issues[].severity` enum: `'error' | 'warning' | 'info'`
- `suiteLint.issues[].code` — additive enum (`NO_TESTS_SUPPLIED`, `SINGLE_INPUT_VARIANT`, `NO_DURATION_GUARD`, `NO_CRITICAL_CHECKS`, `SINGLE_TEST_BUT_CI_GATING_HINT`)
- `trendSignals[]` — additive-only enum (known entries: `confidence_regression_fast` / `_moderate` / `_slow`, `confidence_improving_fast` / `_moderate` / `_slow`, `flaky_tests_present`, `flakiness_clean`, `breaking_drift_detected`, `schema_expanding_noncritical`, `execution_fast_all_tests`)
- `driftSeverity` tiers: `breaking` / `nonBreaking` / `informational` / `expected`
- `fleetSignals[].code` — additive enum (documented in dataset schema)
- `confidenceBreakdown` sub-bands: same `high` / `medium` / `low` enum
- `context.progress` enum: `cold-start` / `emerging` / `developing` / `mature`
- `remediation[].type` enum: `schema_drift` / `assertion_failure` / `flaky_test` / `low_coverage` / `missing_baseline` / `suite_design`
- Dataset field names + types (declared in `dataset_schema.json`)

**Illustrative only — format may evolve, do NOT parse:**

- `decisionReason`, `statusHeadline`, `oneLine`, `summary`, `explanation`
- `releaseDecision.recommendation`, `releaseDecision.reason`
- Status messages (`setStatusMessage`)
- Log lines
- `recommendations[]` strings

If you need to react to something the prose contains, look for a machine code instead.

***

### Why this beats Apify's daily default-input test

Apify's built-in default-input test runs your actor with `{}` once a day and flips it to `UNDER_MAINTENANCE` after 3 consecutive failures. That's a single binary signal — no assertion detail, no drift, no confidence score, no per-field forensics, no CI hook. Deploy Guard runs a full assertion suite against arbitrary inputs, compares against a stored baseline, emits a routable decision tag, produces GitHub/HTML/JSON reports, and calibrates confidence over time. **Default-input test is the floor. Deploy Guard is the gate.**

***

### How it works

1. You call Deploy Guard with a target actor ID and either a preset (e.g. `canary`) or an array of custom test cases
2. For each test case, Deploy Guard runs the target actor via `Actor.call()` with the test's input, memory, and timeout
3. Dataset items from the child run are validated against assertions (min/max results, required fields, field types, regex patterns, duration limits, uniqueness, ranges)
4. With `enableBaseline: true`, Deploy Guard compares the run's field schema against a stored baseline — flagging new/missing fields, type changes, null-rate shifts, and test flakiness
5. The release decision is derived from: critical failures, warning failures, drift significance, trust trend, confidence factors
6. The `decision` scalar is computed from verdict + confidence level + baseline trust, then emitted alongside stable machine codes

One dataset item per run (the `TestSuiteReport`), plus three records in the default key-value store: `SUMMARY` (JSON, flattened decision layer), `GITHUB_SUMMARY` (Markdown, `text/markdown`), and `HTML_REPORT` (HTML, `text/html`).

***

### Presets

Pick a preset or write custom test cases — both can run together in the same suite.

| Preset | Best for | What it runs |
|:-------|:---------|:-------------|
| `canary` | Pre-push confidence check | Single fast test with default input, under 10 seconds |
| `scraper-smoke` | Basic crawler health | Default input, checks results exist, 120s timeout |
| `api-actor` | API wrapper validation | Default input, response structure + timing checks |
| `contact-scraper` | Email extractors | Email format regex, domain validation, richness checks |
| `ecommerce-quality` | Product scrapers | Price is number ≥0, URL is https, title non-empty, unique URLs |
| `store-readiness` | Pre-publish audit | Default input produces output, performance guardrail (120s) |

When both a preset and `testCases` are supplied, Deploy Guard runs both. Total child runs = preset test count + custom test count.

***

### Input schema — the 5 inputs that matter

```json
{
  "targetActorId": "username/actor-name",
  "preset": "canary",
  "testCases": [
    {
      "name": "Smoke — default input",
      "input": {},
      "assertions": { "minResults": 1, "maxDuration": 120 }
    }
  ],
  "enableBaseline": true,
  "timeout": 300
}
```

- **`targetActorId`** (required) — `username/actor-name` or the raw actor ID
- **`preset`** — one of the 6 presets above, or omit for custom-only runs
- **`testCases`** — array of `{ name, input, assertions, expectedToFail?, schemaContract? }`
- **`enableBaseline`** — opt into baseline + drift + flakiness + trust trend; activates the cold-start → emerging → developing → mature maturity progression
- **`timeout`** — seconds per test (default 300, max 3600); each child run is wrapped in a wall-clock guard at `timeout + 60s`

**Also supported:** `parameterizedTestCases` for `{{placeholder}}` templating across parameter sets, `memory` (MB per child run, default 512), `maxSampleItems` (default 1000, max 10000 for full-scan mode), `fieldImportanceProfile` for per-field severity overrides (drives `criticalityImpact` and `driftSeverity` tiering).

**Assertion reference:** `minResults`, `maxResults`, `maxDuration`, `requiredFields[]`, `fieldTypes{field: 'string'|'number'|'boolean'|'array'|'object'}`, `noEmptyFields[]`, `fieldPatterns{field: regex}`, `fieldRanges{field: {min, max}}`, `uniqueFields[]`, `severity: 'critical'|'warning'`.

**Waivers + expected instability.** Real CI pipelines need controlled exceptions without silently hiding regressions:

- Per test case: `expectedFlaky: true` (test is known non-deterministic; don't weight toward flakiness penalty), `allowedDriftFields: ["badgeText"]` (tolerate drift on listed fields), `temporaryWaiverUntil: "2026-05-15T00:00:00.000Z"` (scoped waiver with expiry), `waiverReason: "site rollout in progress"` (audit trail).
- Global: top-level `waivers: [{ testName, allowedDriftFields, temporaryWaiverUntil, reason }]` mirrors the same shape, applied by test name. Expired waivers are ignored automatically.
- Effect: fields matched by an active waiver land in `driftSeverity.expected[]` instead of `breaking` / `nonBreaking`, so the decision engine doesn't punish intentional change.

***

### Output — decision layer first

Every run emits a single `TestSuiteReport` to the default dataset. Read the decision-layer fields first.

#### Decision layer (machine-routable — branch on these)

| Field | Type | Description |
|:------|:-----|:------------|
| `decision` | `'act_now'\|'monitor'\|'ignore'` | Routable decision tag. Never parse prose — branch on this. |
| `decisionReason` | `string` | One-line plain-language justification. Usable in logs, alerts, audit trails. |
| `decisionDrivers` | `string[]` | Top 2–3 stable codes ranked by impact on the final decision — surface these in CI logs and Slack alerts. |
| `confidenceLevel` | `'high'\|'medium'\|'low'` | Banded from `score`: high ≥75, medium ≥50, low <50. |
| `confidenceBreakdown` | `object` | Sub-bands: `executionConfidence` / `schemaConfidence` / `historyConfidence` / `suiteDesignConfidence`, each high/medium/low. Tells you *why* confidence is what it is. |
| `confidenceFactorCodes` | `string[]` | Stable codes explaining the confidence score. Additive-only enum. |
| `verdictReasonCodes` | `string[]` | Stable codes behind the pass/warn/block verdict. Additive-only enum. |
| `statusHeadline` | `string` | Human-readable one-liner (e.g. `SAFE TO DEPLOY — 5/5 passed (high confidence)`). |
| `oneLine` | `string` | Actor-name-prefixed summary for Slack, email subjects, agent summaries. |
| `context` | `object` | `{ progress, progressMessage, hasTrustedBaseline, runCount }` — learning maturity. |

***

#### Explainability layer (read these to understand *why* the decision landed)

| Field | Type | Description |
|:------|:-----|:------------|
| `scoreBreakdown` | `object` | Auditable scoring — `{ startingScore: 100, deductions[], caps[], finalScore }`. Each deduction carries a code, points, count, and reason. Same inputs always produce the same breakdown. |
| `remediation` | `RemediationItem[]` | Priority-ranked fix cards. Each has `whyItMatters` + `suggestedFix` + `ownerHint` + `affectedFields`. Read top-down to fix highest-impact issues first. |
| `suiteLint` | `object` | Pre-execution lint of the test suite definition itself — catches `NO_TESTS_SUPPLIED`, `SINGLE_INPUT_VARIANT`, `NO_DURATION_GUARD`, `NO_CRITICAL_CHECKS`, `SINGLE_TEST_BUT_CI_GATING_HINT`. Fails fast on suite design problems before burning compute. |
| `suiteCoverage` | `object` | `{ score, assertionTypesUsed[], blindSpots[], testCount, hasSchemaContract }`. Guards against false confidence from a thin suite. |
| `driftSeverity` | `object` | Drift findings tiered: `breaking` / `nonBreaking` / `informational` / `expected`. Breaking = required or critical-importance field removed or type-changed. Expected = field listed in `allowedDriftFields` or a test waiver. |
| `criticalityImpact` | `object` | `{ criticalFieldsHealthy, criticalFieldFailures, nonCriticalFieldFailures, affectedCriticalFields[] }`. Derived from `fieldImportanceProfile`. |
| `regressionSummary` | `object` | `{ direction: 'better'\|'worse'\|'stable', velocity, confidence }`. Null until ≥2 prior confidence snapshots exist. |
| `trendSignals` | `string[]` | Compact trend codes: `confidence_regression_moderate`, `flaky_tests_present`, `breaking_drift_detected`, `execution_fast_all_tests`. |
| `fleetSignals` | `FleetSignal[]` | Stable machine codes for fleet-wide aggregation. Additive-only enum: `SCHEMA_DRIFT_CRITICAL`, `SCHEMA_DRIFT_NONCRITICAL`, `TEST_FLAKY`, `LOW_SUITE_COVERAGE`, `CRITICAL_FIELD_FAILURE`, `CONFIDENCE_REGRESSION`, `RELEASE_BLOCKED`. |

**`confidenceFactorCodes` vocabulary** (additive-only — new codes may arrive; existing codes won't be renamed or removed within a major version):

- `cold_start_cap` — no trusted baseline; confidence capped at 70
- `low_sample_size` — fewer than 3 test cases executed
- `small_history` — run history exists but has fewer than 5 prior runs
- `healthy_history` — trusted baseline + zero failures this run
- `drift_detected` — current field schema differs materially from baseline
- `low_suite_coverage` — suite exercises fewer than 60% of the assertion surface (coverage score <60)
- `suite_lint_failed` — pre-execution lint blocked the run

**`verdictReasonCodes` vocabulary:**

- `VERDICT_PASS` / `VERDICT_WARN` / `VERDICT_BLOCK` — raw status
- `CRITICAL_TEST_FAILURE` / `WARNING_TEST_FAILURE` — per-severity failure counts
- `BASELINE_DRIFT` — drift detected against prior baseline
- `COLD_START` — no trusted baseline yet
- `SUITE_LINT_FAILED` — pre-execution lint failed; no tests ran (paired with `decision: 'ignore'`)
- `NO_TESTS` — no preset and no custom test cases supplied (paired with `decision: 'ignore'`)

***

#### Fleet signals (for downstream aggregators)

`fleetSignals[]` is a stable-code array designed for Fleet Analytics / Slack routing / Zapier. Every entry carries `{ code, severity, scope, actionability, detail?, field? }`. The enum is additive-only within a major version.

| Code | Severity | Scope | Meaning |
|:-----|:---------|:------|:--------|
| `SCHEMA_DRIFT_CRITICAL` | critical | field | Breaking drift on a required or critical-importance field |
| `SCHEMA_DRIFT_NONCRITICAL` | info | suite | Non-breaking drift across one or more fields |
| `TEST_FLAKY` | warning | test | Individual test's historical pass rate below 80% |
| `LOW_SUITE_COVERAGE` | warning | suite | Coverage score < 60 — suite has blind spots |
| `CRITICAL_FIELD_FAILURE` | critical | field | Assertion failed on a `fieldImportanceProfile.critical` field |
| `CONFIDENCE_REGRESSION` | warning | run | Recent confidence scores trending down |
| `RELEASE_BLOCKED` | critical | run | Verdict is `block` — do not promote |

***

#### Verdict + analytics (existing fields)

| Field | Type | Description |
|:------|:-----|:------------|
| `status` | `'pass'\|'warn'\|'block'` | Raw verdict before the decision layer. Use `decision` for automation, `status` for display. |
| `score` | `integer 0–100` | Composite confidence score. Capped at 70 during cold-start. |
| `summary` | `string` | Plain-language explanation. Not machine-stable. |
| `recommendations` | `string[]` | Suggested next actions derived from the failure mix. |
| `signals` | `object` | `{ errorCount, warningCount, criticalCount, driftDetected, metrics }` |
| `actorName` / `actorId` | `string` | The tested actor's display name + ID. |
| `totalTests` / `passed` / `failed` / `expectedFailures` | `integer` | Count breakdown. |
| `totalDuration` | `number` | Seconds across all test cases. |
| `results` | `TestCaseResult[]` | Per-test: assertions, schema contract, duration, forensics, error classification. |
| `releaseDecision` | `object` | Full detail: root cause, prioritised failures, actions, trust trend, regression velocity, early warnings, blind spots, suite health. |
| `drift` | `DriftReport \| null` | Field-level diff vs previous baseline. Null until `enableBaseline` is on + a baseline exists. |
| `stability` | `TestStability[] \| null` | Per-test pass rate + flakiness flag. Null on cold-start. |
| `history` | `RunSnapshot[] \| null` | Last 20 run snapshots. Null on cold-start. |
| `detectedActorType` | `string` | Heuristic: `scraper` / `contact-scraper` / `api-actor` / `ecommerce` / `unknown`. |
| `suggestedPreset` | `string \| null` | Preset that would give richer validation for the detected type. |
| `testedAt` | `ISO 8601` | Timestamp of test completion. |

#### Key-value store outputs

- `SUMMARY` — flattened decision layer + counts + failed tests + context (dashboards should read this)
- `GITHUB_SUMMARY` — Markdown ready for `$GITHUB_STEP_SUMMARY` in Actions
- `HTML_REPORT` — standalone HTML ready to upload as a CI artefact

***

### Automation contract

| Consumer | Read this field | Why |
|:---------|:----------------|:----|
| Slack / PagerDuty router | `decision` + `statusHeadline` | Enum routing, headline as alert title |
| CI/CD gate (GitHub Actions, etc.) | `decision` (exit 0 only on `act_now` + `status: pass`) | Stable enum, no prose parsing |
| LLM agent tool call | `oneLine` + `verdictReasonCodes` | One-liner for the model, codes for deterministic follow-up |
| Human debugging | `releaseDecision.rootCause` + `results[].forensics` | Traces back to the failing assertion |

***

### Decision invariants

Deploy Guard enforces these in code — downstream consumers can rely on them without defensive checks:

```
decision = act_now implies:
  context.hasTrustedBaseline = true
  confidenceLevel != 'low'
  status != 'warn'
  totalTests > 0
  suiteLint.status != 'fail'

decision = monitor implies at least one of:
  context.hasTrustedBaseline = false   (cold-start)
  confidenceLevel = 'low'
  status = 'warn'

decision = ignore implies:
  totalTests = 0  OR  suiteLint.status = 'fail'

To disambiguate why ignore fired, read verdictReasonCodes:
  'SUITE_LINT_FAILED' → suite was invalid, zero tests executed
  otherwise           → preset + custom testCases both empty

decisionDrivers contract:
  - max length = 3
  - ordered by absolute score-impact points (higher first)
  - ties broken by alphabetical code
  - empty only when: decision = act_now + healthy history, OR decision = ignore
    (ignore paths already surface their reason via verdictReasonCodes:
     'NO_TESTS' or 'SUITE_LINT_FAILED')

remediation[] ordering (deterministic across runs):
  1. severity  (critical > warning > info)
  2. score impact (per DEDUCTION_POINTS table)
  3. presence of affected-field list
  4. stable tie-break by type
  items[].priority reflects this 1..N order after sort.
```

***

### Decision flow

```
   Input ─────▶  Resolve test cases (preset + custom + parameterized)
                            │
                            ▼
                 Run each test via Actor.call()  ◀── 5-consecutive-failure
                 → listItems()                        circuit breaker (cost guard)
                 → checkAssertions()
                            │
                            ▼
                 computeReleaseDecision
                 (root cause, trust trend,
                  drift, stability, suite health)
                            │
                            ▼
                   hasTrustedBaseline ?
                     ╱         ╲
                   no           yes
                   ▼             ▼
        score = min(score, 70)   │
        + cold_start_cap code    │
                   ╲            ╱
                    ▼          ▼
                 confidenceLevel = band(score)
                            │
                            ▼
              decision:
                 ignore   (totalTests = 0)
                 monitor  (cold-start OR low confidence OR warn verdict)
                 act_now  ((pass or block) + medium/high + trusted baseline)
                            │
                            ▼
              pushData  →  setStatusMessage  →
              KV SUMMARY / GITHUB_SUMMARY / HTML_REPORT
              →  AQP store (field-rule suggestions for Output Guard)
```

***

### When to trust the decision

| Scenario | `decision` | Confidence | Action |
|:---------|:-----------|:-----------|:-------|
| 5+ prior runs, pass, high confidence, no drift | `act_now` | high | Deploy |
| 5+ prior runs, block, critical failure, high confidence | `act_now` | high | Halt + investigate |
| First run ever | `monitor` | ≤70 (capped) | Review manually; run establishes baseline |
| Drift detected on a previously-stable field | `monitor` or `act_now` | varies | Inspect `drift.changeSummary` — may be intentional |
| 1 flaky test in a 5-test suite | `act_now` | medium | Acceptable if `expectedToFail: true` |

### When NOT to trust the decision

| Scenario | Why | What to do instead |
|:---------|:----|:-------------------|
| `monitor` + `cold_start_cap` code | No baseline context yet | Run on a schedule for 5+ iterations before gating CI |
| `verdictReasonCodes` contains `BASELINE_DRIFT` | Prior schema has changed | Inspect `drift`; may be intentional or regression |
| Single test in the suite | `low_sample_size` code | Add at least 3 tests; cold-start math dominates with one |
| Flakiness in `stability` | One test's pass rate < 80% | Fix the flake or mark `expectedToFail: true` |
| Fewer than 5 runs in `history` | `small_history` code | Trust trend is still warming up — wait for maturity |

***

### Failure interpretation cheat sheet

Every failure mode maps to a stable code → a meaning → an action. Use this to route alerts and automate fixes without an LLM in the loop.

| Code | Where it appears | Meaning | Action |
|:-----|:-----------------|:--------|:-------|
| `CRITICAL_TEST_FAILURE` | `verdictReasonCodes`, `decisionDrivers` | A test marked `severity: 'critical'` failed — the release gate considers this blocking | Fix the underlying extractor/output before deploy |
| `WARNING_TEST_FAILURE` | `verdictReasonCodes`, `decisionDrivers` | A `severity: 'warning'` test failed — advisory | Investigate; accept if intentional |
| `BASELINE_DRIFT` | `verdictReasonCodes` | Field schema differs from prior baseline | Read `driftSeverity.breaking[]` + `driftSeverity.nonBreaking[]` |
| `BASELINE_DRIFT_BREAKING` | `decisionDrivers`, `scoreBreakdown` | Required or critical-importance field changed type or disappeared | Restore field OR update `schemaContract.requiredFields` + notify consumers |
| `BASELINE_DRIFT_NONBREAKING` | `decisionDrivers`, `scoreBreakdown` | New or renamed non-required fields | Usually safe — confirm consumers tolerate extras |
| `COLD_START` | `verdictReasonCodes`, `decisionDrivers`, `confidenceFactorCodes` (`cold_start_cap`) | No trusted baseline yet — confidence capped at 70, `decision` cannot be `act_now` | Run on a schedule with `enableBaseline: true`; graduate to `act_now` from run 2 onward |
| `LOW_SAMPLE_SIZE` | `decisionDrivers`, `confidenceFactorCodes` (`low_sample_size`) | Fewer than 3 test cases | Add tests; cold-start math dominates with one |
| `LOW_SUITE_COVERAGE` | `decisionDrivers`, `confidenceFactorCodes` (`low_suite_coverage`), `fleetSignals` | Suite uses fewer than 60% of assertion types | Read `suiteCoverage.blindSpots[]` and fix the top 2–3 |
| `FLAKY_TEST` | `decisionDrivers`, `fleetSignals` (`TEST_FLAKY`) | A test's historical pass rate is below 80% | Mark `expectedFlaky: true` OR fix the non-determinism |
| `CRITICAL_FIELD_FAILURE` | `fleetSignals` | Assertion failed on a field declared critical in `fieldImportanceProfile` | Read `criticalityImpact.affectedCriticalFields[]` |
| `CONFIDENCE_REGRESSION` | `fleetSignals` | Confidence score trending down over recent runs | Read `regressionSummary.direction` + `velocity`; investigate recent drift |
| `SUITE_LINT_FAILED` | `verdictReasonCodes` (paired with `decision: 'ignore'`) | Pre-execution lint blocked the run — suite design problem | Read `suiteLint.issues[].code` and fix the suite definition |
| `NO_TESTS` | `verdictReasonCodes` (paired with `decision: 'ignore'`) | Both `preset` and `testCases` were empty | Pick a preset OR provide at least one custom test case |
| `RELEASE_BLOCKED` | `fleetSignals` | Verdict is `block` (any reason) | Halt pipeline; do not promote |

***

### First run / second run / nth run

The `context.progress` field tells you exactly where you are.

| Runs | `progress` | What's active | What's still warming up |
|:-----|:-----------|:--------------|:------------------------|
| 0 (first) | `cold-start` | Assertions, verdict, forensic details | No baseline, drift, flakiness, or trust trend. Confidence capped at 70. `decision` ∈ {`monitor`, `ignore`}. |
| 1–4 | `emerging` | Baseline comparison (from run 2), drift fields, stability, run history begin populating | Flakiness unreliable with <5 samples. Trust trend not yet meaningful. `decision` can become `act_now` from run 2 when baseline is trusted. |
| 5–14 | `developing` | Trust trend, flakiness, auto-tune hints all reliable | Early warnings sharpen with more history. Suite health fully active. |
| 15+ | `mature` | Full intelligence: trust trend, regression velocity, blind spots, calibrated suggestions | — |

**Note:** `enableBaseline: true` is required for baselines, drift, stability, history, and trust trend. Without it, Deploy Guard still runs all assertions and emits a verdict — but `context.hasTrustedBaseline` stays `false` and `decision` is capped at `monitor`.

***

### Example — full input + output

**Input:**

```json
{
  "targetActorId": "ryanclinton/website-contact-scraper",
  "preset": "contact-scraper",
  "testCases": [
    {
      "name": "Smoke — known-good site",
      "input": { "urls": ["https://example.com"] },
      "assertions": {
        "minResults": 1,
        "requiredFields": ["emails", "domain"],
        "maxDuration": 120
      }
    }
  ],
  "enableBaseline": true,
  "timeout": 180
}
```

**Output — `act_now` + pass (safe to deploy):**

```json
{
  "decision": "act_now",
  "decisionReason": "pass verdict + high confidence (82/100) + 12 prior runs — act_now",
  "decisionDrivers": [],
  "confidenceLevel": "high",
  "confidenceFactorCodes": ["healthy_history"],
  "verdictReasonCodes": ["VERDICT_PASS"],
  "statusHeadline": "SAFE TO DEPLOY — 2/2 passed (high confidence)",
  "oneLine": "ryanclinton/website-contact-scraper: SAFE to deploy — 2/2 passed, 82/100 confidence",
  "status": "pass",
  "score": 82,
  "totalTests": 2,
  "passed": 2,
  "failed": 0
}
```

**Output — `act_now` + block (halt release):**

```json
{
  "decision": "act_now",
  "decisionReason": "block verdict + medium confidence (58/100) + 9 prior runs — act_now",
  "decisionDrivers": ["CRITICAL_TEST_FAILURE", "BASELINE_DRIFT_BREAKING"],
  "confidenceLevel": "medium",
  "confidenceFactorCodes": ["drift_detected"],
  "verdictReasonCodes": ["VERDICT_BLOCK", "CRITICAL_TEST_FAILURE", "BASELINE_DRIFT"],
  "statusHeadline": "HALT RELEASE — 1/3 passed (medium confidence)",
  "oneLine": "ryanclinton/website-contact-scraper: HALT — 1/3 passed, 58/100 confidence",
  "status": "block",
  "score": 58,
  "totalTests": 3,
  "passed": 1,
  "failed": 2
}
```

**Output — `monitor` + cold-start (first run, directional only):**

```json
{
  "decision": "monitor",
  "decisionReason": "pass verdict + medium confidence (70/100, cold-start capped) — monitor only",
  "decisionDrivers": ["COLD_START"],
  "confidenceLevel": "medium",
  "confidenceFactorCodes": ["cold_start_cap"],
  "verdictReasonCodes": ["VERDICT_PASS", "COLD_START"],
  "statusHeadline": "PASS — 2/2 passed, low trust (monitor only)",
  "oneLine": "ryanclinton/website-contact-scraper: PASS — 2/2 passed, 70/100 confidence — monitor",
  "status": "pass",
  "score": 70,
  "totalTests": 2,
  "passed": 2,
  "failed": 0,
  "context": { "progress": "cold-start", "hasTrustedBaseline": false, "runCount": 0 }
}
```

***

### Using Deploy Guard in GitHub Actions

```yaml
- name: Deploy Guard — pre-release check
  run: |
    RESULT=$(curl -s -X POST \
      "https://api.apify.com/v2/acts/ryanclinton~actor-test-runner/run-sync-get-dataset-items?token=$APIFY_TOKEN" \
      -H "Content-Type: application/json" \
      -d '{
        "targetActorId": "ryanclinton/my-actor",
        "preset": "canary",
        "enableBaseline": true
      }')
    DECISION=$(echo "$RESULT" | jq -r '.[0].decision')
    HEADLINE=$(echo "$RESULT" | jq -r '.[0].statusHeadline')
    STATUS=$(echo "$RESULT" | jq -r '.[0].status')
    echo "Deploy Guard: $HEADLINE"
    if [ "$DECISION" != "act_now" ] || [ "$STATUS" != "pass" ]; then
      echo "::error::$HEADLINE"
      exit 1
    fi
```

The `GITHUB_SUMMARY` record in the default key-value store is served with `Content-Type: text/markdown` — ready to drop into `$GITHUB_STEP_SUMMARY`.

***

### Pricing

**$0.35 per test suite run** (Pay-Per-Event, single `test-suite` event charged once per run after the report is pushed).

Your target actor's compute + that actor's own PPE charges are **separate** — they run on your account and bill at the target's rates. Deploy Guard only charges for the validation layer, not the underlying compute.

Deploy Guard logs the price at start:

```
PPE mode active — $0.35 per test suite run
```

And again in the final status message:

```
ACT NOW (deploy) — 2/2 passed in 8.4s — $0.35 charged
```

Cost guardrail: after 5 consecutive test failures, Deploy Guard breaks the loop to stop runaway sub-actor credit spend on a clearly broken target. Remaining tests are skipped; the verdict stands on what ran.

***

### FAQ

#### How is Deploy Guard different from Apify's default-input test?

Apify's built-in default-input test runs your actor with `{}` once a day and flags it `UNDER_MAINTENANCE` after 3 consecutive failures. That's a single-test binary signal with no assertion detail, no drift, no confidence scoring, no per-field forensics. Deploy Guard runs a full assertion suite against arbitrary inputs, compares against a stored baseline, emits a routable decision tag, and produces GitHub/HTML/JSON reports. Default-input test is the floor; Deploy Guard is the gate.

#### Why is the first run always `monitor`?

Cold-start safety. Without a trusted baseline, Deploy Guard has no field schema history, no drift reference, no flakiness signal, and no run history to calibrate confidence. The score is capped at 70 and `decision` is forced to `monitor`. After the first run completes with `enableBaseline: true`, run number 2 has something to compare against and can graduate to `act_now`.

#### Can I use this in GitHub Actions?

Yes. Call the `run-sync-get-dataset-items` endpoint, parse the `decision` field, exit non-zero on anything other than `act_now` + `status: pass`. The key-value store also contains a `GITHUB_SUMMARY` record (Markdown) ready for `$GITHUB_STEP_SUMMARY`. See the example above.

#### Does it re-run all tests if one fails?

By default, yes — tests run sequentially in the order you provide. If 5 consecutive tests fail, a circuit breaker halts remaining tests to cap cost and the run exits cleanly with the verdict derived from what ran. Mark known-broken tests with `expectedToFail: true` and they won't trip the breaker.

#### What's the difference between `verdictReasonCodes` and `confidenceFactorCodes`?

`verdictReasonCodes` explain **what the verdict is** — pass/warn/block and the specific failures that drove it (e.g. `CRITICAL_TEST_FAILURE`, `BASELINE_DRIFT`). `confidenceFactorCodes` explain **how much to trust the verdict** — whether enough data has accumulated, whether a baseline exists, whether drift signals are active. Both are stable enums; both are additive-only within a major version.

#### Does it cost credits?

Yes — $0.35 per suite for the Deploy Guard layer itself, plus whatever your target actor costs per run × N test cases. A suite with 5 test cases against a $0.10-per-result scraper that returns 20 results each costs: $0.35 (Deploy Guard) + 5 × 20 × $0.10 = $10.35 total. Deploy Guard only bills the $0.35; the rest bills on the target's pricing to your account.

#### Can I compare two actor versions side-by-side?

No — Deploy Guard tests one actor at a time. For side-by-side A/B comparison use [A/B Tester](https://apify.com/ryanclinton/actor-ab-tester), which runs the same input against two actors in parallel and returns a pairwise decision (`switch_now` / `canary_recommended` / `monitor_only` / `no_call`).

#### How do I detect flaky tests?

Enable `enableBaseline: true` and run on a schedule. Flakiness detection activates after 5 prior runs — Deploy Guard computes a per-test pass rate across run history and flags any test with a pass rate below 80% as flaky. The `stability[]` array shows `{ name, passRate, runs, flaky }` per test case. Consumers should treat `flaky: true` tests as non-blocking — don't gate CI on them until you've fixed the underlying non-determinism.

#### Can I supply different inputs per test case?

Yes — every `testCase.input` is independent. Use `parameterizedTestCases` to run the same template against many parameter sets (e.g. test the same URL shape with 20 different URLs). `nameTemplate` and `inputTemplate` support `{{placeholder}}` substitution.

#### What happens if the sub-actor times out?

Each `Actor.call()` is wrapped in a wall-clock race (`timeout + 60s` or 5 minutes minimum). On timeout, the test case is marked failed with `failureType: 'timeout'`, and the suite continues. Two timeouts in a row don't break the suite — but 5 consecutive failures of any type trip the circuit breaker.

#### Why did my first run get a `monitor` decision even though every test passed?

Cold-start cap. The run succeeded, every assertion passed, and the verdict is `pass` — but without a stored baseline there's no history to calibrate confidence, so the score is capped at 70 and `decision` can't promote to `act_now`. Run it again (scheduled or manual) with `enableBaseline: true` and run number 2 onward will promote to `act_now` when the verdict stays healthy.

***

### What Deploy Guard does NOT do

Deploy Guard is the **pre-release test gate** in a fleet of specialist actors. Use siblings for these adjacent jobs:

| Need | Use instead |
|:-----|:------------|
| Validate schema/quality of a PRODUCTION dataset after it runs (silent data failures, coverage drops, null spikes) | [Output Guard](https://apify.com/ryanclinton/actor-schema-validator) — post-run data-quality monitor with incident lifecycle and channel-aware alerts |
| Compare two actor versions side-by-side on the same input | [A/B Tester](https://apify.com/ryanclinton/actor-ab-tester) — pairwise decision engine with fairness checks and decision stability |
| Score a whole fleet's quality | [Quality Monitor](https://apify.com/ryanclinton/actor-quality-monitor) — fleet-wide quality scorer |
| Detect PII / GDPR / TOS risks in an actor's output | [Compliance Scanner](https://apify.com/ryanclinton/actor-compliance-scanner) |
| Consolidated dashboard across the whole fleet | [Fleet Health Report](https://apify.com/ryanclinton/actor-fleet-analytics) |

Deploy Guard's output is designed to **feed these siblings** — every run appends field-rule suggestions to a shared key-value store (the AQP store) that Output Guard picks up automatically. Pre-deploy assertions that fail here become production monitoring rules there without manual sync.

***

### License

Proprietary. Runs on Apify. Source is available inside the platform for audit but not redistributable.

# Actor input Schema

## `targetActorId` (type: `string`):

The actor ID or username/actor-name to test

## `preset` (type: `string`):

Pre-built test suite for common actor types. Runs preset checks automatically — add custom test cases to extend. Leave empty to define all test cases manually.

## `testCases` (type: `array`):

Custom test cases. Each has a name, input, and quality checks. Supports: minResults, maxResults, maxDuration, requiredFields, fieldTypes, noEmptyFields, fieldPatterns (regex), fieldRanges (min/max), uniqueFields. Set expectedToFail: true for known issues.

## `parameterizedTestCases` (type: `array`):

Define one test template and an array of parameter sets. Use {{key}} placeholders in nameTemplate and inputTemplate. Each parameter set generates a concrete test case.

## `enableBaseline` (type: `boolean`):

Track field schema across runs. Detects new fields, missing fields, type changes, and null-rate shifts. Baselines are saved to a named KV store and compared on each run. Only passing suites update the baseline.

## `timeout` (type: `integer`):

Max time to wait for each test case run. Increase for slow actors (deep research, multi-page scraping).

## `memory` (type: `integer`):

Memory allocation for each test run. Match the target actor's typical memory usage.

## `maxSampleItems` (type: `integer`):

Number of dataset items to fetch and validate per test case. Default 1,000 covers most actors. Increase for full-scan mode on high-volume actors.

## Actor input object example

```json
{
  "targetActorId": "ryanclinton/website-contact-scraper",
  "preset": "canary",
  "testCases": [
    {
      "name": "Contact scraper smoke test",
      "input": {
        "urls": [
          "https://example.com"
        ]
      },
      "assertions": {
        "minResults": 1,
        "requiredFields": [
          "email",
          "domain"
        ],
        "fieldPatterns": {
          "email": "^[^@\\s]+@[^@\\s]+\\.[^@\\s]+$"
        },
        "maxDuration": 120
      }
    }
  ],
  "enableBaseline": false,
  "timeout": 300,
  "memory": 512,
  "maxSampleItems": 1000
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "targetActorId": "ryanclinton/website-contact-scraper",
    "testCases": [
        {
            "name": "Contact scraper smoke test",
            "input": {
                "urls": [
                    "https://example.com"
                ]
            },
            "assertions": {
                "minResults": 1,
                "requiredFields": [
                    "email",
                    "domain"
                ],
                "fieldPatterns": {
                    "email": "^[^@\\s]+@[^@\\s]+\\.[^@\\s]+$"
                },
                "maxDuration": 120
            }
        }
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("ryanclinton/actor-test-runner").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "targetActorId": "ryanclinton/website-contact-scraper",
    "testCases": [{
            "name": "Contact scraper smoke test",
            "input": { "urls": ["https://example.com"] },
            "assertions": {
                "minResults": 1,
                "requiredFields": [
                    "email",
                    "domain",
                ],
                "fieldPatterns": { "email": "^[^@\\s]+@[^@\\s]+\\.[^@\\s]+$" },
                "maxDuration": 120,
            },
        }],
}

# Run the Actor and wait for it to finish
run = client.actor("ryanclinton/actor-test-runner").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "targetActorId": "ryanclinton/website-contact-scraper",
  "testCases": [
    {
      "name": "Contact scraper smoke test",
      "input": {
        "urls": [
          "https://example.com"
        ]
      },
      "assertions": {
        "minResults": 1,
        "requiredFields": [
          "email",
          "domain"
        ],
        "fieldPatterns": {
          "email": "^[^@\\\\s]+@[^@\\\\s]+\\\\.[^@\\\\s]+$"
        },
        "maxDuration": 120
      }
    }
  ]
}' |
apify call ryanclinton/actor-test-runner --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=ryanclinton/actor-test-runner",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Actor Test Runner — Validate Inputs, Outputs & Error Handling",
        "description": "Actor Test Runner. Available on the Apify Store with pay-per-event pricing.",
        "version": "1.0",
        "x-build-id": "hjslZRousV4L6cm2G"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/ryanclinton~actor-test-runner/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-ryanclinton-actor-test-runner",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/ryanclinton~actor-test-runner/runs": {
            "post": {
                "operationId": "runs-sync-ryanclinton-actor-test-runner",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/ryanclinton~actor-test-runner/run-sync": {
            "post": {
                "operationId": "run-sync-ryanclinton-actor-test-runner",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "targetActorId"
                ],
                "properties": {
                    "targetActorId": {
                        "title": "Target Actor ID",
                        "type": "string",
                        "description": "The actor ID or username/actor-name to test",
                        "default": "ryanclinton/website-contact-scraper"
                    },
                    "preset": {
                        "title": "Test Suite Preset",
                        "enum": [
                            "canary",
                            "scraper-smoke",
                            "api-actor",
                            "contact-scraper",
                            "ecommerce-quality",
                            "store-readiness"
                        ],
                        "type": "string",
                        "description": "Pre-built test suite for common actor types. Runs preset checks automatically — add custom test cases to extend. Leave empty to define all test cases manually.",
                        "default": "canary"
                    },
                    "testCases": {
                        "title": "Test Cases",
                        "maxItems": 50,
                        "type": "array",
                        "description": "Custom test cases. Each has a name, input, and quality checks. Supports: minResults, maxResults, maxDuration, requiredFields, fieldTypes, noEmptyFields, fieldPatterns (regex), fieldRanges (min/max), uniqueFields. Set expectedToFail: true for known issues."
                    },
                    "parameterizedTestCases": {
                        "title": "Parameterized Test Cases",
                        "maxItems": 10,
                        "type": "array",
                        "description": "Define one test template and an array of parameter sets. Use {{key}} placeholders in nameTemplate and inputTemplate. Each parameter set generates a concrete test case."
                    },
                    "enableBaseline": {
                        "title": "Enable baseline + drift detection",
                        "type": "boolean",
                        "description": "Track field schema across runs. Detects new fields, missing fields, type changes, and null-rate shifts. Baselines are saved to a named KV store and compared on each run. Only passing suites update the baseline.",
                        "default": false
                    },
                    "timeout": {
                        "title": "Timeout per test (seconds)",
                        "minimum": 30,
                        "maximum": 3600,
                        "type": "integer",
                        "description": "Max time to wait for each test case run. Increase for slow actors (deep research, multi-page scraping).",
                        "default": 300
                    },
                    "memory": {
                        "title": "Memory (MB)",
                        "minimum": 128,
                        "maximum": 32768,
                        "type": "integer",
                        "description": "Memory allocation for each test run. Match the target actor's typical memory usage.",
                        "default": 512
                    },
                    "maxSampleItems": {
                        "title": "Max items to analyse",
                        "minimum": 10,
                        "maximum": 10000,
                        "type": "integer",
                        "description": "Number of dataset items to fetch and validate per test case. Default 1,000 covers most actors. Increase for full-scan mode on high-volume actors.",
                        "default": 1000
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
