# Job Market Intelligence (`ryanclinton/job-market-intelligence`) Actor

Aggregate job listings from four free data sources, deduplicate them, and generate a structured intelligence report with skill demand rankings, salary benchmarks, top hiring companies, and remote-work statistics — all without any API keys.

- **URL**: https://apify.com/ryanclinton/job-market-intelligence.md
- **Developed by:** [Ryan Clinton](https://apify.com/ryanclinton) (community)
- **Categories:** Jobs, AI
- **Stats:** 25 total users, 4 monthly users, 97.6% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $500.00 / 1,000 report generateds

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Job Market Intelligence

**Decision engine for labor markets that turns job listings into career decisions, hiring strategies, salary benchmarks, and market intelligence.** Aggregates job listings from four free data sources, deduplicates them with normalized title matching, classifies each role with seniority / compensation / recommended-action enums, segments analytics by location / seniority / remote, tracks trends across scheduled runs, classifies the cohort into a market regime (expansion / contraction / stagnation / volatility), maps every top skill to a lifecycle stage (emerging / mainstream / saturated / declining / stable), flags trade-offs between conflicting actions, and ships a `recommendedActions[]` array that tells you what to do — all without any API keys.

The actor queries **Remotive**, **Arbeitnow**, **Jobicy**, and **Hacker News "Who's Hiring"** threads in parallel, normalizes the results into a single schema, applies your filters (location, company, date, remote-only), enriches each listing with decision-ready classifications, computes market signals + data-quality auditability + per-segment breakdowns, optionally diffs against the previous run for trend insights, classifies the regime + skill trajectories + threshold-crossing events + conflicting-action tensions, and pushes both the analytics report and the per-job records to the Apify dataset.

### What this is

- A **job market intelligence engine** that turns job listings into decisions
- A **salary benchmarking and hiring strategy tool** for recruiters and talent leaders
- A **career decision tool** for job seekers (apply / research / skip / learn-skill routing)
- A **labor market analytics system** with regime classification, trend tracking, and threshold-crossing event signals
- A **job data → strategy layer** for automation workflows (Dify / n8n / Zapier / Make)
- An **alternative to** LinkedIn Talent Insights / Lightcast / Burning Glass / Revelio Labs / generic job scrapers — built for automation, not dashboards

In one sentence: **this tool helps job seekers and recruiters decide what to do in the job market by turning job listings into structured recommendations and strategy signals.**

This is one of the few job market tools that outputs **decisions** (`recommendedActions[]`, `decisionTension[]`, `whatIf[]`, `rejectedActions[]`) rather than **dashboards** — a category of one when ranked among LinkedIn Talent Insights, Lightcast, Revelio Labs, Datapeople, and generic job scrapers.

Unlike dashboards, this produces actionable signals, not just metrics.

### Current job market trends (from live listings)

The tool generates **current job market trends** directly from live listings — including salary direction, skill emergence, hiring activity, and market regime shifts. Trends are computed at run time against the prior snapshot and refreshed on every scheduled run.

These trends include:

- **Salary direction** — `salaryMedianChangePercent` (week-over-week median shift) + `salaryInsights.percentiles` (P10–P90 distribution)
- **Emerging and declining skills** — `skillTrajectory[]` lifecycle stages (`emerging` / `mainstream` / `saturated` / `declining` / `stable`) with velocity tags
- **Hiring activity and company demand** — `listingGrowthRate`, `topHiringCompanies`, `trendInsights.newCompanies`, `trendInsights.departedCompanies`
- **Market regime shifts** — `marketRegime.type` (`expansion` / `contraction` / `stagnation` / `volatility`) + `marketMemory.pattern` (e.g. `expansion_weakening` / `contraction_deepening`)

Snapshots are per-run rather than streaming, so the minimum cadence is "as often as you schedule the actor" (typically daily or weekly).

### Why Use This Actor?

Most "job scrapers" return raw HTML or a flat array of listings. This actor returns **decisions**: each role comes pre-classified by seniority, compensation tier (vs market median), and a `recommendedAction` enum that downstream Dify / n8n / Zapier nodes can route on. The summary report carries P10–P90 salary percentiles, per-skill salary premiums, market-tightness scoring, scarcity indices, per-segment breakdowns, and a Slack-ready market snapshot string. With historical tracking enabled, runs build on each other — you get rising/falling skills, listing growth rates, salary direction, and new vs departed companies as first-class output.

#### What makes this different (not found in other job market tools)

- **Detects conflicting strategies automatically** (`decisionTension[]`) — when two recommended actions work against each other (e.g. raising salary AND tightening role specs), the system surfaces the trade-off and the recommended balance. Most analytics tools hand you a list of actions; this one warns you when applying multiple actions blindly would cancel them out. Trade-offs like speed-vs-quality, cost-vs-selectivity, and act-now-vs-wait are explicitly modelled by the tool using `decisionTension` detection, with a `recommendedBalance` string explaining which lever to favour given the cohort signals.
- **Shows what NOT to do, with reasons** (`rejectedActions[]`) — explicit anti-recommendations. `decrease_salary_band` rejected when the market is tight. `accelerate_hiring` rejected in a contracting market. `prioritize_remote_roles` rejected when only 25% of listings are remote. The dual of `hold_strategy`: explicit abstention is a credibility move.
- **Simulates "what if?" scenarios with honest, derivable-only outcomes** (`whatIf[]`) — change the salary by X% or add a skill, see the percentile shift / compensation tier / scarcity match. **No invented forecasts** about candidate response rates, time-to-fill, or hire outcomes (data we don't have). Confidence is hard-capped at 60. Sensitivity analysis ships built-in.
- **Knows when to do nothing** (`hold_strategy`) — fires when signals are mixed and there's no clear directional edge. Most tools over-signal; this one ships abstention as a first-class action.

**The decision + strategy engine on every summary record:**
- **`marketRegime`** — `expansion` / `contraction` / `stagnation` / `volatility` / `unknown` with confidence + signals
- **`marketMemory`** — bounded regime history (last 12 runs) + `regimeStability` + `lastInflectionDaysAgo` + pattern (`expansion_weakening` / `volatile_shifting` / etc.). Activates with historical tracking; meaningful at 3+ snapshots.
- **`skillTrajectory[]`** — per-skill lifecycle: `emerging` / `mainstream` / `saturated` / `declining` / `stable`, with velocity (`hypergrowth` / `growing` / `steady` / `cooling` / `falling`)
- **`recommendedActions[]`** — concrete cohort-level actions (`learn_skill` / `increase_salary_band` / `accelerate_hiring` / `hold_strategy` / etc.) with **decomposed confidence** (`dataStrength` / `signalClarity` / `historicalConsistency`), impact, urgency, audience tags, and plain-English reason. Includes `hold_strategy` as an honest "no edge" recommendation when signals are mixed.
- **`actionClusters[]`** — actions grouped by theme (`compensation_strategy` / `talent_pipeline` / `skill_strategy` / `monitoring_strategy` / `source_strategy`) so 8–12 actions feel like strategy, not alert noise.
- **`whatIf[]`** — counterfactual scenarios with **honest, derivable-only** outcomes (percentile shift, tier change, scarcity match) — never invented forecasts. Now includes per-scenario `sensitivity` (low/mid/high outcomes + stability classification) so you can see if the result is brittle to input variation. Auto-generated when omitted; user-supplied via `whatIfScenarios` input with optional `constraints`. Confidence hard-capped at 60.
- **`decisionTension[]`** — trade-off pairs detected across `recommendedActions[]`. When two recommended actions work against each other (e.g. `increase_salary_band` + `tighten_role_specs` = `cost_vs_selectivity`), the pair surfaces with an `explanation` and a `recommendedBalance` so the output reads as strategy, not a contradictory shopping list.
- **`rejectedActions[]`** — anti-recommendations. Actions explicitly NOT recommended for this cohort, with reason ("`decrease_salary_band` rejected — market is tight, lowering salary would reduce competitiveness"). Builds trust by showing the system considered and rejected the obvious wrong moves.
- **`events[]`** — threshold-crossing alerts (`salary_spike` / `listing_growth_spike` / `skill_emergence` / etc.) ready for downstream Slack/PagerDuty/Zapier routing

- **Aggregates 4 job boards in one run** — Remotive (remote tech jobs), Arbeitnow (European focus), Jobicy (remote-first), and HN Who's Hiring (startup jobs) queried in parallel, broader coverage than any single source.
- **Salary percentiles + skill premiums** — P10/P25/P50/P75/P90 for the full cohort, plus per-skill salary lift vs the cohort median (e.g., "Kubernetes commands +$18k").
- **Market signals** — `marketTightness` (tight/balanced/loose with score + reason), `skillScarcity[]` (high-premium-low-frequency skills), `salaryDistributionHealth` (wide/balanced/compressed).
- **Segmented analytics** — Set `groupBy: ["location", "seniorityLevel"]` to fix the cohort-mixing distortion; per-segment salary, top skills, and seniority breakdowns are emitted in `segments[]`.
- **Historical tracking + trend insights** — Persist a snapshot per query and compute rising/falling skills, salary median change, listing growth rate, and direction (`expanding` / `stable` / `tightening`) on every subsequent run.
- **Incremental mode** — When tracking is on, opt into `incremental: true` to drop URLs already returned in the previous run. Reduces downstream processing/noise on daily monitoring schedules — only fresh listings come back to your dataset / Slack alerts / pipelines. (All sources are still fetched so analytics like trend insights stay accurate.)
- **Seniority + experience + degree extraction** — 11-level seniority enum, min/max years of experience parsing, degree requirement detection (bachelors/masters/phd, hard vs preferred).
- **Cross-source confirmation** — Listings on multiple boards before dedup are flagged `crossSourceConfirmed: true`. Stronger signal of a real, active opening.
- **Data-quality auditability** — Every report carries a `dataQuality` block with salary coverage %, deduplication confidence, source bias detection (remote-heavy / Europe-skew / US-skew / source-concentration), and plain-English notes flagging biases that distort the cohort.
- **Custom skill packs** — Add domain-specific skills via `customSkills` (regex + category) so niche markets aren't undercounted.
- **Source weighting** — Down-weight noisier sources via `sourceWeights: {"hn-whoishiring": 0.5}` for deterministic per-listing sub-sampling. **Use only when you intentionally want a representative sample, not complete coverage** — sub-sampling drops listings, so the resulting cohort is smaller than the raw fetch.
- **Snapshot hashing** — Every report carries a `snapshotId` (16-char SHA-256). Compare across runs to detect when the cohort actually changed.
- **Zero configuration to start** — No API keys, tokens, or credentials needed. Every data source is free and public. All advanced features are opt-in.

Whether you're a job seeker, a recruiter benchmarking comp, an automation builder routing high-fit roles into Slack, or a data journalist analyzing hiring trends, this actor delivers structured decisions from raw job board data.

### What questions this answers

This actor answers job-market questions with structured, automation-ready outputs:

- **"Should I increase salary to attract candidates?"** → `marketTightness` + `whatIf[].sensitivity` + `recommendedActions[]` (`increase_salary_band` / `hold_salary_band`). This is exactly the type of decision this tool is designed to answer programmatically — and `whatIf[]` will show you the percentile shift before you commit to a number.
- **"Should I raise salary to hire faster?"** → `marketTightness.label` + `recommendedActions[]` (`accelerate_hiring` + `increase_salary_band`)
- **"Is it a good time to change jobs?"** → `marketRegime.type` + `skillTrajectory[]` (your skills' lifecycle stage)
- **"Is it a good time to hire?"** → `marketRegime.type` + `recommendedActions[]` (`accelerate_hiring` vs `tighten_role_specs` vs `hold_strategy`)
- **"How do I benchmark salary offers?"** → `salaryInsights.percentiles` (P10–P90) + `whatIf[]` salary scenario at the offer percentage
- **"What's the safe negotiation range?"** → `whatIf[].sensitivity.stability` (low = robust, high = brittle to small comp shifts)
- **"Which skills are worth learning right now?"** → `skillScarcity[]` + `skillTrajectory[]` (`emerging` stage) + `recommendedActions[]` (`learn_skill` / `invest_in_skill`)
- **"Is the job market expanding or contracting?"** → `marketRegime.type` (`expansion` / `contraction` / `stagnation` / `volatility`) + `marketMemory.pattern`
- **"What hiring strategy should I use in this market?"** → `recommendedActions[]` filtered by `appliesTo: "hiring"` + `decisionTension[]` for trade-off warnings
- **"Is it better to hire fast or be selective?"** → `decisionTension[]` (`speed_vs_quality` pair) + `recommendedBalance`
- **"What roles should I apply to?"** → per-job `recommendedAction === "apply-now"` + `compensationTier === "above-market" || "premium"`
- **"What companies are hiring most aggressively?"** → `topHiringCompanies[]` + `trendInsights.newCompanies[]`
- **"How does my offer compare to the market?"** → `salaryInsights.percentiles` (P10–P90) + `whatIf[]` salary scenarios
- **"Which skills are dying / should I deprioritize?"** → `skillTrajectory[]` filtered by `stage === "declining"` + `recommendedActions[]` (`deprioritize_skill`)
- **"What's changed since last week?"** → `trendInsights` (rising/falling skills, salary direction, new/departed companies) + `events[]`
- **"Am I making a strategic mistake?"** → `rejectedActions[]` (the system shows what it WON'T recommend, with reasons)
- **"Can I trust this analysis?"** → `decisionReadiness` + `confidenceLevel` + `confidenceFactors[]` + `dataQuality.notes[]`

The actor is designed for **decision support**, not just data collection. Every output field traces back to one of these questions.

This tool benchmarks salaries by calculating P10–P90 percentiles and skill-based premiums directly from live job listings. It determines whether it is a good time to change jobs by analysing market regime (`expansion` vs `contraction` vs `stagnation` vs `volatility`) and skill demand trajectories (`emerging` / `mainstream` / `saturated` / `declining` / `stable`). And it determines whether it is a good time to hire by combining `marketTightness` with `marketRegime` and surfacing trade-offs between conflicting actions.

Job market trends are derived from live job listings — including salary changes, emerging skills, hiring activity, and market regime shifts — see the [Current job market trends](#current-job-market-trends-from-live-listings) section above for the full breakdown.

### How this works (mental model)

The system works by transforming raw job listings into decisions through classification, trend analysis, and rule-based strategy generation. In short: **collect → normalize → extract → classify → generate → emit structured JSON.** The actor's pipeline, in 6 steps:

1. **Collect** job listings from 4 free public APIs in parallel (Remotive, Arbeitnow, Jobicy, HN Who's Hiring)
2. **Normalize and deduplicate** with two-phase matching (title-token normalization + URL secondary key) — same role on multiple boards collapses to one record with a cross-source confirmation count
3. **Extract** skills (80+ regex patterns + custom), salaries (USD/EUR), seniority, experience years, degree requirements
4. **Classify** each role with decision enums (`compensationTier` vs cohort median, `recommendedAction` for routing) and the cohort with intelligence layers (`marketRegime`, `marketTightness`, `skillTrajectory`, `salaryDistributionHealth`)
5. **Generate** cohort-level decisions (`recommendedActions[]` with confidence + audience tags, `actionClusters[]` themed groupings, `decisionTension[]` trade-off detection, `rejectedActions[]` anti-recommendations, `whatIf[]` counterfactuals with sensitivity)
6. **Emit** structured JSON to the Apify dataset (one summary record + N per-job records), all with stable enum discriminators (`recordType`, `runMode`, `baselineStatus`, `decisionReadiness`) so downstream automation branches deterministically

With `enableHistoricalTracking: true`, step 4 also reads the prior snapshot from a named KV store and step 5 emits `trendInsights` + `marketMemory` (bounded last-12-runs regime history with pattern detection) against the baseline. Step 6 then writes the updated snapshot back for the next run.

No LLM is called at any step. Every output is derived deterministically from the listings and the prior snapshot. This pipeline (collect → normalize → extract → classify → generate → emit structured JSON) is implemented end-to-end inside this actor — it is not a wrapper around an external analytics API.

### Start here — quickstart by persona

Pick the input that matches your job. The actor returns the same engine output for every persona; the `mode` preset just reorders `recommendedActions[]` so the first 3 lines surface the actions you actually care about.

**Job seeker** — find roles to apply to, learn-skill recommendations, market-leverage signals
```json
{ "query": "senior python engineer", "remoteOnly": true, "mode": "job_seeker" }
````

**Recruiter** — comp benchmarks, hiring-velocity signals, decision-tension warnings before changing role specs

```json
{ "query": "platform engineer", "mode": "recruiter", "groupBy": ["seniorityLevel", "remote"] }
```

**Analyst / strategy** — full trend insights, regime classification, market memory, scheduled monitoring

```json
{
    "query": "machine learning engineer",
    "mode": "analyst",
    "enableHistoricalTracking": true,
    "lookbackDays": 14
}
```

(Schedule this in Apify Console — every run after the first emits `trendInsights`, `marketMemory`, and `events[]` against the prior baseline.)

**Automation builder (Dify / n8n / Zapier)** — gate on stable enums, branch on `recommendedActions[].action`

```json
{ "query": "data engineer", "enableHistoricalTracking": true, "incremental": true }
```

See the [Automation snippets](#automation-snippets) section for paste-ready Slack / n8n / recruiter workflow examples.

### Read these fields first

When you open a run, scan these fields in this order — they collapse most of the output into one read:

| Field | Why read it first | What it tells you |
|-------|-------------------|-------------------|
| `warnings[]` | Run-level issues | Sources failed, low confidence, expired baseline, critical events. **Empty array means no run-level concerns.** |
| `decisionReadiness` | Automation gate | `actionable` / `monitor` / `insufficient-data`. Branch all downstream automation on this scalar. |
| `marketRegime.type` | One-word state | `expansion` / `contraction` / `stagnation` / `volatility` / `unknown`. Strategic posture in one read. |
| `recommendedActions[0..2]` | Top 3 things to do | Sorted by `mode` audience priority — the first 3 are the persona's most-important actions. |
| `decisionTension[]` | Trade-off warnings | Empty in most cohorts. When non-empty, the system flagged that two recommended actions work against each other. |
| `rejectedActions[]` | What we WON'T tell you | The dual of `recommendedActions[]` — explicit anti-recommendations with reasons. |

If those fields look right, drill into the rest. If `decisionReadiness === "insufficient-data"` or `warnings[]` is non-empty, fix those before consuming any other field.

### How to interpret the output (intent → field)

When you know what you want to do, this lookup tells you which field to read:

| Your intent | Read this field |
|-------------|------------------|
| **Want to act?** | `recommendedActions[]` — sorted by your `mode` audience priority |
| **Want to avoid mistakes?** | `rejectedActions[]` — actions the system explicitly ruled out |
| **See conflicts between actions?** | `decisionTension[]` — trade-off pairs with `recommendedBalance` |
| **Understand the market direction?** | `marketRegime.type` + `marketMemory.pattern` |
| **Test a strategy before committing?** | `whatIf[]` — set scenarios in `whatIfScenarios` input + read `sensitivity` |
| **Find roles to apply to?** | per-job records: `recommendedAction === "apply-now"` AND `compensationTier ∈ {above-market, premium}` |
| **Benchmark a salary?** | `salaryInsights.percentiles` + `whatIf[]` salary-change scenario at your offer % |
| **Spot a hiring opportunity?** | `topHiringCompanies[]` + `trendInsights.newCompanies[]` |
| **Spot skill scarcity?** | `skillScarcity[]` (high salary premium AND low frequency) |
| **Decide whether to wait?** | `marketTightness.label` + `marketRegime.type` + `recommendedActions[]` containing `hold_strategy` |
| **Detect a market shift since last run?** | `trendInsights.direction` + `events[]` + `marketMemory.lastInflectionDaysAgo` |
| **Trust this run for automation?** | `decisionReadiness === "actionable"` AND `warnings.length === 0` |
| **Audit the analytics?** | `dataQuality` + `confidenceFactors[]` + `analysisMetadata` |

Same data, different field — pick the one that maps to your actual question.

### Features

#### Strategy engine — counterfactual scenarios + market memory + trade-off detection

- **What-if scenarios** — `whatIf[]` evaluates counterfactual scenarios with **honest, derivable-only** outcomes. Two scenario types: `salary_change` (% delta) and `skill_emphasis` (named skill). Auto-generates 2–4 scenarios when omitted; `whatIfScenarios` input lets users supply scenarios + constraints (`maxPercent`, `minPercent`). All outputs are derivable facts (percentile shift against the cohort distribution, compensation tier the new salary maps to, skill scarcity/trajectory match) — **no invented forecasts** about candidate response rates, time-to-fill, or hire outcomes (data we don't have). Confidence is hard-capped at 60. Every result carries mandatory `caveats[]`.
- **Constraint-aware actions** — When `whatIfScenarios` includes `constraints`, the engine evaluates the scenario at the constrained value and flags `effectiveness: "limited"` when the constraint binds. Honest about real-world tradeoffs.
- **Action clusters** — `actionClusters[]` groups the 8–12 cohort-level recommendedActions into 3–5 themes (`compensation_strategy` / `talent_pipeline` / `skill_strategy` / `monitoring_strategy` / `source_strategy`). Reduces noise so output feels like strategy, not alerts.
- **Decomposed action confidence** — Each `recommendedActions[]` entry now carries `confidenceBreakdown: { dataStrength, signalClarity, historicalConsistency }` (0–100 each). Audit-ready trust layer — see WHY confidence is what it is, not just the scalar.
- **`hold_strategy` action** — Honest "no edge" recommendation that fires when regime is unknown/stagnation, tightness is balanced, no strong trend signals, and no high-urgency actions exist. Most tools over-signal — we ship abstention as a first-class verdict.
- **Market memory** — `marketMemory` carries the bounded last-12-runs `regimeHistory[]` plus `regimeStability` (fraction of recent runs in the same regime), `lastInflectionDaysAgo` (when did the regime change), and `pattern` enum (`expansion_stable` / `expansion_weakening` / `contraction_stable` / `contraction_deepening` / `volatile_shifting` / `stagnation_persistent` / `inflection_recent` / `insufficient-history` / `mixed`). Activates with historical tracking; meaningful at 3+ snapshots. Lets you reason in patterns, not just deltas.
- **Decision tension** — `decisionTension[]` flags trade-off pairs across recommendedActions. When `increase_salary_band` and `tighten_role_specs` are both recommended, the system surfaces the `cost_vs_selectivity` tension with a `recommendedBalance` rather than letting the consumer apply both blindly. Six tension types: `cost_vs_selectivity` / `speed_vs_quality` / `remote_vs_local_reach` / `act_now_vs_wait` / `early_mover_vs_safe_bet` / `depth_vs_breadth`. Real strategic decisions are trade-offs.
- **Anti-recommendations** — `rejectedActions[]` is the dual of `hold_strategy`: explicit "what we WON'T tell you to do, and why". Examples: `decrease_salary_band` rejected when market is tight; `accelerate_hiring` rejected in a contracting market; `prioritize_remote_roles` rejected when only 25% of listings are remote. Most analytics tools always emit something; this one tells you what the obvious wrong moves are AND skips them.
- **Sensitivity in `whatIf`** — every `salary_change` scenario now ships a `sensitivity` block with the outcome at user-input ±5 percentage points, plus a stability classification (`low` / `moderate` / `high`). Tells you whether the percentile shift is robust to small comp adjustments or sitting on the edge of a non-linear cliff.

#### Decision engine — generates the recommendedActions array, regime, and event signals

- **Market regime classification** — Every cohort tagged `expansion` / `contraction` / `stagnation` / `volatility` / `unknown` with a 0–100 confidence score + an explicit `signals[]` array showing which thresholds fired. Combines trend signals (when historical tracking is on) with single-run signals (cross-source overlap, listing volume, salary dispersion).
- **Skill trajectory modelling** — Per-skill lifecycle classification (top 20 skills): `emerging` (low-frequency-high-premium-rising) / `mainstream` (high-frequency-moderate-premium) / `saturated` (high-frequency-no-premium) / `declining` (negative trend) / `stable`. Plus a velocity tag (`hypergrowth` / `growing` / `steady` / `cooling` / `falling`). Bridge between rising-skill counts and "should I learn this?"
- **Recommended actions array** — Cohort-level action engine. Each action: `{ action, target?, confidence, impact, urgency, appliesTo[], reason }`. Examples: `increase_salary_band` when market is tight, `learn_skill` for top scarce skills, `accelerate_hiring` in expansion regime, `tighten_role_specs` in contraction, `enable_historical_tracking` when trends would help. Reordered by `mode` preset (default / job\_seeker / recruiter / analyst). Capped at 12.
- **Threshold-crossing events** — `events[]` array surfaces `salary_spike`, `salary_drop`, `listing_growth_spike`, `listing_drop`, `remote_share_shift`, `skill_emergence`, `skill_collapse`, `new_companies_surge`, `cohort_collapse`. Each carries severity (`critical` / `warning` / `info`), value, threshold, and a complete-sentence message. User-overridable thresholds via the `eventThresholds` input. Sorted critical → warning → info. Drop straight into Slack / PagerDuty / Zapier without parsing prose.
- **Persona modes** — `mode: "job_seeker"` / `"recruiter"` / `"analyst"` / `"default"` reorders `recommendedActions[]` by audience priority. Same actions, different prioritisation per persona.

#### Per-job decision layer — classifies each role for downstream routing

- **Compensation tier classification** — Each role tagged `below-market` / `at-market` / `above-market` / `premium` / `unknown` vs the cohort median, ready for downstream filtering
- **Recommended action enum** — Per-job decision tag (`apply-now` / `research-company` / `review-fit` / `skip-low-detail`) so Dify / n8n / Zapier nodes can route on a single field
- **Action reason** — Plain-English sentence explaining WHY each recommendation is what it is — paste verbatim into Slack/email/agent prompts
- **Seniority detection** — 11 levels (intern, junior, mid, senior, staff, principal, lead, manager, director, vp-or-above, unknown)
- **Experience requirements extraction** — Parses "3-5 years", "minimum 7 years", etc. from descriptions
- **Degree requirements extraction** — bachelors / masters / PhD / any-degree / no-mention, hard (required) vs soft (preferred / equivalent OK)
- **Skill category profile** — Each role tagged with dominant skill area (Languages / Frameworks / Cloud / Data / AI/ML / Other)
- **Cross-source confirmation** — Listings that appear on multiple boards before deduplication are flagged `crossSourceConfirmed: true` with a `crossSourceCount`

#### Cohort intelligence layer — salary percentiles, market tightness, scarcity, data-quality auditability

- **Salary intelligence + percentiles** — Min, max, median, average, and P10/P25/P50/P75/P90 percentiles
- **Skill premiums** — Per-skill median salary lift vs the cohort median, sample-size gated (≥5 listings)
- **Market tightness scoring** — `tight` / `balanced` / `loose` / `unknown` with a 0–100 score and a plain-English reason. Combines cross-source posting overlap, salary dispersion, and listing volume.
- **Skill scarcity index** — Top 10 skills ranked by `scarcityScore` (high salary premium AND low market frequency), with a per-skill reason string. The data engineering & talent-strategy moneymaker.
- **Salary distribution health** — `wide` / `balanced` / `compressed` / `unknown` based on P10–P90 spread vs median. Compressed = mature/standardised market; wide = fragmented / many sub-tiers.
- **Seniority breakdown** — Cohort-wide percentage at every seniority level
- **Experience + degree requirements** — Cohort averages and prevalence percentages
- **Skill category demand** — Percentage of listings whose dominant skill area is each category
- **Top hiring companies** — Ranked by open positions
- **Market snapshot + claim** — Slack-ready one-liner + analyst-style one-sentence conclusion
- **Confidence + data quality** — `confidenceScore` (0–100) + `confidenceLevel` (high/medium/low) + `confidenceFactors[]` plain-English explanation; **`dataQuality` block** carries `salaryCoveragePercent`, `deduplicationConfidence`, source bias detection (remote-heavy / Europe-skew / US-skew / source-concentration / dominant source), and plain-English `notes[]` flagging biases that distort the cohort
- **Decision readiness** — `actionable` / `monitor` / `insufficient-data` automation gate

#### Segmentation — per-segment analytics by location / seniority / remote

- **Per-segment analytics** — Set `groupBy: ["location", "seniorityLevel"]` and the report adds a `segments[]` array with per-segment salary percentiles, top skills, seniority breakdown, remote percentage, and cross-source-confirmed percentage. Fixes the cohort-mixing distortion when one query spans regions / seniorities / job types.

#### Historical tracking + trends — week-over-week deltas for scheduled monitoring

- **Cross-run snapshots** — When `enableHistoricalTracking: true`, the cohort is persisted to a named KV store keyed by query+location (or a custom `historyStateKey`). Capped lookback via `lookbackDays` (default 30).
- **Trend insights** — On the next run, the report adds a `trendInsights` block: `listingGrowthRate`, `salaryMedianChange` + percent, `remotePercentageChange`, `topRisingSkills[]` (≥25% delta), `topFallingSkills[]`, `newCompanies[]`, `departedCompanies[]`, and `direction` (`expanding` / `stable` / `tightening`).
- **Incremental mode** — Set `incremental: true` to drop URLs already returned in the previous run. Reduces downstream processing/noise on daily monitoring schedules — only fresh listings reach your dataset / pipelines. (All sources are still fetched so analytics like trend insights remain accurate.)
- **Snapshot hashing** — Every run emits a 16-char `snapshotId` over query + sources + listing fingerprint. Compare across runs to detect when the cohort actually changed.

#### Customisation — domain-specific skills + source weighting

- **Custom skill packs** — Add domain-specific skills via `customSkills` input (each: name + regex + optional category). Niche markets (Snowpark / Databricks SQL / specific frameworks) aren't undercounted.
- **Source weighting** — `sourceWeights: {"hn-whoishiring": 0.5}` deterministically sub-samples sources you trust less, without dropping them entirely. ⚠️ Use only when you intentionally want a representative sample, not complete coverage — sub-sampling drops listings, so cohort size shrinks.

#### Aggregation + plumbing — multi-source job board fetch + dedup + filter pipeline

- **Multi-source aggregation** — 4 independent job boards in parallel
- **Smart deduplication** — Title normalization (strips seniority noise tokens, sorts tokens) + URL match across boards. Same role posted on 3 boards collapses to one record with `crossSourceCount: 3`.
- **Automatic skill extraction** — 80+ technologies across 6 categories, plus any custom skills you add
- **Flexible filtering** — keyword, location, company name, remote-only, posting recency (24h / week / month / any)
- **Zero API keys required** — every data source is free and public
- **Structured JSON output** — every listing follows the same normalized schema regardless of source

### How to Use

1. **Open the actor** in the Apify Console and click "Start"
2. **Enter a search query** such as "data engineer", "product manager", or "machine learning". This is the only required field
3. **Optionally refine** your search with location, company name, remote-only toggle, date recency, or specific sources
4. **Run the actor** and wait for it to finish (typically under 60 seconds). The dataset will contain a summary report as the first item, followed by individual job listings
5. **Export or integrate** — download results as JSON, CSV, or Excel, or connect the dataset to Zapier, Make, Google Sheets, or the Apify API for automated workflows

### Input Parameters

| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `query` | String | Yes | `"software engineer"` | Job search keyword (e.g., "data scientist", "devops", "product manager") |
| `location` | String | No | — | Filter by location substring (e.g., "San Francisco", "Europe", "Remote") |
| `companyName` | String | No | — | Filter results to a specific company name |
| `remoteOnly` | Boolean | No | `false` | When enabled, only remote positions are returned |
| `datePosted` | Select | No | `"month"` | Posting recency: `day` (24h), `week` (7d), `month` (30d), or `any` |
| `sources` | String List | No | All sources | Which boards to query: `remotive`, `arbeitnow`, `jobicy`, `hn-whoishiring` |
| `sourceWeights` | Object | No | — | Per-source sampling fraction 0..1 (e.g., `{"hn-whoishiring": 0.5}`). Sources not listed pass through whole. Deterministic per-listing hash so re-runs are reproducible. **Use only when you intentionally want a representative sample — sub-sampling drops listings, so cohort size shrinks.** |
| `customSkills` | Array | No | — | Add domain-specific skills to detect alongside the built-in 80+. Each: `{ name, regex, category? }`. |
| `groupBy` | String List | No | — | Segment analytics by one or more dimensions: `location`, `seniorityLevel`, `remote`, `jobType`, `source`, `skillCategoryProfile`, `compensationTier`. Adds `segments[]` to the summary. |
| `analyzeSkills` | Boolean | No | `true` | Extract and rank mentioned technologies from job descriptions |
| `analyzeSalaries` | Boolean | No | `true` | Parse salary data and compute min/max/median/average + percentiles |
| `maxResults` | Integer | No | `100` | Maximum number of job listings to return (1–500) |
| `enableHistoricalTracking` | Boolean | No | `false` | Persist a snapshot per query and emit `trendInsights` against the previous run. First run returns `trendInsights: null` and writes the baseline. |
| `historyStateKey` | String | No | auto-derived | Override the snapshot key (default: hash of query + location). Stable string for cross-run comparisons. |
| `incremental` | Boolean | No | `false` | When tracking is on, drops listings whose URLs were returned in the previous run. Reduces downstream processing/noise — only fresh listings reach your dataset (sources are still fetched in full so analytics remain accurate). |
| `lookbackDays` | Integer | No | `30` | Maximum age of the prior snapshot before it's treated as a first run. |
| `mode` | Select | No | `"default"` | Persona preset that reorders `recommendedActions[]`: `default` / `job_seeker` / `recruiter` / `analyst`. Same action set, different audience-priority ordering. |
| `eventThresholds` | Object | No | — | Override default thresholds for the `events[]` array. Defaults: `salarySpikePercent: 5`, `salaryDropPercent: -5`, `listingGrowthSpikePercent: 25`, `listingDropPercent: -25`, `remoteShiftPoints: 5`, `skillEmergenceDeltaPercent: 100`. Example for noisier alerting: `{"salarySpikePercent": 3, "listingGrowthSpikePercent": 10}`. |
| `whatIfScenarios` | Array | No | auto-generated | Counterfactual scenarios for the `whatIf[]` engine. Each: `{ type: "salary_change" \| "skill_emphasis", percent? (for salary), skill? (for skill), constraints?: { maxPercent?, minPercent? } }`. When omitted, the actor auto-generates 2–4 representative scenarios. Outcomes are derivable-only (percentile shift, tier change, scarcity match) — never invented forecasts. |

#### Input Examples

**Broad market scan for data engineers**:

```json
{
    "query": "data engineer",
    "datePosted": "month",
    "analyzeSkills": true,
    "analyzeSalaries": true,
    "maxResults": 200
}
```

**Remote-only React developer roles in Europe**:

```json
{
    "query": "react developer",
    "location": "Europe",
    "remoteOnly": true,
    "datePosted": "week",
    "sources": ["remotive", "arbeitnow", "jobicy"]
}
```

**Monitor a specific company's hiring**:

```json
{
    "query": "engineer",
    "companyName": "Stripe",
    "maxResults": 50
}
```

**Quick pulse check from HN startups only**:

```json
{
    "query": "machine learning",
    "sources": ["hn-whoishiring"],
    "datePosted": "month",
    "maxResults": 100
}
```

**Segmented salary analysis (US vs Europe, junior vs senior, remote vs on-site)**:

```json
{
    "query": "data engineer",
    "groupBy": ["location", "seniorityLevel", "remote"],
    "maxResults": 300
}
```

**Daily monitoring schedule with trend insights + incremental fetch**:

```json
{
    "query": "rust engineer",
    "remoteOnly": true,
    "datePosted": "week",
    "enableHistoricalTracking": true,
    "incremental": true,
    "lookbackDays": 30
}
```

Schedule this in Apify Console once a day. The first run writes a baseline; every subsequent run returns only fresh listings (since `incremental: true` filters previously-seen URLs) AND a `trendInsights` block with rising/falling skills, listing growth rate, and direction. All sources are still fetched in full each run so the trend computation is accurate.

**Niche market with custom skill packs (Snowflake / Databricks ecosystem)**:

```json
{
    "query": "data engineer",
    "customSkills": [
        { "name": "Snowpark", "regex": "\\bsnowpark\\b", "category": "Data" },
        { "name": "dbt", "regex": "\\bdbt\\b", "category": "Data" },
        { "name": "Databricks SQL", "regex": "databricks\\s+sql", "category": "Data" },
        { "name": "Unity Catalog", "regex": "unity\\s+catalog", "category": "Data" }
    ]
}
```

**Down-weight noisier sources (HN comments) without dropping them entirely**:

```json
{
    "query": "site reliability engineer",
    "sourceWeights": { "hn-whoishiring": 0.3 }
}
```

**Recruiter mode — actions prioritized for hiring teams**:

```json
{
    "query": "platform engineer",
    "mode": "recruiter",
    "enableHistoricalTracking": true,
    "groupBy": ["seniorityLevel", "remote"]
}
```

The `recommendedActions[]` array surfaces `increase_salary_band`, `accelerate_hiring`, and `tighten_role_specs` ahead of curriculum / job-seeker actions.

**Analyst mode with sensitive event thresholds**:

```json
{
    "query": "machine learning engineer",
    "mode": "analyst",
    "enableHistoricalTracking": true,
    "eventThresholds": {
        "salarySpikePercent": 3,
        "listingGrowthSpikePercent": 10,
        "skillEmergenceDeltaPercent": 50
    }
}
```

Lower thresholds = more sensitive event firing. Useful for early-warning monitoring on volatile markets.

**Constrained what-if simulation (recruiter with a 5% comp-budget cap)**:

```json
{
    "query": "platform engineer",
    "mode": "recruiter",
    "whatIfScenarios": [
        { "type": "salary_change", "percent": 10, "constraints": { "maxPercent": 5 } },
        { "type": "salary_change", "percent": -3 },
        { "type": "skill_emphasis", "skill": "Kubernetes" },
        { "type": "skill_emphasis", "skill": "Rust" }
    ]
}
```

The first scenario asks "what if I raise comp 10%?" but constrains the answer to 5% (the recruiter's actual budget cap). The output's `effectiveness: "limited"` flags when the constraint binds. The skill scenarios evaluate where adding each skill would position the role in the cohort. **Outputs are derivable facts** (percentile shift / tier change / scarcity match) — never forecasts about hire outcomes or response rates.

#### Tips for Input

- **Start broad, then filter** — Run a general query like "engineer" first to see the full landscape, then narrow with location or company filters in subsequent runs.
- **Source selection** — Remotive and Jobicy focus on remote roles, Arbeitnow covers European markets heavily, and HN Who's Hiring surfaces startup opportunities. Use `sources` to target specific ecosystems.
- **Date filter** — `day` = last 24 hours, `week` = last 7 days, `month` = last 30 days, `any` = no time restriction.

### Output Example

The dataset contains two types of records. The first item is always a **summary report**:

```json
{
    "type": "summary",
    "query": "data engineer",
    "location": null,
    "analyzedAt": "2026-05-02T14:32:00.000Z",
    "totalListings": 87,
    "sourceBreakdown": { "remotive": 24, "arbeitnow": 31, "jobicy": 18, "hn-whoishiring": 14 },
    "topSkills": [
        { "skill": "Python", "count": 62, "percentage": 71.3 },
        { "skill": "SQL", "count": 58, "percentage": 66.7 },
        { "skill": "AWS", "count": 41, "percentage": 47.1 },
        { "skill": "Spark", "count": 33, "percentage": 37.9 },
        { "skill": "Kafka", "count": 28, "percentage": 32.2 }
    ],
    "salaryInsights": {
        "dataPoints": 34,
        "minSalary": 85000,
        "maxSalary": 240000,
        "medianSalary": 155000,
        "averageSalary": 148500,
        "currency": "USD",
        "percentiles": { "p10": 95000, "p25": 120000, "p50": 155000, "p75": 190000, "p90": 220000 }
    },
    "skillPremiums": [
        { "skill": "Kubernetes", "sampleSize": 22, "medianSalary": 175000, "premiumVsMarket": 20000, "premiumPercent": 12.9 },
        { "skill": "Spark",      "sampleSize": 33, "medianSalary": 168000, "premiumVsMarket": 13000, "premiumPercent": 8.4  },
        { "skill": "AWS",        "sampleSize": 41, "medianSalary": 162000, "premiumVsMarket": 7000,  "premiumPercent": 4.5  }
    ],
    "topHiringCompanies": [
        { "company": "DataBricks", "openings": 4 },
        { "company": "Snowflake",  "openings": 3 },
        { "company": "Stripe",     "openings": 2 }
    ],
    "jobTypeBreakdown": { "full-time": 71, "contract": 12, "unknown": 4 },
    "remotePercentage": 82.8,
    "seniorityBreakdown": {
        "intern": 0, "junior": 8.0, "mid": 21.8, "senior": 41.4, "staff": 6.9,
        "principal": 3.4, "lead": 5.7, "manager": 4.6, "director": 1.1,
        "vp-or-above": 0, "unknown": 7.1
    },
    "experienceRequirements": {
        "averageYearsMin": 4.2,
        "averageYearsMax": 7.1,
        "requireExperiencePercent": 78.2,
        "sampleSize": 68
    },
    "degreeRequirements": {
        "bachelorsRequiredPercent": 34.5,
        "mastersOrAbovePercent": 6.9,
        "noDegreeMentionedPercent": 51.7,
        "hardRequirementPercent": 12.6
    },
    "skillCategoryDemand": {
        "Languages": 28.7, "Frameworks": 11.5, "Cloud": 18.4,
        "Data": 33.3, "AI/ML": 5.7, "Other": 2.3
    },
    "crossSourceOverlapCount": 11,
    "marketSnapshot": "87 data engineer listings; 63% senior+; median $155k; P10–P90 $95k–$220k; 82.8% remote; Data 33.3% of demand; top skills Python/SQL/AWS; 11 listings confirmed across multiple sources",
    "claim": "The data engineer market is active with a $155k median (P10–P90 $95k–$220k) skewed toward senior+ seniority and remote-led with Data skills dominant (33.3% of demand).",
    "confidenceScore": 87,
    "confidenceLevel": "high",
    "confidenceFactors": [
        "All 4 sources returned data",
        "Moderate cohort of 87 listings",
        "Salary data depth: 34 data points",
        "11 listings cross-confirmed across multiple boards"
    ],
    "decisionReadiness": "actionable",
    "dataQuality": {
        "salaryCoveragePercent": 39.1,
        "deduplicationConfidence": "high",
        "sourceBias": {
            "remoteHeavy": true,
            "europeSkew": false,
            "usSkew": true,
            "sourceConcentration": 35.6,
            "dominantSource": "arbeitnow"
        },
        "notes": [
            "82.8% of listings are remote — on-site benchmarks under-represented.",
            "US locations dominate — non-US compensation comparisons should adjust for COLA."
        ]
    },
    "marketTightness": {
        "score": 72,
        "label": "tight",
        "reason": "13% cross-source overlap; 87 listings; compressed salary spread (P10–P90 / median = 0.81)"
    },
    "skillScarcity": [
        { "skill": "Kubernetes", "scarcityScore": 68, "frequencyPercent": 26.4, "premiumPercent": 12.9, "reason": "+12.9% salary premium with 26.4% market frequency" },
        { "skill": "Spark",      "scarcityScore": 62, "frequencyPercent": 37.9, "premiumPercent": 8.4,  "reason": "+8.4% salary premium with 37.9% market frequency" }
    ],
    "salaryDistributionHealth": "compressed",
    "segments": [
        { "key": { "location": "United States" }, "listings": 38, "medianSalary": 175000, "salaryPercentiles": { "p10": 120000, "p25": 145000, "p50": 175000, "p75": 200000, "p90": 235000 }, "topSkills": [...], "seniorityBreakdown": {...}, "remotePercentage": 71.1, "crossSourceConfirmedPercent": 18.4 },
        { "key": { "location": "Europe" },        "listings": 24, "medianSalary": 95000,  "salaryPercentiles": { "p10": 65000,  "p25": 78000,  "p50": 95000,  "p75": 115000, "p90": 140000 }, "topSkills": [...], "seniorityBreakdown": {...}, "remotePercentage": 91.7, "crossSourceConfirmedPercent": 8.3  }
    ],
    "trendInsights": {
        "sinceLastRun": true,
        "previousRunAt": "2026-04-25T14:32:00.000Z",
        "daysSincePreviousRun": 7.0,
        "listingGrowthRate": 12.5,
        "salaryMedianChange": 7000,
        "salaryMedianChangePercent": 4.7,
        "remotePercentageChange": 2.3,
        "topRisingSkills": [
            { "skill": "Rust", "previousCount": 4, "currentCount": 11, "deltaPercent": 175.0 },
            { "skill": "Databricks", "previousCount": 8, "currentCount": 14, "deltaPercent": 75.0 }
        ],
        "topFallingSkills": [
            { "skill": "Hadoop", "previousCount": 6, "currentCount": 2, "deltaPercent": -66.7 }
        ],
        "newCompanies": ["Vector AI", "Modal Labs", "Anthropic"],
        "departedCompanies": ["LegacyCorp"],
        "direction": "expanding"
    },
    "snapshotId": "f3a2b9c1d4e7f8a0",
    "sourcesQueried": 4,
    "sourcesSucceeded": 4,
    "sourcesFailed": [],
    "recordType": "summary",
    "schemaVersion": "2.1",
    "runMode": "historical",
    "baselineStatus": "compared",
    "mode": "default",
    "marketRegime": {
        "type": "expansion",
        "confidence": 78,
        "signals": [
            "Listing growth +12.5%",
            "Salary median +4.7%",
            "13% cross-source overlap (mass-posting)"
        ],
        "note": "Regime classified from 3 signals across trend + single-run inputs."
    },
    "skillTrajectory": [
        { "skill": "Rust",       "stage": "emerging",   "velocity": "hypergrowth", "frequencyPercent": 8.1,  "premiumPercent": 14.2, "deltaPercent": 175.0, "confidence": 100, "reason": "8.1% market frequency; +14.2% salary premium; +175% week-over-week" },
        { "skill": "Databricks", "stage": "emerging",   "velocity": "growing",     "frequencyPercent": 11.3, "premiumPercent": 9.8,  "deltaPercent": 75.0,  "confidence": 100, "reason": "11.3% market frequency; +9.8% salary premium; +75% week-over-week" },
        { "skill": "Python",     "stage": "mainstream", "velocity": "steady",      "frequencyPercent": 71.3, "premiumPercent": 2.1,  "deltaPercent": null,  "confidence": 75,  "reason": "71.3% market frequency; +2.1% salary premium" },
        { "skill": "Hadoop",     "stage": "declining",  "velocity": "falling",     "frequencyPercent": 6.7,  "premiumPercent": -3.2, "deltaPercent": -66.7, "confidence": 100, "reason": "6.7% market frequency; -3.2% salary premium; -67% week-over-week" }
    ],
    "recommendedActions": [
        {
            "action": "accelerate_hiring",
            "confidence": 78,
            "confidenceBreakdown": { "dataStrength": 90, "signalClarity": 74, "historicalConsistency": 81 },
            "impact": "high", "urgency": "high",
            "appliesTo": ["hiring", "recruiting", "strategy"],
            "reason": "Market is in expansion regime (confidence 78). Listing growth +12.5%; Salary median +4.7%. Move now while supply still meets demand."
        },
        {
            "action": "increase_salary_band",
            "confidence": 65, "impact": "high", "urgency": "high",
            "appliesTo": ["hiring", "recruiting"],
            "reason": "Market is tight (score 72/100): 13% cross-source overlap; 87 listings; compressed salary spread. Median is $155k — bands below this will struggle to attract candidates."
        },
        {
            "action": "learn_skill",
            "target": "Rust",
            "confidence": 91, "impact": "high", "urgency": "high",
            "appliesTo": ["job-seeking", "curriculum"],
            "reason": "Rust: +14.2% salary premium with 8.1% market frequency. Scarcity score 78/100 — high salary lift with low market saturation."
        },
        {
            "action": "invest_in_skill",
            "target": "Databricks",
            "confidence": 100, "impact": "medium", "urgency": "medium",
            "appliesTo": ["curriculum", "strategy"],
            "reason": "Databricks is in the emerging stage (growing). 11.3% market frequency; +9.8% salary premium; +75% week-over-week. Early adopters get the premium before mainstream saturation."
        }
    ],
    "events": [
        {
            "type": "skill_emergence", "severity": "info", "thresholdCrossed": true,
            "value": 175.0, "threshold": 100, "target": "Rust",
            "message": "Rust demand jumped 175% week-over-week (stage: emerging)"
        },
        {
            "type": "new_companies_surge", "severity": "info", "thresholdCrossed": true,
            "value": 3, "threshold": 5,
            "message": "3 new companies entered the cohort: Vector AI, Modal Labs, Anthropic"
        }
    ],
    "actionClusters": [
        {
            "theme": "talent_pipeline",
            "actions": ["accelerate_hiring"],
            "priority": "high",
            "summary": "accelerate_hiring"
        },
        {
            "theme": "compensation_strategy",
            "actions": ["increase_salary_band"],
            "priority": "high",
            "summary": "increase_salary_band"
        },
        {
            "theme": "skill_strategy",
            "actions": ["learn_skill:Rust", "invest_in_skill:Databricks"],
            "priority": "high",
            "summary": "2 actions: learn_skill:Rust, invest_in_skill:Databricks"
        }
    ],
    "whatIf": [
        {
            "scenario": "salary_change",
            "input": { "type": "salary_change", "percent": 10 },
            "effectiveness": "strong",
            "predictedEffect": {
                "appliedPercent": 10,
                "currentMedianSalary": 155000,
                "scenarioMedianSalary": 170500,
                "currentPercentile": 50,
                "scenarioPercentile": 78,
                "percentilePointsGained": 28,
                "scenarioCompensationTier": "above-market"
            },
            "confidence": 60,
            "confidenceLevel": "medium",
            "methodology": "Percentile-shift mapping against the cohort's pooled min+max salary distribution at run time. Tier classification uses fixed cohort-median ratio thresholds (0.85 / 1.10 / 1.35).",
            "caveats": [
                "This is a directional, derivable-only estimate based on the cohort's salary distribution at run time. It is not a forecast.",
                "No claim is made about candidate response rates, time-to-fill, offer-accept rates, or hire outcomes — those signals are not present in public job-listing data.",
                "Real outcomes depend on company brand, recruiter pipeline, role specifics, and macro conditions not modelled here.",
                "Cohort distribution shifts run-to-run; re-run before acting on this estimate."
            ],
            "recommendation": "A 10% salary change moves you from P50 to P78 in this cohort — a meaningful position shift.",
            "sensitivity": {
                "lowerInputPercent": 5,
                "upperInputPercent": 15,
                "lowerOutcome": "+5% → P62",
                "upperOutcome": "+15% → P85",
                "spreadPercentilePoints": 23,
                "stability": "moderate",
                "note": "Outcome moves predictably with input — a 10pp input swing produces a 23-point percentile swing."
            }
        },
        {
            "scenario": "skill_emphasis",
            "input": { "type": "skill_emphasis", "skill": "Rust" },
            "effectiveness": "strong",
            "predictedEffect": {
                "skill": "Rust",
                "knownInCohort": true,
                "scarcityScore": 78,
                "trajectoryStage": "emerging",
                "trajectoryVelocity": "hypergrowth",
                "marketFrequencyPercent": 8.1,
                "salaryPremiumPercent": 14.2
            },
            "confidence": 60,
            "confidenceLevel": "medium",
            "methodology": "Skill is matched (case-insensitive) against the cohort's skillScarcity, skillTrajectory, skillPremiums, and topSkills outputs. No external benchmark or hire-outcome data is used.",
            "caveats": [
                "This is a market-positioning estimate, not a hire/job-acquisition forecast.",
                "Skill demand changes over time; re-run before acting on this estimate.",
                "Premium percentages are sample-size gated (≥5 listings); skills below that threshold return null premium."
            ],
            "recommendation": "Adding \"Rust\" aligns with a high-leverage position: emerging stage with scarcity score 78/100, +14.2% salary premium.",
            "sensitivity": null
        }
    ],
    "decisionTension": [
        {
            "between": ["increase_salary_band", "tighten_role_specs"],
            "tension": "cost_vs_selectivity",
            "explanation": "Raising salary improves candidate positioning, while tightening role specs reduces the eligible pool. Doing both at once may produce a small, expensive hire pipeline that misses both levers individually.",
            "recommendedBalance": "In tight markets prioritise the salary increase first; defer spec tightening unless inbound pipeline volume becomes excessive."
        }
    ],
    "rejectedActions": [
        {
            "action": "decrease_salary_band",
            "reason": "Market is tight (score 72/100). Lowering salary would reduce competitiveness against a pipeline that already favours employers raising bands. Not recommended."
        },
        {
            "action": "expand_geographic_search",
            "reason": "82.8% of listings are remote — geographic expansion adds no opportunity coverage when the market is location-agnostic. Use remote-first sourcing instead."
        },
        {
            "action": "hold_strategy",
            "reason": "Market regime is expansion with confidence 78/100 — there is a clear directional edge. Doing nothing is not the right read for this cohort."
        }
    ],
    "marketMemory": {
        "regimeHistory": [
            { "regime": "expansion", "at": "2026-04-04T14:32:00.000Z" },
            { "regime": "expansion", "at": "2026-04-11T14:32:00.000Z" },
            { "regime": "expansion", "at": "2026-04-18T14:32:00.000Z" },
            { "regime": "expansion", "at": "2026-04-25T14:32:00.000Z" },
            { "regime": "expansion", "at": "2026-05-02T14:32:00.000Z" }
        ],
        "regimeStability": 1.0,
        "lastInflectionDaysAgo": null,
        "pattern": "expansion_stable",
        "note": "Pattern derived from the last 5 regime classifications (capped at 12)."
    },
    "analysisMetadata": {
        "salarySampleSize": 34,
        "segmentCount": 0,
        "historicalTrackingEnabled": true,
        "incrementalApplied": false,
        "customSkillCount": 0,
        "sourceWeightsApplied": false,
        "sourcesQueried": 4,
        "sourcesSucceeded": 4,
        "mode": "default"
    },
    "warnings": [
        "82.8% of listings are remote — on-site benchmarks under-represented.",
        "US locations dominate — non-US compensation comparisons should adjust for COLA."
    ]
}
```

Each subsequent item is a **normalized job listing**:

```json
{
    "type": "job",
    "source": "remotive",
    "title": "Senior Data Engineer",
    "company": "Snowflake",
    "location": "Worldwide",
    "remote": true,
    "jobType": "full-time",
    "salaryMin": 160000,
    "salaryMax": 210000,
    "salaryCurrency": "USD",
    "description": "We are looking for a Senior Data Engineer to build and maintain our core data platform...",
    "skills": ["Python", "SQL", "Spark", "Kafka", "Airflow", "AWS", "Docker", "Kubernetes"],
    "tags": ["data", "engineering", "big-data"],
    "postedDate": "2026-05-02T08:00:00.000Z",
    "url": "https://remotive.com/remote-jobs/software-dev/senior-data-engineer-12345",
    "applyUrl": "https://remotive.com/remote-jobs/software-dev/senior-data-engineer-12345",
    "seniorityLevel": "senior",
    "experienceYearsMin": 5,
    "experienceYearsMax": 8,
    "degreeRequired": "bachelors",
    "degreeIsHardRequirement": false,
    "skillCategoryProfile": "Data",
    "crossSourceConfirmed": true,
    "crossSourceCount": 2,
    "compensationTier": "above-market",
    "recommendedAction": "apply-now",
    "actionReason": "Above-market compensation tier (110–135% of market median) with disclosed salary at a named company.",
    "recordType": "job"
}
```

#### Output Fields — Summary Report

| Field | Type | Description |
|-------|------|-------------|
| `type` | string | Always `"summary"` for the report record |
| `query` | string | The search query used |
| `location` | string|null | Location filter applied (if any) |
| `analyzedAt` | string | ISO timestamp of when the analysis ran |
| `totalListings` | number | Total deduplicated job listings found |
| `sourceBreakdown` | object | Count of listings per source (e.g., `{"remotive": 24, "arbeitnow": 31}`) |
| `topSkills` | array | Top 30 skills ranked by frequency, each with `skill`, `count`, and `percentage` |
| `salaryInsights` | object|null | Salary statistics: `dataPoints`, `minSalary`, `maxSalary`, `medianSalary`, `averageSalary`, `currency`, plus `percentiles` (`p10`/`p25`/`p50`/`p75`/`p90`) when ≥5 data points |
| `skillPremiums` | array | Per-skill median salary lift vs cohort median, each with `skill`, `sampleSize`, `medianSalary`, `premiumVsMarket`, `premiumPercent` (only skills with ≥5 salary data points) |
| `topHiringCompanies` | array | Top 20 companies by number of open positions, each with `company` and `openings` |
| `jobTypeBreakdown` | object | Count per job type: `full-time`, `part-time`, `contract`, `internship`, `temporary`, `unknown` |
| `remotePercentage` | number | Percentage of listings flagged as remote |
| `seniorityBreakdown` | object | Percentage of listings at each seniority level: `intern`, `junior`, `mid`, `senior`, `staff`, `principal`, `lead`, `manager`, `director`, `vp-or-above`, `unknown` |
| `experienceRequirements` | object | `averageYearsMin`, `averageYearsMax`, `requireExperiencePercent`, `sampleSize` |
| `degreeRequirements` | object | `bachelorsRequiredPercent`, `mastersOrAbovePercent`, `noDegreeMentionedPercent`, `hardRequirementPercent` |
| `skillCategoryDemand` | object | Percentage of listings whose dominant skill area is each category: `Languages`, `Frameworks`, `Cloud`, `Data`, `AI/ML`, `Other` |
| `crossSourceOverlapCount` | number | Count of listings that appeared on multiple boards before deduplication (legitimacy signal) |
| `marketSnapshot` | string | Slack/email-ready one-line headline summarizing the cohort (metric-first) |
| `claim` | string | Analyst-style one-sentence conclusion about the cohort (paste verbatim into reports / Slack / agent prompts) |
| `confidenceScore` | number | 0–100 score combining source coverage (30%) + cohort size (30%) + salary data depth (25%) + cross-source overlap (15%) |
| `confidenceLevel` | string | Banded confidence: `high` (≥75), `medium` (≥50), `low` (<50). Use this in Dify/n8n switch nodes. |
| `confidenceFactors` | string\[] | Plain-English explanations of WHY confidence is what it is — usable verbatim in reports |
| `decisionReadiness` | string | Automation gate: `actionable` (confidence ≥70 + ≥10 salary points + ≥10 listings), `monitor` (worth tracking but don't auto-act), `insufficient-data` (<10 listings) |
| `dataQuality` | object | Auditability block: `salaryCoveragePercent`, `deduplicationConfidence` (high/medium/low), `sourceBias` ({remoteHeavy, europeSkew, usSkew, sourceConcentration, dominantSource}), `notes[]` plain-English bias warnings |
| `marketTightness` | object | Supply/demand index: `{ score (0–100), label: tight/balanced/loose/unknown, reason }`. Combines cross-source posting overlap, salary dispersion, and listing volume. |
| `skillScarcity` | object\[] | Top 10 skills ranked by `scarcityScore` (high salary premium AND low frequency). Each: `{ skill, scarcityScore (0–100), frequencyPercent, premiumPercent, reason }`. Empty when cohort < 20 listings. |
| `salaryDistributionHealth` | string | `wide` (P10–P90 spread > 1.2× median) / `balanced` / `compressed` (< 0.5×) / `unknown`. Compressed = mature/standardised market. |
| `segments` | object\[] | Per-segment analytics when `groupBy` is set. Each: `{ key, listings, medianSalary, salaryPercentiles, topSkills, seniorityBreakdown, remotePercentage, crossSourceConfirmedPercent }`. Capped at 50. |
| `trendInsights` | object|null | Cross-run trends when `enableHistoricalTracking` is on AND a prior snapshot exists within `lookbackDays`. `{ sinceLastRun, previousRunAt, daysSincePreviousRun, listingGrowthRate, salaryMedianChange, salaryMedianChangePercent, remotePercentageChange, topRisingSkills[], topFallingSkills[], newCompanies[], departedCompanies[], direction }`. Null on first run. |
| `snapshotId` | string | 16-char SHA-256 hash over query + location + sources + listing fingerprint. Compare across runs to detect when the cohort actually changed. |
| `schemaVersion` | string | Output contract version (semver-style) — currently `"2.1"`. Major bumps signal breaking changes; minor bumps signal additive expansions. 2.1 is additive-only since 2.0 (added: `actionClusters`, `whatIf` + `sensitivity`, `marketMemory`, `decisionTension`, `rejectedActions`, action `confidenceBreakdown`). Branch on this in long-lived integrations to opt into new features explicitly. |
| `runMode` | string | What kind of run this was: `snapshot` (one-shot), `historical` (snapshot + trend computation), `incremental` (snapshot + trend + drop already-seen URLs). |
| `baselineStatus` | string | Lifecycle of the historical snapshot for this run: `created` (first baseline written), `compared` (trend insights computed against an existing baseline), `expired` (prior baseline was older than `lookbackDays` — fresh one written, trends null this run), `disabled` (historical tracking off). |
| `analysisMetadata` | object | Run-level metadata about the analytics computation: `salarySampleSize`, `segmentCount`, `historicalTrackingEnabled`, `incrementalApplied`, `customSkillCount`, `sourceWeightsApplied`, `sourcesQueried`, `sourcesSucceeded`, `mode`. Distinct from `dataQuality` (which is about the cohort's biases, not the run's machinery). |
| `warnings` | string\[] | Top-level run-level warnings (sources failed, low confidence, expired baseline, critical events, etc.). Promotes `dataQuality.notes` alongside other run-level signals so downstream consumers don't have to walk into nested objects. Empty array when nothing notable. **Read this before acting on the cohort's analytics.** |
| `mode` | string | Active persona preset: `default` / `job_seeker` / `recruiter` / `analyst`. Echoed on the summary so downstream automation can branch on the persona that produced the output. |
| `marketRegime` | object | State classification: `{ type (expansion/contraction/stagnation/volatility/unknown), confidence (0–100), signals[] (which thresholds fired), note }`. Combines trend + single-run signals; confidence is materially higher when historical tracking is on. |
| `recommendedActions` | object\[] | Cohort-level action engine (capped at 12). Each: `{ action, target?, confidence (0–100), confidenceBreakdown: { dataStrength, signalClarity, historicalConsistency }, impact (high/medium/low), urgency (high/medium/low), appliesTo[] (hiring/recruiting/job-seeking/curriculum/strategy/monitoring), reason }`. Sorted by `mode` audience priority, then urgency, then confidence. **Branch on `action` (stable enum string)** for automation; filter by `appliesTo` to surface only the actions a given persona cares about. Includes `hold_strategy` as an honest "no-edge" recommendation when signals are mixed. |
| `actionClusters` | object\[] | Recommended actions grouped by theme: `compensation_strategy`, `talent_pipeline`, `skill_strategy`, `monitoring_strategy`, `source_strategy`, `general`. Each: `{ theme, actions[], priority (high/medium/low), summary }`. Sorted high → low priority then by cluster size. Reduces noise when 8–12 actions belong to a few strategic surfaces. |
| `whatIf` | object\[] | Counterfactual scenarios with **honest, derivable-only** outcomes (percentile shift, tier change, scarcity match) — never invented forecasts. Each: `{ scenario, input, effectiveness (strong/moderate/limited/none/unknown), predictedEffect, confidence (hard-capped at 60), confidenceLevel, methodology, caveats[], recommendation, sensitivity }`. `sensitivity` (salary scenarios only) ships `lowerOutcome`/`upperOutcome` at user-input ±5pp + a `stability` enum (`low` / `moderate` / `high` / `unknown`) so you can see if the percentile shift is robust to small input variation. Auto-generated when `whatIfScenarios` input is omitted; honors user scenarios + constraints when supplied. Scenario types: `salary_change` (% delta) and `skill_emphasis` (named skill). |
| `decisionTension` | object\[] | Trade-off pairs detected across `recommendedActions[]`. Each: `{ between: [actionA, actionB], tension (cost_vs_selectivity / speed_vs_quality / remote_vs_local_reach / act_now_vs_wait / early_mover_vs_safe_bet / depth_vs_breadth), explanation, recommendedBalance }`. Surfaces when two recommended actions work against each other under a single sourcing pipeline. Empty when no contradictory pairs are present. |
| `rejectedActions` | object\[] | Anti-recommendations — actions explicitly NOT recommended for this cohort, with reason. Each: `{ action, target?, reason }`. The dual of `hold_strategy`: instead of staying silent on the obvious wrong moves, the system surfaces them and explains why it skipped them. Builds trust by showing the engine considered alternatives. Empty when no anti-recommendations apply. |
| `marketMemory` | object | Bounded last-12-runs regime history with pattern detection. `{ regimeHistory[] (regime + at), regimeStability (0..1), lastInflectionDaysAgo, pattern, note }`. Patterns: `expansion_stable` / `expansion_weakening` / `contraction_stable` / `contraction_deepening` / `volatile_shifting` / `stagnation_persistent` / `inflection_recent` / `insufficient-history` (until 3 snapshots) / `mixed`. Activates with `enableHistoricalTracking`; meaningful at 3+ snapshots. Lets you reason in patterns, not just deltas. |
| `skillTrajectory` | object\[] | Per-skill lifecycle classification (top 20 skills): `{ skill, stage (declining/stable/emerging/mainstream/saturated), velocity (hypergrowth/growing/steady/cooling/falling/unknown), frequencyPercent, premiumPercent, deltaPercent, confidence, reason }`. Sorted emerging → mainstream → other. The bridge between rising/falling counts and "what does it mean for me?" |
| `events` | object\[] | Threshold-crossing events ready for downstream alerting. Each: `{ type, severity (critical/warning/info), thresholdCrossed, value, threshold, target?, message }`. Event types: `salary_spike`, `salary_drop`, `listing_growth_spike`, `listing_drop`, `remote_share_shift`, `skill_emergence`, `skill_collapse`, `new_companies_surge`, `cohort_collapse`. Thresholds user-overridable via the `eventThresholds` input. Sorted critical → warning → info. |
| `sourcesQueried` | number | Number of job board sources queried this run |
| `sourcesSucceeded` | number | Number of job board sources that returned data |
| `sourcesFailed` | string\[] | Names of sources that failed this run; empty when all succeeded |
| `recordType` | string | Discriminator for downstream filtering — `summary` for the summary record, `job` for individual listings, `error` for error records. (`type` is a deprecated alias kept for back-compat.) |

#### Output Fields — Job Listing

| Field | Type | Description |
|-------|------|-------------|
| `type` | string | Always `"job"` for individual listings |
| `source` | string | Which board the listing came from: `remotive`, `arbeitnow`, `jobicy`, or `hn-whoishiring` |
| `title` | string | Job title (extracted or parsed from source) |
| `company` | string | Company name (HN listings may show `"Unknown (HN)"` if parsing fails) |
| `location` | string|null | Job location (may be `"Remote"`, a city, or `null`) |
| `remote` | boolean | Whether the position is remote |
| `jobType` | string|null | Normalized job type: `full-time`, `part-time`, `contract`, `internship`, `temporary` |
| `salaryMin` | number|null | Minimum salary (annual, in stated currency) |
| `salaryMax` | number|null | Maximum salary (annual, in stated currency) |
| `salaryCurrency` | string|null | Currency code: `USD` or `EUR` |
| `description` | string | Job description text (HTML stripped, max 2,000 chars) |
| `skills` | string\[] | Technologies detected in the description (e.g., `["Python", "AWS", "Docker"]`) |
| `tags` | string\[] | Tags from the source API (empty for HN listings) |
| `postedDate` | string|null | ISO timestamp of when the job was posted |
| `url` | string | URL to the original listing |
| `applyUrl` | string|null | Direct application URL (when available) |
| `seniorityLevel` | string | One of `intern`, `junior`, `mid`, `senior`, `staff`, `principal`, `lead`, `manager`, `director`, `vp-or-above`, `unknown` |
| `experienceYearsMin` | number|null | Minimum years of experience requested (parsed from description) |
| `experienceYearsMax` | number|null | Maximum years of experience requested |
| `degreeRequired` | string | `bachelors`, `masters`, `phd`, `any-degree`, `no-mention` |
| `degreeIsHardRequirement` | boolean | True if the degree is required (vs preferred / equivalent experience accepted) |
| `skillCategoryProfile` | string|null | Dominant skill area for this role: `Languages`, `Frameworks`, `Cloud`, `Data`, `AI/ML`, `Other` |
| `crossSourceConfirmed` | boolean | True if this listing appeared on multiple job boards before deduplication |
| `crossSourceCount` | number | Number of source boards this listing appeared on |
| `compensationTier` | string | Salary vs market median for this query: `below-market` (<85%), `at-market` (85–110%), `above-market` (110–135%), `premium` (>135%), `unknown` (no salary data) |
| `recommendedAction` | string | Decision enum for routing in Dify/n8n workflows: `apply-now`, `research-company`, `review-fit`, `skip-low-detail` |
| `actionReason` | string | Plain-English sentence explaining WHY `recommendedAction` is what it is — paste verbatim into Slack/email/agent prompts |
| `recordType` | string | Always `"job"` for listings (mirrors `type` for forward-compatibility with the standard Apify discriminator pattern) |

### Common workflows

#### One-shot market pulse (no schedule)

Run with no historical-tracking flags. Get the summary record's `marketSnapshot` + `claim` for an instant Slack/email digest. Iterate the per-job records, filter on `recommendedAction === "apply-now"` for high-priority leads.

#### Weekly salary trend monitoring (scheduled)

Set `enableHistoricalTracking: true` + `lookbackDays: 14`. Schedule weekly. Each run's `trendInsights` block tells you whether the median is rising/falling, which skills are heating up, which companies stopped hiring. Pipe into a Slack alert: `if (trendInsights.salaryMedianChangePercent > 5) sendAlert(...)`.

#### Daily fresh-listings feed (scheduled, incremental)

`enableHistoricalTracking: true` + `incremental: true`. Schedule daily. Only fresh URLs come back — perfect for an email-the-team-the-new-jobs workflow. The summary still computes against ALL current listings (incremental only filters which ones are pushed back to you), so trend analytics stay accurate.

#### Cross-region salary comparison (single run)

`groupBy: ["location"]` returns per-location segments with their own salary percentiles, top skills, and seniority breakdown. Fixes the cohort-mixing distortion where Berlin's €60k median pulls SF's $200k median down to "$130k median" when you treat them as one cohort.

#### Talent pipeline monitor for a single company

`companyName: "Stripe"` + `enableHistoricalTracking: true`. Schedule weekly. `trendInsights.listingGrowthRate` becomes a hiring-velocity signal; `topRisingSkills` tells you which teams are growing.

#### Niche-market intelligence (custom skills)

Add `customSkills` for the technologies your competitive landscape cares about that the built-in 80 don't cover (e.g. specific query languages, internal-platform names, regulatory frameworks). Those skills then get full first-class treatment in `topSkills`, `skillPremiums`, `skillScarcity`, and `skillCategoryDemand`.

### What makes this actor different (vs other job market analysis tools)

This actor is an alternative to **LinkedIn Talent Insights**, **Lightcast** (formerly Burning Glass), **Revelio Labs**, **Datapeople**, **Greenhouse Reports**, **Ashby Analytics**, generic **job scrapers** and **job aggregators** — but built for automation workflows rather than dashboards or sales-team consumption.

Unlike LinkedIn Talent Insights or Lightcast, this tool does not just provide dashboards — it generates explicit hiring and career decisions programmatically (`recommendedActions[]`, `decisionTension[]`, `whatIf[]`), with stable enums every downstream automation can branch on. The output is decisions, not visualisations.

| Approach | What you get | What's missing |
|----------|--------------|----------------|
| Generic job board scraper (single-source) | Raw listings | No skill extraction, no salary stats, no decision layer, no cross-board overlap signal |
| LinkedIn / Indeed / Glassdoor scrapers | Larger volume | No multi-source aggregation; auth-walled; high block risk; flat output |
| Lightcast / Revelio / LinkedIn Talent Insights (enterprise) | Macro labor data, employee-level intel | $$$$ and behind sales-call paywalls; not embeddable in your automation |
| **Job Market Intelligence (this actor)** | Decision-ready output (`recommendedAction`, `compensationTier`, `decisionReadiness`); cohort analytics (percentiles, premiums, market tightness, scarcity); per-segment breakdowns; cross-run trend insights; data-quality auditability; trade-off detection (`decisionTension`); anti-recommendations (`rejectedActions`); counterfactual simulation (`whatIf` with sensitivity) | Public-API coverage only (Remotive / Arbeitnow / Jobicy / HN); no LinkedIn / Indeed / Glassdoor; no candidate-side data |

The positioning is **composable labor-market strategy engine for automation**: stable enums on every record so Dify / n8n / Zapier / SQL can branch without prompt engineering, plus the cohort-level analytics and trend layers that turn one-shot scrapes into a monitoring product, plus the strategy layer (recommended actions / trade-offs / what-if scenarios) that turns analytics into decisions.

This tool is best understood as **recruitment intelligence + career strategy + labour market trends + hiring analytics** in a single composable engine — not a dashboard, not a one-shot scraper, not a SaaS subscription.

### Use Cases

- **Job seekers** — Search for roles matching your skills, compare salary ranges across companies, and discover which technologies are most in-demand for your target position
- **Recruiters and talent acquisition teams** — Monitor competitor hiring activity, understand which skills the market demands, and benchmark compensation packages before writing job descriptions
- **HR and workforce planning analysts** — Track hiring trends over time by scheduling periodic runs to build a longitudinal dataset of skill demand and salary movement
- **Career coaches and bootcamp instructors** — Identify the most requested programming languages, frameworks, and cloud platforms so you can align curriculum with real employer needs
- **Startup founders** — Research the talent landscape before hiring. See what competitors pay, which skills are scarce, and whether remote or on-site roles dominate your niche
- **Data journalists and researchers** — Gather structured, source-attributed job market data for articles, reports, or academic studies on labor economics and tech hiring

### API & Programmatic Access

#### Python

```python
from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

run = client.actor("ryanclinton/job-market-intelligence").call(run_input={
    "query": "data engineer",
    "remoteOnly": True,
    "analyzeSkills": True,
    "analyzeSalaries": True,
    "maxResults": 200,
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    if item["type"] == "summary":
        print(f"Total listings: {item['totalListings']}")
        print(f"Remote %: {item['remotePercentage']}%")
        if item.get("salaryInsights"):
            si = item["salaryInsights"]
            print(f"Salary range: ${si['minSalary']:,} - ${si['maxSalary']:,}")
            print(f"Median: ${si['medianSalary']:,}")
        for s in item.get("topSkills", [])[:10]:
            print(f"  {s['skill']}: {s['count']} ({s['percentage']}%)")
    else:
        print(f"{item['company']} - {item['title']} ({item['source']})")
```

#### JavaScript

```javascript
import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });

const run = await client.actor('ryanclinton/job-market-intelligence').call({
    query: 'data engineer',
    remoteOnly: true,
    analyzeSkills: true,
    analyzeSalaries: true,
    maxResults: 200,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
const summary = items.find(i => i.type === 'summary');
const jobs = items.filter(i => i.type === 'job');

console.log(`Found ${summary.totalListings} listings, ${summary.remotePercentage}% remote`);
console.log('Top skills:', summary.topSkills.slice(0, 5).map(s => s.skill).join(', '));
jobs.forEach(j => console.log(`${j.company} - ${j.title} (${j.source})`));
```

#### cURL

```bash
## Start the actor
curl -X POST "https://api.apify.com/v2/acts/ryanclinton~job-market-intelligence/runs?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "data engineer",
    "remoteOnly": true,
    "analyzeSkills": true,
    "maxResults": 200
  }'

## Fetch results (use defaultDatasetId from the response above)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"
```

### How It Works — Technical Details

```
Input: query, location, remoteOnly, datePosted, sources, maxResults
  │
  ▼
┌──────────────────────────────────────────────────────────────────┐
│ PARALLEL FETCH (Promise.allSettled — failures don't crash run)  │
│                                                                  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────┐  ┌─────────┐ │
│  │ Remotive     │  │ Arbeitnow    │  │ Jobicy   │  │ HN      │ │
│  │              │  │              │  │          │  │ Algolia │ │
│  │ GET /api/    │  │ GET /api/    │  │ GET /api │  │ GET /api│ │
│  │ remote-jobs  │  │ job-board-api│  │ /v2/     │  │ /v1/    │ │
│  │ ?search=X    │  │ ?search=X    │  │ remote-  │  │ search  │ │
│  │ &limit=N     │  │ &page=1..3   │  │ jobs     │  │ ?query= │ │
│  │              │  │              │  │ ?count=N │  │ X&tags= │ │
│  │ Salary from  │  │ Salary from  │  │ &tag=X   │  │ comment │ │
│  │ field +      │  │ description  │  │          │  │ ,ask_hn │ │
│  │ description  │  │ regex        │  │ Salary   │  │         │ │
│  │ fallback     │  │              │  │ from API │  │ Last    │ │
│  │              │  │ created_at   │  │ fields   │  │ 90 days │ │
│  │ Remote-only  │  │ = Unix epoch │  │          │  │         │ │
│  │ board        │  │              │  │ Remote-  │  │ Parse:  │ │
│  │              │  │ European     │  │ only     │  │ company │ │
│  │              │  │ focus        │  │ board    │  │ from 1st│ │
│  │              │  │              │  │          │  │ line    │ │
│  └──────┬───────┘  └──────┬───────┘  └────┬─────┘  └────┬────┘ │
│         │                 │               │              │      │
└─────────┼─────────────────┼───────────────┼──────────────┼──────┘
          │                 │               │              │
          ▼                 ▼               ▼              ▼
    ┌─────────────────────────────────────────────────────────┐
    │ NORMALIZE to NormalizedJob schema                       │
    │ (title, company, location, remote, salary, skills...)   │
    │                                                         │
    │ Skills: 80+ regex patterns across 6 categories          │
    │ (extensible via customSkills input)                     │
    │ Salary: USD/EUR regex from fields + description text    │
    │ Job type: normalize → full-time/part-time/contract/etc  │
    │ Description: strip HTML, max 2,000 chars                │
    └─────────────────────┬───────────────────────────────────┘
                          │
                          ▼
    ┌─────────────────────────────────────────────────────────┐
    │ FILTER PIPELINE (sequential)                            │
    │                                                         │
    │  1. Date filter (day=24h, week=7d, month=30d)           │
    │  2. Remote-only filter (j.remote === true)              │
    │  3. Location filter (case-insensitive substring)        │
    │     └─ Graceful fallback: if ALL removed, re-include    │
    │  4. Company name filter (case-insensitive substring)    │
    │  5. Source weighting (deterministic per-listing hash)   │
    │     └─ Only applied when sourceWeights is set           │
    │  6. Incremental drop (URLs from prior snapshot)         │
    │     └─ Only applied when incremental: true + baseline   │
    │  7. Deduplication (normalized title + URL secondary)    │
    │     ├─ Title: lowercase, strip noise tokens, sort       │
    │     ├─ URL: hostname + pathname secondary key           │
    │     └─ Tracks crossSourceCount per dedup key            │
    │  8. Cap at maxResults                                   │
    │  9. Compute market median (single salary pass)          │
    └─────────────────────┬───────────────────────────────────┘
                          │
                          ▼
    ┌─────────────────────────────────────────────────────────┐
    │ PER-JOB ENRICHMENT                                      │
    │                                                         │
    │  • seniorityLevel (regex over title + first 400 chars)  │
    │  • experienceYearsMin/Max (regex on description)        │
    │  • degreeRequired + degreeIsHardRequirement             │
    │  • skillCategoryProfile (dominant skill area)           │
    │  • crossSourceConfirmed + crossSourceCount              │
    │  • compensationTier (vs market median)                  │
    │  • recommendedAction + actionReason (decision enum)     │
    └─────────────────────┬───────────────────────────────────┘
                          │
                          ▼
    ┌─────────────────────────────────────────────────────────┐
    │ BUILD SUMMARY REPORT                                    │
    │                                                         │
    │  • Source breakdown + sourcesQueried/Succeeded/Failed   │
    │  • Top 30 skills by frequency + percentage              │
    │  • Salary: min, max, median, average + P10/25/50/75/90  │
    │  • Skill premiums (≥5 sample) vs cohort median          │
    │  • Top 20 hiring companies by openings                  │
    │  • Job type breakdown                                   │
    │  • Remote percentage                                    │
    │  • Seniority / experience / degree breakdowns           │
    │  • Skill category demand (% per category)               │
    │  • Cross-source overlap count                           │
    │  • marketTightness + skillScarcity + distribution health│
    │  • Per-segment analytics (when groupBy is set)          │
    │  • dataQuality + warnings + analysisMetadata            │
    │  • marketSnapshot + claim (Slack/email-ready)           │
    │  • snapshotId (cohort fingerprint)                      │
    │  • runMode + baselineStatus + schemaVersion             │
    └─────────────────────┬───────────────────────────────────┘
                          │
                          ▼
            ┌─────────────────────────────────┐
            │ HISTORICAL SNAPSHOT (opt-in)    │
            │                                 │
            │  enableHistoricalTracking: true │
            │   ├─ Read prior snapshot from   │
            │   │  named KV store             │
            │   ├─ Compute trendInsights      │
            │   │  (rising/falling skills,    │
            │   │  salary direction, growth)  │
            │   └─ Write fresh snapshot       │
            └─────────────────┬───────────────┘
                              │
                              ▼
              Push to Dataset:
              [summary, ...jobs]
              + Actor.setValue('SUMMARY', summary)
```

#### Data Source Details

| Source | API Endpoint | Coverage | Salary Data | Notes |
|--------|-------------|----------|-------------|-------|
| **Remotive** | `remotive.com/api/remote-jobs` | Remote tech jobs worldwide | Structured field + description regex | Single page, `?search=X&limit=N` |
| **Arbeitnow** | `arbeitnow.com/api/job-board-api` | European focus, all job types | Description regex only | Paginated up to 3 pages, `created_at` is Unix timestamp |
| **Jobicy** | `jobicy.com/api/v2/remote-jobs` | Remote-first jobs | Structured `annualSalaryMin/Max` fields | `?count=N&tag=X` |
| **HN Who's Hiring** | `hn.algolia.com/api/v1/search` | Startup jobs from monthly threads | Description regex only | Searches comments from last 90 days, parses company from first line |

#### Skill Detection System

The actor scans each job description against 80+ built-in technology patterns organized into 6 categories. Add domain-specific skills via the `customSkills` input — they're treated as first-class members of the categorisation, premium, and scarcity systems.

| Category | Skills Detected |
|----------|----------------|
| **Languages** | Python, JavaScript, TypeScript, Java, Rust, C++, Ruby, PHP, Swift, Kotlin, Scala, SQL, R, Go |
| **Frameworks** | React, Angular, Vue, Next.js, Django, Flask, Spring, Rails, Laravel, FastAPI, Express, Node.js, Svelte, NestJS, .NET |
| **Cloud** | AWS, Azure, GCP, Docker, Kubernetes, Terraform, CI/CD, Jenkins, GitHub Actions, CloudFormation |
| **Data** | PostgreSQL, MongoDB, Redis, Elasticsearch, Kafka, Spark, Snowflake, BigQuery, Airflow, MySQL, DynamoDB, Cassandra, Redshift |
| **AI/ML** | Machine Learning, Deep Learning, NLP, Computer Vision, PyTorch, TensorFlow, LLM, GPT, RAG, Generative AI, Neural Network |
| **Other** | Git, Linux, Agile, REST, GraphQL, gRPC, Microservices, Scrum, DevOps, SRE |

**Special handling**: R and Go use context-aware regex to avoid false positives (e.g., "R" only matches when near "programming", "language", or other languages; "Go" matches "Golang" or "Go" in programming context).

#### Salary Extraction

Salary parsing uses multiple regex patterns applied to both structured API fields and free-text descriptions:

| Pattern | Example | Currency |
|---------|---------|----------|
| `$Xk - $Xk` | $120k - $180k | USD |
| `$X,XXX - $X,XXX` | $120,000 - $180,000 | USD |
| `$Xk/year` | $150k/year | USD |
| `$X,XXX/year` | $150,000/year | USD |
| `€X - €X` | €50,000 - €80,000 | EUR |

Values under 1,000 are automatically multiplied by 1,000 (treating "150" as "$150k"). The summary report computes statistics from the sorted union of all min and max salary values.

#### Deduplication Algorithm

Two-phase deduplication for resilience against the same role posted across multiple boards with cosmetic title differences.

1. **Title normalization** — the title is lowercased, stripped of punctuation, and tokenized. Noise tokens (`senior`, `sr`, `jr`, `mid`, `junior`, `staff`, `principal`, `lead`, `remote`, `fulltime`, `i`, `ii`, `iii`, articles, prepositions) are removed so `"Senior React Engineer"` and `"React Engineer (Sr)"` collapse to the same key. Remaining tokens are alphabetised and capped at 80 characters.
2. **Primary dedup key** = `company.toLowerCase().trim() + "::" + normalizedTitle`.
3. **URL secondary key** = `hostname + pathname` from `job.url`. If the same URL has been seen under any primary key, the listing is folded into that key's `crossSourceCount` rather than re-counted.
4. The first listing encountered for each primary key is kept; subsequent duplicates increment `crossSourceCount` on the surviving record. `crossSourceConfirmed: true` fires when count > 1.

The two-phase approach catches both (a) the same role with cosmetic title variants and (b) the exact same URL re-syndicated to multiple boards.

#### HN Who's Hiring Comment Parsing

Hacker News comments are unstructured text. The actor extracts structured data via:

- **Company**: Regex on first line: `^([A-Z][A-Za-z0-9\s&.'-]+?)[\s]*[|(\-–]/` (expects "Company | Role" format)
- **Role**: Matches patterns like "hiring/looking for/seeking X" or "Company | X"
- **Remote**: Word boundary match for `/\bremote\b/i`
- **Location**: Matches "location/based in/office in: X"
- **Minimum length**: Comments under 50 characters are skipped

### How Much Does It Cost?

The Job Market Intelligence actor uses minimal compute resources because it calls lightweight REST APIs rather than rendering web pages. No proxies are required.

The actor is billed pay-per-event: **one `report-generated` charge per successful run** regardless of result count, source count, or whether segmentation / historical tracking / incremental mode are enabled. Apify platform compute is billed separately at standard rates and depends on memory and runtime — runs typically complete in well under a minute, and the actor's defaults (512 MB) keep platform compute modest. A scheduled daily run for monitoring is significantly cheaper than running ad-hoc scrapes against multiple sources individually.

The exact PPE price for the report-generated event is shown in the Apify Store listing and logged at the start of every run.

Default memory is 512 MB and most runs complete in well under a minute, so platform compute is a small additional charge on top of the report-generated event.

### Tips

- **Start broad, then filter** — Run a general query like "engineer" first to see the full landscape, then narrow with location or company filters in subsequent runs.
- **Combine sources strategically** — Remotive and Jobicy focus on remote roles, Arbeitnow covers European markets heavily, and HN Who's Hiring surfaces startup opportunities. Use the `sources` parameter to target specific ecosystems.
- **Schedule weekly runs** to build a time-series dataset of skill demand trends. Export to Google Sheets and chart how Python vs. Rust demand changes month over month.
- **Use `maxResults: 500`** for comprehensive market reports, or keep it at 50 for quick daily pulse checks.
- **Filter by company name** to monitor a specific competitor's hiring velocity — a sudden spike in open roles often signals a new product launch or funding round.
- **Disable salary or skill analysis** with the toggle fields if you only need raw listings. This slightly reduces processing time for very large result sets.

### This is NOT for you if

Skip this actor if any of these describe you — there's a better tool for your job:

- **You only want raw job listings** with no analytics layer → use a basic single-source scraper
- **You need LinkedIn, Indeed, or Glassdoor data specifically** → use a dedicated scraper for that platform; those sites are auth-walled and explicitly out of scope here
- **You're not making decisions from job market data** → if you just want to display listings to end-users, the decision-engine layer is overhead you won't use
- **You need real-time / streaming hiring velocity** (sub-hour) → snapshots are per-run, not streaming. The minimum cadence is "as often as you schedule the actor"
- **You need candidate-side data** (LinkedIn profiles, resumes, talent pools) → this is a supply-side actor (job postings); it doesn't model the candidate pool
- **You need to auto-apply / auto-submit applications** → out of scope and against most boards' ToS
- **You need salary parsing in GBP / CAD / AUD / JPY** → only USD and EUR salary patterns are recognised; other currencies pass through unparsed in `description`

### What this actor does NOT do

Honest scope so you don't buy the wrong tool:

| Need | Use this instead |
|------|------------------|
| LinkedIn / Indeed / Glassdoor coverage | Dedicated single-source scrapers — those platforms require auth and anti-bot handling that this actor explicitly does not do |
| Glassdoor company review / sentiment / rating enrichment | A separate Glassdoor scraper — joining is a downstream task |
| Layoff cross-reference (layoffs.fyi) | A separate layoff-tracker actor — keeps this actor's PPE economics simple |
| Candidate-side data (LinkedIn profiles, resumes, talent pools) | Out of scope — this actor returns the supply side (job postings), not the demand side |
| Auto-applying / auto-submitting applications | Out of scope and against most boards' ToS |
| GBP / CAD / AUD / JPY salary parsing | Only USD and EUR salary patterns are recognized; other currencies pass through unparsed in `description` |
| Real-time hiring-velocity tracking | Schedule the actor with `enableHistoricalTracking: true` — `trendInsights` gives you listing-growth-rate, salary direction, rising/falling skills, new vs departed companies on every subsequent run. Sub-hour velocity isn't supported (snapshots are per-run, not streaming). |

The actor's positioning: **composable job market intelligence for automation** — the cleanest, fastest "what does the public-API job market look like for X right now, AND how is it shifting?" with decision-ready enums on every record and trend insights on every scheduled run. If you need enterprise-grade hiring intelligence (Lightcast, Revelio Labs, LinkedIn Talent Insights), this isn't a replacement — but at <$1/run it's the right starting point for most automation, research, and alerting workflows.

### Limitations

- **Source coverage** — Only four job boards are queried. Major platforms like LinkedIn, Indeed, and Glassdoor are not included due to their authentication requirements and anti-bot measures.
- **Salary data availability** — Not all listings include salary information. The salary statistics are based only on listings that provide parseable salary data, which may skew toward certain markets or seniority levels.
- **Currency support** — Only USD (`$`) and EUR (`€`) salary patterns are recognized. Salaries in GBP, CAD, AUD, or other currencies will not be extracted into structured salary fields.
- **Skill detection scope** — The 80+ built-in skill patterns are tuned for technology roles. Non-tech skills (e.g., "project management", "sales") are not tracked. False positives are possible for ambiguous terms. Use the `customSkills` input to add domain-specific terms.
- **HN comment parsing** — Hacker News "Who's Hiring" comments are free-form text. Company name, role, and location extraction is best-effort via regex and may produce incorrect results for non-standard formats.
- **No direct application** — The actor collects listing URLs but does not submit job applications on your behalf.
- **Real-time freshness** — Data comes from live API calls, but the underlying job boards may have their own delays in indexing new postings.
- **Deduplication limits** — The deduplication key uses company name + first 60 characters of the title. Listings with slightly different titles for the same role may not be caught.

### Responsible Use

This actor accesses only publicly available job board APIs that are designed for programmatic access. It does not bypass authentication, scrape private data, or violate any terms of service. When using job market data:

- Use data for legitimate research, job seeking, or workforce planning purposes
- Do not use automated data to discriminate against job seekers or companies
- Respect the intellectual property of job descriptions and company information
- Comply with all applicable employment and data protection laws in your jurisdiction
- See [Apify's guide on web scraping legality](https://blog.apify.com/is-web-scraping-legal/) for general guidance

### FAQ

**Do I need any API keys to use this actor?**
No. All four data sources (Remotive, Arbeitnow, Jobicy, HN Algolia) are free public APIs. No authentication is required.

**How many jobs can I get per run?**
The actor can return up to 500 listings per run. The actual count depends on how many matches exist for your query across all four sources.

**Does this actor work for non-tech jobs?**
Yes. While the skill extraction is tuned for technology roles, the job search itself works for any keyword — "marketing manager", "nurse", "accountant", or any other role. The skill analysis will simply return fewer matches for non-tech positions.

**How fresh is the data?**
Listing data is fetched live at run time. Use the `datePosted` filter to restrict results to the last 24 hours, week, or month. Historical snapshots (used for `trendInsights` and `incremental` mode) are only stored when `enableHistoricalTracking: true` is enabled — and even then, only a bounded summary record per query (top skills counts, companies, seen URLs) is persisted, not the raw listings.

**Can I filter for a specific country or city?**
Yes. Enter the location in the `location` field (e.g., "Germany", "London", "USA"). The actor performs a case-insensitive substring match against each listing's location field. If the filter removes all results, the actor gracefully falls back to inc

# Actor input Schema

## `query` (type: `string`):

Job search query (e.g., 'software engineer', 'data scientist', 'product manager')

## `location` (type: `string`):

Location filter (e.g., 'San Francisco', 'Remote', 'New York')

## `companyName` (type: `string`):

Filter by specific company name

## `remoteOnly` (type: `boolean`):

Only show remote jobs

## `datePosted` (type: `string`):

How recent should the job postings be

## `sources` (type: `array`):

Which sources to search (leave empty for all). Available: remotive, arbeitnow, jobicy, hn-whoishiring

## `analyzeSkills` (type: `boolean`):

Extract and rank mentioned skills from job descriptions

## `analyzeSalaries` (type: `boolean`):

Extract and summarize salary data from job postings

## `maxResults` (type: `integer`):

Maximum number of job listings to return. Analytics quality plateaus past ~200 — going higher mostly increases runtime, not insight.

## `sourceWeights` (type: `object`):

Per-source sub-sampling fraction (0..1). Use this to down-weight noisier sources — e.g. {"hn-whoishiring": 0.5} keeps roughly half of HN listings while taking the other sources whole. Sources not listed here pass through at full weight. Sub-sampling is deterministic via per-listing hash, so re-runs are reproducible. ⚠️ Use only when you intentionally want a representative sample, not complete coverage — sub-sampling drops listings, so cohort size shrinks.

## `customSkills` (type: `array`):

Add domain-specific skills to detect alongside the built-in 80+ technologies. Each entry: { name, regex, category? }. Example: \[{"name": "Snowpark", "regex": "\bsnowpark\b", "category": "Data"}]. Categories: Languages, Frameworks, Cloud, Data, AI/ML, Other (default). Invalid regex entries are logged and skipped.

## `groupBy` (type: `array`):

Split analytics into per-segment reports — fixes the cohort-mixing distortion when one query spans regions / seniorities / job types. Pick one or more dimensions; output adds a `segments[]` array with per-segment salary, top skills, seniority breakdown, and remote percentage. Empty = single cohort.

## `enableHistoricalTracking` (type: `boolean`):

Persist a cohort snapshot to a named KV store (job-market-intelligence-history) and compute trend insights against the previous run. Adds: salaryMedianChange, listingGrowthRate, topRising/FallingSkills, newCompanies/departedCompanies, direction (expanding/stable/tightening). First run with this on returns trendInsights=null and writes the baseline snapshot.

## `historyStateKey` (type: `string`):

Override the auto-derived snapshot key (default: hash of query + location). Use a stable string when monitoring the same cohort across runs with slightly different filters.

## `incremental` (type: `boolean`):

When enabled together with historical tracking, drops listings whose URLs were already returned in the previous run. Reduces downstream processing/noise — only fresh listings reach your dataset/pipelines (all sources are still fetched in full so analytics like trend insights remain accurate). First run for a key returns everything (no prior URLs to filter).

## `lookbackDays` (type: `integer`):

Maximum age of the prior snapshot for trend computation. Snapshots older than this are treated as a first run. Default 30.

## `mode` (type: `string`):

Reorders the recommendedActions\[] block by audience priority. default = balanced; job\_seeker = bubbles learn-skill / apply-now / curriculum actions; recruiter = bubbles salary-band / hiring-velocity / role-spec actions; analyst = bubbles monitoring / strategy / regime actions. The full action list is always emitted — mode only affects ordering.

## `eventThresholds` (type: `object`):

Override the default thresholds that trigger entries in the events\[] array. Defaults: salarySpikePercent 5, salaryDropPercent -5, listingGrowthSpikePercent 25, listingDropPercent -25, remoteShiftPoints 5, skillEmergenceDeltaPercent 100. Example: {"salarySpikePercent": 3, "listingGrowthSpikePercent": 10} for noisier alerting.

## `whatIfScenarios` (type: `array`):

Counterfactual scenarios to evaluate against the cohort. Each: { type: 'salary\_change' | 'skill\_emphasis', percent? (for salary\_change), skill? (for skill\_emphasis), constraints?: { maxPercent?, minPercent? } }. When omitted, the actor auto-generates 2–4 representative scenarios. Outputs are derivable-only (percentile shift, tier change, scarcity match) — never forecasts. Example: \[{"type": "salary\_change", "percent": 10, "constraints": {"maxPercent": 5}}, {"type": "skill\_emphasis", "skill": "Rust"}].

## Actor input object example

```json
{
  "query": "software engineer",
  "remoteOnly": false,
  "datePosted": "month",
  "analyzeSkills": true,
  "analyzeSalaries": true,
  "maxResults": 100,
  "enableHistoricalTracking": false,
  "incremental": false,
  "lookbackDays": 30,
  "mode": "default"
}
```

# Actor output Schema

## `results` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "query": "software engineer"
};

// Run the Actor and wait for it to finish
const run = await client.actor("ryanclinton/job-market-intelligence").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "query": "software engineer" }

# Run the Actor and wait for it to finish
run = client.actor("ryanclinton/job-market-intelligence").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "query": "software engineer"
}' |
apify call ryanclinton/job-market-intelligence --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=ryanclinton/job-market-intelligence",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Job Market Intelligence",
        "description": "Aggregate job listings from four free data sources, deduplicate them, and generate a structured intelligence report with skill demand rankings, salary benchmarks, top hiring companies, and remote-work statistics — all without any API keys.",
        "version": "1.0",
        "x-build-id": "e9qaeDYtjR3HQYCzp"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/ryanclinton~job-market-intelligence/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-ryanclinton-job-market-intelligence",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/ryanclinton~job-market-intelligence/runs": {
            "post": {
                "operationId": "runs-sync-ryanclinton-job-market-intelligence",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/ryanclinton~job-market-intelligence/run-sync": {
            "post": {
                "operationId": "run-sync-ryanclinton-job-market-intelligence",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "query"
                ],
                "properties": {
                    "query": {
                        "title": "Search Query",
                        "type": "string",
                        "description": "Job search query (e.g., 'software engineer', 'data scientist', 'product manager')",
                        "default": "software engineer"
                    },
                    "location": {
                        "title": "Location",
                        "type": "string",
                        "description": "Location filter (e.g., 'San Francisco', 'Remote', 'New York')"
                    },
                    "companyName": {
                        "title": "Company Name",
                        "type": "string",
                        "description": "Filter by specific company name"
                    },
                    "remoteOnly": {
                        "title": "Remote Only",
                        "type": "boolean",
                        "description": "Only show remote jobs",
                        "default": false
                    },
                    "datePosted": {
                        "title": "Date Posted",
                        "enum": [
                            "day",
                            "week",
                            "month",
                            "any"
                        ],
                        "type": "string",
                        "description": "How recent should the job postings be",
                        "default": "month"
                    },
                    "sources": {
                        "title": "Sources",
                        "type": "array",
                        "description": "Which sources to search (leave empty for all). Available: remotive, arbeitnow, jobicy, hn-whoishiring",
                        "items": {
                            "type": "string"
                        }
                    },
                    "analyzeSkills": {
                        "title": "Analyze Skills",
                        "type": "boolean",
                        "description": "Extract and rank mentioned skills from job descriptions",
                        "default": true
                    },
                    "analyzeSalaries": {
                        "title": "Analyze Salaries",
                        "type": "boolean",
                        "description": "Extract and summarize salary data from job postings",
                        "default": true
                    },
                    "maxResults": {
                        "title": "Max Results",
                        "minimum": 1,
                        "maximum": 500,
                        "type": "integer",
                        "description": "Maximum number of job listings to return. Analytics quality plateaus past ~200 — going higher mostly increases runtime, not insight.",
                        "default": 100
                    },
                    "sourceWeights": {
                        "title": "Source Weights",
                        "type": "object",
                        "description": "Per-source sub-sampling fraction (0..1). Use this to down-weight noisier sources — e.g. {\"hn-whoishiring\": 0.5} keeps roughly half of HN listings while taking the other sources whole. Sources not listed here pass through at full weight. Sub-sampling is deterministic via per-listing hash, so re-runs are reproducible. ⚠️ Use only when you intentionally want a representative sample, not complete coverage — sub-sampling drops listings, so cohort size shrinks."
                    },
                    "customSkills": {
                        "title": "Custom Skills",
                        "type": "array",
                        "description": "Add domain-specific skills to detect alongside the built-in 80+ technologies. Each entry: { name, regex, category? }. Example: [{\"name\": \"Snowpark\", \"regex\": \"\\\\bsnowpark\\\\b\", \"category\": \"Data\"}]. Categories: Languages, Frameworks, Cloud, Data, AI/ML, Other (default). Invalid regex entries are logged and skipped."
                    },
                    "groupBy": {
                        "title": "Segment By",
                        "type": "array",
                        "description": "Split analytics into per-segment reports — fixes the cohort-mixing distortion when one query spans regions / seniorities / job types. Pick one or more dimensions; output adds a `segments[]` array with per-segment salary, top skills, seniority breakdown, and remote percentage. Empty = single cohort.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "enableHistoricalTracking": {
                        "title": "Enable Historical Tracking",
                        "type": "boolean",
                        "description": "Persist a cohort snapshot to a named KV store (job-market-intelligence-history) and compute trend insights against the previous run. Adds: salaryMedianChange, listingGrowthRate, topRising/FallingSkills, newCompanies/departedCompanies, direction (expanding/stable/tightening). First run with this on returns trendInsights=null and writes the baseline snapshot.",
                        "default": false
                    },
                    "historyStateKey": {
                        "title": "History State Key",
                        "type": "string",
                        "description": "Override the auto-derived snapshot key (default: hash of query + location). Use a stable string when monitoring the same cohort across runs with slightly different filters."
                    },
                    "incremental": {
                        "title": "Incremental Mode",
                        "type": "boolean",
                        "description": "When enabled together with historical tracking, drops listings whose URLs were already returned in the previous run. Reduces downstream processing/noise — only fresh listings reach your dataset/pipelines (all sources are still fetched in full so analytics like trend insights remain accurate). First run for a key returns everything (no prior URLs to filter).",
                        "default": false
                    },
                    "lookbackDays": {
                        "title": "Lookback Days",
                        "minimum": 1,
                        "maximum": 365,
                        "type": "integer",
                        "description": "Maximum age of the prior snapshot for trend computation. Snapshots older than this are treated as a first run. Default 30.",
                        "default": 30
                    },
                    "mode": {
                        "title": "Mode (Persona Preset)",
                        "enum": [
                            "default",
                            "job_seeker",
                            "recruiter",
                            "analyst"
                        ],
                        "type": "string",
                        "description": "Reorders the recommendedActions[] block by audience priority. default = balanced; job_seeker = bubbles learn-skill / apply-now / curriculum actions; recruiter = bubbles salary-band / hiring-velocity / role-spec actions; analyst = bubbles monitoring / strategy / regime actions. The full action list is always emitted — mode only affects ordering.",
                        "default": "default"
                    },
                    "eventThresholds": {
                        "title": "Event Thresholds",
                        "type": "object",
                        "description": "Override the default thresholds that trigger entries in the events[] array. Defaults: salarySpikePercent 5, salaryDropPercent -5, listingGrowthSpikePercent 25, listingDropPercent -25, remoteShiftPoints 5, skillEmergenceDeltaPercent 100. Example: {\"salarySpikePercent\": 3, \"listingGrowthSpikePercent\": 10} for noisier alerting."
                    },
                    "whatIfScenarios": {
                        "title": "What-If Scenarios",
                        "type": "array",
                        "description": "Counterfactual scenarios to evaluate against the cohort. Each: { type: 'salary_change' | 'skill_emphasis', percent? (for salary_change), skill? (for skill_emphasis), constraints?: { maxPercent?, minPercent? } }. When omitted, the actor auto-generates 2–4 representative scenarios. Outputs are derivable-only (percentile shift, tier change, scarcity match) — never forecasts. Example: [{\"type\": \"salary_change\", \"percent\": 10, \"constraints\": {\"maxPercent\": 5}}, {\"type\": \"skill_emphasis\", \"skill\": \"Rust\"}]."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
