# CatchAll (`newscatcher/catchall`) Actor

Submit a CatchAll job, poll until completion, and retrieve all valid records. Results are saved to the Dataset and Key-Value Store.

- **URL**: https://apify.com/newscatcher/catchall.md
- **Developed by:** [Newscatcher-CatchAll](https://apify.com/newscatcher) (community)
- **Categories:** News, AI, Automation
- **Stats:** 4 total users, 1 monthly users, 0.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

$0.01 / 1,000 valid\_records

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## CatchAll — structured web research

CatchAll transforms plain-text questions into structured, validated datasets extracted from billions of web pages. Enter a query like "Series B funding rounds for SaaS startups" and receive structured JSON records with company names, deal sizes, dates, and source citations — no scraping logic required.

**CatchAll is not a traditional web scraper.** It searches NewsCatcher's proprietary index of 2+ billion web pages, clusters related pages into real-world events, validates relevance, and extracts structured data — all in a single run.

### What can CatchAll do?

- **Find specific events at scale** — acquisitions, funding rounds, product launches, regulatory approvals, executive changes, and more
- **Return structured JSON** — each record includes extracted fields, confidence scores, and source citations
- **Handle the full job lifecycle** — this Actor submits a job, polls until completion, and retrieves all results automatically
- **Save results to Apify storage** — records are stored in both a Dataset and a Key-value store for easy export

CatchAll pairs well with the Apify platform. Schedule recurring runs, chain with other Actors using integrations, export results via API, or send data to external services through webhooks.

### How to use CatchAll

1. Go to the [CatchAll Actor page](https://apify.com/newscatcher/catchall) and click **Try for free**.
2. Enter your **CatchAll API key** (get one at [platform.newscatcherapi.com](https://platform.newscatcherapi.com)).
3. Type a plain-text **query** describing what you want to find.
4. Optionally adjust the record limit, or add custom validators and enrichments as JSON.
5. Click **Save & Start**. The Actor submits the job, polls for status, and retrieves results when complete.
6. Open the **Output** tab to review records, or go to **Storage** to download the dataset as JSON, CSV, or Excel.

A typical run takes 10–15 minutes depending on query complexity and the number of web pages processed.

### Input

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `apiKey` | String | Yes | Your CatchAll API key |
| `query` | String | Yes | Plain-text question describing what to find |
| `context` | String | No | Additional guidance to focus extraction |
| `limit` | Integer | No | Maximum number of records to return (default: 50, minimum: 11) |
| `validatorsJson` | String | No | JSON array of validator objects. Example: `[{"name":"is_acquisition","description":"...","type":"boolean"}]` |
| `enrichmentsJson` | String | No | JSON array of enrichment objects. Example: `[{"name":"acquirer_company","description":"...","type":"company"}]` |
| `pollIntervalSeconds` | Integer | No | How often to check job status, in seconds (default: 60) |
| `timeoutMinutes` | Integer | No | Stop polling after this many minutes (default: 30) |
| `pageSize` | Integer | No | Records to fetch per page when pulling results (default: 100) |

If you leave `validatorsJson` and `enrichmentsJson` empty (or as `[]`), CatchAll generates them automatically based on your query.

#### Input example

```json
{
  "apiKey": "YOUR_CATCHALL_API_KEY",
  "query": "AI company acquisitions",
  "context": "Focus on deal size and acquiring company details",
  "limit": 10
}
````

#### Input example with custom enrichments

```json
{
  "apiKey": "YOUR_CATCHALL_API_KEY",
  "query": "AI company acquisitions",
  "context": "Focus on deal size and acquiring company details",
  "limit": 10,
  "validatorsJson": "[{\"name\":\"is_acquisition\",\"description\":\"true if the page describes a company acquisition\",\"type\":\"boolean\"}]",
  "enrichmentsJson": "[{\"name\":\"acquiring_company\",\"description\":\"Name of the acquiring company\",\"type\":\"company\"},{\"name\":\"deal_value\",\"description\":\"Deal value in USD\",\"type\":\"number\"}]"
}
```

### Output

Each record in the output dataset contains:

| Field | Description |
|-------|-------------|
| `record_id` | Unique identifier for the record |
| `record_title` | Short title summarizing the event |
| `enrichment` | Structured data extracted from web pages (dynamic fields) |
| `enrichment.enrichment_confidence` | Overall confidence score: `low`, `medium`, or `high` |
| `citations` | Array of source documents with title, URL, and publication date |

The `enrichment` object uses dynamic schemas — field names are generated based on your query. For example, a funding query might return `funding_amount`, `investee_company`, and `funding_date`. If you need consistent field names across runs, define custom enrichments in `enrichmentsJson`.

#### Output example

```json
{
  "record_id": "6983973854314692457",
  "record_title": "VulnCheck Raises $25M Series B Funding",
  "enrichment": {
    "enrichment_confidence": "high",
    "funding_amount": 25000000,
    "funding_currency": "USD",
    "funding_date": "2026-02-17",
    "investee_company": {
      "source_text": "VulnCheck",
      "confidence": 0.99,
      "metadata": {
        "name": "VulnCheck",
        "domain_url": "vulncheck.com",
        "domain_url_confidence": "high"
      }
    },
    "investor_company": {
      "source_text": "Sorenson Capital",
      "confidence": 0.99,
      "metadata": {
        "name": "Sorenson Capital",
        "domain_url": null,
        "domain_url_confidence": null
      }
    }
  },
  "citations": [
    {
      "title": "Exclusive: VulnCheck raises $25M funding to help companies patch software bugs",
      "link": "https://example.com/article",
      "published_date": "2026-02-17T10:00:00Z"
    }
  ]
}
```

### Tips for effective queries

- **Be specific about what you're looking for.** "Series B funding rounds for SaaS startups" works better than "startup funding."
- **Use the journalist test.** If a journalist would write a news article about it, CatchAll can find it.
- **Target single entities or related entities.** "Apple OR Google acquisitions in healthcare" is effective. Mixing unrelated topics in one query reduces accuracy.
- **Add context to guide extraction.** Use the `context` field to specify what data points matter most.
- **Start with a small limit for testing.** You can expand results later with the [CatchAll Continue](https://apify.com/newscatcher/catchall-continue) Actor without reprocessing.

### How much does it cost?

This Actor is free to use on Apify — you only pay for Apify platform usage (compute units). However, each run consumes credits from your CatchAll API plan. Check your plan limits at [platform.newscatcherapi.com](https://platform.newscatcherapi.com).

### Other CatchAll Actors

CatchAll also offers utility Actors for building custom workflows. Each maps to a single API endpoint:

| Actor | Description |
|-------|-------------|
| [CatchAll Initialize](https://apify.com/newscatcher/catchall-initialize) | Get suggested validators, enrichments, and date ranges before submitting |
| [CatchAll Create Job](https://apify.com/newscatcher/catchall-create-job) | Submit a job without polling or fetching results |
| [CatchAll Get Job Status](https://apify.com/newscatcher/catchall-get-job-status) | Check current job status and step progress |
| [CatchAll Pull Results](https://apify.com/newscatcher/catchall-pull-results) | Retrieve all records from a completed job |
| [CatchAll Early Results](https://apify.com/newscatcher/catchall-early-results) | Get partial results before a job completes |
| [CatchAll Continue](https://apify.com/newscatcher/catchall-continue) | Expand a job to process more records |
| [CatchAll Create Monitor](https://apify.com/newscatcher/catchall-create-monitor) | Schedule recurring jobs |
| [CatchAll Update Monitor](https://apify.com/newscatcher/catchall-update-monitor) | Update a monitor's webhook configuration |
| [CatchAll Start/Stop Monitor](https://apify.com/newscatcher/catchall-start-stop-monitor) | Pause or resume a monitor |

Chain these Actors using Apify's built-in [integrations](https://docs.apify.com/platform/integrations/actors) and [webhooks](https://docs.apify.com/platform/integrations/webhooks) to build automated data pipelines.

### FAQ

**How long does a run take?**
A typical CatchAll job processes 50,000+ web pages and takes 10–15 minutes. The Actor polls the API automatically until the job completes or the timeout is reached (default: 30 minutes).

**What are dynamic schemas?**
CatchAll generates response schemas dynamically for each job. Field names in the `enrichment` object can vary between runs, even with the same query. To get consistent field names, define custom enrichments in `enrichmentsJson`. Learn more in the [dynamic schemas guide](https://www.newscatcherapi.com/docs/web-search-api/guides-and-concepts/dynamic-schemas).

**Can I get more results after a run completes?**
Yes. Use the [CatchAll Continue](https://apify.com/newscatcher/catchall-continue) Actor with the same `jobId` to process additional records without restarting the job.

**Where can I get help?**

- [CatchAll documentation](https://www.newscatcherapi.com/docs/web-search-api/get-started/introduction)
- [Write effective queries](https://www.newscatcherapi.com/docs/web-search-api/how-to/write-effective-queries)
- Open an issue on the Actor's **Issues** tab in Apify Console

# Actor input Schema

## `apiKey` (type: `string`):

Your CatchAll API key.

## `query` (type: `string`):

Natural language query describing the news you want.

## `context` (type: `string`):

Extra instructions for extraction/enrichment.

## `limit` (type: `integer`):

Maximum number of records to return. Must be >= 11.

## `validatorsJson` (type: `string`):

Paste a JSON array. Example: \[{"name":"is\_acquisition\_event","description":"...","type":"boolean"}]

## `enrichmentsJson` (type: `string`):

Paste a JSON array. Example: \[{"name":"acquirer\_company","description":"...","type":"company"}]

## `pollIntervalSeconds` (type: `integer`):

How often to check job status.

## `timeoutMinutes` (type: `integer`):

Stop polling after this many minutes.

## `pageSize` (type: `integer`):

How many records to fetch per page when pulling results.

## Actor input object example

```json
{
  "limit": 50,
  "validatorsJson": "[]",
  "enrichmentsJson": "[]",
  "pollIntervalSeconds": 60,
  "timeoutMinutes": 30,
  "pageSize": 100
}
```

# Actor output Schema

## `overview` (type: `string`):

No description

## `outputJson` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("newscatcher/catchall").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("newscatcher/catchall").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call newscatcher/catchall --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=newscatcher/catchall",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "CatchAll",
        "description": "Submit a CatchAll job, poll until completion, and retrieve all valid records. Results are saved to the Dataset and Key-Value Store.",
        "version": "0.0",
        "x-build-id": "aS8ELvjgGRenLdi9x"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/newscatcher~catchall/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-newscatcher-catchall",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/newscatcher~catchall/runs": {
            "post": {
                "operationId": "runs-sync-newscatcher-catchall",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/newscatcher~catchall/run-sync": {
            "post": {
                "operationId": "run-sync-newscatcher-catchall",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "apiKey",
                    "query"
                ],
                "properties": {
                    "apiKey": {
                        "title": "CatchAll API Key",
                        "type": "string",
                        "description": "Your CatchAll API key."
                    },
                    "query": {
                        "title": "Query",
                        "type": "string",
                        "description": "Natural language query describing the news you want."
                    },
                    "context": {
                        "title": "Context (optional)",
                        "type": "string",
                        "description": "Extra instructions for extraction/enrichment."
                    },
                    "limit": {
                        "title": "Limit",
                        "type": "integer",
                        "description": "Maximum number of records to return. Must be >= 11.",
                        "default": 50
                    },
                    "validatorsJson": {
                        "title": "Validators JSON (optional)",
                        "type": "string",
                        "description": "Paste a JSON array. Example: [{\"name\":\"is_acquisition_event\",\"description\":\"...\",\"type\":\"boolean\"}]",
                        "default": "[]"
                    },
                    "enrichmentsJson": {
                        "title": "Enrichments JSON (optional)",
                        "type": "string",
                        "description": "Paste a JSON array. Example: [{\"name\":\"acquirer_company\",\"description\":\"...\",\"type\":\"company\"}]",
                        "default": "[]"
                    },
                    "pollIntervalSeconds": {
                        "title": "Poll Interval (seconds)",
                        "type": "integer",
                        "description": "How often to check job status.",
                        "default": 60
                    },
                    "timeoutMinutes": {
                        "title": "Timeout (minutes)",
                        "type": "integer",
                        "description": "Stop polling after this many minutes.",
                        "default": 30
                    },
                    "pageSize": {
                        "title": "Page Size",
                        "type": "integer",
                        "description": "How many records to fetch per page when pulling results.",
                        "default": 100
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
