# USA HealthData.gov HHS Open Data Scraper (`parseforge/healthdata-scraper`) Actor

Collect health data catalog information from HealthData.gov . Filter by category, tags, view type, authority, and search terms to find exactly what you need. Perfect for researchers, data analysts, and healthcare professionals who need to discover and access public health datasets efficiently.

- **URL**: https://apify.com/parseforge/healthdata-scraper.md
- **Developed by:** [ParseForge](https://apify.com/parseforge) (community)
- **Categories:** Developer tools, Other, Automation
- **Stats:** 4 total users, 0 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per event

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

![ParseForge Banner](https://github.com/ParseForge/apify-assets/blob/ad35ccc13ddd068b9d6cba33f323962e39aed5b2/banner.jpg?raw=true)

## 📊 USA HealthData.gov HHS Open Data Scraper

> 🚀 **Collect health datasets, stories, charts, and maps from the U.S. HHS Open Data Catalog in seconds.** Filter by category, tags, view type, and authority. No coding, no API keys required.

> 🕒 **Last updated:** 2026-04-16 · **📊 40 fields** · **🏥 HHS Open Data** · **📂 Datasets, Stories, Charts, Maps**

The HealthData.gov Scraper automates the discovery and collection of health datasets from the U.S. Department of Health and Human Services Open Data Catalog. Each record includes the dataset name, unique ID, description, publisher, contact information, categories, tags, view and download counts, license details, file download links, and timestamps. You can filter by **keyword**, **category** (CDC, FDA, CMS, NIH, HHS), **tags**, **view type** (datasets, stories, charts, maps, files), **authority** (official or community), and **sort order**. Free users can collect up to **10 items** per run, while paid users can retrieve up to **1,000,000 records**.

Whether you are a healthcare researcher tracking new CDC datasets, a data scientist building a catalog of open health data, or a policy analyst monitoring HHS publications, this tool eliminates the manual browsing that HealthData.gov requires. Results export to **JSON, CSV, or Excel**, making it easy to load records into your database, BI tool, or analysis pipeline. Schedule recurring runs to automatically detect new datasets as they are published. The scraper handles pagination, normalizes metadata fields, and processes multiple content types including datasets, stories, charts, maps, and downloadable files.

| Target Audience | Use Cases |
|---|---|
| Healthcare Researchers | Discover and catalog open health datasets for analysis |
| Data Scientists | Build metadata indexes of available HHS data sources |
| Policy Analysts | Monitor new publications from CDC, FDA, CMS, and NIH |
| Public Health Teams | Track epidemiological datasets and surveillance data |
| Journalists | Find health data for investigative reporting |
| Academic Institutions | Locate research datasets for grant-funded projects |

---

### 📋 What the HealthData.gov Scraper does

- 📝 **Dataset names and IDs** - capture the title, unique identifier, and description for every item in the HHS catalog
- 🔗 **Direct URLs** - collect working links to each dataset page for quick access and verification
- 📊 **Engagement metrics** - pull view counts and download counts to identify the most popular datasets
- 👤 **Publisher and contact info** - identify which health authority published the data and how to reach them
- 🏷️ **Categories and tags** - classify items by health topic, authority (CDC, FDA, CMS), and custom tags
- 📁 **File downloads** - extract download links with format and size information for each available file

The scraper connects to the HealthData.gov catalog API and iterates through results using your specified filters. It processes datasets, stories, charts, maps, files, and calendars. Each record is normalized with consistent field names and pushed to an Apify dataset in real time. The tool supports both URL-based browsing (paste a HealthData.gov browse URL) and filter-based searching (set keywords and categories directly).

> 💡 **Why it matters:** HealthData.gov hosts thousands of datasets from dozens of health agencies. Manually browsing and cataloging this content is time-consuming. This scraper gives you structured metadata for the entire catalog in minutes.

---

### 🎬 Full Demo

_🚧 Coming soon..._

---

### ⚙️ Input

<table>
<tr><th>Field</th><th>Type</th><th>Required</th><th>Description</th></tr>
<tr><td>startUrl</td><td>string</td><td>No</td><td>Direct URL to a HealthData.gov browse page. Use this OR search filters, not both.</td></tr>
<tr><td>maxItems</td><td>integer</td><td>No</td><td>Maximum items to collect. Free: 10. Paid: up to 1,000,000.</td></tr>
<tr><td>q</td><td>string</td><td>No</td><td>Search term to find datasets (e.g., "diabetes", "vaccination").</td></tr>
<tr><td>category</td><td>string</td><td>No</td><td>Filter by category: CDC, FDA, CMS, HHS, NIH, Hospital, State.</td></tr>
<tr><td>tags</td><td>string</td><td>No</td><td>Filter by tags (comma-separated values).</td></tr>
<tr><td>limitTo</td><td>string</td><td>No</td><td>Content type: Datasets, Stories, Charts, Maps, Forms, Files, Calendars.</td></tr>
<tr><td>authority</td><td>string</td><td>No</td><td>Official health agency data or community-contributed content.</td></tr>
<tr><td>sortBy</td><td>string</td><td>No</td><td>Sort order: newest, alpha, most_accessed, relevance, recently_updated.</td></tr>
</table>

**Example 1: Browse newest datasets**
```json
{
  "startUrl": "https://healthdata.gov/browse?sortBy=newest&page=1&pageSize=20",
  "maxItems": 50
}
````

**Example 2: Search for vaccination data**

```json
{
  "q": "vaccination",
  "category": "Health",
  "limitTo": "datasets",
  "sortBy": "most_accessed",
  "maxItems": 100
}
```

> ⚠️ **Good to Know:** Free users are automatically limited to 10 items per run. Use either startUrl OR the search filters (q, category, tags), not both at the same time. The limitTo field lets you focus on specific content types like datasets or charts.

***

### 📊 Output

#### 🧾 Schema

| Emoji | Field | Type | Description |
|---|---|---|---|
| 📝 | datasetId | string | Unique identifier for the dataset |
| 🏷️ | datasetName | string | Title of the dataset or resource |
| 🔗 | datasetUrl | string | Direct link to the dataset page |
| 📄 | description | string | Full description of the dataset |
| 👤 | publisher | string | Agency or organization that published the data |
| 📧 | contactEmail | string | Contact email for the dataset publisher |
| 🏷️ | categories | array | Topic categories assigned to the dataset |
| 🔖 | tags | array | Topic tags for filtering and discovery |
| 📊 | viewCount | number | Total number of views |
| 📥 | downloadCount | number | Total number of downloads |
| 📜 | license | string | License type for the dataset |
| 📅 | createdAt | string | Date the dataset was first published |
| 📅 | publicationDate | string | Official publication date |
| 🔄 | lastUpdated | string | Most recent update timestamp |
| 📁 | downloads | array | Available file downloads with format and size |
| 🕐 | scrapedAt | string | Timestamp of data collection |
| ⚠️ | error | string | Error message if processing failed |

#### 📦 Sample records

<details>
<summary>📝 CDC dataset record</summary>

```json
{
  "datasetId": "abc123-def456",
  "datasetName": "COVID-19 Case Surveillance Public Use Data",
  "datasetUrl": "https://healthdata.gov/dataset/COVID-19-Case-Surveillance",
  "description": "Nationwide case-level data reported to CDC including demographics and outcomes.",
  "publisher": "Centers for Disease Control and Prevention",
  "contactEmail": "data@cdc.gov",
  "categories": ["Health", "COVID-19"],
  "tags": ["covid", "surveillance", "cases"],
  "viewCount": 450000,
  "downloadCount": 120000,
  "license": "Public Domain",
  "createdAt": "2020-05-01",
  "lastUpdated": "2026-03-15",
  "scrapedAt": "2026-04-16T12:00:00.000Z"
}
```

</details>

<details>
<summary>📝 Hospital quality dataset</summary>

```json
{
  "datasetId": "xyz789-ghi012",
  "datasetName": "Hospital General Information",
  "datasetUrl": "https://healthdata.gov/dataset/Hospital-General-Information",
  "description": "General information about all registered hospitals including quality ratings.",
  "publisher": "Centers for Medicare & Medicaid Services",
  "contactEmail": "data@cms.gov",
  "categories": ["Hospital", "Quality"],
  "tags": ["hospital", "quality", "ratings"],
  "viewCount": 280000,
  "downloadCount": 85000,
  "license": "Public Domain",
  "createdAt": "2018-11-15",
  "lastUpdated": "2026-02-28",
  "scrapedAt": "2026-04-16T12:00:00.000Z"
}
```

</details>

<details>
<summary>📝 Health story record</summary>

```json
{
  "datasetId": "sto456-789abc",
  "datasetName": "How States Are Using Data to Fight the Opioid Crisis",
  "datasetUrl": "https://healthdata.gov/stories/opioid-crisis-data",
  "description": "A narrative exploration of state-level approaches to using open data in opioid response.",
  "publisher": "HHS Office of the CTO",
  "categories": ["Health", "State"],
  "tags": ["opioid", "data-driven", "states"],
  "viewCount": 15000,
  "downloadCount": 0,
  "license": "Public Domain",
  "createdAt": "2023-06-20",
  "lastUpdated": "2023-06-20",
  "scrapedAt": "2026-04-16T12:00:00.000Z"
}
```

</details>

***

### ✨ Why choose this Actor

| Feature | This Actor | Alternatives |
|---|---|---|
| Filter by health authority (CDC, FDA, CMS, NIH) | Yes | No |
| Multiple content types (datasets, stories, charts, maps) | Yes | Datasets only |
| View and download count metrics | Yes | Rarely included |
| Publisher and contact information | Yes | No |
| File download links with format info | Yes | No |
| Up to 1,000,000 results per run | Yes | Capped lower |
| Export to JSON, CSV, and Excel | Yes | JSON only |

> 📊 **HealthData.gov hosts thousands of health datasets from over a dozen federal agencies. This scraper gives you structured access to the full catalog with engagement metrics and file download links.**

***

### 📈 How it compares to alternatives

| Capability | This Actor | Manual Browsing | Generic Web Scrapers |
|---|---|---|---|
| Health-specific filters (category, tags, authority) | Yes | Yes | No |
| Engagement metrics (views, downloads) | Yes | Visible per page | No |
| Automatic pagination | Yes | No | Partial |
| Multiple content types in one run | Yes | Manual switching | No |
| Scheduled recurring runs | Yes | No | Varies |
| No coding required | Yes | Yes | No |

This scraper is purpose-built for HealthData.gov and handles the catalog's specific API structure, content types, and metadata fields out of the box.

***

### 🚀 How to use

1. **Sign up** - [Create a free Apify account with $5 credit](https://console.apify.com/sign-up?fpr=vmoqkp)
2. **Find the Actor** - Search for "HealthData.gov Scraper" in the Apify Store
3. **Configure your search** - Set keywords, category, content type, and max items
4. **Start the run** - Click "Start" and watch results appear in real time
5. **Export your data** - Download as JSON, CSV, or Excel from the dataset tab

> 🕒 **Typical run time:** 30 seconds to 2 minutes for up to 50 items. Larger runs with 500+ items may take 5 to 15 minutes.

***

### 💼 Business use cases

<table>
<tr>
<td>

**Healthcare Research**

- Discover new CDC and NIH datasets for epidemiological studies
- Build metadata catalogs of available health data sources
- Track dataset updates to ensure analyses use current data
- Identify high-download datasets for literature review context

</td>
<td>

**Public Health Monitoring**

- Monitor new HHS publications weekly for surveillance data
- Track COVID-19 and infectious disease dataset updates
- Catalog hospital quality and safety datasets by state
- Build notification systems for new health data releases

</td>
</tr>
<tr>
<td>

**Policy and Journalism**

- Find data sources for health policy analysis and reporting
- Track which health agencies are publishing the most data
- Identify trending datasets by view and download counts
- Build evidence bases for policy recommendations

</td>
<td>

**Data Engineering**

- Catalog available APIs and downloadable files for pipeline planning
- Monitor dataset freshness and update frequency
- Build automated ingestion workflows triggered by new publications
- Track license types across datasets for compliance

</td>
</tr>
</table>

***

***

### 🌟 Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

<table>
<tr>
<td width="50%">

#### 🎓 Research and academia

- Empirical datasets for papers, thesis work, and coursework
- Longitudinal studies tracking changes across snapshots
- Reproducible research with cited, versioned data pulls
- Classroom exercises on data analysis and ethical scraping

</td>
<td width="50%">

#### 🎨 Personal and creative

- Side projects, portfolio demos, and indie app launches
- Data visualizations, dashboards, and infographics
- Content research for bloggers, YouTubers, and podcasters
- Hobbyist collections and personal trackers

</td>
</tr>
<tr>
<td width="50%">

#### 🤝 Non-profit and civic

- Transparency reporting and accountability projects
- Advocacy campaigns backed by public-interest data
- Community-run databases for local issues
- Investigative journalism on public records

</td>
<td width="50%">

#### 🧪 Experimentation

- Prototype AI and machine-learning pipelines with real data
- Validate product-market hypotheses before engineering spend
- Train small domain-specific models on niche corpora
- Test dashboard concepts with live input

</td>
</tr>
</table>

### 🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:

- 💬 [**ChatGPT**](https://chat.openai.com/?q=How%20do%20I%20use%20the%20USA%20HealthData.gov%20HHS%20Open%20Data%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20workflow.)
- 🧠 [**Claude**](https://claude.ai/new?q=How%20do%20I%20use%20the%20USA%20HealthData.gov%20HHS%20Open%20Data%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20workflow.)
- 🔍 [**Perplexity**](https://perplexity.ai/search?q=How%20do%20I%20use%20the%20USA%20HealthData.gov%20HHS%20Open%20Data%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20workflow.)
- 🅒 [**Copilot**](https://copilot.microsoft.com/?q=How%20do%20I%20use%20the%20USA%20HealthData.gov%20HHS%20Open%20Data%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20workflow.)

### ❓ Frequently Asked Questions

<details>
<summary><b>💳 Do I need a paid Apify plan to run this actor?</b></summary>

No. You can start right now on the free Apify plan, which includes **$5 in free monthly credit**. That is enough to run this actor several times and explore the output before committing to anything. Paid plans unlock higher limits, more concurrent runs, and larger datasets. [Create a free Apify account here](https://console.apify.com/sign-up?fpr=vmoqkp) to get started.

</details>

<details>
<summary><b>🚨 What happens if my run fails or returns no results?</b></summary>

Failed runs are not charged. If the source site changes, proxies get rate-limited, or a specific input matches nothing, re-run the actor or open our [contact form](https://tally.so/r/BzdKgA) and we will investigate. You can also check the run log in the Apify console to see why the run stopped.

</details>

<details>
<summary><b>📏 How many items can I scrape per run?</b></summary>

Free users are limited to **10 items per run** so you can preview the output and confirm the actor works for your use case. Paid users can raise maxItems up to **1,000,000** per run. [Upgrade here](https://console.apify.com/sign-up?fpr=vmoqkp) if you need full scale.

</details>

<details>
<summary><b>🕒 How fresh is the data?</b></summary>

Every run fetches live data at the moment of execution. There is no cache or delay: the records you get reflect what the source returned at that moment. Schedule the actor to maintain a rolling snapshot of the data you need.

</details>

<details>
<summary><b>🧑‍💻 Can I call this actor from my own code?</b></summary>

Yes. Apify exposes every actor as a REST endpoint and ships first-class SDKs for [Node.js](https://docs.apify.com/sdk/js) and [Python](https://docs.apify.com/sdk/python). You can start a run, read the dataset, and handle webhooks from your own app in a few lines. All you need is your Apify API token.

</details>

<details>
<summary><b>📤 How do I export the data?</b></summary>

Every Apify dataset can be downloaded in one click from the console as CSV, JSON, JSONL, Excel, HTML, XML, or RSS. You can also pull results programmatically via the [Apify API](https://docs.apify.com/api/v2) or stream them into BigQuery, S3, and other destinations through built-in integrations.

</details>

<details>
<summary><b>📅 Can I schedule the actor to run automatically?</b></summary>

Yes. Use the Apify scheduler to run the actor on any cadence, from hourly to monthly. Results are saved to your dataset and can be delivered to webhooks, email, Slack, cloud storage, or automation tools such as Zapier and Make.

***

</details>

### 🔌 Automating HealthData.gov Scraper

**Node.js example:**

```javascript
import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('parseforge/healthdata-scraper').call({
    q: 'vaccination',
    maxItems: 50,
    sortBy: 'newest'
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);
```

**Python example:**

```python
from apify_client import ApifyClient
client = ApifyClient('YOUR_API_TOKEN')
run = client.actor('parseforge/healthdata-scraper').call(run_input={
    'q': 'vaccination',
    'maxItems': 50,
    'sortBy': 'newest'
})
items = list(client.dataset(run['defaultDatasetId']).iterate_items())
print(items)
```

- [Apify API documentation](https://docs.apify.com/api/v2)
- [Node.js client docs](https://docs.apify.com/api/client/js/)
- [Python client docs](https://docs.apify.com/api/client/python/)

**Schedules:** Set up daily or weekly runs to detect new health datasets as they are published. Combine with Slack or email integrations to get notified whenever new data matches your search criteria.

### 🔌 Integrate with any app

- [Make](https://docs.apify.com/platform/integrations/make) - Automate health data workflows and route datasets to your team
- [Zapier](https://docs.apify.com/platform/integrations/zapier) - Connect to 5,000+ apps and trigger actions on new health data
- [Slack](https://docs.apify.com/platform/integrations/slack) - Get notifications when new datasets match your criteria
- [Airbyte](https://docs.apify.com/platform/integrations/airbyte) - Stream health data metadata into your data warehouse
- [GitHub](https://docs.apify.com/platform/integrations/github) - Version control your scraper configurations
- [Google Drive](https://docs.apify.com/platform/integrations/drive) - Export results directly to Google Sheets

***

### 🔗 Recommended Actors

| Actor | Description |
|---|---|
| [GSA eLibrary Scraper](https://apify.com/parseforge/gsa-elibrary-scraper) | Collect government contractor and vendor data from the GSA eLibrary |
| [USAspending Scraper](https://apify.com/parseforge/usaspending-scraper) | Extract federal spending data and contract information |
| [PR Newswire Scraper](https://apify.com/parseforge/pr-newswire-scraper) | Collect press releases and news articles from PR Newswire |
| [FINRA BrokerCheck Scraper](https://apify.com/parseforge/finra-brokercheck-scraper) | Search broker and firm registration data from the FINRA registry |
| [FAA Aircraft Registry Scraper](https://apify.com/parseforge/faa-aircraft-registry-scraper) | Look up aircraft registration records by N-number from the FAA |

> 💡 **Pro Tip:** Combine the HealthData.gov Scraper with the USAspending Scraper to cross-reference health datasets with federal health spending records.

***

**🆘 Need Help?** [**Open our contact form**](https://tally.so/r/BzdKgA) to request a new scraper, propose a custom data project, or report an issue. We typically respond within 24 hours.

***

> **Disclaimer:** This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the U.S. Department of Health and Human Services, HealthData.gov, CDC, FDA, CMS, or NIH. All trademarks mentioned are the property of their respective owners.

# Actor input Schema

## `startUrl` (type: `string`):

Direct URL to scrape from HealthData.gov. Use this OR search filters below, not both. Example: https://healthdata.gov/browse?category=Health\&sortBy=newest

## `maxItems` (type: `integer`):

Maximum number of items to scrape. Free users: Limited to 100. Paid users: Optional, max 1,000,000.

## `q` (type: `string`):

Search term to filter datasets. Use this OR startUrl above, not both.

## `category` (type: `string`):

Filter by category

## `tags` (type: `string`):

Filter by tags (comma-separated)

## `limitTo` (type: `string`):

Filter by view type

## `authority` (type: `string`):

Filter by authority

## `sortBy` (type: `string`):

Sort order for results. Options: newest (Recently added), alpha (A to Z), most\_accessed (Most viewed), relevance (Most relevant), updated (Recently updated)

## Actor input object example

```json
{
  "startUrl": "https://healthdata.gov/browse?sortBy=newest&page=1&pageSize=20",
  "maxItems": 10
}
```

# Actor output Schema

## `overview` (type: `string`):

Overview view for all catalog items (datasets, charts, maps, files, etc.) with key fields displayed in a table format

## `stories` (type: `string`):

Stories view with story-specific fields (title, authors, content, sections, links) displayed prominently

## `files` (type: `string`):

Files view with file-specific fields (downloads, blob information, MIME type, file size) displayed prominently

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrl": "https://healthdata.gov/browse?sortBy=newest&page=1&pageSize=20",
    "maxItems": 10
};

// Run the Actor and wait for it to finish
const run = await client.actor("parseforge/healthdata-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startUrl": "https://healthdata.gov/browse?sortBy=newest&page=1&pageSize=20",
    "maxItems": 10,
}

# Run the Actor and wait for it to finish
run = client.actor("parseforge/healthdata-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrl": "https://healthdata.gov/browse?sortBy=newest&page=1&pageSize=20",
  "maxItems": 10
}' |
apify call parseforge/healthdata-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=parseforge/healthdata-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "USA HealthData.gov HHS Open Data Scraper",
        "description": "Collect health data catalog information from HealthData.gov . Filter by category, tags, view type, authority, and search terms to find exactly what you need. Perfect for researchers, data analysts, and healthcare professionals who need to discover and access public health datasets efficiently.",
        "version": "1.0",
        "x-build-id": "uHBe0VGjsXA4jT6x9"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/parseforge~healthdata-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-parseforge-healthdata-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/parseforge~healthdata-scraper/runs": {
            "post": {
                "operationId": "runs-sync-parseforge-healthdata-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/parseforge~healthdata-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-parseforge-healthdata-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "startUrl": {
                        "title": "Start URL",
                        "type": "string",
                        "description": "Direct URL to scrape from HealthData.gov. Use this OR search filters below, not both. Example: https://healthdata.gov/browse?category=Health&sortBy=newest"
                    },
                    "maxItems": {
                        "title": "Max Items",
                        "minimum": 1,
                        "maximum": 1000000,
                        "type": "integer",
                        "description": "Maximum number of items to scrape. Free users: Limited to 100. Paid users: Optional, max 1,000,000."
                    },
                    "q": {
                        "title": "Search Query",
                        "type": "string",
                        "description": "Search term to filter datasets. Use this OR startUrl above, not both."
                    },
                    "category": {
                        "title": "Category",
                        "enum": [
                            "Blog",
                            "Community",
                            "Health",
                            "HHS",
                            "Hospital",
                            "ACF",
                            "ACL",
                            "AHRQ",
                            "ASPR",
                            "ATSDR",
                            "CDC",
                            "CMS",
                            "FDA",
                            "HRSA",
                            "IHS",
                            "NIH",
                            "SAMHSA",
                            "National",
                            "State"
                        ],
                        "type": "string",
                        "description": "Filter by category"
                    },
                    "tags": {
                        "title": "Tags",
                        "type": "string",
                        "description": "Filter by tags (comma-separated)"
                    },
                    "limitTo": {
                        "title": "View Type",
                        "enum": [
                            "dataset",
                            "story",
                            "chart",
                            "map",
                            "form",
                            "measure",
                            "calendar",
                            "filtered_view",
                            "external_dataset",
                            "file"
                        ],
                        "type": "string",
                        "description": "Filter by view type"
                    },
                    "authority": {
                        "title": "Authority",
                        "enum": [
                            "Community",
                            "Official"
                        ],
                        "type": "string",
                        "description": "Filter by authority"
                    },
                    "sortBy": {
                        "title": "Sort By",
                        "enum": [
                            "newest",
                            "alpha",
                            "most_accessed",
                            "relevance",
                            "updated"
                        ],
                        "type": "string",
                        "description": "Sort order for results. Options: newest (Recently added), alpha (A to Z), most_accessed (Most viewed), relevance (Most relevant), updated (Recently updated)"
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
