# Hacker News Scraper — Stories, Comments & Jobs (`cryptosignals/hackernews-scraper`) Actor

Scrape Hacker News stories, comments, and user profiles — extract title, URL, score, author, comment threads, and submission time. CSV/JSON output.

- **URL**: https://apify.com/cryptosignals/hackernews-scraper.md
- **Developed by:** [Web Data Labs](https://apify.com/cryptosignals) (community)
- **Categories:** Social media, News, AI
- **Stats:** 6 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

$5.00 / 1,000 result scrapeds

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Hacker News Scraper: Stories, Comments & Users Free

The most comprehensive **Hacker News scraper** on Apify. Extract HN stories, comments, and user profiles at scale. Search Hacker News posts by keyword. Export to JSON, CSV, or Excel. Built on the official HN Firebase API for maximum reliability.

### Why Use This Hacker News Scraper?

- **Tech Trend Analysis** - Track what's trending on Hacker News. Monitor emerging technologies, frameworks, and tools before they go mainstream.
- **Startup Scouting** - Discover new startups from Show HN and Launch HN posts. Get early intelligence on companies before they hit the press.
- **Hiring Intelligence** - Monitor "Who is Hiring" threads and job postings. Analyze which companies are hiring and for what roles.
- **Competitive Research** - Track mentions of competitors, products, or technologies. See what the developer community thinks.
- **Content Curation** - Build datasets of top-performing HN content. Understand what resonates with the tech community.
- **Academic Research** - Analyze HN data for sentiment analysis, trend detection, and community dynamics research.

### Features

| Feature | Description |
|---------|-------------|
| **6 Story Categories** | Top, New, Best, Ask HN, Show HN, Jobs |
| **Full-Text Search** | Search HN posts via Algolia API |
| **Comment Trees** | Fetch full comment threads for any story |
| **User Profiles** | Get karma, about, and submission counts |
| **Pagination** | Control output size with maxItems |
| **Concurrent Fetching** | Fast parallel requests with rate limiting |
| **Multiple Export Formats** | JSON, CSV, Excel, XML via Apify |

### Input Configuration

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `category` | string | `"top"` | Story category: `top`, `new`, `best`, `ask`, `show`, `jobs`, or `search` |
| `searchQuery` | string | `""` | Search query (required when category is `search`) |
| `maxItems` | integer | `100` | Maximum stories to return (1-500) |
| `includeComments` | boolean | `false` | Fetch comment trees for each story |
| `maxCommentsPerStory` | integer | `50` | Max comments per story |
| `scrapeType` | string | `"stories"` | Output type: `stories`, `users`, or `both` |

### Output - Stories

```json
{
  "id": 12345678,
  "title": "Show HN: I built an AI-powered code reviewer",
  "url": "https://example.com/project",
  "text": null,
  "author": "techfounder",
  "score": 342,
  "commentCount": 156,
  "createdAt": "2025-12-15T10:30:00.000Z",
  "storyType": "show",
  "hnUrl": "https://news.ycombinator.com/item?id=12345678"
}
````

### Output - Users

```json
{
  "username": "techfounder",
  "karma": 15420,
  "about": "Building cool stuff. Previously at BigCorp.",
  "createdAt": "2015-03-10T08:00:00.000Z",
  "submittedCount": 892
}
```

### Output - Comments (when includeComments is true)

```json
{
  "id": 12345679,
  "author": "commenter1",
  "text": "This is a great project! I've been looking for something like this.",
  "createdAt": "2025-12-15T11:00:00.000Z",
  "parentId": 12345678
}
```

### Example Use Cases

#### Get Top 50 HN Stories

```json
{
  "category": "top",
  "maxItems": 50
}
```

#### Search for AI Startup Posts

```json
{
  "category": "search",
  "searchQuery": "AI startup launch",
  "maxItems": 100
}
```

#### Get Ask HN with Comments

```json
{
  "category": "ask",
  "maxItems": 20,
  "includeComments": true,
  "maxCommentsPerStory": 100
}
```

#### Get Stories + Author Profiles

```json
{
  "category": "best",
  "maxItems": 30,
  "scrapeType": "both"
}
```

### API Usage Examples

#### JavaScript

```javascript
import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('cryptosignals/hackernews-scraper').call({
    category: 'top',
    maxItems: 50,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);
```

#### Python

```python
from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("cryptosignals/hackernews-scraper").call(run_input={
    "category": "top",
    "maxItems": 50,
})
items = list(client.dataset(run["defaultDatasetId"]).iterate_items())
print(items)
```

#### cURL

```bash
curl "https://api.apify.com/v2/acts/cryptosignals~hackernews-scraper/runs" \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -d '{"category": "top", "maxItems": 50}'
```

### About the Data Source

This scraper uses the **official Hacker News Firebase API** (https://github.com/HackerNews/API) and the **Algolia HN Search API** for keyword search. These are public, free APIs provided by Y Combinator. No scraping of the website is involved -- all data comes from official endpoints, ensuring reliability and compliance.

### FAQ

**Q: Is this scraper free to use?**
A: The actor itself is free. You only pay for Apify compute units based on your plan.

**Q: How often is the data updated?**
A: The HN API provides real-time data. Stories and scores are current at the time of scraping.

**Q: Can I scrape historical data?**
A: Use the search feature to find older posts by keyword. The Firebase API only provides current story lists.

**Q: Why are some comments missing?**
A: Deleted or dead (flagged) comments are filtered out automatically.

**Q: What's the rate limit?**
A: The official HN API has no documented rate limit. The scraper uses concurrent requests with sensible batching.

**Q: Can I schedule this to run automatically?**
A: Yes! Use Apify's scheduling feature to run the scraper hourly, daily, or on any custom schedule.

### Tags

hacker news, hn scraper, tech news, startup intelligence, developer tools, comment scraper, user profiles, tech trends, Y Combinator, hackernews api, hn data extraction

# Actor input Schema

## `category` (type: `string`):

Which HN category to scrape. Use 'search' to search by keyword.

## `searchQuery` (type: `string`):

Search query (only used when category is 'search'). Searches HN posts via Algolia.

## `maxItems` (type: `integer`):

Maximum number of stories/users to return.

## `includeComments` (type: `boolean`):

Fetch the full comment tree for each story. Warning: significantly increases run time.

## `maxCommentsPerStory` (type: `integer`):

Maximum number of comments to fetch per story (when includeComments is true).

## `scrapeType` (type: `string`):

What to scrape: stories only, user profiles of story authors, or both.

## Actor input object example

```json
{
  "category": "top",
  "maxItems": 100,
  "includeComments": false,
  "maxCommentsPerStory": 50,
  "scrapeType": "stories"
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("cryptosignals/hackernews-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("cryptosignals/hackernews-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call cryptosignals/hackernews-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=cryptosignals/hackernews-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Hacker News Scraper — Stories, Comments & Jobs",
        "description": "Scrape Hacker News stories, comments, and user profiles — extract title, URL, score, author, comment threads, and submission time. CSV/JSON output.",
        "version": "1.2",
        "x-build-id": "cbG6sqbP03MN2cKBe"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/cryptosignals~hackernews-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-cryptosignals-hackernews-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/cryptosignals~hackernews-scraper/runs": {
            "post": {
                "operationId": "runs-sync-cryptosignals-hackernews-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/cryptosignals~hackernews-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-cryptosignals-hackernews-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "category": {
                        "title": "Story Category",
                        "enum": [
                            "top",
                            "new",
                            "best",
                            "ask",
                            "show",
                            "jobs",
                            "search"
                        ],
                        "type": "string",
                        "description": "Which HN category to scrape. Use 'search' to search by keyword.",
                        "default": "top"
                    },
                    "searchQuery": {
                        "title": "Search Query",
                        "type": "string",
                        "description": "Search query (only used when category is 'search'). Searches HN posts via Algolia."
                    },
                    "maxItems": {
                        "title": "Max Items",
                        "minimum": 1,
                        "maximum": 500,
                        "type": "integer",
                        "description": "Maximum number of stories/users to return.",
                        "default": 100
                    },
                    "includeComments": {
                        "title": "Include Comments",
                        "type": "boolean",
                        "description": "Fetch the full comment tree for each story. Warning: significantly increases run time.",
                        "default": false
                    },
                    "maxCommentsPerStory": {
                        "title": "Max Comments Per Story",
                        "minimum": 1,
                        "maximum": 500,
                        "type": "integer",
                        "description": "Maximum number of comments to fetch per story (when includeComments is true).",
                        "default": 50
                    },
                    "scrapeType": {
                        "title": "Scrape Type",
                        "enum": [
                            "stories",
                            "users",
                            "both"
                        ],
                        "type": "string",
                        "description": "What to scrape: stories only, user profiles of story authors, or both.",
                        "default": "stories"
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
