# Wiki Grabber (`shahabuddin38/wiki-grabber`) Actor

Find Wikipedia pages with citation-needed tags, dead links, broken link signals, and cleanup issues using keyword search. Great for SEO, link building, outreach, and research workflows.

- **URL**: https://apify.com/shahabuddin38/wiki-grabber.md
- **Developed by:** [Shahab Uddin](https://apify.com/shahabuddin38) (community)
- **Categories:** AI, Automation, Developer tools
- **Stats:** 3 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $0.35 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## WikiGrabber

WikiGrabber is an Apify Actor and lightweight web app for finding Wikipedia pages with citation-needed tags, dead-link templates, broken-link signals, and other source-cleanup hints.

### What it does

- Searches English Wikipedia by keyword
- Parses page wikitext and rendered HTML
- Detects citation-needed, dead-link, and cleanup-style signals
- Extracts exact citation and dead-link locations from article sections
- Adds direct article, section, and section-edit links for faster action
- Scores results so higher-opportunity pages rise to the top
- Stores filtered results in an Apify dataset
- Lets you browse results in a browser
- Exports saved results as CSV

### Endpoints

- `GET /` serves the browser UI
- `GET /api/health` returns a simple health check
- `GET /api/search?keyword=SEO&limit=30&page=1` runs a keyword search and creates a request-safe dataset
- `GET /api/dataset?dataset=<datasetName-from-search>&page=2&limit=20` pages through saved dataset results
- `GET /api/export.csv?dataset=<datasetName-from-search>` exports a dataset as CSV

### Advanced result workflow

- Filter result pages by `Show all`, `Missing Citations`, or `Dead Links`
- See exact issue rows with section title, line reference, and excerpt
- Open the exact Wikipedia section directly from the result card
- Jump straight into `action=edit&section=<n>` links to add a citation or replace a dead link
- Review mixed pages that contain both citation and dead-link opportunities

### Local development

```bash
npm install
npm start
````

By default the app starts on `http://localhost:4321`.

For a local one-off QA run that follows the same standard-run code path as Apify's automated test, put an `INPUT.json` file under your chosen `CRAWLEE_STORAGE_DIR`, then start the actor with `WIKI_GRABBER_FORCE_STANDARD_MODE=1`.

### Deploy on Apify

```bash
npx apify login
npx apify push
```

### Important note about Apify run modes

This project supports both Apify run modes, but they behave differently:

- Standard Actor run
  The Actor does not keep the HTTP server alive on Apify. Instead, it treats the run as a one-off batch job. If you provide input like `{"keyword":"seo tool","limit":10}`, it will build the dataset, save output, and finish with `SUCCEEDED`. If a standard run starts without a keyword, the actor now falls back to the built-in QA keyword `seo tool` so automated tests and manual one-off runs still produce a non-empty default dataset.
- Standby mode
  The Actor behaves like a web server behind a stable URL, and Apify keeps standby runs available according to the standby configuration.

If you want a persistent app-like experience, use Standby mode instead of manually starting a normal Actor run from the Console.

The input schema now uses both `prefill` and `default` on the search keyword for maximum compatibility with Apify's QA flow, while operational settings such as `limit` keep a real `default` value for API, task, and scheduler runs.

### Apify QA checklist

- In Apify Console, use `Source > Input > Restore example input` and confirm it fills `keyword: "seo tool"` with `limit: 10`
- Start the Actor from that restored example input and verify the run finishes within Apify's 5-minute automated-test window
- Confirm the default dataset is non-empty and that fallback rows, when emitted, are clearly marked with `resultType: "fallback"`
- If Wikipedia is temporarily unavailable during the test window, expect a successful run with a diagnostic fallback row instead of an empty default dataset

### Standby behavior

- Repeated identical searches can be served from an in-memory cache while a Standby run stays warm
- Concurrent identical requests share the same in-flight search work instead of duplicating Wikipedia fetches
- Each generated dataset name is request-safe, so one user search does not drop or overwrite another user's dataset
- Add `refresh=true` to `/api/search` if you want to bypass the cache and force a new dataset build
- Wikipedia API calls automatically retry on transient timeout and `429`/`5xx` responses, and large revision batches fall back to smaller groups when needed

### Example use cases

- Wikipedia citation research
- Dead-link replacement prospecting
- Link-building opportunity discovery
- SEO outreach research
- Topic-based cleanup analysis
- CSV export for campaign workflows

### Output fields

Each result can include:

- `resultType`
- `keyword`
- `title`
- `note`
- `pageid`
- `url`
- `snippet`
- `wordcount`
- `timestamp`
- `citationNeededTemplates`
- `deadLinkTemplates`
- `brokenLinkSignals`
- `cleanupTemplates`
- `bareUrlCount`
- `refCount`
- `score`
- `issueCounts`
- `locations[]`
- `actionLinks`

# Actor input Schema

## `keyword` (type: `string`):

Keyword or short phrase to search on English Wikipedia. This field keeps both a prefilled example for Console testing and a real default for automated runs, tasks, and API calls.

## `query` (type: `string`):

Optional backward-compatible alias for keyword. Prefer using Keyword for new runs.

## `search` (type: `string`):

Optional backward-compatible alias for keyword. Prefer using Keyword for new runs.

## `limit` (type: `integer`):

Maximum number of Wikipedia search matches to inspect before ranking and filtering pages with citation or dead-link opportunities.

## `refresh` (type: `boolean`):

When enabled, forces a fresh search instead of reusing a warm in-memory Standby cache entry for the same keyword and limit.

## Actor input object example

```json
{
  "keyword": "seo tool",
  "query": "seo tool",
  "search": "seo tool",
  "limit": 10,
  "refresh": false
}
```

# Actor output Schema

## `runSummary` (type: `string`):

JSON summary stored under the OUTPUT record in the default key-value store. It includes the mode, keyword, dataset name, counts, cache details, QA keyword fallback messages, and diagnostic fallback metadata when needed.

## `lastResultsMetadata` (type: `string`):

Compact JSON metadata stored under the LAST\_RESULTS record in the default key-value store after the latest standard run, including diagnostic fallback datasets when necessary.

## `resultsOverview` (type: `string`):

Filtered Wikipedia pages stored in the default dataset for standard runs, shown with the Overview dataset view selected.

## `issueLocations` (type: `string`):

Default dataset with the issue-locations view selected, expanding exact citation-needed and dead-link matches when available.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "keyword": "seo tool",
    "limit": 10
};

// Run the Actor and wait for it to finish
const run = await client.actor("shahabuddin38/wiki-grabber").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "keyword": "seo tool",
    "limit": 10,
}

# Run the Actor and wait for it to finish
run = client.actor("shahabuddin38/wiki-grabber").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "keyword": "seo tool",
  "limit": 10
}' |
apify call shahabuddin38/wiki-grabber --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=shahabuddin38/wiki-grabber",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Wiki Grabber",
        "description": "Find Wikipedia pages with citation-needed tags, dead links, broken link signals, and cleanup issues using keyword search. Great for SEO, link building, outreach, and research workflows.",
        "version": "0.11",
        "x-build-id": "HZibten0EJ15XqyKs"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/shahabuddin38~wiki-grabber/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-shahabuddin38-wiki-grabber",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/shahabuddin38~wiki-grabber/runs": {
            "post": {
                "operationId": "runs-sync-shahabuddin38-wiki-grabber",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/shahabuddin38~wiki-grabber/run-sync": {
            "post": {
                "operationId": "run-sync-shahabuddin38-wiki-grabber",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "keyword": {
                        "title": "Keyword",
                        "type": "string",
                        "description": "Keyword or short phrase to search on English Wikipedia. This field keeps both a prefilled example for Console testing and a real default for automated runs, tasks, and API calls.",
                        "default": "seo tool"
                    },
                    "query": {
                        "title": "Legacy query alias",
                        "type": "string",
                        "description": "Optional backward-compatible alias for keyword. Prefer using Keyword for new runs."
                    },
                    "search": {
                        "title": "Legacy search alias",
                        "type": "string",
                        "description": "Optional backward-compatible alias for keyword. Prefer using Keyword for new runs."
                    },
                    "limit": {
                        "title": "Search limit",
                        "minimum": 1,
                        "maximum": 50,
                        "type": "integer",
                        "description": "Maximum number of Wikipedia search matches to inspect before ranking and filtering pages with citation or dead-link opportunities.",
                        "default": 10
                    },
                    "refresh": {
                        "title": "Bypass cache",
                        "type": "boolean",
                        "description": "When enabled, forces a fresh search instead of reusing a warm in-memory Standby cache entry for the same keyword and limit.",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
