# 🕰️ Wayback Machine Bulk Checker (`taroyamada/wayback-machine-checker`) Actor

Check bulk lists of URLs against the Internet Archive database to instantly verify cache availability. Automate historical web page discovery for large sites.

- **URL**: https://apify.com/taroyamada/wayback-machine-checker.md
- **Developed by:** [naoki anzai](https://apify.com/taroyamada) (community)
- **Categories:** SEO tools, Automation, Developer tools
- **Stats:** 7 total users, 2 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per event

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## 📚 Wayback Machine Checker

Instantly check bulk lists of URLs against the Internet Archive to discover cached web pages and extract historical site data. Designed for technical SEO professionals and site administrators, this Wayback Machine checker automates the discovery of archived content at scale. Instead of loading the archive interface for every broken link, simply input your URL list and let the scraper query the underlying API to find exact snapshot matches.

Large-scale website migrations frequently result in orphaned pages and 404 errors. This tool is built to cross-reference your dead URLs with historical web cache data, enabling rapid recovery of lost link equity. Web developers and SEO teams use this scraper to map legacy site structures, audit domain history before acquisitions, and rescue deleted content that Google search previously indexed. 

The scraper efficiently processes bulk requests, returning highly structured results for your technical audits. For every requested URL, you will extract the exact timestamp of the most recent snapshot, the direct Wayback Machine URL for the cached HTML, and the HTTP status code of the original capture. Schedule this checker to run alongside your regular broken link crawlers to instantly identify which dead pages can be fully restored from the web archive.



> 📄 **Live sample output**: see [`docs/sample-output.json`](docs/sample-output.json) for a representative dataset captured from a real run of this actor. Use it to validate the schema before subscribing.

### Store Quickstart

Start with the **Quickstart** template to verify 3 archived URLs. For bulk verification, use **Portfolio Archive Check** with up to 500 URLs. For content recovery, use **404 Recovery** after running Broken Link Checker.

### Key Features

- 📚 **Official Internet Archive API** — Uses archive.org/wayback/available endpoint
- 📅 **Closest-snapshot lookup** — Find archived version nearest to any date
- 🔍 **Availability check** — Know if a URL was ever archived
- 📊 **Snapshot count** — Total archived versions per URL
- ⚡ **Bulk processing** — Up to 500 URLs per run
- 🔑 **No API key needed** — Free, open Internet Archive service

### Use Cases

| Who | Why |
|-----|-----|
| **Compliance teams** | Legal evidence preservation for regulated industries |
| **Journalists** | Verify historical versions of web pages that may have been edited |
| **SEO recovery** | Restore content from accidentally deleted pages |
| **Brand protection** | Track archived versions of competitor sites over time |
| **Academic research** | Cite archived web sources in publications |

### Input

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| urls | string[] | (required) | URLs to check in archive (max 500) |
| closest | string |  | Target date YYYY-MM-DD (optional) |
| checkAvailability | boolean | true | Return availability details |

#### Input Example

```json
{
  "urls": ["https://example.com/old-article", "https://deleted-site.com"],
  "closest": "2020-01-01",
  "checkAvailability": true
}
````

### Input Examples

#### Example: Single URL availability

```json
{
  "urls": [
    "https://example.com/old-page"
  ]
}
```

#### Example: Bulk archived snapshot history

```json
{
  "urls": [
    "https://example.com/",
    "https://example.com/blog"
  ],
  "includeCdx": true,
  "maxSnapshotsPerUrl": 50
}
```

#### Example: Domain change-detection digest

```json
{
  "urls": [
    "https://example.com/policy"
  ],
  "compareToLatestLive": true
}
```

### Output

| Field | Type | Description |
|-------|------|-------------|
| `url` | string | URL queried |
| `archived` | boolean | Whether the URL has any snapshots in the Wayback Machine |
| `closestSnapshotUrl` | string | URL of the closest snapshot to the requested date |
| `closestSnapshotDate` | string | Date of the closest snapshot (YYYYMMDDhhmmss) |
| `totalSnapshots` | integer | Approximate total snapshots ever taken |
| `firstSnapshotDate` | string | Date of the earliest known snapshot |
| `lastSnapshotDate` | string | Date of the most recent snapshot |

#### Output Example

```json
{
  "url": "https://example.com/old-article",
  "available": true,
  "closestSnapshot": {
    "url": "https://web.archive.org/web/20200115000000/https://example.com/old-article",
    "timestamp": "20200115000000"
  },
  "archivedVersions": 23
}
```

### API Usage

Run this actor programmatically using the Apify API. Replace `YOUR_API_TOKEN` with your token from [Apify Console → Settings → Integrations](https://console.apify.com/account/integrations).

#### cURL

```bash
curl -X POST "https://api.apify.com/v2/acts/taroyamada~wayback-machine-checker/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "urls": ["https://example.com/old-article", "https://deleted-site.com"], "closest": "2020-01-01", "checkAvailability": true }'
```

#### Python

```python
from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("taroyamada/wayback-machine-checker").call(run_input={
  "urls": ["https://example.com/old-article", "https://deleted-site.com"],
  "closest": "2020-01-01",
  "checkAvailability": true
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)
```

#### JavaScript / Node.js

```javascript
import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('taroyamada/wayback-machine-checker').call({
  "urls": ["https://example.com/old-article", "https://deleted-site.com"],
  "closest": "2020-01-01",
  "checkAvailability": true
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);
```

### Tips & Limitations

- Use `closestDate: "20200101"` format (YYYYMMDD) to find a specific historical snapshot.
- Great for verifying when a page was first published or last modified.
- Combine with Broken Link Checker to recover content from dead pages via archive links.
- Wayback Machine is free but rate-limits aggressive callers — keep concurrency low.

### FAQ

**How far back can I check?**

Internet Archive has snapshots back to 1996. Coverage depends on whether a URL was crawled.

**Why is my URL 'not available'?**

Either it was never archived, or Internet Archive excluded it (due to robots.txt or removal request).

**Is this the same as running `curl` to archive.org?**

Yes, but with bulk processing, error handling, and structured output for datasets.

**Can I archive new URLs?**

This actor only reads from the archive. To save NEW pages, use archive.org's /save/ endpoint.

**Why is `archived` false for my URL?**

The Internet Archive may not have crawled that URL yet, or robots.txt blocked it at the time.

**Can I trigger a new snapshot?**

Not via this actor. Use the Wayback Machine 'Save Page Now' feature manually.

### Complete Your Website Health Audit

**Website Health Suite** — Build a comprehensive compliance and trust monitoring workflow:

**1. Link & URL Health**

- [🔗 Broken Link Checker](https://apify.com/taroyamada/broken-link-checker) — Find broken links across your entire site structure
- [🔗 Bulk URL Health Checker](https://apify.com/taroyamada/bulk-url-health-checker) — Validate HTTP status, redirects, SSL, and response times

**2. SEO & Metadata Quality**

- [🏷️ Meta Tag Analyzer](https://apify.com/taroyamada/meta-tag-analyzer) — Audit title tags, Open Graph, Twitter Cards, and hreflang
- [Schema.org Validator](https://apify.com/taroyamada/structured-data-validator) — Validate JSON-LD and Microdata with quality scoring

**3. Security & Email Deliverability**

- [DNS/DMARC Security Checker](https://apify.com/taroyamada/dns-dmarc-security-checker) — Audit SPF, DKIM, DMARC, and MX records

**4. Historical Data & Recovery** (you are here)

- [📚 Wayback Machine Checker](https://apify.com/taroyamada/wayback-machine-checker) — Find archived snapshots for content recovery

**Recommended workflow**: Run Broken Link Checker → Export 404 URLs → Use Wayback Machine Checker to find archived versions → Restore content → Validate with URL Health Checker.

**Other Website Tools:**

- [Sitemap Analyzer](https://apify.com/taroyamada/sitemap-analyzer) — SEO sitemap audit
- [Site Governance Monitor](https://apify.com/taroyamada/site-governance-monitor) — Robots.txt and schema monitoring
- [Domain Trust Monitor](https://apify.com/taroyamada/domain-trust-monitor) — SSL expiry and security headers

### Cost

**Pay Per Event**:

- `actor-start`: $0.01 (flat fee per run)
- `dataset-item`: $0.003 per output item

**Example**: 1,000 items = $0.01 + (1,000 × $0.003) = **$3.01**

No subscription required — you only pay for what you use.

> 💾 **Save it for later**: click the bookmark icon at the top of the Apify Store page if you'd like to come back to it. Bookmarks help other engineers find this actor via Apify's discovery surfaces.

### ⭐ Was Wayback Machine Bulk Checker useful for your URL archive history?

If this actor saved you time, **[please leave a 5★ rating on Apify Store](https://apify.com/taroyamada/wayback-machine-checker/reviews)** — it takes 10 seconds, helps other engineers and analysts discover it, and keeps updates free.

Have a feature request, bug, or sample workflow you'd like to share? **[Open an issue](https://apify.com/taroyamada/wayback-machine-checker/issues)** — we read every one and use them to prioritise the next release.

# Actor input Schema

## `urls` (type: `array`):

URLs to check in the Wayback Machine (max 500).

## `closest` (type: `string`):

Find snapshot closest to this date (YYYYMMDD format). Leave empty for latest.

## `concurrency` (type: `integer`):

Keep low to respect Internet Archive rate limits.

## `maxChargeUsd` (type: `number`):

Safety cap for this run. Results beyond the cap are kept in output as no-charge limit\_reached rows.

## `delivery` (type: `string`):

Where to send results: dataset or webhook

## `webhookUrl` (type: `string`):

Webhook URL to POST results to (if delivery=webhook)

## `dryRun` (type: `boolean`):

Run without saving results (for testing)

## Actor input object example

```json
{
  "urls": [
    "https://example.com",
    "https://google.com"
  ],
  "concurrency": 3,
  "maxChargeUsd": 1,
  "delivery": "dataset",
  "dryRun": false
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "urls": [
        "https://example.com",
        "https://google.com"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("taroyamada/wayback-machine-checker").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "urls": [
        "https://example.com",
        "https://google.com",
    ] }

# Run the Actor and wait for it to finish
run = client.actor("taroyamada/wayback-machine-checker").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "urls": [
    "https://example.com",
    "https://google.com"
  ]
}' |
apify call taroyamada/wayback-machine-checker --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=taroyamada/wayback-machine-checker",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "🕰️ Wayback Machine Bulk Checker",
        "description": "Check bulk lists of URLs against the Internet Archive database to instantly verify cache availability. Automate historical web page discovery for large sites.",
        "version": "0.1",
        "x-build-id": "4dflAGd2OuOS9rUs3"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/taroyamada~wayback-machine-checker/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-taroyamada-wayback-machine-checker",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/taroyamada~wayback-machine-checker/runs": {
            "post": {
                "operationId": "runs-sync-taroyamada-wayback-machine-checker",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/taroyamada~wayback-machine-checker/run-sync": {
            "post": {
                "operationId": "run-sync-taroyamada-wayback-machine-checker",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "urls"
                ],
                "properties": {
                    "urls": {
                        "title": "URLs to Check",
                        "type": "array",
                        "description": "URLs to check in the Wayback Machine (max 500).",
                        "items": {
                            "type": "string"
                        }
                    },
                    "closest": {
                        "title": "Closest to Date",
                        "type": "string",
                        "description": "Find snapshot closest to this date (YYYYMMDD format). Leave empty for latest."
                    },
                    "concurrency": {
                        "title": "Concurrency",
                        "minimum": 1,
                        "maximum": 5,
                        "type": "integer",
                        "description": "Keep low to respect Internet Archive rate limits.",
                        "default": 3
                    },
                    "maxChargeUsd": {
                        "title": "Maximum charge (USD)",
                        "minimum": 0,
                        "maximum": 100,
                        "type": "number",
                        "description": "Safety cap for this run. Results beyond the cap are kept in output as no-charge limit_reached rows.",
                        "default": 1
                    },
                    "delivery": {
                        "title": "Delivery Mode",
                        "enum": [
                            "dataset",
                            "webhook"
                        ],
                        "type": "string",
                        "description": "Where to send results: dataset or webhook",
                        "default": "dataset"
                    },
                    "webhookUrl": {
                        "title": "Webhook URL",
                        "type": "string",
                        "description": "Webhook URL to POST results to (if delivery=webhook)"
                    },
                    "dryRun": {
                        "title": "Dry Run",
                        "type": "boolean",
                        "description": "Run without saving results (for testing)",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
