# cambridge dictionary scraper (`dev_bodex/cambridge-dictionary-scraper`) Actor

This Cambridge Dictionary Scraper Apify Actor automates extracting word definitions, synonyms, examples, and translations from the Cambridge Dictionary. Built with Node.js and Puppeteer, it returns structured data, ideal for language processing, research, and educational use.

- **URL**: https://apify.com/dev\_bodex/cambridge-dictionary-scraper.md
- **Developed by:** [Eniola Bode](https://apify.com/dev_bodex) (community)
- **Categories:** Automation, SEO tools
- **Stats:** 4 total users, 2 monthly users, 100.0% runs succeeded, 1 bookmarks
- **User rating**: No ratings yet

## Pricing

$10.00/month + usage

To use this Actor, you pay a monthly rental fee to the developer. The rent is subtracted from your prepaid usage every month.You also pay for the Apify platform usage, which gets cheaper the higher Apify subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#rental-actors

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Cambridge Dictionary Scraper
This Apify actor is designed to scrape word definitions, phonetics, pronunciations, example sentences, and other relevant linguistic data from the Cambridge Dictionary. This actor automates the process of extracting dictionary entries for specified words, making it easy to collect language data for learning, research, or integration into apps.

## Features
- **Input Word List**: Accepts a list of words to search for on the Cambridge Dictionary website.
- **Rich Data Collection**: Scrapes various details about each word, including:
**Word definitions**,
**Word phonetics (UK/US)**,
**Pronunciation audio links (UK/US)**,
**Part of speech (noun, verb, adjective, etc.)**,
**Example sentences**,
**Synonyms and antonyms (if available)**

- **Output Formats**: Results can be downloaded in JSON format for easy integration or analysis.

## How It Works
This actor navigates the Cambridge Dictionary website and retrieves the requested data for each word input. It can also retrieve translations if they are provided on the dictionary page for a specific word.

## Input
The input is a JSON object with a list of words to scrape:

```javascript
{
  "words": "Iron"
}
````

- \**words*: The words for which you want to retrieve dictionary entries.

## Output

The output is an array of objects where each object contains the information for one word. For example:

```javascript
[{
  "searchWord": "Iron",
  "result": {
    "UK Dictionary": {
      "partOfSpeech": "noun",
      "pronounce": {
        "UK": {
          "audioList": [
            "https://dictionary.cambridge.org/media/english/uk_pron/u/uki/ukiri/ukiridi009.mp3",
            "https://dictionary.cambridge.org/media/english/uk_pron_ogg/u/uki/ukiri/ukiridi009.ogg"
          ],
          "pron": "/aɪən/"
        },
        "US": {
          "audioList": [
            "https://dictionary.cambridge.org/media/english/us_pron/i/iro/iron_/iron.mp3",
            "https://dictionary.cambridge.org/media/english/us_pron_ogg/i/iro/iron_/iron.ogg"
          ],
          "pron": "/aɪrn/"
        }
      },
      "details": [
        {
          "defintion": "a chemical element that is a common greyish-coloured metal. It is strong, used in making steel, and exists in very small amounts in blood:",
          "example": [
            "Iron rusts easily.",
            "Liver is a particularly rich source of dietary iron.",
            "iron ore",
            "an iron deficiency"
          ]
        },
        {
          "defintion": "a piece of equipment for making clothes flat and smooth that has a handle and a flat base and is usually heated with electricity:",
          "example": [
            "a steam iron",
            "a travel iron"
          ]
        },
        {
          "defintion": "a stick that has an iron or steel part at the end that is used to hit the ball in golf:",
          "example": [
            "He'll probably use a 2 or 3 iron for the shot."
          ]
        },
        {
          "defintion": "chains tied around someone to prevent them from escaping or moving:",
          "example": [
            "It was common practice for the prisoners to be clapped in irons (= tied with chains)."
          ]
        },
        {
          "defintion": "to make clothes flat and smooth using an iron:",
          "example": [
            "It takes about five minutes to iron a shirt properly.",
            "Synonym\npress"
          ]
        },
        {
          "defintion": "very strong physically, mentally, or emotionally:",
          "example": [
            "I think you have to have an iron will to make some of these decisions."
          ]
        }
      ]
    },
    "AMERICAN DICTIONARY": {
      "partOfSpeech": "noun",
      "pronounce": {
        "US": {
          "audioList": [
            "https://dictionary.cambridge.org/media/english/us_pron/i/iro/iron_/iron.mp3",
            "https://dictionary.cambridge.org/media/english/us_pron_ogg/i/iro/iron_/iron.ogg"
          ],
          "pron": "/ˈɑɪ·ərn/"
        }
      },
      "details": [
        {
          "defintion": "a common, silver-colored, metal element that is magnetic and strong, is used in making steel, and is found in small amounts in blood and in all living things:",
          "example": [
            "Iron rusts easily.",
            "Liver is a rich source of dietary iron."
          ]
        },
        {
          "defintion": "a device with a handle and a flat metal base that can be heated and pressed against cloth to make the cloth smooth"
        },
        {
          "defintion": "to make cloth smooth using an iron:",
          "example": [
            "I have to iron this skirt.",
            "[ M ] Let me iron out the wrinkles in this tablecloth."
          ]
        },
        {
          "defintion": "made of or containing iron:",
          "example": [
            "iron ore",
            "an iron railing along the steps",
            "fig. Her success depended on physical strength and an iron will (= strong determination)."
          ]
        }
      ]
    },
    "BUSINESS ENGLISH": {
      "partOfSpeech": "noun",
      "pronounce": {
        "UK": {
          "audioList": [
            "https://dictionary.cambridge.org/media/english/uk_pron/u/uki/ukiri/ukiridi009.mp3",
            "https://dictionary.cambridge.org/media/english/uk_pron_ogg/u/uki/ukiri/ukiridi009.ogg"
          ],
          "pron": "/aɪən/"
        },
        "US": {
          "audioList": [
            "https://dictionary.cambridge.org/media/english/us_pron/i/iro/iron_/iron.mp3",
            "https://dictionary.cambridge.org/media/english/us_pron_ogg/i/iro/iron_/iron.ogg"
          ]
        }
      },
      "details": [
        {
          "defintion": "a common metal element used in making steel:",
          "example": [
            "Heavy industries, like iron and steel, can take advantage of the government's increased public-works spending."
          ]
        }
      ]
    }
  }
}]
```

**Key Scraped Data**
For the word, the scraper retrieves:

- **Word**: The original word.
- **Phonetics**: The UK and US pronunciation phonetic spellings.
- **Pronunciation**: Links to UK and US pronunciation audio files.
- **Definitions**: The primary meaning of the word.
- **Part of Speech**: Whether the word is a noun, verb, adjective, etc.
- **Example Sentences**: Sentences that show the word in context.
- **Synonyms and Antonyms**: Related words, if available.

## Output Formats

JSON

## Usage

## On the Apify Platform

1. Go to the Cambridge Dictionary Scraper actor on Apify.
2. Click Try for Free.
3. Provide a the words in the input.
4. Run the actor and wait for it to complete.
5. Download the results.

## Locally via Apify CLI

1. Install Apify CLI.
2. Run the actor locally using the following command:

```javascript
apify run your-username/cambridge-dictionary-scraper --input='{"words": "Iron"}'
```

## Input Example

```javascript
{
  "words": ["Iron"]
}
```

This input will fetch the dictionary data for the words "Iron".

## Versioning

v1.0.0: Initial release with support for word scraping and detailed data extraction.

## Use Cases

- **Language Learning Apps**: Enrich your app with accurate word definitions, example sentences, and audio pronunciations.
- **Translation Platforms**: Retrieve word translations from the Cambridge Dictionary to enhance your platform.
- **Education Tools**: Create language exercises or vocabulary quizzes using the scraped data.
- **Research and Linguistic Analysis**: Collect large datasets of words, definitions, and usage examples for research purposes.

## Limitations

\--Only words available on the Cambridge Dictionary can be scraped.
\--Some words may not have translations or synonyms/antonyms available.

# Actor input Schema

## `words` (type: `string`):

The words to search the meaning.

## Actor input object example

```json
{
  "words": "Iron"
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "words": "Iron"
};

// Run the Actor and wait for it to finish
const run = await client.actor("dev_bodex/cambridge-dictionary-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "words": "Iron" }

# Run the Actor and wait for it to finish
run = client.actor("dev_bodex/cambridge-dictionary-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "words": "Iron"
}' |
apify call dev_bodex/cambridge-dictionary-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=dev_bodex/cambridge-dictionary-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "cambridge dictionary scraper",
        "description": "This Cambridge Dictionary Scraper Apify Actor automates extracting word definitions, synonyms, examples, and translations from the Cambridge Dictionary. Built with Node.js and Puppeteer, it returns structured data, ideal for language processing, research, and educational use.",
        "version": "0.0",
        "x-build-id": "uOPbGdUsI36C95biq"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/dev_bodex~cambridge-dictionary-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-dev_bodex-cambridge-dictionary-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/dev_bodex~cambridge-dictionary-scraper/runs": {
            "post": {
                "operationId": "runs-sync-dev_bodex-cambridge-dictionary-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/dev_bodex~cambridge-dictionary-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-dev_bodex-cambridge-dictionary-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "words"
                ],
                "properties": {
                    "words": {
                        "title": "Words",
                        "type": "string",
                        "description": "The words to search the meaning."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
