# HEMA Scraper — Dutch & Belgian Discount Retail Products & Price (`studio-amba/hema-scraper`) Actor

Scrape products, prices, stock status, and categories from hema.com. Supports search queries, category browsing, and the Belgian (nl-be) and Dutch (nl-nl) storefronts.

- **URL**: https://apify.com/studio-amba/hema-scraper.md
- **Developed by:** [Studio Amba](https://apify.com/studio-amba) (community)
- **Categories:** E-commerce
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $4.00 / 1,000 result scrapeds

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## HEMA Scraper

Scrapes products from hema.com, the popular Dutch/Belgian variety store. Supports the Belgian (`nl-be`, `fr-be`) and Dutch (`nl-nl`) storefronts with a single configuration.

### What makes this scraper different

HEMA embeds product data in `data-gtmproduct` attributes on product tiles, which means the scraper can extract structured data (name, price, product ID, stock status, rating, category) directly from listing pages without visiting every product page. This makes it significantly faster.

Set `scrapeProductPages: true` if you need full descriptions, image galleries, and specs.

### Input

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `startUrls` | Array | No | HEMA category or product page URLs |
| `searchQuery` | String | No | Search term (e.g., "lamp", "handdoek") |
| `maxResults` | Integer | No | Max products (default: 100) |
| `country` | String | No | Storefront: `nl-be` (default), `nl-nl`, or `fr-be` |
| `scrapeProductPages` | Boolean | No | Visit detail pages for descriptions and specs (default: false) |
| `proxyConfiguration` | Object | No | Proxy settings |

### Output

| Field | Type | Example |
|-------|------|---------|
| `name` | String | `"handdoek 50x100 zware kwaliteit roze"` |
| `brand` | String | `"HEMA"` |
| `price` | Number | `7.00` |
| `originalPrice` | Number | `9.00` |
| `currency` | String | `"EUR"` |
| `productId` | String | `"60211304"` |
| `sku` | String | `"60211304"` |
| `inStock` | Boolean | `true` |
| `rating` | Number | `4.1` |
| `category` | String | `"Bad"` |
| `imageUrl` | String | Product image (600px) |
| `language` | String | `"nl"` |

With `scrapeProductPages: true`, you also get: `description`, `imageUrls`, `specs`.

### Pagination

HEMA uses `start=` and `sz=` URL parameters. The scraper reads the total item count from the page (e.g., "85 artikelen") and automatically generates next-page URLs in batches of 24.

### Cost

**Without** detail pages: ~$0.10 per 1,000 products (very fast, data from listing tiles).
**With** detail pages: ~$0.40 per 1,000 products.

### Notes

- Product page URLs end with `-DIGITS.html` (e.g., `/nl-be/handdoek-60211304.html`)
- The scraper auto-upgrades product tile images to 600px high-res versions
- GTM product data includes a category path like `"PARTY|Party|PARTYGOODS"` which is parsed into readable names

### Why use HEMA Scraper

- **Price monitoring** — Track prices, stock, and promotions across HEMA at scale
- **Competitive intelligence** — Compare your catalog against HEMA pricing and assortment
- **Market research** — Analyze category trends, new arrivals, and rating distributions
- **Lead generation** — Build product datasets for affiliate sites, comparison tools, or feeds
- **No login or cookies required** — Authenticated access not needed; works out of the box

### How to use HEMA Scraper

1. Open the **Input** tab and provide a search query, category URL, or product list
2. Adjust optional filters such as `maxResults` or proxy settings
3. Click **Start** and wait for the run to complete
4. Download results from the **Output** tab in JSON, CSV, Excel, XML, or HTML
5. Schedule recurring runs from the **Schedule** tab if you need ongoing data

### How to scrape HEMA data

This Actor automates the process of extracting structured product data from HEMA.
You can run it directly from the Apify console, the Apify API, or any of the
official SDKs (JavaScript, Python). The scraper handles pagination, retries, and
rate limiting so you can focus on the data, not the plumbing.

Typical workflows:

- **One-off export**: paste a category URL or keyword, set `maxResults`, and run
- **Scheduled monitoring**: set a daily cron in the Schedule tab to track prices over time
- **Programmatic integration**: trigger runs from your backend via the Apify API and
  pull the dataset when finished
- **Webhook automation**: receive a callback the moment a run completes and pipe
  the results into Zapier, Make, n8n, BigQuery, or Google Sheets

### Tips for best results

- **Start small** — run with `maxResults: 10` before launching large jobs
- **Use proxies** — residential proxies reduce blocking on protected sites
- **Throttle on big jobs** — keep `maxConcurrency` modest (5–10) for stability
- **Schedule runs** — daily runs are usually enough for price monitoring
- **Inspect the dataset schema** — the Storage tab shows the full output structure

### FAQ and support

**Is it legal to scrape HEMA?** This Actor extracts publicly available data.
Always review the website's Terms of Service before scraping at scale, and
respect rate limits.

**Why am I getting fewer results than expected?** Some categories have hidden
pagination or load more on scroll. Increase `maxResults` and verify your filters.

**Can I extract data for a single product?** Yes — provide the full product URL
in `startUrls` and the scraper will return one item.

**The site blocks me — what should I do?** Enable Apify residential proxies in
the input. Datacenter IPs are blocked by many e-commerce sites.

For issues, feature requests, or bug reports, open a ticket in the Issues tab on
the Actor page or contact support@apify.com. We monitor every actor and ship
fixes quickly when sites change.

# Actor input Schema

## `startUrls` (type: `array`):

HEMA category or product page URLs to scrape. Example: https://www.hema.com/nl-be/wonen/woondecoratie/verlichting
## `searchQuery` (type: `string`):

Search for products by keyword (e.g., 'lamp', 'handdoek', 'schoolspullen'). Overrides startUrls if provided.
## `maxResults` (type: `integer`):

Maximum number of products to return.
## `country` (type: `string`):

Which HEMA storefront to use. Affects prices, stock, and language.
## `scrapeProductPages` (type: `boolean`):

Visit individual product pages to get description, images, and specs. Slower but more data. If false, only extracts data from listing pages.
## `proxyConfiguration` (type: `object`):

Proxy settings. Usually not needed for hema.com.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://www.hema.com/nl-be/wonen/woondecoratie/verlichting"
    }
  ],
  "maxResults": 100,
  "country": "nl-be",
  "scrapeProductPages": false,
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ],
    "apifyProxyCountry": "NL"
  }
}
````

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://www.hema.com/nl-be/wonen/woondecoratie/verlichting"
        }
    ],
    "proxyConfiguration": {
        "useApifyProxy": true,
        "apifyProxyGroups": [
            "RESIDENTIAL"
        ],
        "apifyProxyCountry": "NL"
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("studio-amba/hema-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startUrls": [{ "url": "https://www.hema.com/nl-be/wonen/woondecoratie/verlichting" }],
    "proxyConfiguration": {
        "useApifyProxy": True,
        "apifyProxyGroups": ["RESIDENTIAL"],
        "apifyProxyCountry": "NL",
    },
}

# Run the Actor and wait for it to finish
run = client.actor("studio-amba/hema-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://www.hema.com/nl-be/wonen/woondecoratie/verlichting"
    }
  ],
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ],
    "apifyProxyCountry": "NL"
  }
}' |
apify call studio-amba/hema-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=studio-amba/hema-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "HEMA Scraper — Dutch & Belgian Discount Retail Products & Price",
        "description": "Scrape products, prices, stock status, and categories from hema.com. Supports search queries, category browsing, and the Belgian (nl-be) and Dutch (nl-nl) storefronts.",
        "version": "0.1",
        "x-build-id": "wKInpC3mMv1sIIjdM"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/studio-amba~hema-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-studio-amba-hema-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/studio-amba~hema-scraper/runs": {
            "post": {
                "operationId": "runs-sync-studio-amba-hema-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/studio-amba~hema-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-studio-amba-hema-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "startUrls": {
                        "title": "Category or Product URLs",
                        "type": "array",
                        "description": "HEMA category or product page URLs to scrape. Example: https://www.hema.com/nl-be/wonen/woondecoratie/verlichting",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "searchQuery": {
                        "title": "Search Query",
                        "type": "string",
                        "description": "Search for products by keyword (e.g., 'lamp', 'handdoek', 'schoolspullen'). Overrides startUrls if provided."
                    },
                    "maxResults": {
                        "title": "Max Results",
                        "minimum": 1,
                        "maximum": 50000,
                        "type": "integer",
                        "description": "Maximum number of products to return.",
                        "default": 100
                    },
                    "country": {
                        "title": "Country",
                        "enum": [
                            "nl-be",
                            "nl-nl",
                            "fr-be"
                        ],
                        "type": "string",
                        "description": "Which HEMA storefront to use. Affects prices, stock, and language.",
                        "default": "nl-be"
                    },
                    "scrapeProductPages": {
                        "title": "Scrape Product Pages",
                        "type": "boolean",
                        "description": "Visit individual product pages to get description, images, and specs. Slower but more data. If false, only extracts data from listing pages.",
                        "default": false
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Proxy settings. Usually not needed for hema.com."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
