# Shopify Product Scraper - Products, Collections & Entire Stores (`novus/shopify-scraper`) Actor

An advanced Shopify data extraction tool built for professionals. Simply enter any store, collection, or product URL — the scraper automatically detects Shopify stores, fetches structured product data via the Shopify JSON API, and handles pagination for large catalogs.

- **URL**: https://apify.com/novus/shopify-scraper.md
- **Developed by:** [Novus](https://apify.com/novus) (community)
- **Categories:** E-commerce, Developer tools, Automation
- **Stats:** 9 total users, 0 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

$10.00/month + usage

To use this Actor, you pay a monthly rental fee to the developer. The rent is subtracted from your prepaid usage every month after the free trial period.You also pay for the Apify platform usage, which gets cheaper the higher Apify subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#rental-actors

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Shopify Scraper

A professional-grade Shopify store scraper that extracts comprehensive product data from any Shopify-based e-commerce site. Supports full store crawling, collection scraping, individual product extraction, and search functionality.

### Why Use This Scraper?

- **Market Research**: Analyze competitor pricing, product descriptions, and variants
- **Trend Monitoring**: Track new product launches and stock status changes
- **Data Aggregation**: Build comprehensive catalogs from multiple Shopify stores
- **Marketing Insights**: Understand how brands structure their product metadata and categories

### Key Features

🚀 **Store-Wide Crawling** — Automatically discovers and extracts all products from an entire store

🎯 **Precision Targeting** — Scrape specific collections or individual product URLs

🔍 **Search Support** — Search for products within a store using keywords

💱 **Price Normalization** — Prices stored as integers (cents) to avoid floating-point errors

📦 **Comprehensive Data** — Extracts titles, descriptions, variants, images, options, SKUs, barcodes, and stock status

🛡️ **Anti-Bot Resilience** — Built-in rate limiting, retry logic, and proxy rotation

🔄 **Auto-Detection** — Automatically verifies if a site is Shopify-powered

⚡ **Fast & Reliable** — Optimized extraction with automatic fallback mechanisms

### How It Works

1. **Automatic Shopify Detection** — The scraper automatically verifies each URL is a Shopify store
2. **Smart URL Classification** — Detects if URL is a store root, collection, product, or search page
3. **Data Extraction** — Extracts comprehensive product data with automatic pagination
4. **Deduplication** — Automatically removes duplicate products across collections

### Input Parameters

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `startUrls` | Array | ✅ | - | List of Shopify URLs (store, collection, or product) |
| `maxProducts` | Integer | No | 0 | Max products to scrape (0 = unlimited) |
| `searchQuery` | String | No | - | Search term for finding specific products |
| `includeOutOfStock` | Boolean | No | true | Include out-of-stock products |
| `currency` | String | No | Auto | Currency code (USD, EUR, GBP, etc.) |
| `proxy` | Object | No | Apify Proxy | Proxy configuration |
| `useHeadlessFallback` | Boolean | No | true | Enable fallback extraction method |
| `requestTimeout` | Integer | No | 30000 | Request timeout in milliseconds |
| `retryCount` | Integer | No | 3 | Number of retry attempts |

### Input Examples

#### Scrape Entire Store
```json
{
    "startUrls": [
        { "url": "https://www.allbirds.com" }
    ],
    "proxy": {
        "useApifyProxy": true
    }
}
````

#### Scrape Specific Collection

```json
{
    "startUrls": [
        { "url": "https://www.allbirds.com/collections/mens-shoes" }
    ],
    "maxProducts": 100
}
```

#### Scrape Single Product

```json
{
    "startUrls": [
        { "url": "https://www.allbirds.com/products/mens-wool-runners" }
    ]
}
```

#### Search Within Store

```json
{
    "startUrls": [
        { "url": "https://www.allbirds.com" }
    ],
    "searchQuery": "wool runners",
    "maxProducts": 20
}
```

#### Multiple URLs

```json
{
    "startUrls": [
        { "url": "https://www.allbirds.com/collections/mens" },
        { "url": "https://www.allbirds.com/collections/womens" }
    ],
    "maxProducts": 50
}
```

### Output Schema

Each product includes:

```json
{
    "source": {
        "id": "7654321098765",
        "handle": "mens-wool-runners",
        "url": "https://www.allbirds.com/products/mens-wool-runners",
        "retailer": "www.allbirds.com",
        "scrapedAt": "2025-12-13T10:30:00Z"
    },
    "title": "Men's Wool Runners",
    "description": "Our original wool shoe...",
    "descriptionHtml": "<p>Our original wool shoe...</p>",
    "vendor": "Allbirds",
    "productType": "Shoes",
    "tags": ["mens", "shoes", "wool"],
    "createdAt": "2025-01-15T00:00:00Z",
    "updatedAt": "2025-12-10T00:00:00Z",
    "publishedAt": "2025-01-15T08:00:00Z",
    "variants": [
        {
            "id": "42345678901234",
            "title": "8 / Natural Grey",
            "sku": "WR-M-NG-8",
            "barcode": "1234567890123",
            "price": 11000,
            "compareAtPrice": null,
            "currency": "USD",
            "available": true,
            "inventoryQuantity": null,
            "requiresShipping": true,
            "weight": 0.5,
            "weightUnit": "kg",
            "option1": "8",
            "option2": "Natural Grey",
            "option3": null
        }
    ],
    "images": [
        {
            "id": "12345678901234",
            "url": "https://cdn.shopify.com/s/files/...",
            "alt": "Men's Wool Runners",
            "width": 1200,
            "height": 1500,
            "position": 1
        }
    ],
    "options": [
        { "name": "Size", "position": 1, "values": ["7", "8", "9", "10", "11", "12"] },
        { "name": "Color", "position": 2, "values": ["Natural Grey", "Black", "Navy"] }
    ]
}
```

### Extracted Data Fields

| Product Fields | Variant Fields | Media & Options |
|---------------|----------------|-----------------|
| Title | SKU | All Images |
| Description (text & HTML) | Barcode | Image Dimensions |
| Vendor/Brand | Price (in cents) | Alt Text |
| Product Type | Compare-at Price | Options (Size, Color) |
| Tags | Availability | Option Values |
| Created/Updated Dates | Inventory Quantity | Positions |
| URL & Handle | Weight & Unit | |
| Retailer | Shipping Required | |

### URL Types Supported

| URL Pattern | Type | Example |
|-------------|------|---------|
| `domain.com` | Full Store | `https://www.allbirds.com` |
| `/collections/{handle}` | Collection | `https://www.allbirds.com/collections/mens` |
| `/products/{handle}` | Single Product | `https://www.allbirds.com/products/wool-runners` |
| `/search?q={query}` | Search Results | `https://www.allbirds.com/search?q=wool` |

### Troubleshooting

| Issue | Possible Cause | Solution |
|-------|---------------|----------|
| 0 Results | Site is not Shopify-based | Check logs - scraper auto-detects and warns |
| 403 / Access Denied | IP flagged | Enable `useApifyProxy` with residential proxies |
| Incorrect Prices | Integer format | Divide by 100 (e.g., 2995 → $29.95) |
| Missing Products | Rate limiting | Increase `retryCount`, use proxies |

### Important Notes

- **Price Format**: All prices are integers in cents (e.g., 2995 = $29.95)
- **Variants**: Each product contains all variants in the `variants` array
- **Deduplication**: Products are automatically deduplicated by ID
- **Pagination**: Handles pagination automatically across large catalogs

### Cost Estimation

- **Speed**: Typically 500-2,000 products per minute
- **Compute Units**: ~0.1-0.2 CUs per 1,000 products
- **Proxy**: Residential proxies recommended for best results

*Actual costs vary based on store size and anti-bot measures.*

# Actor input Schema

## `startUrls` (type: `array`):

List of Shopify URLs to scrape. Can be store root URLs, collection URLs, product URLs, or search URLs.

## `maxProducts` (type: `integer`):

Maximum number of products to scrape. Set to 0 for unlimited.

## `searchQuery` (type: `string`):

Search query for product search. Only used when scraping search URLs or when you want to search within a store.

## `includeOutOfStock` (type: `boolean`):

Whether to include out-of-stock products in the results.

## `currency` (type: `string`):

Currency code for prices (e.g., USD, EUR, GBP). If not specified, uses the store's default currency.

## `proxy` (type: `object`):

Proxy settings for the scraper. Residential proxies are recommended for best results.

## `useHeadlessFallback` (type: `boolean`):

Enable fallback extraction method when primary method is blocked.

## `requestTimeout` (type: `integer`):

Timeout for HTTP requests in milliseconds.

## `retryCount` (type: `integer`):

Number of times to retry failed requests.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://www.allbirds.com"
    }
  ],
  "maxProducts": 0,
  "includeOutOfStock": true,
  "currency": "",
  "proxy": {
    "useApifyProxy": true
  },
  "useHeadlessFallback": true,
  "requestTimeout": 30000,
  "retryCount": 3
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://www.allbirds.com"
        }
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("novus/shopify-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "startUrls": [{ "url": "https://www.allbirds.com" }] }

# Run the Actor and wait for it to finish
run = client.actor("novus/shopify-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://www.allbirds.com"
    }
  ]
}' |
apify call novus/shopify-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=novus/shopify-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Shopify Product Scraper - Products, Collections & Entire Stores",
        "description": "An advanced Shopify data extraction tool built for professionals. Simply enter any store, collection, or product URL — the scraper automatically detects Shopify stores, fetches structured product data via the Shopify JSON API, and handles pagination for large catalogs.",
        "version": "1.3",
        "x-build-id": "XQc2qirGNYGg2pqFy"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/novus~shopify-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-novus-shopify-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/novus~shopify-scraper/runs": {
            "post": {
                "operationId": "runs-sync-novus-shopify-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/novus~shopify-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-novus-shopify-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls"
                ],
                "properties": {
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "List of Shopify URLs to scrape. Can be store root URLs, collection URLs, product URLs, or search URLs.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "maxProducts": {
                        "title": "Max Products",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum number of products to scrape. Set to 0 for unlimited.",
                        "default": 0
                    },
                    "searchQuery": {
                        "title": "Search Query",
                        "type": "string",
                        "description": "Search query for product search. Only used when scraping search URLs or when you want to search within a store."
                    },
                    "includeOutOfStock": {
                        "title": "Include Out of Stock Products",
                        "type": "boolean",
                        "description": "Whether to include out-of-stock products in the results.",
                        "default": true
                    },
                    "currency": {
                        "title": "Currency",
                        "enum": [
                            "",
                            "USD",
                            "EUR",
                            "GBP",
                            "CAD",
                            "AUD",
                            "JPY",
                            "CNY",
                            "INR",
                            "BRL",
                            "MXN"
                        ],
                        "type": "string",
                        "description": "Currency code for prices (e.g., USD, EUR, GBP). If not specified, uses the store's default currency.",
                        "default": ""
                    },
                    "proxy": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Proxy settings for the scraper. Residential proxies are recommended for best results.",
                        "default": {
                            "useApifyProxy": true
                        }
                    },
                    "useHeadlessFallback": {
                        "title": "Enable Fallback Extraction",
                        "type": "boolean",
                        "description": "Enable fallback extraction method when primary method is blocked.",
                        "default": true
                    },
                    "requestTimeout": {
                        "title": "Request Timeout (ms)",
                        "minimum": 1000,
                        "maximum": 120000,
                        "type": "integer",
                        "description": "Timeout for HTTP requests in milliseconds.",
                        "default": 30000
                    },
                    "retryCount": {
                        "title": "Retry Count",
                        "minimum": 0,
                        "maximum": 10,
                        "type": "integer",
                        "description": "Number of times to retry failed requests.",
                        "default": 3
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
