# Shopify Store Scraper | Metadata & Catalog Extractor (`taroyamada/shopify-store-intelligence`) Actor

Shopify store scraper that pulls public storefront metadata, product catalogs, collections, and vendor data directly from JSON endpoints. No browser, no auth. Returns structured tables ready for competitive catalog research.

- **URL**: https://apify.com/taroyamada/shopify-store-intelligence.md
- **Developed by:** [naoki anzai](https://apify.com/taroyamada) (community)
- **Categories:** E-commerce, Developer tools, SEO tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per event

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Shopify Store Leads & Catalog Intelligence

<!-- v37-internal-flow-boost:start -->
### After this run

Turn this Actor's output into a capped paid report with [Ad Landing Page Offer Intelligence & CRO Gap Report](https://apify.com/taroyamada/ad-landing-page-offer-intelligence?utm_source=apify_internal&utm_medium=readme_after_run&utm_campaign=v37_internal_flow&utm_content=shopify-store-intelligence__ad-landing-page-offer-intelligence).
Use it when paid media, CRO, and agency teams need to decide which public landing-page offer gaps to fix before increasing ad spend.

- First report: $3 / `landing_offer_report`; set `maxChargeUsd` to $3.
- Deeper report: $15 / `cro_gap_report_pack`; use only when the first result needs competitor or action-depth.
- This is an internal Apify flow aid. It is not revenue proof until accounted paid usage appears.
<!-- v37-internal-flow-boost:end -->

<!-- v30-traffic-cta:start -->
### Next report-style Actors

If you already have data from this Actor, these follow-on Actors turn public or user-provided inputs into decision-ready reports. They are optional, capped by `maxChargeUsd`, and do not make business outcome claims.

- [ATS Hiring Signal Report](https://apify.com/taroyamada/ats-hiring-signal-report) - turn target-company public hiring pages into expansion and account-priority signals.
- [SaaS Pricing Page Monitor](https://apify.com/taroyamada/saas-pricing-page-change-monitor) - monitor competitor public pricing pages after store intelligence.
- [Ad Landing Page Offer Intelligence](https://apify.com/taroyamada/ad-landing-page-offer-intelligence) - audit public landing pages for offer, proof, CTA, and friction.
- [CSV Local Business List Scoring](https://apify.com/taroyamada/csv-local-business-list-scoring-report) - score exported business lists before SEO cleanup.

<!-- v30-traffic-cta:end -->


Runtime: Node.js 20+.

Extract analyst-ready Shopify storefront intelligence from public merchant endpoints: normalized domain, store identity, currency, price range, sampled products, collections, merch rollups, endpoint warnings, and explicit pay-per-event billing fields.

This actor is built for ecommerce analysts, growth teams, marketplace operators, technical SEO teams, data engineers, and competitive intelligence workflows that need repeatable storefront facts without running a browser. It reads public Shopify surfaces such as the homepage, `/meta.json`, `/products.json`, `/collections.json`, and optional `/pages.json` / `/blogs.json`. It is not a search engine or discovery actor: provide the store URLs you want inspected.

### Store Quickstart

Start with dataset delivery so analysts can inspect rows before wiring automation:

- **Quickstart Baseline (2 Stores -> Catalog + Merch Signals)**: two public storefronts, low sampling limits, and the core analyst fields: `status`, `chargedEvent`, `isShopify`, `normalizedDomain`, `storeName`, `currency`, `priceRange`, `productCount`, `productsSample`, `signals`, and `errors`.
- **Recurring Baseline (Multi-Store Catalog Watch)**: schedule the same watchlist weekly or daily to compare catalog size, price range, vendor/tag rollups, endpoint availability, and warning counts over time.
- **Webhook Routed Check (Daily Store Updates)**: use only after dataset rows match your downstream BI, CRM, Slack, Make, n8n, or warehouse schema.
- **Content Expansion (Pages / Blogs When Public)**: enable when page/blog metadata matters for SEO, launch monitoring, policy copy checks, or content inventory.

The included `store-input.example.json` is the lowest-friction Store proof. `sample-output.example.json` shows the published result contract, including one charged Shopify row and one no-charge blocked row.

### Analyst Workflow

1. Provide known Shopify or ecommerce storefront URLs. The actor does not discover stores from search terms.
2. Run in `delivery: "dataset"` with modest sampling limits.
3. Review `status`, `chargedEvent`, `signals`, `warnings`, and `errors` before routing rows downstream.
4. Filter charged Shopify rows with `chargedEvent` equal to `store_enriched` or `store_partial` for analyst review.
5. Keep no-charge rows such as `invalid_input`, `blocked`, and `not_store` as watchlist cleanup tasks.
6. Add webhook delivery only after analysts trust the dataset shape.

### Key Features

- Multi-store inspection for up to 50 storefront URLs per run.
- Public Shopify signal detection from homepage, meta, products, and collections endpoints.
- Analyst summary fields for domain, store name, currency, price range, product count, sampled products, status, billing event, signals, and errors.
- Catalog and collection samples from public Shopify JSON endpoints.
- Vendor, tag, and product-type rollups derived from sampled products.
- Restriction-aware output for blocked, non-JSON, unavailable, timeout, invalid, and non-store cases.
- Optional pages and blogs metadata when public endpoints expose it.
- Dataset-first and webhook-after delivery modes.

### Use Cases

| Who | Workflow | Value |
|-----|----------|-------|
| Ecommerce analysts | Track competitor catalog, price bands, and merchandising structure | Repeatable store summary rows for comparison over time |
| Data teams | Pipe normalized storefront fields into warehouses or dashboards | Stable row keys such as `normalizedDomain`, `storeName`, and `currency` |
| Technical SEO teams | Inspect public collections, products, pages, and blog metadata | Fast endpoint-level visibility without browser automation |
| Marketplace operators | Validate merchant storefronts and detect public Shopify evidence | Clear `isShopify`, `status`, and no-charge cleanup rows |
| RevOps / growth teams | Feed merchant intelligence into CRM or account scoring | Sampled products, signals, and errors ready for routing |

### Input

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `storeUrls` | string[] | required | Known storefront URLs to inspect. Custom domains and `*.myshopify.com` domains both work. Maximum 50 stores per run. |
| `productSampleLimit` | integer | `25` | Maximum public products to fetch per store from `/products.json`. Keep low for quickstarts and recurring monitoring. |
| `collectionSampleLimit` | integer | `25` | Maximum public collections to fetch per store from `/collections.json`. |
| `includeContentMetadata` | boolean | `false` | When true, also attempts `/pages.json` and `/blogs.json`. Leave false until content metadata is worth the extra sampling. |
| `contentSampleLimit` | integer | `10` | Maximum public pages or blogs to sample when content metadata is enabled. |
| `timeoutMs` | integer | `15000` | Per-request timeout in milliseconds. |
| `delivery` | string | `"dataset"` | `dataset` writes durable rows for review. `webhook` writes dataset rows first, then sends the full payload to `webhookUrl`. |
| `webhookUrl` | string | empty | Required only when `delivery` is `"webhook"`. |
| `dryRun` | boolean | `false` | Development mode. Skips dataset writes and webhook delivery, but still writes local `output/result.json`. |

#### Input Example

```json
{
  "storeUrls": [
    "https://colourpop.com",
    "https://allbirds.com"
  ],
  "productSampleLimit": 10,
  "collectionSampleLimit": 6,
  "includeContentMetadata": false,
  "contentSampleLimit": 5,
  "timeoutMs": 15000,
  "delivery": "dataset",
  "dryRun": false
}
````

### Input Examples

#### Example: Single store catalog snapshot

```json
{
  "stores": [
    "allbirds.com"
  ],
  "includeCollections": true,
  "maxProductsPerStore": 250
}
```

#### Example: Competitor catalog comparison

```json
{
  "stores": [
    "brand1.myshopify.com",
    "brand2.myshopify.com"
  ],
  "includeCollections": true,
  "includeVendorRollup": true
}
```

#### Example: Vendor / tag rollup audit

```json
{
  "stores": [
    "multi-brand-store.com"
  ],
  "includeVendorRollup": true,
  "includeTagRollup": true,
  "maxProductsPerStore": 500
}
```

### Output

The Apify dataset receives one row per input storefront after normalization and deduplication. Local `output/result.json` wraps the same rows in `{ "meta": ..., "results": [...] }`.

| Field | Type | Analyst meaning |
|-------|------|-----------------|
| `status` | string | Result classification: `success`, `partial`, `not_shopify`, `blocked`, `invalid_input`, `not_store`, `timeout`, or `error`. Use this before routing rows to analysts. |
| `chargedEvent` | string|null | PPE billing event for the row. `store_enriched`, `store_partial`, and `non_shopify_store_detected` are charged; `null` means no-charge diagnostic output. |
| `isShopify` | boolean | True when Shopify evidence was detected from metadata, Shopify endpoints, theme scripts, or public Shopify JSON. |
| `normalizedDomain` | string|null | Lowercase domain without leading `www.`. Use it as a stable join key across CRM, BI, and recurring runs. |
| `storeName` | string|null | Best public store name from Shopify metadata, Open Graph, title, or hostname fallback. |
| `currency` | string|null | Public currency hint from Shopify metadata when available. |
| `priceRange` | object | Minimum and maximum prices observed in sampled public products. Null values mean no public product prices were sampled. |
| `productCount` | integer | Number of public products sampled in this row. This is sample count, not full catalog size. |
| `productsSample` | object\[] | CSV/API-friendly alias for sampled public product records. Same source as `productSamples`. |
| `signals` | string\[] | Evidence used for classification and charging, such as `shopify_detected`, `shopify_products_json`, `shopify_collections_json`, or `ecommerce_cart_or_checkout`. |
| `errors` | object\[] | Structured endpoint or run problems important for automation. Includes type, endpoint, HTTP status, and message. |
| `inputUrl` | string | Original URL provided by the user. |
| `normalizedUrl` | string|null | Origin URL used for endpoint requests. |
| `hostname` | string | Hostname from `normalizedUrl`. |
| `store` | object | Store profile fields such as name, canonical URL, myshopify domain, theme, locale, country, and Shopify detection. |
| `summary` | object | Counts, sample basis, and endpoint status map for homepage, meta, products, collections, pages, and blogs. |
| `collections` | object\[] | Sampled public collections. |
| `productSamples` | object\[] | Sampled public products with vendor, type, tags, variant count, availability, images, and product-level price range. |
| `rollups` | object | Vendor, tag, and product-type counts derived from sampled products only. |
| `content` | object | Optional page and blog samples when `includeContentMetadata` is enabled and endpoints are public. |
| `warnings` | object\[] | Endpoint restrictions, non-JSON responses, unavailable endpoints, and sample truncation notices. |
| `error` | string|null | Run-level error message for failed rows. |

#### Output Example

```json
{
  "inputUrl": "https://example-shop.com",
  "normalizedUrl": "https://example-shop.com",
  "hostname": "example-shop.com",
  "status": "success",
  "chargedEvent": "store_enriched",
  "isShopify": true,
  "normalizedDomain": "example-shop.com",
  "storeName": "Example Shop",
  "currency": "USD",
  "priceRange": { "min": 12.5, "max": 49.99 },
  "productCount": 2,
  "productsSample": [
    {
      "title": "Sample Product",
      "url": "https://example-shop.com/products/sample-product",
      "vendor": "Example Shop",
      "productType": "Accessory",
      "priceRange": { "min": 12.5, "max": 19.99 }
    }
  ],
  "signals": ["shopify_detected", "shopify_products_json", "shopify_collections_json"],
  "errors": [],
  "store": {
    "name": "Example Shop",
    "currency": "USD",
    "myshopifyDomain": "example-shop.myshopify.com",
    "canonicalUrl": "https://example-shop.com/",
    "themeName": "Dawn",
    "shopifyDetected": true
  },
  "summary": {
    "productSampleCount": 2,
    "collectionSampleCount": 1,
    "vendorCount": 1,
    "tagCount": 3,
    "endpointStatuses": {
      "homepage": "ok",
      "meta": "ok",
      "products": "ok",
      "collections": "ok",
      "pages": "skipped",
      "blogs": "skipped"
    }
  },
  "warnings": [],
  "error": null
}
```

No-charge diagnostic rows keep analyst queues honest:

```json
{
  "inputUrl": "not a url",
  "normalizedUrl": null,
  "hostname": "",
  "status": "invalid_input",
  "chargedEvent": null,
  "isShopify": false,
  "normalizedDomain": null,
  "storeName": null,
  "currency": null,
  "priceRange": { "min": null, "max": null },
  "productCount": 0,
  "productsSample": [],
  "signals": [],
  "errors": [
    {
      "type": "invalid_input",
      "endpoint": null,
      "status": null,
      "message": "Unsupported protocol for store URL."
    }
  ]
}
```

### PPE Events And No-Charge Rules

This actor uses explicit pay-per-event row charging. Production runtime passes the event name from `chargedEvent` when a row should be charged.

| Result status | PPE event | Charged? | Meaning |
|---------------|-----------|----------|---------|
| `success` | `store_enriched` | Yes | Shopify evidence plus useful public catalog or store metadata were captured, and primary catalog endpoints were available. |
| `partial` | `store_partial` | Yes | Shopify evidence was found, but one or more important endpoints were restricted, unavailable, or incomplete. |
| `not_shopify` with ecommerce evidence | `non_shopify_store_detected` | Yes | The site looks like an ecommerce store but does not expose Shopify evidence. Useful for merchant classification. |
| `invalid_input` | `null` | No | The input could not be normalized into an HTTP(S) storefront URL. |
| `blocked` | `null` | No | Public endpoints were blocked, restricted, password-like, or non-JSON across primary surfaces. |
| `not_store` | `null` | No | The homepage loaded but no Shopify or ecommerce storefront evidence was found. |
| `timeout` | `null` | No | Endpoint requests timed out. |
| `error` | `null` | No | Unexpected run or fetch failure. |

Use `chargedEvent` rather than `status` alone for billing audits. `invalid_input`, `blocked`, `not_store`, `timeout`, and `error` rows are no-charge diagnostics and should be retained for watchlist cleanup.

### API Usage

Run this actor programmatically using the Apify API. Replace `YOUR_API_TOKEN` with your token from [Apify Console -> Settings -> Integrations](https://console.apify.com/account/integrations).

#### cURL

```bash
curl -X POST "https://api.apify.com/v2/acts/taroyamada~shopify-store-intelligence/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "storeUrls": ["https://colourpop.com", "https://allbirds.com"], "productSampleLimit": 10, "collectionSampleLimit": 6, "includeContentMetadata": false, "contentSampleLimit": 5, "timeoutMs": 15000, "delivery": "dataset", "dryRun": false }'
```

#### Python

```python
from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("taroyamada/shopify-store-intelligence").call(run_input={
    "storeUrls": ["https://colourpop.com", "https://allbirds.com"],
    "productSampleLimit": 10,
    "collectionSampleLimit": 6,
    "includeContentMetadata": False,
    "contentSampleLimit": 5,
    "timeoutMs": 15000,
    "delivery": "dataset",
    "dryRun": False,
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["normalizedDomain"], item["status"], item["chargedEvent"], item["productCount"])
```

#### JavaScript / Node.js

```javascript
import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('taroyamada/shopify-store-intelligence').call({
  storeUrls: ['https://colourpop.com', 'https://allbirds.com'],
  productSampleLimit: 10,
  collectionSampleLimit: 6,
  includeContentMetadata: false,
  contentSampleLimit: 5,
  timeoutMs: 15000,
  delivery: 'dataset',
  dryRun: false,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items.map((row) => ({
  normalizedDomain: row.normalizedDomain,
  status: row.status,
  chargedEvent: row.chargedEvent,
  productCount: row.productCount,
})));
```

### Tips And Limitations

- This is not a search or discovery actor; provide known storefront URLs.
- `productCount`, `priceRange`, `rollups`, and `productsSample` are based on sampled public products, not the full catalog.
- Some Shopify stores restrict public JSON endpoints; these become `partial`, `blocked`, or no-charge diagnostic rows depending on available evidence.
- `not_shopify` can be charged only when useful ecommerce evidence exists, because it helps classify non-Shopify merchant URLs.
- Use `delivery: "dataset"` first. Move to webhooks only after downstream tools accept the row shape.
- Use `dryRun: true` for local development or shape checks where dataset writes and webhooks should be skipped.

### FAQ

**Does this fetch every product and collection?**

No. The actor samples public `/products.json` and `/collections.json` results up to your configured limits.

**What happens on restricted stores?**

The actor emits explicit `warnings` and `errors`, and uses `partial`, `blocked`, or another diagnostic status depending on whether useful Shopify evidence was still available.

**Can non-Shopify stores be useful?**

Yes. If the homepage contains ecommerce evidence such as cart, checkout, product structured data, or platform hints, the row is classified as `not_shopify` and charged as `non_shopify_store_detected`.

**Can I route results to another tool?**

Yes. Keep dataset mode for inspection, then use webhook mode for Slack, Make, n8n, BI ingestion, CRM enrichment, or internal monitoring.

### Related Actors

- [Website Content Extractor](https://apify.com/taroyamada/website-content-extractor) for cleaned text from policy, FAQ, pricing, help-center, or landing pages.
- [Contact Details Extractor](https://apify.com/taroyamada/contact-details-extractor) for public support, sales, or partnership contacts from the same merchant domain.
- [Domain Security Audit API](https://apify.com/taroyamada/domain-trust-monitor) for SSL, DMARC, expiry, and security-header checks.
- [AI Visibility Monitor](https://apify.com/taroyamada/ai-visibility-monitor-actor) for brand visibility checks beside storefront monitoring.

### Was this helpful?

If this actor saved you time, please leave a rating on Apify Store. Bug reports and feature requests belong on the actor Issues tab.

### Premium Report Pack

Use these premium report actors when a raw dataset is ready to become a buyer-facing audit, watch summary, or agency deliverable. All three keep `sourceDatasetId` as advanced-only; first runs should use pasted input, URLs, demo mode, and `reportTier`.

- [CSV Local Business List Scoring & SEO Gap Report](https://apify.com/taroyamada/csv-local-business-list-scoring-report) - Score pasted local business CSV lists and produce agency-ready lead/SEO gap reports.
- [SaaS Pricing Page Monitor & Competitor Price Change Alerts](https://apify.com/taroyamada/saas-pricing-page-change-monitor) - Turn public pricing pages into snapshots, competitor reports, and weekly pricing watch summaries.
- [Ad Landing Page Offer Intelligence & CRO Gap Report](https://apify.com/taroyamada/ad-landing-page-offer-intelligence) - Analyze user-provided landing pages and pasted ad copy for offer, CTA, proof, and CRO gaps.

Recommended flow from this actor: run the current extraction/check first, export the useful dataset or copy the relevant URLs, then choose `entry`, `premium`, or `bundle` in the report actor with `maxChargeUsd` as the safety cap.

### Related report Actors

Use these follow-on Actors when you want a capped, decision-ready report instead of more raw rows. They use public or user-provided inputs, respect `maxChargeUsd`, and do not promise rankings, revenue, conversion lifts, or sales outcomes.

- [SaaS Pricing Page Monitor](https://apify.com/taroyamada/saas-pricing-page-change-monitor) - watch public competitor pricing and packaging pages after store research.
- [Ad Landing Page Offer Intelligence](https://apify.com/taroyamada/ad-landing-page-offer-intelligence) - audit public landing pages for offer, proof, CTA, and friction.
- [CSV Local Business List Scoring](https://apify.com/taroyamada/csv-local-business-list-scoring-report) - score exported or user-provided business lists before cleanup.

### Related paid report workflows

If this Actor gave you raw rows or source context, these follow-on report Actors are designed for a small capped paid run. They help make a decision, not just collect more data.

- [SaaS Pricing Page Monitor & Competitor Price Change Alerts](https://apify.com/taroyamada/saas-pricing-page-change-monitor) - decide whether a public competitor pricing page changed in a way that affects packaging or sales messaging. Entry $3 / `pricing_snapshot_report`; premium $15 / `competitor_pricing_report`.
- [Ad Landing Page Offer Intelligence & CRO Gap Report](https://apify.com/taroyamada/ad-landing-page-offer-intelligence) - decide which public landing-page offer gaps to fix before increasing ad spend. Entry $3 / `landing_offer_report`; premium $15 / `cro_gap_report_pack`.
- [CSV Local Business List Scoring & SEO Gap Report](https://apify.com/taroyamada/csv-local-business-list-scoring-report) - prioritize which businesses in a list deserve outreach, cleanup, or SEO follow-up. Entry $3 / `lead_scoring_report`; premium $15 / `agency_lead_gap_report`.

Keep `maxChargeUsd` equal to the selected tier. Internal links are traffic aids only; real proof requires accounted paid usage.

# Actor input Schema

## `storeUrls` (type: `array`):

One or more known storefront URLs to inspect. Custom domains and \*.myshopify.com domains both work. This actor does not search for stores; maximum 50 URLs per run.

## `productSampleLimit` (type: `integer`):

Maximum number of public products to fetch per store from /products.json. ProductCount, priceRange, productsSample, and merch rollups are based on this sample, not the full catalog.

## `collectionSampleLimit` (type: `integer`):

Maximum number of public collections to fetch per store from /collections.json.

## `includeContentMetadata` (type: `boolean`):

Attempt lightweight metadata fetches from /pages.json and /blogs.json when public storefront endpoints expose them. Leave off for the fastest analyst baseline.

## `contentSampleLimit` (type: `integer`):

Maximum public pages or blogs to sample per store when includeContentMetadata is enabled.

## `timeoutMs` (type: `integer`):

Per-request timeout in milliseconds. Timeout rows are no-charge diagnostics when useful storefront data cannot be captured.

## `delivery` (type: `string`):

Start with dataset for analyst inspection. Webhook writes dataset rows first, then posts the same meta/results payload to your endpoint.

## `webhookUrl` (type: `string`):

Required only when delivery is webhook. Leave empty for dataset-first Store quickstarts.

## `dryRun` (type: `boolean`):

If true, skips dataset writes and webhook delivery while still writing local output/result.json. Use for development shape checks only.

## `maxChargeUsd` (type: `number`):

Hard safety cap for custom PPE result events in this run. Rows that would exceed the cap are emitted as no-charge limit\_reached diagnostics.

## Actor input object example

```json
{
  "storeUrls": [
    "https://colourpop.com",
    "https://allbirds.com"
  ],
  "productSampleLimit": 25,
  "collectionSampleLimit": 25,
  "includeContentMetadata": false,
  "contentSampleLimit": 10,
  "timeoutMs": 15000,
  "delivery": "dataset",
  "dryRun": false,
  "maxChargeUsd": 2.5
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "storeUrls": [
        "https://colourpop.com",
        "https://allbirds.com"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("taroyamada/shopify-store-intelligence").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "storeUrls": [
        "https://colourpop.com",
        "https://allbirds.com",
    ] }

# Run the Actor and wait for it to finish
run = client.actor("taroyamada/shopify-store-intelligence").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "storeUrls": [
    "https://colourpop.com",
    "https://allbirds.com"
  ]
}' |
apify call taroyamada/shopify-store-intelligence --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=taroyamada/shopify-store-intelligence",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Shopify Store Scraper | Metadata & Catalog Extractor",
        "description": "Shopify store scraper that pulls public storefront metadata, product catalogs, collections, and vendor data directly from JSON endpoints. No browser, no auth. Returns structured tables ready for competitive catalog research.",
        "version": "0.1",
        "x-build-id": "nE5iEeZeOektAKJo8"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/taroyamada~shopify-store-intelligence/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-taroyamada-shopify-store-intelligence",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/taroyamada~shopify-store-intelligence/runs": {
            "post": {
                "operationId": "runs-sync-taroyamada-shopify-store-intelligence",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/taroyamada~shopify-store-intelligence/run-sync": {
            "post": {
                "operationId": "run-sync-taroyamada-shopify-store-intelligence",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "storeUrls"
                ],
                "properties": {
                    "storeUrls": {
                        "title": "Known Storefront URLs",
                        "minItems": 1,
                        "maxItems": 50,
                        "type": "array",
                        "description": "One or more known storefront URLs to inspect. Custom domains and *.myshopify.com domains both work. This actor does not search for stores; maximum 50 URLs per run.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "productSampleLimit": {
                        "title": "Product Sample Limit",
                        "minimum": 1,
                        "maximum": 250,
                        "type": "integer",
                        "description": "Maximum number of public products to fetch per store from /products.json. ProductCount, priceRange, productsSample, and merch rollups are based on this sample, not the full catalog.",
                        "default": 25
                    },
                    "collectionSampleLimit": {
                        "title": "Collection Sample Limit",
                        "minimum": 1,
                        "maximum": 250,
                        "type": "integer",
                        "description": "Maximum number of public collections to fetch per store from /collections.json.",
                        "default": 25
                    },
                    "includeContentMetadata": {
                        "title": "Include Pages / Blogs Metadata",
                        "type": "boolean",
                        "description": "Attempt lightweight metadata fetches from /pages.json and /blogs.json when public storefront endpoints expose them. Leave off for the fastest analyst baseline.",
                        "default": false
                    },
                    "contentSampleLimit": {
                        "title": "Content Sample Limit",
                        "minimum": 1,
                        "maximum": 100,
                        "type": "integer",
                        "description": "Maximum public pages or blogs to sample per store when includeContentMetadata is enabled.",
                        "default": 10
                    },
                    "timeoutMs": {
                        "title": "Timeout (ms)",
                        "minimum": 1000,
                        "maximum": 30000,
                        "type": "integer",
                        "description": "Per-request timeout in milliseconds. Timeout rows are no-charge diagnostics when useful storefront data cannot be captured.",
                        "default": 15000
                    },
                    "delivery": {
                        "title": "Delivery Mode",
                        "enum": [
                            "dataset",
                            "webhook"
                        ],
                        "type": "string",
                        "description": "Start with dataset for analyst inspection. Webhook writes dataset rows first, then posts the same meta/results payload to your endpoint.",
                        "default": "dataset"
                    },
                    "webhookUrl": {
                        "title": "Webhook URL",
                        "type": "string",
                        "description": "Required only when delivery is webhook. Leave empty for dataset-first Store quickstarts."
                    },
                    "dryRun": {
                        "title": "Dry Run",
                        "type": "boolean",
                        "description": "If true, skips dataset writes and webhook delivery while still writing local output/result.json. Use for development shape checks only.",
                        "default": false
                    },
                    "maxChargeUsd": {
                        "title": "Max Charge USD",
                        "minimum": 0,
                        "maximum": 100,
                        "type": "number",
                        "description": "Hard safety cap for custom PPE result events in this run. Rows that would exceed the cap are emitted as no-charge limit_reached diagnostics.",
                        "default": 2.5
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
