# WCC Pinecone Integration (`tri_angle/wcc-pinecone-integration`) Actor

Crawl any website and store its content in your Pinecone vector database. Enhance the accuracy and reliability of your own AI Assistant with facts fetched from external sources or connect this integration to our Pinecone GPT Chatbot assistant available in Apify Store.

- **URL**: https://apify.com/tri\_angle/wcc-pinecone-integration.md
- **Developed by:** [Tri⟁angle](https://apify.com/tri_angle) (Apify)
- **Categories:** Automation, Integrations, AI
- **Stats:** 170 total users, 1 monthly users, 0.0% runs succeeded, 6 bookmarks
- **User rating**: 3.77 out of 5 stars

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

This actor integrates the [Website Content Crawler](https://apify.com/apify/website-content-crawler) (WCC) with the Pinecone vector database. Its main goal is to scrape a specific website and store the scraped text data into a Pinecone database in form of embeddings. The actor serves as a crucial use case for web data management and leverages LLM RAG capabilities to ensure seamless functionality out of the box. Retrieval-augmented generation (RAG) is a technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources (in our case user’s vector database).

Additionally, you can connect your Pinecone database with OpenAI's GPT model using our [Pinecone GPT Chatbot](https://apify.com/tri_angle/pinecone-gpt-chatbot). This Actor provides you with an interactive chatbot application similar to the well known Chat GPT. You can ask questions as if you were chatting with GPT but thanks to the integration with Pinecone vector database, the model has more rich and up-to-date knowledge base.

### How it works

1. Actor triggers WCC to crawl the website specified in the input (`url`).
2. When the WCC is finished, the scraped text will be encoded using [OpenAI embeddings](https://platform.openai.com/docs/guides/embeddings) and stored into Pinecone database
    - the actor makes sure that only new and updated pages are encoded and stored in Pinecone to save resources

### How to use it

In order to successfully run the actor, you need to provide the following fields:
- Website URL
- [OpenAI API key](https://platform.openai.com/account/api-keys) (required)
- [Pinecone API key](https://docs.pinecone.io/guides/projects/understanding-projects#api-keys) (required)
- Pinecone index name (required): provide a name of your Pinecone index (the actor will create a new one if it doesn't exist). If you're using an existing index, make sure it's dimension is set to `1536`, otherwise the actor will fail.

Other fields to tweak the actor's settings:
- You can adjust WCC's settings in **`Website Content Crawler settings`** and **`HTML processing`** sections
- Documents (text) processing can be configured in **`Document chunk settings`**
- Use `Vector database query` to get relevant documents from the database
    - additionally, use `No website crawling ...` flag to disable scraping and only query the database

### Input example

```json
{
    "url": "https://apify.com/change-log/performance-api-updates-adaptive-playwright-crawler",
    "openaiApiKey": "YOUR_OPENAI_KEY",
    "pineconeApiKey": "YOUR_PINECONE_KEY",
    "cacheKeyValueStoreName": "website-content-vector-cache",
    "noCrawling": false,
    "pineconeIndexName": "your-pinecone-index-name",
    "query": "What is an Adaptive Playwright Crawler and how can I use it to crawl apify.com website? Include TypeScript code example demonstrating the usage of this adaptive crawler.",
    "chunkSize": 2000,
    "chunkOverlap": 200,
    "maxCrawlPages": 1,
    "maxCrawlDepth": 0
}
````

### Output example

If you provide `query` in input, the actor will output documents from the Pinecone database that are relevant to your query, sorted by the most relevant to the least relevant using the `score` value. Note that the following example merged data from 3 different runs and their corresponding start URLs:

- https://apify.com/change-log/performance-api-updates-adaptive-playwright-crawler
- https://apify.com/about
- https://apify.com/pricing

```json
[
  {
    "id": "0b2d4817d2698f166ed02d90d74e6156ffd8ee593ba5ef7cef287ce6deff900f",
    "score": 0.850724638,
    "values": [],
    "metadata": {
      "text": "As part of our continuous performance improvement initiative, we're happy to announce that we successfully improved the Apify API response time by 50% on average and the 90th-percentile startup time of Actors by about 20%. We will continue improving Apify in this direction.\nAPI updates\nUser limits endpoint now returns maxConcurrentActorJobs and activeActorJobCount properties enabling users to keep an eye on the concurrency limit.\nWe also added the missing endpoint /actor-builds/:build-id/log, allowing you to quickly access the log of certain builds without a need for an Actor run ID.\nAdaptive Playwright Crawler\nTry out Crawlee's new AdaptivePlaywrightCrawler class abstraction, which is an extension of PlaywrightCrawler that uses a more limited request handler interface so that it's able to switch to HTTP-only crawling when it detects that it may be possible. This way, you can achieve lower costs when crawling multiple websites.\n1const crawler = new AdaptivePlaywrightCrawler({ 2 renderingTypeDetectionRatio: 0.1, 3 async requestHandler({ querySelector, pushData, enqueueLinks, request, log }) { 4 // This function is called to extract data from a single web page 5 const $prices = await querySelector('span.price') 6 7 await pushData({ 8 url: request.url, 9 price: $prices.filter(':contains(\"$\")').first().text(), 10 }) 11 12 await enqueueLinks({ selector: '.pagination a' }) 13 }, 14}); 15 16await crawler.run([ 17 'http://www.example.com/page-1', 18 'http://www.example.com/page-2', 19]);",
      "url": "https://apify.com/change-log/performance-api-updates-adaptive-playwright-crawler"
    }
  },
  {
    "id": "5a8e1b8a36076db902e9d7c5063ae6eee6e6e5833358d13982a377c480e7b87c",
    "score": 0.818883061,
    "values": [],
    "metadata": {
      "text": "Founded in 2015\nApify was launched by Jan Čurn and Jakub Balada in 2015 from the Y Combinator Fellowship in Mountain View, California. The original idea was to make it easy for developers to build flexible and scalable web crawlers simply using front-end JavaScript, thanks to the back-then new headless browser technology.\nBuilt with ❤️ and 🍺 in Prague\nIn 2016, the team moved back to the Czech Republic, raised a seed investment, and started building a company around its product. Soon it became obvious that customers’ use cases need more than a simple JavaScript crawler, so we committed to building the most flexible full-stack platform for web scraping and browser automation.\nOur mission\nWe make the web more programmable, to let people automate mundane tasks on the web and spend their time on things that matter. We strive to keep the web open as a public good and a basic right for everyone, regardless of the way you want to use it, as its creators intended.\n2,500+Customers worldwide\n4 B+Web pages crawled monthly\n1,600+Ready-made Actors in Store\nBrand resources",
      "url": "https://apify.com/about"
    }
  },
  {
    "id": "c73be5748c2d04783587eebac174c37c31864c1dc66872aa7b7161cb3a1ed8ec",
    "score": 0.801438034,
    "values": [],
    "metadata": {
      "text": "Start URLs\nhttps://crawlee.dev\nGlob patterns\nhttps://crawlee.dev/*/*\nResults of successful run:\nMonthly Actor rental fee\n-$0\nOverall platform usage*\n-$0.036\n*Usage can differ for every run. The example above uses default settings for each Actor.\nPlatform usage breakdown:\nActor compute units\n$0.035\nRegistered users can check their daily usage chart in Apify Console \nWhat is the prepaid platform usage, and how much do I need?\nThe Apify platform has a number of services that are charged based on usages, such as Actors, proxies, data transfer, and storage. See pricing for the full list of platform services.",
      "url": "https://apify.com/pricing"
    }
  },
  {
    "id": "ce47020d694161e4397370e330e711f4b165c4ed63d74995ce5ef1786042631e",
    "score": 0.780740678,
    "values": [],
    "metadata": {
      "text": "The Apify platform has a number of services that are charged based on usages, such as Actors, proxies, data transfer, and storage. See pricing for the full list of platform services. \nEach subscription plan comes with a certain amount of prepaid platform usage that is used to pay for services. If your platform usage in a given billing cycle exceeds this prepaid amount, the excess usage will be added to your next invoice, and you'll get a notification. If you're on the free plan, your access to Apify's services will be blocked until the beginning of the next monthly cycle. \nNote that unused usage credits are not rolled over to the next billing cycle, and they expire at the end of the billing cycle. \nCan I try Apify for free?",
      "url": "https://apify.com/pricing"
    }
  },
  {
    "id": "fb0eeed4c9f6c222a6a629b07a4db428c89a0bb6484ff8e45f7275abf9702aab",
    "score": 0.777982414,
    "values": [],
    "metadata": {
      "text": "How does Apify pay as you go work?\nIf you're on one of Apify's paid plans, you can continue using the platform after reaching the limit by paying the rest as overage. That means you don't have to change your pricing plan to exceed the usage limit of your current plan. \nDoes Apify offer any discounts for charities and universities?\nApify offers a discount on its paid plans to students of accredited educational institutions. Students of those institutions are eligible for 30% off Starter, and Scale plans. If you have any questions, contact us. \nI would like to develop Actors. What should I do?\nApify Academy is a free course that shows you how to start developing Actors on the Apify platform. You can also find more information in the Apify documentation. \nAny other questions? Please contact us.",
      "url": "https://apify.com/pricing"
    }
  },
  {
    "id": "623beea25a535a6fa62f0db309ef7dd5cf6df5e4b1a1c52a7c1de0615072fcda",
    "score": 0.776461363,
    "values": [],
    "metadata": {
      "text": "Increased Actor RAM\n$2 / GB\nDatacenter proxy\nfrom $0.6 / IP address\nPersonal tech training\n$200 / hour\nPriority chat\n$100\nDo you want to build your own Actors?\nHere is a special offer for you: our Creator Plan! For just $1 per month, enjoy $500 worth of free usage and other benefits for 6 months, but please note that you will have access to only some Apify Store Actors.\nHow pricing works\nApify's pricing is all about how you use the platform - here's a breakdown for a typical $49/month plan as an example.\nMonthly prepaid usage $49 + pay as you go\nActor rentals\n$49\nSet your limit\nYour limit\nPay as you go\nMonthly Actor rental fee\n1st Actor usage*\n2nd Actor usage*\n*Each Actor run is different. The above pricing breakdown is just an example.\nRegistered users can check their daily usage chart in Apify Console \nStart URLs\nhttps://crawlee.dev\nGlob patterns\nhttps://crawlee.dev/*/*\nResults of successful run:\nMonthly Actor rental fee\n-$0\nOverall platform usage*\n-$0.036",
      "url": "https://apify.com/pricing"
    }
  }
]
```

# Actor input Schema

## `url` (type: `string`):

A URL of a website where to fetch the web pages from. The URL can be a top-level domain like https://example.com, a subdirectory https://example.com/some-directory/, or a specific page https://example.com/some-directory/page.html.

## `query` (type: `string`):

Text query that will be used to search relevant documents in the vector database using similarity search. This query will be converted into an embedding vector using OpenAI embedding function and it will be compared to the vectors of documents stored in the vector database.

## `noCrawling` (type: `boolean`):

If enabled, the crawler will not be started and the actor will only search the vector database for the given query.

## `openaiApiKey` (type: `string`):

OpenAI API key to generate vector embeddings for documents that are stored to the vector database and also for the database query.

## `pineconeApiKey` (type: `string`):

Your Pinecone API key.

## `pineconeIndexName` (type: `string`):

The name of the Pinecone index where you want to store the vectors.

## `topKResults` (type: `integer`):

The number of top results to return from the vector database. The results will be sorted by similarity to the query vector.

## `cacheKeyValueStoreName` (type: `string`):

The name of the key-value store where the actor will cache URLs of the fetched websites. If the website is already being crawled, the actor will be aborted.

## `maxResults` (type: `integer`):

The maximum number of resulting web pages to store. The crawler will automatically finish after reaching this number. This setting is useful to prevent accidental crawler runaway. If both **Max page** and **Max results** are defined, then the crawler will finish when the first limit is reached. Note that the crawler skips pages with the canonical URL of a page that has already been crawled, hence it might crawl more pages than there are results.

## `chunkSize` (type: `integer`):

The maximum size of each chunk in characters.

## `chunkOverlap` (type: `integer`):

The number of overlapping characters between consecutive chunks.

## `crawlerType` (type: `string`):

Select the crawling engine:

- **Headless web browser** - Useful for modern websites with anti-scraping protections and JavaScript rendering. It recognizes common blocking patterns like CAPTCHAs and automatically retries blocked requests through new sessions. However, running web browsers is more expensive as it requires more computing resources and is slower. It is recommended to use at least 8 GB of RAM.
- **Stealthy web browser** (default) - Another headless web browser with anti-blocking measures enabled. Try this if you encounter bot protection while scraping. For best performance, use with Apify Proxy residential IPs.
- **Adaptive switching between Chrome and raw HTTP client** - The crawler automatically switches between raw HTTP for static pages and Chrome browser (via Playwright) for dynamic pages, to get the maximum performance wherever possible.
- **Raw HTTP client** - High-performance crawling mode that uses raw HTTP requests to fetch the pages. It is faster and cheaper, but it might not work on all websites.

## `includeUrlGlobs` (type: `array`):

Glob patterns matching URLs of pages that will be included in crawling.

Setting this option will disable the default Start URLs based scoping and will allow you to customize the crawling scope yourself. Note that this affects only links found on pages, but not **Start URLs** - if you want to crawl a page, make sure to specify its URL in the **Start URLs** field.

For example `https://{store,docs}.example.com/**` lets the crawler to access all URLs starting with `https://store.example.com/` or `https://docs.example.com/`, and `https://example.com/**/*\?*foo=*` allows the crawler to access all URLs that contain `foo` query parameter with any value.

Learn more about globs and test them [here](https://www.digitalocean.com/community/tools/glob?comments=true\&glob=https%3A%2F%2Fexample.com%2Fscrape_this%2F%2A%2A\&matches=false\&tests=https%3A%2F%2Fexample.com%2Ftools%2F\&tests=https%3A%2F%2Fexample.com%2Fscrape_this%2F\&tests=https%3A%2F%2Fexample.com%2Fscrape_this%2F123%3Ftest%3Dabc\&tests=https%3A%2F%2Fexample.com%2Fdont_scrape_this).

## `excludeUrlGlobs` (type: `array`):

Glob patterns matching URLs of pages that will be excluded from crawling. Note that this affects only links found on pages, but not **Start URLs**, which are always crawled.

For example `https://{store,docs}.example.com/**` excludes all URLs starting with `https://store.example.com/` or `https://docs.example.com/`, and `https://example.com/**/*\?*foo=*` excludes all URLs that contain `foo` query parameter with any value.

Learn more about globs and test them [here](https://www.digitalocean.com/community/tools/glob?comments=true\&glob=https%3A%2F%2Fexample.com%2Fdont_scrape_this%2F%2A%2A\&matches=false\&tests=https%3A%2F%2Fexample.com%2Ftools%2F\&tests=https%3A%2F%2Fexample.com%2Fdont_scrape_this%2F\&tests=https%3A%2F%2Fexample.com%2Fdont_scrape_this%2F123%3Ftest%3Dabc\&tests=https%3A%2F%2Fexample.com%2Fscrape_this).

## `ignoreCanonicalUrl` (type: `boolean`):

If enabled, the Actor will ignore the canonical URL reported by the page, and use the actual URL instead. You can use this feature for websites that report invalid canonical URLs, which causes the Actor to skip those pages in results.

## `maxCrawlDepth` (type: `integer`):

The maximum number of links starting from the start URL that the crawler will recursively follow. The start URLs have depth `0`, the pages linked directly from the start URLs have depth `1`, and so on.

This setting is useful to prevent accidental crawler runaway. By setting it to `0`, the Actor will only crawl the Start URLs.

## `maxCrawlPages` (type: `integer`):

The maximum number pages to crawl. It includes the start URLs, pagination pages, pages with no content, etc. The crawler will automatically finish after reaching this number. This setting is useful to prevent accidental crawler runaway.

## `initialConcurrency` (type: `integer`):

The initial number of web browsers or HTTP clients running in parallel. The system scales the concurrency up and down based on the current CPU and memory load. If the value is set to 0 (default), the Actor uses the default setting for the specific crawler type.

Note that if you set this value too high, the Actor will run out of memory and crash. If too low, it will be slow at start before it scales the concurrency up.

## `maxConcurrency` (type: `integer`):

The maximum number of web browsers or HTTP clients running in parallel. This setting is useful to avoid overloading the target websites and to avoid getting blocked.

## `initialCookies` (type: `array`):

Cookies that will be pre-set to all pages the scraper opens. This is useful for pages that require login. The value is expected to be a JSON array of objects with `name` and `value` properties. For example: `[{"name": "cookieName", "value": "cookieValue"}]`.

You can use the [EditThisCookie](https://chrome.google.com/webstore/detail/editthiscookie/fngmhnnpilhplaeedifhccceomclgfbg) browser extension to copy browser cookies in this format, and paste it here.

## `proxyConfiguration` (type: `object`):

Enables loading the websites from IP addresses in specific geographies and to circumvent blocking.

## `maxSessionRotations` (type: `integer`):

The maximum number of times the crawler will rotate the session (IP address + browser configuration) on anti-scraping measures like CAPTCHAs. If the crawler rotates the session more than this number and the page is still blocked, it will finish with an error.

## `maxRequestRetries` (type: `integer`):

The maximum number of times the crawler will retry the request on network, proxy or server errors. If the (n+1)-th request still fails, the crawler will mark this request as failed.

## `requestTimeoutSecs` (type: `integer`):

Timeout (in seconds) for making the request and processing its response. Defaults to 60s.

## `minFileDownloadSpeedKBps` (type: `integer`):

The minimum viable file download speed in kilobytes per seconds. If the file download speed is lower than this value for a prolonged duration, the crawler will consider the file download as failing, abort it, and retry it again (up to "Maximum number of retries" times). This is useful to avoid your crawls being stuck on slow file downloads.

## `dynamicContentWaitSecs` (type: `integer`):

The maximum time to wait for dynamic page content to load. By default, it is 10 seconds. The crawler will continue either if this time elapses, or if it detects the network became idle as there are no more requests for additional resources.

Note that this setting is ignored for the raw HTTP client, because it doesn't execute JavaScript or loads any dynamic resources.

## `maxScrollHeightPixels` (type: `integer`):

The crawler will scroll down the page until all content is loaded (and network becomes idle), or until this maximum scrolling height is reached. Setting this value to `0` disables scrolling altogether.

Note that this setting is ignored for the raw HTTP client, because it doesn't execute JavaScript or loads any dynamic resources.

## `removeElementsCssSelector` (type: `string`):

A CSS selector matching HTML elements that will be removed from the DOM, before converting it to text, Markdown, or saving as HTML. This is useful to skip irrelevant page content.

By default, the Actor removes common navigation elements, headers, footers, modals, scripts, and inline image. You can disable the removal by setting this value to some non-existent CSS selector like `dummy_keep_everything`.

## `removeCookieWarnings` (type: `boolean`):

If enabled, the Actor will try to remove cookies consent dialogs or modals, using the [I don't care about cookies](https://addons.mozilla.org/en-US/firefox/addon/i-dont-care-about-cookies/) browser extension, to improve the accuracy of the extracted text. Note that there is a small performance penalty if this feature is enabled.

This setting is ignored when using the raw HTTP crawler type.

## `clickElementsCssSelector` (type: `string`):

A CSS selector matching DOM elements that will be clicked. This is useful for expanding collapsed sections, in order to capture their text content.

## `htmlTransformer` (type: `string`):

Specify how to transform the HTML to extract meaningful content without any extra fluff, like navigation or modals. The HTML transformation happens after removing and clicking the DOM elements.

- **Readable text with fallback** - Extracts the main contents of the webpage, without navigation and other fluff while carefully checking the content integrality.

- **Readable text** (default) - Extracts the main contents of the webpage, without navigation and other fluff.

- **Extractus** - Uses Extractus library.

- **None** - Only removes the HTML elements specified via 'Remove HTML elements' option.

You can examine output of all transformers by enabling the debug mode.

## `readableTextCharThreshold` (type: `integer`):

A configuration options for the "Readable text" HTML transformer. It contains the minimum number of characters an article must have in order to be considered relevant.

## `aggressivePrune` (type: `boolean`):

This is an **experimental feature**. If enabled, the crawler will prune content lines that are very similar to the ones already crawled on other pages, using the Count-Min Sketch algorithm. This is useful to strip repeating content in the scraped data like menus, headers, footers, etc. In some (not very likely) cases, it might remove relevant content from some pages.

## `debugMode` (type: `boolean`):

If enabled, the Actor will store the output of all types of HTML transformers, including the ones that are not used by default, and it will also store the HTML to Key-value Store with a link. All this data is stored under the `debug` field in the resulting Dataset.

## `debugLog` (type: `boolean`):

If enabled, the actor log will include debug messages. Beware that this can be quite verbose.

## Actor input object example

```json
{
  "url": "https://docs.apify.com/",
  "noCrawling": false,
  "topKResults": 10,
  "cacheKeyValueStoreName": "website-content-vector-cache",
  "maxResults": 9999999,
  "chunkSize": 2000,
  "chunkOverlap": 200,
  "crawlerType": "playwright:adaptive",
  "includeUrlGlobs": [],
  "excludeUrlGlobs": [],
  "ignoreCanonicalUrl": false,
  "maxCrawlDepth": 20,
  "maxCrawlPages": 9999999,
  "initialConcurrency": 0,
  "maxConcurrency": 200,
  "initialCookies": [],
  "proxyConfiguration": {
    "useApifyProxy": true
  },
  "maxSessionRotations": 10,
  "maxRequestRetries": 5,
  "requestTimeoutSecs": 60,
  "minFileDownloadSpeedKBps": 128,
  "dynamicContentWaitSecs": 10,
  "maxScrollHeightPixels": 5000,
  "removeElementsCssSelector": "nav, footer, script, style, noscript, svg,\n[role=\"alert\"],\n[role=\"banner\"],\n[role=\"dialog\"],\n[role=\"alertdialog\"],\n[role=\"region\"][aria-label*=\"skip\" i],\n[aria-modal=\"true\"]",
  "removeCookieWarnings": true,
  "clickElementsCssSelector": "[aria-expanded=\"false\"]",
  "htmlTransformer": "readableText",
  "readableTextCharThreshold": 100,
  "aggressivePrune": false,
  "debugMode": false,
  "debugLog": false
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "url": "https://docs.apify.com/",
    "crawlerType": "playwright:adaptive",
    "includeUrlGlobs": [],
    "excludeUrlGlobs": [],
    "initialCookies": [],
    "proxyConfiguration": {
        "useApifyProxy": true
    },
    "removeElementsCssSelector": `nav, footer, script, style, noscript, svg,
[role="alert"],
[role="banner"],
[role="dialog"],
[role="alertdialog"],
[role="region"][aria-label*="skip" i],
[aria-modal="true"]`,
    "clickElementsCssSelector": "[aria-expanded=\"false\"]"
};

// Run the Actor and wait for it to finish
const run = await client.actor("tri_angle/wcc-pinecone-integration").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "url": "https://docs.apify.com/",
    "crawlerType": "playwright:adaptive",
    "includeUrlGlobs": [],
    "excludeUrlGlobs": [],
    "initialCookies": [],
    "proxyConfiguration": { "useApifyProxy": True },
    "removeElementsCssSelector": """nav, footer, script, style, noscript, svg,
[role=\"alert\"],
[role=\"banner\"],
[role=\"dialog\"],
[role=\"alertdialog\"],
[role=\"region\"][aria-label*=\"skip\" i],
[aria-modal=\"true\"]""",
    "clickElementsCssSelector": "[aria-expanded=\"false\"]",
}

# Run the Actor and wait for it to finish
run = client.actor("tri_angle/wcc-pinecone-integration").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "url": "https://docs.apify.com/",
  "crawlerType": "playwright:adaptive",
  "includeUrlGlobs": [],
  "excludeUrlGlobs": [],
  "initialCookies": [],
  "proxyConfiguration": {
    "useApifyProxy": true
  },
  "removeElementsCssSelector": "nav, footer, script, style, noscript, svg,\\n[role=\\"alert\\"],\\n[role=\\"banner\\"],\\n[role=\\"dialog\\"],\\n[role=\\"alertdialog\\"],\\n[role=\\"region\\"][aria-label*=\\"skip\\" i],\\n[aria-modal=\\"true\\"]",
  "clickElementsCssSelector": "[aria-expanded=\\"false\\"]"
}' |
apify call tri_angle/wcc-pinecone-integration --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=tri_angle/wcc-pinecone-integration",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "WCC Pinecone Integration",
        "description": "Crawl any website and store its content in your Pinecone vector database. Enhance the accuracy and reliability of your own AI Assistant with facts fetched from external sources or connect this integration to our Pinecone GPT Chatbot assistant available in Apify Store.",
        "version": "0.0",
        "x-build-id": "BB5ECWSEdgh4C1nF5"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/tri_angle~wcc-pinecone-integration/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-tri_angle-wcc-pinecone-integration",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/tri_angle~wcc-pinecone-integration/runs": {
            "post": {
                "operationId": "runs-sync-tri_angle-wcc-pinecone-integration",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/tri_angle~wcc-pinecone-integration/run-sync": {
            "post": {
                "operationId": "run-sync-tri_angle-wcc-pinecone-integration",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "openaiApiKey",
                    "pineconeApiKey",
                    "pineconeIndexName"
                ],
                "properties": {
                    "url": {
                        "title": "Website URL",
                        "type": "string",
                        "description": "A URL of a website where to fetch the web pages from. The URL can be a top-level domain like https://example.com, a subdirectory https://example.com/some-directory/, or a specific page https://example.com/some-directory/page.html."
                    },
                    "query": {
                        "title": "Vector database query",
                        "type": "string",
                        "description": "Text query that will be used to search relevant documents in the vector database using similarity search. This query will be converted into an embedding vector using OpenAI embedding function and it will be compared to the vectors of documents stored in the vector database."
                    },
                    "noCrawling": {
                        "title": "No website crawling and vector DB update (query only)",
                        "type": "boolean",
                        "description": "If enabled, the crawler will not be started and the actor will only search the vector database for the given query.",
                        "default": false
                    },
                    "openaiApiKey": {
                        "title": "OpenAI API key",
                        "type": "string",
                        "description": "OpenAI API key to generate vector embeddings for documents that are stored to the vector database and also for the database query."
                    },
                    "pineconeApiKey": {
                        "title": "Pinecone API key",
                        "type": "string",
                        "description": "Your Pinecone API key."
                    },
                    "pineconeIndexName": {
                        "title": "Pinecone index name",
                        "pattern": "^[-a-z0-9]+$",
                        "type": "string",
                        "description": "The name of the Pinecone index where you want to store the vectors."
                    },
                    "topKResults": {
                        "title": "Top K results",
                        "minimum": 1,
                        "type": "integer",
                        "description": "The number of top results to return from the vector database. The results will be sorted by similarity to the query vector.",
                        "default": 10
                    },
                    "cacheKeyValueStoreName": {
                        "title": "Cache key-value store",
                        "type": "string",
                        "description": "The name of the key-value store where the actor will cache URLs of the fetched websites. If the website is already being crawled, the actor will be aborted.",
                        "default": "website-content-vector-cache"
                    },
                    "maxResults": {
                        "title": "Max results",
                        "minimum": 0,
                        "type": "integer",
                        "description": "The maximum number of resulting web pages to store. The crawler will automatically finish after reaching this number. This setting is useful to prevent accidental crawler runaway. If both **Max page** and **Max results** are defined, then the crawler will finish when the first limit is reached. Note that the crawler skips pages with the canonical URL of a page that has already been crawled, hence it might crawl more pages than there are results.",
                        "default": 9999999
                    },
                    "chunkSize": {
                        "title": "Chunk size",
                        "minimum": 1,
                        "type": "integer",
                        "description": "The maximum size of each chunk in characters.",
                        "default": 2000
                    },
                    "chunkOverlap": {
                        "title": "Chunk overlap",
                        "minimum": 0,
                        "type": "integer",
                        "description": "The number of overlapping characters between consecutive chunks.",
                        "default": 200
                    },
                    "crawlerType": {
                        "title": "Crawler type",
                        "enum": [
                            "playwright:firefox",
                            "playwright:chrome",
                            "playwright:adaptive",
                            "cheerio",
                            "jsdom"
                        ],
                        "type": "string",
                        "description": "Select the crawling engine:\n- **Headless web browser** - Useful for modern websites with anti-scraping protections and JavaScript rendering. It recognizes common blocking patterns like CAPTCHAs and automatically retries blocked requests through new sessions. However, running web browsers is more expensive as it requires more computing resources and is slower. It is recommended to use at least 8 GB of RAM.\n- **Stealthy web browser** (default) - Another headless web browser with anti-blocking measures enabled. Try this if you encounter bot protection while scraping. For best performance, use with Apify Proxy residential IPs. \n- **Adaptive switching between Chrome and raw HTTP client** - The crawler automatically switches between raw HTTP for static pages and Chrome browser (via Playwright) for dynamic pages, to get the maximum performance wherever possible. \n- **Raw HTTP client** - High-performance crawling mode that uses raw HTTP requests to fetch the pages. It is faster and cheaper, but it might not work on all websites.",
                        "default": "playwright:firefox"
                    },
                    "includeUrlGlobs": {
                        "title": "Include URLs (globs)",
                        "type": "array",
                        "description": "Glob patterns matching URLs of pages that will be included in crawling. \n\nSetting this option will disable the default Start URLs based scoping and will allow you to customize the crawling scope yourself. Note that this affects only links found on pages, but not **Start URLs** - if you want to crawl a page, make sure to specify its URL in the **Start URLs** field. \n\nFor example `https://{store,docs}.example.com/**` lets the crawler to access all URLs starting with `https://store.example.com/` or `https://docs.example.com/`, and `https://example.com/**/*\\?*foo=*` allows the crawler to access all URLs that contain `foo` query parameter with any value.\n\nLearn more about globs and test them [here](https://www.digitalocean.com/community/tools/glob?comments=true&glob=https%3A%2F%2Fexample.com%2Fscrape_this%2F%2A%2A&matches=false&tests=https%3A%2F%2Fexample.com%2Ftools%2F&tests=https%3A%2F%2Fexample.com%2Fscrape_this%2F&tests=https%3A%2F%2Fexample.com%2Fscrape_this%2F123%3Ftest%3Dabc&tests=https%3A%2F%2Fexample.com%2Fdont_scrape_this).",
                        "default": [],
                        "items": {
                            "type": "object",
                            "required": [
                                "glob"
                            ],
                            "properties": {
                                "glob": {
                                    "type": "string",
                                    "title": "Glob of a web page"
                                }
                            }
                        }
                    },
                    "excludeUrlGlobs": {
                        "title": "Exclude URLs (globs)",
                        "type": "array",
                        "description": "Glob patterns matching URLs of pages that will be excluded from crawling. Note that this affects only links found on pages, but not **Start URLs**, which are always crawled. \n\nFor example `https://{store,docs}.example.com/**` excludes all URLs starting with `https://store.example.com/` or `https://docs.example.com/`, and `https://example.com/**/*\\?*foo=*` excludes all URLs that contain `foo` query parameter with any value.\n\nLearn more about globs and test them [here](https://www.digitalocean.com/community/tools/glob?comments=true&glob=https%3A%2F%2Fexample.com%2Fdont_scrape_this%2F%2A%2A&matches=false&tests=https%3A%2F%2Fexample.com%2Ftools%2F&tests=https%3A%2F%2Fexample.com%2Fdont_scrape_this%2F&tests=https%3A%2F%2Fexample.com%2Fdont_scrape_this%2F123%3Ftest%3Dabc&tests=https%3A%2F%2Fexample.com%2Fscrape_this).",
                        "default": [],
                        "items": {
                            "type": "object",
                            "required": [
                                "glob"
                            ],
                            "properties": {
                                "glob": {
                                    "type": "string",
                                    "title": "Glob of a web page"
                                }
                            }
                        }
                    },
                    "ignoreCanonicalUrl": {
                        "title": "Ignore canonical URLs",
                        "type": "boolean",
                        "description": "If enabled, the Actor will ignore the canonical URL reported by the page, and use the actual URL instead. You can use this feature for websites that report invalid canonical URLs, which causes the Actor to skip those pages in results.",
                        "default": false
                    },
                    "maxCrawlDepth": {
                        "title": "Max crawling depth",
                        "minimum": 0,
                        "type": "integer",
                        "description": "The maximum number of links starting from the start URL that the crawler will recursively follow. The start URLs have depth `0`, the pages linked directly from the start URLs have depth `1`, and so on.\n\nThis setting is useful to prevent accidental crawler runaway. By setting it to `0`, the Actor will only crawl the Start URLs.",
                        "default": 20
                    },
                    "maxCrawlPages": {
                        "title": "Max pages",
                        "minimum": 0,
                        "type": "integer",
                        "description": "The maximum number pages to crawl. It includes the start URLs, pagination pages, pages with no content, etc. The crawler will automatically finish after reaching this number. This setting is useful to prevent accidental crawler runaway.",
                        "default": 9999999
                    },
                    "initialConcurrency": {
                        "title": "Initial concurrency",
                        "minimum": 0,
                        "maximum": 999,
                        "type": "integer",
                        "description": "The initial number of web browsers or HTTP clients running in parallel. The system scales the concurrency up and down based on the current CPU and memory load. If the value is set to 0 (default), the Actor uses the default setting for the specific crawler type.\n\nNote that if you set this value too high, the Actor will run out of memory and crash. If too low, it will be slow at start before it scales the concurrency up.",
                        "default": 0
                    },
                    "maxConcurrency": {
                        "title": "Max concurrency",
                        "minimum": 1,
                        "maximum": 999,
                        "type": "integer",
                        "description": "The maximum number of web browsers or HTTP clients running in parallel. This setting is useful to avoid overloading the target websites and to avoid getting blocked.",
                        "default": 200
                    },
                    "initialCookies": {
                        "title": "Initial cookies",
                        "type": "array",
                        "description": "Cookies that will be pre-set to all pages the scraper opens. This is useful for pages that require login. The value is expected to be a JSON array of objects with `name` and `value` properties. For example: `[{\"name\": \"cookieName\", \"value\": \"cookieValue\"}]`.\n\nYou can use the [EditThisCookie](https://chrome.google.com/webstore/detail/editthiscookie/fngmhnnpilhplaeedifhccceomclgfbg) browser extension to copy browser cookies in this format, and paste it here.",
                        "default": []
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Enables loading the websites from IP addresses in specific geographies and to circumvent blocking.",
                        "default": {
                            "useApifyProxy": true
                        }
                    },
                    "maxSessionRotations": {
                        "title": "Maximum number of session rotations",
                        "minimum": 0,
                        "maximum": 20,
                        "type": "integer",
                        "description": "The maximum number of times the crawler will rotate the session (IP address + browser configuration) on anti-scraping measures like CAPTCHAs. If the crawler rotates the session more than this number and the page is still blocked, it will finish with an error.",
                        "default": 10
                    },
                    "maxRequestRetries": {
                        "title": "Maximum number of retries on network / server errors",
                        "minimum": 0,
                        "maximum": 20,
                        "type": "integer",
                        "description": "The maximum number of times the crawler will retry the request on network, proxy or server errors. If the (n+1)-th request still fails, the crawler will mark this request as failed.",
                        "default": 5
                    },
                    "requestTimeoutSecs": {
                        "title": "Request timeout",
                        "minimum": 1,
                        "maximum": 600,
                        "type": "integer",
                        "description": "Timeout (in seconds) for making the request and processing its response. Defaults to 60s.",
                        "default": 60
                    },
                    "minFileDownloadSpeedKBps": {
                        "title": "Minimum file download speed (kilobytes per second)",
                        "type": "integer",
                        "description": "The minimum viable file download speed in kilobytes per seconds. If the file download speed is lower than this value for a prolonged duration, the crawler will consider the file download as failing, abort it, and retry it again (up to \"Maximum number of retries\" times). This is useful to avoid your crawls being stuck on slow file downloads.",
                        "default": 128
                    },
                    "dynamicContentWaitSecs": {
                        "title": "Wait for dynamic content (seconds)",
                        "type": "integer",
                        "description": "The maximum time to wait for dynamic page content to load. By default, it is 10 seconds. The crawler will continue either if this time elapses, or if it detects the network became idle as there are no more requests for additional resources.\n\nNote that this setting is ignored for the raw HTTP client, because it doesn't execute JavaScript or loads any dynamic resources.",
                        "default": 10
                    },
                    "maxScrollHeightPixels": {
                        "title": "Maximum scroll height (pixels)",
                        "minimum": 0,
                        "type": "integer",
                        "description": "The crawler will scroll down the page until all content is loaded (and network becomes idle), or until this maximum scrolling height is reached. Setting this value to `0` disables scrolling altogether.\n\nNote that this setting is ignored for the raw HTTP client, because it doesn't execute JavaScript or loads any dynamic resources.",
                        "default": 5000
                    },
                    "removeElementsCssSelector": {
                        "title": "Remove HTML elements (CSS selector)",
                        "type": "string",
                        "description": "A CSS selector matching HTML elements that will be removed from the DOM, before converting it to text, Markdown, or saving as HTML. This is useful to skip irrelevant page content. \n\nBy default, the Actor removes common navigation elements, headers, footers, modals, scripts, and inline image. You can disable the removal by setting this value to some non-existent CSS selector like `dummy_keep_everything`.",
                        "default": "nav, footer, script, style, noscript, svg,\n[role=\"alert\"],\n[role=\"banner\"],\n[role=\"dialog\"],\n[role=\"alertdialog\"],\n[role=\"region\"][aria-label*=\"skip\" i],\n[aria-modal=\"true\"]"
                    },
                    "removeCookieWarnings": {
                        "title": "Remove cookie warnings",
                        "type": "boolean",
                        "description": "If enabled, the Actor will try to remove cookies consent dialogs or modals, using the [I don't care about cookies](https://addons.mozilla.org/en-US/firefox/addon/i-dont-care-about-cookies/) browser extension, to improve the accuracy of the extracted text. Note that there is a small performance penalty if this feature is enabled.\n\nThis setting is ignored when using the raw HTTP crawler type.",
                        "default": true
                    },
                    "clickElementsCssSelector": {
                        "title": "Expand clickable elements",
                        "type": "string",
                        "description": "A CSS selector matching DOM elements that will be clicked. This is useful for expanding collapsed sections, in order to capture their text content.",
                        "default": "[aria-expanded=\"false\"]"
                    },
                    "htmlTransformer": {
                        "title": "HTML transformer",
                        "enum": [
                            "readableTextIfPossible",
                            "readableText",
                            "extractus",
                            "none"
                        ],
                        "type": "string",
                        "description": "Specify how to transform the HTML to extract meaningful content without any extra fluff, like navigation or modals. The HTML transformation happens after removing and clicking the DOM elements.\n\n- **Readable text with fallback** - Extracts the main contents of the webpage, without navigation and other fluff while carefully checking the content integrality.\n\n- **Readable text** (default) - Extracts the main contents of the webpage, without navigation and other fluff.\n- **Extractus** - Uses Extractus library.\n- **None** - Only removes the HTML elements specified via 'Remove HTML elements' option.\n\nYou can examine output of all transformers by enabling the debug mode.\n",
                        "default": "readableText"
                    },
                    "readableTextCharThreshold": {
                        "title": "Readable text extractor character threshold",
                        "type": "integer",
                        "description": "A configuration options for the \"Readable text\" HTML transformer. It contains the minimum number of characters an article must have in order to be considered relevant.",
                        "default": 100
                    },
                    "aggressivePrune": {
                        "title": "Remove duplicate text lines",
                        "type": "boolean",
                        "description": "This is an **experimental feature**. If enabled, the crawler will prune content lines that are very similar to the ones already crawled on other pages, using the Count-Min Sketch algorithm. This is useful to strip repeating content in the scraped data like menus, headers, footers, etc. In some (not very likely) cases, it might remove relevant content from some pages.",
                        "default": false
                    },
                    "debugMode": {
                        "title": "Debug mode (stores output of all HTML transformers)",
                        "type": "boolean",
                        "description": "If enabled, the Actor will store the output of all types of HTML transformers, including the ones that are not used by default, and it will also store the HTML to Key-value Store with a link. All this data is stored under the `debug` field in the resulting Dataset.",
                        "default": false
                    },
                    "debugLog": {
                        "title": "Debug log",
                        "type": "boolean",
                        "description": "If enabled, the actor log will include debug messages. Beware that this can be quite verbose.",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```