# Milvus Integration (`apify/milvus-integration`) Actor

This integration transfers data from Apify Actors to a Milvus/Zilliz database and is a good starting point for a question-answering, search, or RAG use case.

- **URL**: https://apify.com/apify/milvus-integration.md
- **Developed by:** [Apify](https://apify.com/apify) (Apify)
- **Categories:** AI, Integrations, Open source
- **Stats:** 10 total users, 1 monthly users, 0.0% runs succeeded, 2 bookmarks
- **User rating**: 4.50 out of 5 stars

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Milvus integration

[![Milvus integration](https://apify.com/actor-badge?actor=apify/milvus-integration)](https://apify.com/apify/milvus-integration)

The Apify Milvus integration transfers selected data from Apify Actors to a  [Milvus](https://milvus.io/)/[Zilliz](https://zilliz.com) database. 
It processes the data, optionally splits it into chunks, computes embeddings, and saves them to Milvus.

This integration supports incremental updates, updating only the data that has changed. 
This approach reduces unnecessary embedding computation and storage operations, making it suitable for search and retrieval augmented generation (RAG) use cases.

💡 **Note**: This Actor is meant to be used together with other Actors' integration sections.
For instance, if you are using the [Website Content Crawler](https://apify.com/apify/website-content-crawler), you can activate Milvus integration to save web data as vectors to Milvus.

#### What is Milvus/Zilliz vector database?

Milvus is an open-source vector database designed for similarity searches on large datasets of high-dimensional vectors.
Its emphasis on efficient vector similarity search enables the development of robust and scalable retrieval systems.
The Milvus database hosted at [Zilliz](https://zilliz.com/) demonstrates top performance in the [Vector Database Benchmark](https://github.com/zilliztech/VectorDBBench).

### 📋 How does the Apify-Milvus/Zilliz integration work?

Apify Milvus integration computes text embeddings and store them in Milvus. 
It uses [LangChain](https://www.langchain.com/) to compute embeddings and interact with [Milvus](https://milvus.io/).

1. Retrieve a dataset as output from an Actor
2. _[Optional]_ Split text data into chunks using `langchain`'s `RecursiveCharacterTextSplitter`
(enable/disable using `performChunking` and specify `chunkSize`, `chunkOverlap`)
3. _[Optional]_ Update only changed data (select `dataUpdatesStrategy`)
4. Compute embeddings, e.g. using `OpenAI` or `Cohere` (specify `embeddings` and `embeddingsConfig`)
5. Save data into the database

![Apify-milvus-integration](https://raw.githubusercontent.com/apify/actor-vector-database-integrations/master/docs/Apify-milvus-integration-readme.png)

### ✅ Before you start

To use this integration, ensure you have:

- Created or existing `Milvus` database. You need to know `milvusUri`, `milvusToken`, and `milvusCollectionName`.
- If the collection does not exist, it will be created automatically.
- An account to compute embeddings using one of the providers, e.g., [OpenAI](https://platform.openai.com/docs/guides/embeddings) or [Cohere](https://docs.cohere.com/docs/cohere-embed).

#### Set up Milvus/Zilliz URI, token and collection name

You can run Milvus using Docker or try the managed Milvus service at [Zilliz](https://zilliz.com/).
For more details, please refer to the [Milvus documentation](https://milvus.io/docs).

You need the URI and Token of your Milvus/Zilliz to setup the client.
- If you have self-deployed Milvus server on [Docker or Kubernetes](https://milvus.io/docs/quickstart.md), use the server address and port as your uri, e.g.`http://localhost:19530`. If you enable the authentication feature on Milvus, use "<your_username>:<your_password>" as the token, otherwise leave the token as empty string.
- If you use [Zilliz Cloud](https://zilliz.com/cloud), the fully managed cloud service for Milvus, adjust the `uri` and `token`, which correspond to the [Public Endpoint and API key](https://docs.zilliz.com/docs/on-zilliz-cloud-console#cluster-details) in Zilliz Cloud.

Note that the collection does not need to exist beforehand. 
It will be automatically created when data is uploaded to the database.


### 👉 Examples

The configuration consists of three parts: Milvus, embeddings provider and data.

Ensure that the vector size of your embeddings aligns with the configuration of your Milvus index. 
For instance, if you're using the `text-embedding-3-small` model from `OpenAI`, it generates vectors of size `1536`. 
This means your Milvus index should also be configured to accommodate vectors of the same size, `1536` in this case.

For detailed input information refer to the [Input page](https://apify.com/apify/milvus-integration/input-schema).

##### Database: Milvus
```json
{
  "milvusUri": "YOUR-MILVUS-URI",
  "milvusToken": "YOUR-MILVUS-TOKEN",
  "milvusCollectionName": "YOUR-MILVUS-COLLECTION-NAME"
}
````

Please refer to the instructions above on how to set up the Milvus/Zilliz `URI`, `token`, and `collection name`.

##### Embeddings provider: OpenAI

```json
{
  "embeddingsProvider": "OpenAI",
  "embeddingsApiKey": "YOUR-OPENAI-API-KEY",
  "embeddingsConfig": {"model":  "text-embedding-3-large"}
}
```

#### Save data from Website Content Crawler to Milvus

Data is transferred in the form of a dataset from [Website Content Crawler](https://apify.com/apify/website-content-crawler), which provides a dataset with the following output fields (truncated for brevity):

```json
{
  "url": "https://www.apify.com",
  "text": "Apify is a platform that enables developers to build, run, and share automation tasks.",
  "metadata": {"title": "Apify"}
}
```

This dataset is then processed by the Milvus integration.
In the integration settings you need to specify which fields you want to save to Milvus, e.g., `["text"]` and which of them should be used as metadata, e.g., `{"title": "metadata.title"}`.
Without any other configuration, the data is saved to Milvus as is.

```json
{
  "datasetFields": ["text"],
  "metadataDatasetFields": {"title": "metadata.title"}
}
```

#### Create chunks from Website Content Crawler data and save them to the database

Assume that the text data from the [Website Content Crawler](https://apify.com/apify/website-content-crawler) is too long to compute embeddings.
Therefore, we need to divide the data into smaller pieces called chunks.
We can leverage LangChain's `RecursiveCharacterTextSplitter` to split the text into chunks and save them into a database.
The parameters `chunkSize` and `chunkOverlap` are important.
The settings depend on your use case where a proper chunking helps optimize retrieval and ensures accurate responses.

```json
{
  "datasetFields": ["text"],
  "metadataDatasetFields": {"title": "metadata.title"},
  "performChunking": true,
  "chunkSize": 1000,
  "chunkOverlap": 0
}
```

#### Configure update strategy

To control how the integration updates data in the database, use the `dataUpdatesStrategy` parameter. This parameter allows you to choose between different update strategies based on your use case, such as adding new data, upserting records, or incrementally updating records based on changes (deltas). Below are the available strategies and explanations for when to use each:

- **Add data (`add`)**:
  - Appends new data to the database without checking for duplicates or updating existing records.
  - Suitable for cases where deduplication or updates are unnecessary, and the data simply needs to be added.
  - For example, you might use this strategy to continually append data from independent crawls without regard for overlaps.

- **Upsert data (`upsert`)**:
  - Delete existing records in the database if they match a key or identifier and inserts new records.
  - Ideal when you want to maintain accurate and up-to-date data while avoiding duplication.
  - For instance, this is useful in cases where unique items (such as user profiles or documents) need to be managed, ensuring the database reflects the latest changes.
  - Check the `dataUpdatesPrimaryDatasetFields` parameter to specify which fields are used to uniquely identify each dataset item.

- **Delta updates (`deltaUpdates`)**:
  - Incrementally updates records by identifying differences (deltas) between the new dataset and the existing database records.
  - Ensures only new or modified records are processed, leaving unchanged records untouched. This minimizes unnecessary database operations and improves efficiency.
  - This is the most efficient strategy when integrating data that evolves over time, such as website content or recurring crawls.
  - Check the `dataUpdatesPrimaryDatasetFields` parameter to specify which fields are used to uniquely identify each dataset item.

#### Incrementally update database from the Website Content Crawler

To incrementally update data from the [Website Content Crawler](https://apify.com/apify/website-content-crawler) to database, configure the integration to update only the changed or new data.
This is controlled by the `dataUpdatesStrategy` setting.
This way, the integration minimizes unnecessary updates and ensures that only new or modified data is processed.

A checksum is computed for each dataset item (together with all metadata) and stored in the database alongside the vectors.
When the data is re-crawled, the checksum is recomputed and compared with the stored checksum.
If the checksum is different, the old data (including vectors) is deleted and new data is saved.
Otherwise, only the `last_seen_at` metadata field is updated to indicate when the data was last seen.

##### Provide unique identifier for each dataset item

To incrementally update the data, you need to be able to uniquely identify each dataset item.
The variable `dataUpdatesPrimaryDatasetFields` specifies which fields are used to uniquely identify each dataset item and helps track content changes across different crawls.
For instance, when working with the Website Content Crawler, you can use the URL as a unique identifier.

```json
{
  "dataUpdatesStrategy": "deltaUpdates",
  "dataUpdatePrimaryDatasetFields": ["url"]
}
```

To fully maximize the potential of incremental data updates, it is recommended to start with an empty database.
While it is possible to use this feature with an existing database, records that were not originally saved using a prefix or metadata will not be updated.

#### Delete outdated (expired) data

The integration can delete data from the database that hasn't been crawled for a specified period, which is useful when data becomes outdated, such as when a page is removed from a website.

The deletion feature can be enabled or disabled using the `deleteExpiredObjects` setting.

For each crawl, the `last_seen_at` metadata field is created or updated.
This field records the most recent time the data object was crawled.
The `expiredObjectDeletionPeriodDays` setting is used to control number of days since the last crawl, after which the data object is considered expired.
If a database object has not been seen for more than the `expiredObjectDeletionPeriodDays`, it will be deleted automatically.

The specific value of `expiredObjectDeletionPeriodDays` depends on your use case.

- If a website is crawled daily, `expiredObjectDeletionPeriodDays` can be set to 7.
- If you crawl weekly, it can be set to 30.

To disable this feature, set `deleteExpiredObjects` to `false`.

```json
{
  "deleteExpiredObjects": true,
  "expiredObjectDeletionPeriodDays": 30
}
```

💡 If you are using multiple Actors to update the same database, ensure that all Actors crawl the data at the same frequency.
Otherwise, data crawled by one Actor might expire due to inconsistent crawling schedules.

### 💾 Outputs

This integration will save the selected fields from your Actor to Milvus and store the chunked data in the Apify dataset.

### 🔢 Example configuration

##### Full Input Example for Website Content Crawler Actor with Milvus integration

```json
{
  "milvusUri": "YOUR-MILVUS-URI",
  "milvusToken": "YOUR-MILVUS-TOKEN",
  "milvusCollectionName": "YOUR-MILVUS-COLLECTION-NAME",
  "embeddingsApiKey": "YOUR-OPENAI-API-KEY",
  "embeddingsConfig": {
    "model": "text-embedding-3-small"
  },
  "embeddingsProvider": "OpenAI",
  "datasetFields": [
    "text"
  ],
  "dataUpdatesStrategy": "deltaUpdates",
  "dataUpdatePrimaryDatasetFields": ["url"],
  "expiredObjectDeletionPeriodDays": 7,
  "performChunking": true,
  "chunkSize": 2000,
  "chunkOverlap": 200
}
```

##### Milvus

```json
{
  "milvusUri": "YOUR-MILVUS-URI",
  "milvusToken": "YOUR-MILVUS-TOKEN",
  "milvusCollectionName": "YOUR-MILVUS-COLLECTION-NAME"
}
```

##### Managed Milvus service at [Zilliz](https://zilliz.com/)

```json
{
  "milvusUri": "https://in03-***********.api.gcp-us-west1.zillizcloud.com",
  "milvusToken": "d46**********b4b",
  "milvusCollectionName": "YOUR-MILVUS-COLLECTION-NAME"
}
```

##### OpenAI embeddings

```json
{
  "embeddingsApiKey": "YOUR-OPENAI-API-KEY",
  "embeddingsProvider": "OpenAI",
  "embeddingsConfig": {"model":  "text-embedding-3-large"}
}
```

##### Cohere embeddings

```json
{
  "embeddingsApiKey": "YOUR-COHERE-API-KEY",
  "embeddingsProvider": "Cohere",
  "embeddingsConfig": {"model":  "embed-multilingual-v3.0"}
}
```

# Actor input Schema

## `milvusUri` (type: `string`):

The URI of the Milvus instance to connect to. You can include the username and password in the URI, for example: `https://username:password@****.serverless.gcp-us-west1.cloud.zilliz.com`.

## `milvusToken` (type: `string`):

Milvus Token

## `milvusCollectionName` (type: `string`):

Name of the Milvus collection where the data will be stored, if the collection does not exist, it will be created automatically

## `embeddingsProvider` (type: `string`):

Choose the embeddings provider to use for generating embeddings

## `embeddingsConfig` (type: `object`):

Configure the parameters for the LangChain embedding class. Key points to consider:

1. Typically, you only need to specify the model name. For example, for OpenAI, set the model name as {"model": "text-embedding-3-small"}.

2. It's required to ensure that the vector size of your embeddings matches the size of embeddings in the database.

3. Here are examples of embedding models:
   - [OpenAI](https://platform.openai.com/docs/guides/embeddings): `text-embedding-3-small`, `text-embedding-3-large`, etc.
   - [Cohere](https://docs.cohere.com/docs/cohere-embed): `embed-english-v3.0`, `embed-multilingual-light-v3.0`, etc.

4. For more details about other parameters, refer to the [LangChain documentation](https://python.langchain.com/docs/integrations/text_embedding/).

## `embeddingsApiKey` (type: `string`):

Value of the API KEY for the embeddings provider (if required).

For example for OpenAI it is OPENAI\_API\_KEY, for Cohere it is COHERE\_API\_KEY)

## `datasetFields` (type: `array`):

This array specifies the dataset fields to be selected and stored in the vector store. Only the fields listed here will be included in the vector store.

For instance, when using the Website Content Crawler, you might choose to include fields such as `text`, `url`, and `metadata.title` in the vector store.

## `metadataDatasetFields` (type: `object`):

A list of dataset fields which should be selected from the dataset and stored as metadata in the vector stores.

For example, when using the Website Content Crawler, you might want to store `url` in metadata. In this case, use `metadataDatasetFields parameter as follows {"url": "url"}`

## `metadataObject` (type: `object`):

This object allows you to store custom metadata for every item in the vector store.

For example, if you want to store the `domain` as metadata, use the `metadataObject` like this: {"domain": "apify.com"}.

## `datasetId` (type: `string`):

Dataset ID (when running standalone without integration)

## `dataUpdatesStrategy` (type: `string`):

Choose the update strategy for the integration. The update strategy determines how the integration updates the data in the database.

The available options are:

- **Add data** (`add`):
  - Always adds new records to the database.
  - No checks for existing records or updates are performed.
  - Useful when appending data without concern for duplicates.

- **Upsert data** (`upsert`):
  - Updates existing records if they match a key or identifier.
  - Inserts new records into the database if they don't already exist.
  - Ideal for ensuring the database contains the most up-to-date data, avoiding duplicates.

- **Update changed data based on deltas** (`deltaUpdates`):
  - Performs incremental updates by identifying differences (deltas) between the new dataset and the existing records.
  - Only adds new records and updates those that have changed.
  - Unchanged records are left untouched.
  - Maximizes efficiency by reducing unnecessary updates.

Select the strategy that best fits your use case.

## `dataUpdatesPrimaryDatasetFields` (type: `array`):

This array contains fields that are used to uniquely identify dataset items, which helps to handle content changes across different runs.

For instance, in a web content crawling scenario, the `url` field could serve as a unique identifier for each item.

## `enableDeltaUpdates` (type: `boolean`):

When set to true, this setting enables incremental updates for objects in the database by comparing the changes (deltas) between the crawled dataset items and the existing objects, uniquely identified by the `datasetKeysToItemId` field.

The integration will only add new objects and update those that have changed, reducing unnecessary updates. The `datasetFields`, `metadataDatasetFields`, and `metadataObject` fields are used to determine the changes.

## `deltaUpdatesPrimaryDatasetFields` (type: `array`):

This array contains fields that are used to uniquely identify dataset items, which helps to handle content changes across different runs.

For instance, in a web content crawling scenario, the `url` field could serve as a unique identifier for each item.

## `deleteExpiredObjects` (type: `boolean`):

When set to true, delete objects from the database that have not been crawled for a specified period.

## `expiredObjectDeletionPeriodDays` (type: `integer`):

This setting allows the integration to manage the deletion of objects from the database that have not been crawled for a specified period. It is typically used in subsequent runs after the initial crawl.

When the value is greater than 0, the integration checks if objects have been seen within the last X days (determined by the expiration period). If the objects are expired, they are deleted from the database. The specific value for `deletedExpiredObjectsDays` depends on your use case and how frequently you crawl data.

For example, if you crawl data daily, you can set `deletedExpiredObjectsDays` to 7 days. If you crawl data weekly, you can set `deletedExpiredObjectsDays` to 30 days.

## `performChunking` (type: `boolean`):

When set to true, the text will be divided into smaller chunks based on the settings provided below. Proper chunking helps optimize retrieval and ensures accurate and efficient responses.

## `chunkSize` (type: `integer`):

Defines the maximum number of characters in each text chunk. Choosing the right size balances between detailed context and system performance. Optimal sizes ensure high relevancy and minimal response time.

## `chunkOverlap` (type: `integer`):

Specifies the number of overlapping characters between consecutive text chunks. Adjusting this helps maintain context across chunks, which is crucial for accuracy in retrieval-augmented generation systems.

## Actor input object example

```json
{
  "embeddingsProvider": "OpenAI",
  "datasetFields": [
    "text"
  ],
  "dataUpdatesStrategy": "deltaUpdates",
  "dataUpdatesPrimaryDatasetFields": [
    "url"
  ],
  "enableDeltaUpdates": true,
  "deltaUpdatesPrimaryDatasetFields": [
    "url"
  ],
  "deleteExpiredObjects": true,
  "expiredObjectDeletionPeriodDays": 30,
  "performChunking": true,
  "chunkSize": 2000,
  "chunkOverlap": 0
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "datasetFields": [
        "text"
    ],
    "dataUpdatesStrategy": "deltaUpdates",
    "dataUpdatesPrimaryDatasetFields": [
        "url"
    ],
    "deltaUpdatesPrimaryDatasetFields": [
        "url"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("apify/milvus-integration").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "datasetFields": ["text"],
    "dataUpdatesStrategy": "deltaUpdates",
    "dataUpdatesPrimaryDatasetFields": ["url"],
    "deltaUpdatesPrimaryDatasetFields": ["url"],
}

# Run the Actor and wait for it to finish
run = client.actor("apify/milvus-integration").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "datasetFields": [
    "text"
  ],
  "dataUpdatesStrategy": "deltaUpdates",
  "dataUpdatesPrimaryDatasetFields": [
    "url"
  ],
  "deltaUpdatesPrimaryDatasetFields": [
    "url"
  ]
}' |
apify call apify/milvus-integration --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=apify/milvus-integration",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Milvus Integration",
        "description": "This integration transfers data from Apify Actors to a Milvus/Zilliz database and is a good starting point for a question-answering, search, or RAG use case.",
        "version": "0.1",
        "x-build-id": "NytD4kScCWRTUzNuf"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/apify~milvus-integration/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-apify-milvus-integration",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/apify~milvus-integration/runs": {
            "post": {
                "operationId": "runs-sync-apify-milvus-integration",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/apify~milvus-integration/run-sync": {
            "post": {
                "operationId": "run-sync-apify-milvus-integration",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "milvusUri",
                    "milvusToken",
                    "milvusCollectionName",
                    "embeddingsProvider",
                    "embeddingsApiKey",
                    "datasetFields"
                ],
                "properties": {
                    "milvusUri": {
                        "title": "Milvus URI",
                        "type": "string",
                        "description": "The URI of the Milvus instance to connect to. You can include the username and password in the URI, for example: `https://username:password@****.serverless.gcp-us-west1.cloud.zilliz.com`."
                    },
                    "milvusToken": {
                        "title": "Milvus Token",
                        "type": "string",
                        "description": "Milvus Token"
                    },
                    "milvusCollectionName": {
                        "title": "Milvus collection name",
                        "type": "string",
                        "description": "Name of the Milvus collection where the data will be stored, if the collection does not exist, it will be created automatically"
                    },
                    "embeddingsProvider": {
                        "title": "Embeddings provider (as defined in the langchain API)",
                        "enum": [
                            "OpenAI",
                            "Cohere"
                        ],
                        "type": "string",
                        "description": "Choose the embeddings provider to use for generating embeddings",
                        "default": "OpenAI"
                    },
                    "embeddingsConfig": {
                        "title": "Configuration for embeddings provider",
                        "type": "object",
                        "description": "Configure the parameters for the LangChain embedding class. Key points to consider:\n\n1. Typically, you only need to specify the model name. For example, for OpenAI, set the model name as {\"model\": \"text-embedding-3-small\"}.\n\n2. It's required to ensure that the vector size of your embeddings matches the size of embeddings in the database.\n\n3. Here are examples of embedding models:\n   - [OpenAI](https://platform.openai.com/docs/guides/embeddings): `text-embedding-3-small`, `text-embedding-3-large`, etc.\n   - [Cohere](https://docs.cohere.com/docs/cohere-embed): `embed-english-v3.0`, `embed-multilingual-light-v3.0`, etc.\n\n4. For more details about other parameters, refer to the [LangChain documentation](https://python.langchain.com/docs/integrations/text_embedding/)."
                    },
                    "embeddingsApiKey": {
                        "title": "Embeddings API KEY (whenever applicable, depends on provider)",
                        "type": "string",
                        "description": "Value of the API KEY for the embeddings provider (if required).\n\n For example for OpenAI it is OPENAI_API_KEY, for Cohere it is COHERE_API_KEY)"
                    },
                    "datasetFields": {
                        "title": "Dataset fields to select from the dataset results and store in the database",
                        "type": "array",
                        "description": "This array specifies the dataset fields to be selected and stored in the vector store. Only the fields listed here will be included in the vector store.\n\nFor instance, when using the Website Content Crawler, you might choose to include fields such as `text`, `url`, and `metadata.title` in the vector store.",
                        "default": [
                            "text"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "metadataDatasetFields": {
                        "title": "Dataset fields to select from the dataset and store as metadata in the database",
                        "type": "object",
                        "description": "A list of dataset fields which should be selected from the dataset and stored as metadata in the vector stores.\n\nFor example, when using the Website Content Crawler, you might want to store `url` in metadata. In this case, use `metadataDatasetFields parameter as follows {\"url\": \"url\"}`"
                    },
                    "metadataObject": {
                        "title": "Custom object to be stored as metadata in the vector store database",
                        "type": "object",
                        "description": "This object allows you to store custom metadata for every item in the vector store.\n\nFor example, if you want to store the `domain` as metadata, use the `metadataObject` like this: {\"domain\": \"apify.com\"}."
                    },
                    "datasetId": {
                        "title": "Dataset ID",
                        "type": "string",
                        "description": "Dataset ID (when running standalone without integration)"
                    },
                    "dataUpdatesStrategy": {
                        "title": "Update strategy (add, upsert, deltaUpdates (default))",
                        "enum": [
                            "add",
                            "upsert",
                            "deltaUpdates"
                        ],
                        "type": "string",
                        "description": "Choose the update strategy for the integration. The update strategy determines how the integration updates the data in the database.\n\nThe available options are:\n\n- **Add data** (`add`):\n  - Always adds new records to the database.\n  - No checks for existing records or updates are performed.\n  - Useful when appending data without concern for duplicates.\n\n- **Upsert data** (`upsert`):\n  - Updates existing records if they match a key or identifier.\n  - Inserts new records into the database if they don't already exist.\n  - Ideal for ensuring the database contains the most up-to-date data, avoiding duplicates.\n\n- **Update changed data based on deltas** (`deltaUpdates`):\n  - Performs incremental updates by identifying differences (deltas) between the new dataset and the existing records.\n  - Only adds new records and updates those that have changed.\n  - Unchanged records are left untouched.\n  - Maximizes efficiency by reducing unnecessary updates.\n\nSelect the strategy that best fits your use case.",
                        "default": "deltaUpdates"
                    },
                    "dataUpdatesPrimaryDatasetFields": {
                        "title": "Dataset fields to uniquely identify dataset items (only relevant when dataUpdatesStrategy is `upsert` or `deltaUpdates`)",
                        "type": "array",
                        "description": "This array contains fields that are used to uniquely identify dataset items, which helps to handle content changes across different runs.\n\nFor instance, in a web content crawling scenario, the `url` field could serve as a unique identifier for each item.",
                        "default": [
                            "url"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "enableDeltaUpdates": {
                        "title": "Enable incremental updates for objects based on deltas (deprecated)",
                        "type": "boolean",
                        "description": "When set to true, this setting enables incremental updates for objects in the database by comparing the changes (deltas) between the crawled dataset items and the existing objects, uniquely identified by the `datasetKeysToItemId` field.\n\n The integration will only add new objects and update those that have changed, reducing unnecessary updates. The `datasetFields`, `metadataDatasetFields`, and `metadataObject` fields are used to determine the changes.",
                        "default": true
                    },
                    "deleteExpiredObjects": {
                        "title": "Delete expired objects from the database",
                        "type": "boolean",
                        "description": "When set to true, delete objects from the database that have not been crawled for a specified period.",
                        "default": true
                    },
                    "expiredObjectDeletionPeriodDays": {
                        "title": "Delete expired objects from the database after a specified number of days",
                        "minimum": 0,
                        "type": "integer",
                        "description": "This setting allows the integration to manage the deletion of objects from the database that have not been crawled for a specified period. It is typically used in subsequent runs after the initial crawl.\n\nWhen the value is greater than 0, the integration checks if objects have been seen within the last X days (determined by the expiration period). If the objects are expired, they are deleted from the database. The specific value for `deletedExpiredObjectsDays` depends on your use case and how frequently you crawl data.\n\nFor example, if you crawl data daily, you can set `deletedExpiredObjectsDays` to 7 days. If you crawl data weekly, you can set `deletedExpiredObjectsDays` to 30 days.",
                        "default": 30
                    },
                    "performChunking": {
                        "title": "Enable text chunking",
                        "type": "boolean",
                        "description": "When set to true, the text will be divided into smaller chunks based on the settings provided below. Proper chunking helps optimize retrieval and ensures accurate and efficient responses.",
                        "default": true
                    },
                    "chunkSize": {
                        "title": "Maximum chunk size",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Defines the maximum number of characters in each text chunk. Choosing the right size balances between detailed context and system performance. Optimal sizes ensure high relevancy and minimal response time.",
                        "default": 2000
                    },
                    "chunkOverlap": {
                        "title": "Chunk overlap",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Specifies the number of overlapping characters between consecutive text chunks. Adjusting this helps maintain context across chunks, which is crucial for accuracy in retrieval-augmented generation systems.",
                        "default": 0
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
