# Self Learning Postgres DB (`ruv/self-learning-postgres-db`) Actor

Self-learning vector database with GNN-powered index optimization. Features: vector search, RAG queries, embeddings, clustering, deduplication, batch ops, and data import/export. Scales with Raft consensus.

- **URL**: https://apify.com/ruv/self-learning-postgres-db.md
- **Developed by:** [Reuven Cohen](https://apify.com/ruv) (community)
- **Categories:** Agents, Automation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 1 bookmarks
- **User rating**: No ratings yet

## Pricing

from $0.30 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Self-Learning Postgres DB - Vector Database for AI Agents

A distributed vector database that **truly learns**. Store embeddings, query with semantic search, and let the index improve itself through TRM (Tiny Recursive Models), SONA (Self-Optimizing Neural Architecture), and Graph Neural Networks.

[![Apify Actor](https://img.shields.io/badge/Apify-Actor-blue)](https://apify.com/ruv/self-learning-postgres-db)
[![PostgreSQL 17](https://img.shields.io/badge/PostgreSQL-17.7-blue)](https://www.postgresql.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
[![Version](https://img.shields.io/badge/version-2.1-green)](https://github.com/ruvnet/ruvector)

### Key AI Features

| Feature | Description |
|---------|-------------|
| **TRM** | 7M parameter recursive reasoning (83% on GSM8K) |
| **SONA** | 3-tier learning (Instant/Background/Deep) |
| **EWC++** | Anti-forgetting protection (λ=2000) |
| **GNN** | Graph Neural Network index optimization |
| **Trajectory Tracking** | Learn from query patterns |

---

### Features

**30+ Operations** for complete vector database management:

- **Semantic Search** - Find documents by meaning, not just keywords
- **Batch Operations** - Insert and search thousands of documents efficiently
- **Hybrid Search** - Combine vector similarity with keyword matching
- **RAG Support** - Built-in Retrieval-Augmented Generation queries
- **Self-Learning** - GNN training for index optimization
- **Clustering** - K-means document clustering
- **Deduplication** - Find and remove duplicate content
- **Export/Import** - JSON and CSV data migration

**Zero Setup Required:**
- Embedded PostgreSQL with ruvector extension
- Local AI embeddings (no OpenAI API key needed)
- Automatic table and index creation

---

### Quick Start (30 Seconds)

#### Full Demo

```json
{
  "action": "full_workflow",
  "query": "How does machine learning work?",
  "documents": [
    {"content": "Machine learning is AI that learns patterns from data.", "metadata": {"category": "AI"}},
    {"content": "PostgreSQL is a powerful relational database.", "metadata": {"category": "Database"}},
    {"content": "Neural networks consist of layers of nodes.", "metadata": {"category": "AI"}},
    {"content": "Vector databases store embeddings for similarity search.", "metadata": {"category": "Database"}}
  ]
}
````

**Result:** Documents ranked by semantic relevance to your query.

***

### All 38 Actions

#### Document Operations

| Action | Description |
|--------|-------------|
| `insert` | Add documents with auto-generated embeddings |
| `batch_insert` | Efficiently insert large document sets |
| `get` | Retrieve single document by ID |
| `list` | List documents with filtering |
| `update` | Modify existing document content/metadata |
| `delete` | Remove documents by ID, IDs, or filter |
| `upsert` | Insert or update (smart merge) |

#### Search Operations

| Action | Description |
|--------|-------------|
| `search` | Semantic similarity search |
| `batch_search` | Multiple queries in one call |
| `hybrid_search` | Vector + BM25 keyword combined |
| `multi_query_search` | Aggregate results from multiple queries |
| `mmr_search` | Maximal Marginal Relevance (diverse results) |
| `graph_search` | Graph-based similarity traversal |
| `range_search` | All results within distance threshold |

#### Table Operations

| Action | Description |
|--------|-------------|
| `create_table` | Create new vector collection |
| `drop_table` | Delete collection |
| `list_tables` | Show all vector tables |
| `table_stats` | Collection statistics and metrics |
| `create_index` | Add HNSW or IVFFlat index |
| `reindex` | Rebuild indexes |

#### Self-Learning / GNN / SONA

| Action | Description |
|--------|-------------|
| `train_gnn` | Train Graph Neural Network on data |
| `optimize_index` | Auto-tune HNSW parameters |
| `analyze_patterns` | Analyze data distribution |
| `sona_learn` | Trigger TRM/SONA background learning cycle |
| `sona_status` | Check SONA learning status and capabilities |

#### Clustering & Deduplication

| Action | Description |
|--------|-------------|
| `cluster` | K-means document clustering |
| `find_duplicates` | Detect similar document pairs |
| `deduplicate` | Remove duplicate documents |

#### Data Operations

| Action | Description |
|--------|-------------|
| `export` | Export to JSON or CSV |
| `import` | Import from JSON data |

#### AI / RAG

| Action | Description |
|--------|-------------|
| `rag_query` | Build RAG context from search results |
| `summarize` | Document statistics and previews |

#### Utility

| Action | Description |
|--------|-------------|
| `ping` | Test database connection |
| `version` | Get version and feature info |
| `embedding_models` | List available models |
| `generate_embedding` | Create embeddings without storing |
| `similarity` | Compare similarity of two texts |

***

### Use Cases

#### 1. AI Agent Memory

```json
{
  "action": "insert",
  "tableName": "agent_memory",
  "documents": [
    {"content": "User prefers dark mode", "metadata": {"user_id": "123", "type": "preference"}},
    {"content": "User asked about Python tutorials", "metadata": {"user_id": "123", "type": "history"}}
  ]
}
```

Retrieve memories:

```json
{
  "action": "search",
  "tableName": "agent_memory",
  "query": "What does this user like?",
  "filter": "metadata->>'user_id' = '123'"
}
```

#### 2. RAG Pipeline

```json
{
  "action": "rag_query",
  "query": "How do I return a product?",
  "topK": 5,
  "ragMaxTokens": 2000
}
```

Returns context ready to feed to your LLM.

#### 3. Batch Document Processing

```json
{
  "action": "batch_insert",
  "batchSize": 100,
  "documents": [
    // ... thousands of documents
  ]
}
```

#### 4. Find & Remove Duplicates

```json
{
  "action": "find_duplicates",
  "similarityThreshold": 0.95
}
```

Then:

```json
{
  "action": "deduplicate",
  "similarityThreshold": 0.95
}
```

#### 5. Document Clustering

```json
{
  "action": "cluster",
  "numClusters": 10,
  "clusteringAlgorithm": "kmeans"
}
```

#### 6. Index Optimization

```json
{
  "action": "optimize_index",
  "enableLearning": true
}
```

#### 7. SONA Self-Learning

Check learning status:

```json
{
  "action": "sona_status"
}
```

Trigger learning cycle:

```json
{
  "action": "sona_learn",
  "ewcLambda": 2000,
  "patternThreshold": 0.7
}
```

***

### Parameters Reference

#### Core Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `action` | string | `search` | Operation to perform |
| `connectionString` | string | embedded | PostgreSQL URL for persistence |
| `tableName` | string | `documents` | Table/collection name |

#### Search Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `query` | string | - | Natural language search query |
| `queryVector` | array | - | Pre-computed embedding vector |
| `topK` | integer | 10 | Number of results |
| `distanceMetric` | string | `cosine` | cosine, l2, inner\_product, manhattan |
| `filter` | string | - | SQL WHERE clause |
| `minScore` | number | 0 | Minimum similarity score (0-1) |
| `maxDistance` | number | - | Maximum distance threshold |

#### Embedding Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `embeddingModel` | string | `all-MiniLM-L6-v2` | AI embedding model |
| `generateEmbeddings` | boolean | true | Auto-generate embeddings |
| `dimensions` | integer | 384 | Vector dimensions |

#### Index Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `indexType` | string | `hnsw` | hnsw, ivfflat, none |
| `hnswM` | integer | 16 | HNSW max connections |
| `hnswEfConstruction` | integer | 64 | HNSW build quality |
| `hnswEfSearch` | integer | 100 | HNSW search quality |
| `ivfLists` | integer | 100 | IVFFlat partitions |

#### GNN Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `enableLearning` | boolean | false | Enable self-learning |
| `learningRate` | number | 0.01 | GNN learning rate |
| `gnnLayers` | integer | 2 | GNN layer count |
| `trainEpochs` | integer | 10 | Training epochs |

#### SONA / TRM Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `sonaEnabled` | boolean | true | Enable TRM/SONA self-learning |
| `ewcLambda` | number | 2000 | EWC++ anti-forgetting strength |
| `patternThreshold` | number | 0.7 | Pattern recognition confidence |
| `maxTrajectories` | integer | 100 | Max trajectory steps to track |
| `sonaLearningTiers` | array | \["instant", "background"] | Learning tiers to enable |

#### Clustering Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `numClusters` | integer | 10 | K-means clusters |
| `similarityThreshold` | number | 0.95 | Duplicate detection threshold |

***

### Embedding Models

| Model | Dimensions | Speed | Quality | Best For |
|-------|------------|-------|---------|----------|
| `all-MiniLM-L6-v2` | 384 | Fast | Good | Prototyping |
| `bge-small-en-v1.5` | 384 | Fast | Excellent | Production |
| `bge-base-en-v1.5` | 768 | Medium | Better | High accuracy |
| `nomic-embed-text-v1` | 768 | Medium | Good | Long documents (8K) |
| `gte-small` | 384 | Fast | Good | General use |
| `e5-small-v2` | 384 | Fast | Good | Multilingual |

***

### Persistent Storage

#### Hybrid Persistence Architecture

```
┌─────────────────────────────────────────────────────────┐
│                    Actor Run                            │
│  ┌──────────────┐    ┌──────────────┐    ┌───────────┐ │
│  │ Key-Value    │───▶│ Embedded     │───▶│ Key-Value │ │
│  │ Store (load) │    │ PostgreSQL   │    │ (save)    │ │
│  └──────────────┘    └──────────────┘    └───────────┘ │
│       START              WORK               END         │
└─────────────────────────────────────────────────────────┘
```

**Flow:**

1. **On Start** → Load documents from Key-Value Store into embedded PostgreSQL
2. **During Run** → Full vector search capabilities (HNSW, cosine, etc.)
3. **On End** → Export documents back to Key-Value Store

#### Storage Options Comparison

| Feature | External PostgreSQL | Apify Key-Value Store |
|---------|---------------------|----------------------|
| Setup required | Yes | No |
| Cost | Separate billing | Included in Apify |
| Max size | Unlimited | ~9GB per store |
| Cold start | Fast | Slower (load data) |
| Best for | Large/production | Small-medium datasets |

#### External PostgreSQL

For persistent storage with external database:

```json
{
  "connectionString": "postgresql://user:password@host:5432/database",
  "action": "search",
  "query": "Your query"
}
```

**Supported:**

- PostgreSQL 14+ with ruvector extension
- PostgreSQL with pgvector (compatibility mode)
- Supabase, Neon, AWS RDS, etc.

***

### API Integration

#### Python

```python
from apify_client import ApifyClient

client = ApifyClient("your-api-token")
run = client.actor("ruv/self-learning-postgres-db").call(run_input={
    "action": "search",
    "query": "machine learning basics",
    "topK": 5
})
results = client.dataset(run["defaultDatasetId"]).list_items().items
```

#### JavaScript

```javascript
import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'your-api-token' });
const run = await client.actor('ruv/self-learning-postgres-db').call({
    action: 'search',
    query: 'machine learning basics',
    topK: 5
});
const results = await client.dataset(run.defaultDatasetId).listItems();
```

#### cURL

```bash
curl -X POST "https://api.apify.com/v2/acts/ruv~self-learning-postgres-db/runs" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "action": "search",
    "query": "machine learning",
    "topK": 10
  }'
```

***

### Performance

Built on PostgreSQL 17.7 with AVX-512 SIMD acceleration:

| Dataset Size | Search Time | Accuracy |
|--------------|-------------|----------|
| 10,000 docs | ~0.3ms | 99%+ |
| 100,000 docs | ~0.5ms | 99%+ |
| 1,000,000 docs | ~1.2ms | 98%+ |

***

### Pricing (Apify Pay-per-event)

#### Core Events

| Event | Price | Description |
|-------|-------|-------------|
| Actor Start | $0.001 | Per GB memory used |
| Document Insert | $0.001 | Per document stored |
| Vector Search | $0.001 | Per search query |
| Result | $0.0005 | Per result returned |

#### Advanced Operations

| Event | Price | Description |
|-------|-------|-------------|
| Batch Operation | $0.002 | Per batch insert/search |
| RAG Query | $0.002 | Per RAG context build |
| GNN Training | $0.01 | Per training session |
| Clustering | $0.005 | Per cluster operation |
| Deduplication | $0.003 | Per dedupe run |
| Data Export | $0.002 | Per export |
| Data Import | $0.002 | Per import |
| Table Operation | $0.001 | Create/drop table |
| Index Operation | $0.002 | Create/optimize index |
| Similarity Check | $0.001 | Per comparison |
| Embedding Generation | $0.001 | Per embedding |

**Volume Discounts:**

- Bronze: -14% off results
- Silver: -26% off results
- Gold: -40% off results

***

### Development

#### Local Testing

```bash
## Start ruvector-postgres
docker run -d --name ruvector-pg -e POSTGRES_PASSWORD=secret -p 5432:5432 ruvnet/ruvector-postgres:latest

## Run tests
DATABASE_URL="postgresql://postgres:secret@localhost:5432/postgres" npm test
```

#### Deployment

```bash
## Set your API token in root .env
echo "APIFY_API_TOKEN=your_token" >> ../../../.env

## Deploy
npm run deploy
```

***

### Links

- [GitHub Repository](https://github.com/ruvnet/ruvector)
- [Apify Store](https://apify.com/ruv/self-learning-postgres-db)
- [Docker Image](https://hub.docker.com/r/ruvnet/ruvector-postgres)
- [RuVector Documentation](https://github.com/ruvnet/ruvector/tree/main/crates/ruvector-postgres)

***

### Support

- [Open an Issue](https://github.com/ruvnet/ruvector/issues)
- [Apify Community](https://discord.gg/apify)

***

**Built with RuVector** - High-performance vector search with TRM/SONA self-learning for the AI era.

# Actor input Schema

## `action` (type: `string`):

The operation to perform on the vector database

## `connectionString` (type: `string`):

PostgreSQL connection URL. Leave empty for embedded database (non-persistent). For persistent storage, use your own PostgreSQL with ruvector/pgvector extension.

## `tableName` (type: `string`):

Name of the vector table (collection)

## `query` (type: `string`):

Natural language query for semantic search. The AI understands meaning, not just keywords.

## `queryVector` (type: `array`):

Pre-computed embedding vector (alternative to query text). Use with external embedding APIs.

## `documents` (type: `array`):

Documents to insert. Each should have 'content' and optional 'metadata' and 'embedding'.

## `topK` (type: `integer`):

Maximum number of results to return

## `distanceMetric` (type: `string`):

How to measure vector similarity

## `filter` (type: `string`):

SQL WHERE clause for filtering. Example: metadata->>'category' = 'AI'

## `minScore` (type: `number`):

Minimum similarity score threshold (0-1)

## `maxDistance` (type: `number`):

Maximum distance threshold for range search

## `includeEmbeddings` (type: `boolean`):

Include embedding vectors in results (increases response size)

## `includeMetadata` (type: `boolean`):

Include metadata in results

## `embeddingModel` (type: `string`):

AI model for generating text embeddings. No API key needed - runs locally!

## `generateEmbeddings` (type: `boolean`):

Auto-generate embeddings for documents without them

## `dimensions` (type: `integer`):

Embedding dimensions (384 for MiniLM/BGE-small, 768 for larger models)

## `indexType` (type: `string`):

Vector index algorithm for faster search

## `hnswM` (type: `integer`):

Max connections per node. Higher = better recall, more memory

## `hnswEfConstruction` (type: `integer`):

Index build quality. Higher = better index, slower build

## `hnswEfSearch` (type: `integer`):

Search quality. Higher = better recall, slower search

## `ivfLists` (type: `integer`):

Number of IVF partitions for IVFFlat index

## `hybridWeight` (type: `number`):

Balance between vector (1.0) and keyword (0.0) search

## `batchSize` (type: `integer`):

Documents per batch for batch operations

## `documentId` (type: `integer`):

Single document ID for get/update/delete operations

## `documentIds` (type: `array`):

Multiple document IDs for batch delete

## `updates` (type: `object`):

Fields to update: {content, metadata, embedding}

## `enableLearning` (type: `boolean`):

Enable self-learning index optimization

## `learningRate` (type: `number`):

GNN training learning rate

## `gnnLayers` (type: `integer`):

Number of Graph Neural Network layers

## `trainEpochs` (type: `integer`):

Number of GNN training epochs

## `numClusters` (type: `integer`):

K-means cluster count

## `clusteringAlgorithm` (type: `string`):

Clustering method

## `similarityThreshold` (type: `number`):

Threshold for duplicate detection (0-1, higher = stricter)

## `exportFormat` (type: `string`):

Data export format

## `importData` (type: `array`):

Data to import (array of documents with content and optional metadata)

## `ragMaxTokens` (type: `integer`):

Maximum context tokens for RAG query

## `ragContext` (type: `string`):

Additional context to prepend to RAG results

## `sonaEnabled` (type: `boolean`):

Enable TRM/SONA self-learning with trajectory tracking and pattern recognition

## `ewcLambda` (type: `number`):

Elastic Weight Consolidation strength for anti-forgetting protection. Higher values preserve more learned knowledge.

## `patternThreshold` (type: `number`):

Minimum confidence threshold for pattern recognition (0-1)

## `maxTrajectories` (type: `integer`):

Maximum number of trajectory steps to track for learning

## `sonaLearningTiers` (type: `array`):

SONA learning tiers to enable (instant=real-time, background=async, deep=comprehensive)

## Actor input object example

```json
{
  "action": "full_workflow",
  "connectionString": "postgresql://user:password@host:5432/database",
  "tableName": "documents",
  "query": "How does machine learning work?",
  "documents": [
    {
      "content": "Machine learning is a type of AI that learns patterns from data to make predictions.",
      "metadata": {
        "category": "AI"
      }
    },
    {
      "content": "PostgreSQL is a powerful open-source relational database.",
      "metadata": {
        "category": "Database"
      }
    },
    {
      "content": "Neural networks are inspired by the human brain and consist of layers of nodes.",
      "metadata": {
        "category": "AI"
      }
    },
    {
      "content": "Vector databases store data as mathematical embeddings for similarity search.",
      "metadata": {
        "category": "Database"
      }
    }
  ],
  "topK": 10,
  "distanceMetric": "cosine",
  "filter": "metadata->>'category' = 'AI'",
  "minScore": 0,
  "includeEmbeddings": false,
  "includeMetadata": true,
  "embeddingModel": "all-MiniLM-L6-v2",
  "generateEmbeddings": true,
  "dimensions": 384,
  "indexType": "hnsw",
  "hnswM": 16,
  "hnswEfConstruction": 64,
  "hnswEfSearch": 100,
  "ivfLists": 100,
  "hybridWeight": 0.7,
  "batchSize": 100,
  "enableLearning": false,
  "learningRate": 0.01,
  "gnnLayers": 2,
  "trainEpochs": 10,
  "numClusters": 10,
  "clusteringAlgorithm": "kmeans",
  "similarityThreshold": 0.95,
  "exportFormat": "json",
  "ragMaxTokens": 2000,
  "sonaEnabled": true,
  "ewcLambda": 2000,
  "patternThreshold": 0.7,
  "maxTrajectories": 100,
  "sonaLearningTiers": [
    "instant",
    "background"
  ]
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "action": "full_workflow",
    "documents": [
        {
            "content": "Machine learning is a type of AI that learns patterns from data to make predictions.",
            "metadata": {
                "category": "AI"
            }
        },
        {
            "content": "PostgreSQL is a powerful open-source relational database.",
            "metadata": {
                "category": "Database"
            }
        },
        {
            "content": "Neural networks are inspired by the human brain and consist of layers of nodes.",
            "metadata": {
                "category": "AI"
            }
        },
        {
            "content": "Vector databases store data as mathematical embeddings for similarity search.",
            "metadata": {
                "category": "Database"
            }
        }
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("ruv/self-learning-postgres-db").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "action": "full_workflow",
    "documents": [
        {
            "content": "Machine learning is a type of AI that learns patterns from data to make predictions.",
            "metadata": { "category": "AI" },
        },
        {
            "content": "PostgreSQL is a powerful open-source relational database.",
            "metadata": { "category": "Database" },
        },
        {
            "content": "Neural networks are inspired by the human brain and consist of layers of nodes.",
            "metadata": { "category": "AI" },
        },
        {
            "content": "Vector databases store data as mathematical embeddings for similarity search.",
            "metadata": { "category": "Database" },
        },
    ],
}

# Run the Actor and wait for it to finish
run = client.actor("ruv/self-learning-postgres-db").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "action": "full_workflow",
  "documents": [
    {
      "content": "Machine learning is a type of AI that learns patterns from data to make predictions.",
      "metadata": {
        "category": "AI"
      }
    },
    {
      "content": "PostgreSQL is a powerful open-source relational database.",
      "metadata": {
        "category": "Database"
      }
    },
    {
      "content": "Neural networks are inspired by the human brain and consist of layers of nodes.",
      "metadata": {
        "category": "AI"
      }
    },
    {
      "content": "Vector databases store data as mathematical embeddings for similarity search.",
      "metadata": {
        "category": "Database"
      }
    }
  ]
}' |
apify call ruv/self-learning-postgres-db --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=ruv/self-learning-postgres-db",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Self Learning Postgres DB",
        "description": "Self-learning vector database with GNN-powered index optimization. Features: vector search, RAG queries, embeddings, clustering, deduplication, batch ops, and data import/export. Scales with Raft consensus.",
        "version": "2.1",
        "x-build-id": "LfI2tK0RJa7uPN8NB"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/ruv~self-learning-postgres-db/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-ruv-self-learning-postgres-db",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/ruv~self-learning-postgres-db/runs": {
            "post": {
                "operationId": "runs-sync-ruv-self-learning-postgres-db",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/ruv~self-learning-postgres-db/run-sync": {
            "post": {
                "operationId": "run-sync-ruv-self-learning-postgres-db",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "action"
                ],
                "properties": {
                    "action": {
                        "title": "Action",
                        "enum": [
                            "full_workflow",
                            "search",
                            "insert",
                            "batch_insert",
                            "get",
                            "list",
                            "update",
                            "delete",
                            "upsert",
                            "hybrid_search",
                            "multi_query_search",
                            "mmr_search",
                            "graph_search",
                            "range_search",
                            "batch_search",
                            "create_table",
                            "drop_table",
                            "list_tables",
                            "table_stats",
                            "create_index",
                            "reindex",
                            "train_gnn",
                            "optimize_index",
                            "analyze_patterns",
                            "sona_learn",
                            "sona_status",
                            "cluster",
                            "find_duplicates",
                            "deduplicate",
                            "export",
                            "import",
                            "rag_query",
                            "summarize",
                            "ping",
                            "version",
                            "embedding_models",
                            "generate_embedding",
                            "similarity"
                        ],
                        "type": "string",
                        "description": "The operation to perform on the vector database",
                        "default": "full_workflow"
                    },
                    "connectionString": {
                        "title": "Database Connection",
                        "type": "string",
                        "description": "PostgreSQL connection URL. Leave empty for embedded database (non-persistent). For persistent storage, use your own PostgreSQL with ruvector/pgvector extension."
                    },
                    "tableName": {
                        "title": "Table/Collection Name",
                        "type": "string",
                        "description": "Name of the vector table (collection)",
                        "default": "documents"
                    },
                    "query": {
                        "title": "Search Query",
                        "type": "string",
                        "description": "Natural language query for semantic search. The AI understands meaning, not just keywords."
                    },
                    "queryVector": {
                        "title": "Query Vector",
                        "type": "array",
                        "description": "Pre-computed embedding vector (alternative to query text). Use with external embedding APIs."
                    },
                    "documents": {
                        "title": "Documents",
                        "type": "array",
                        "description": "Documents to insert. Each should have 'content' and optional 'metadata' and 'embedding'."
                    },
                    "topK": {
                        "title": "Number of Results",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Maximum number of results to return",
                        "default": 10
                    },
                    "distanceMetric": {
                        "title": "Distance Metric",
                        "enum": [
                            "cosine",
                            "l2",
                            "inner_product",
                            "manhattan"
                        ],
                        "type": "string",
                        "description": "How to measure vector similarity",
                        "default": "cosine"
                    },
                    "filter": {
                        "title": "Filter",
                        "type": "string",
                        "description": "SQL WHERE clause for filtering. Example: metadata->>'category' = 'AI'"
                    },
                    "minScore": {
                        "title": "Minimum Score",
                        "minimum": 0,
                        "maximum": 1,
                        "type": "number",
                        "description": "Minimum similarity score threshold (0-1)",
                        "default": 0
                    },
                    "maxDistance": {
                        "title": "Maximum Distance",
                        "type": "number",
                        "description": "Maximum distance threshold for range search"
                    },
                    "includeEmbeddings": {
                        "title": "Include Embeddings",
                        "type": "boolean",
                        "description": "Include embedding vectors in results (increases response size)",
                        "default": false
                    },
                    "includeMetadata": {
                        "title": "Include Metadata",
                        "type": "boolean",
                        "description": "Include metadata in results",
                        "default": true
                    },
                    "embeddingModel": {
                        "title": "Embedding Model",
                        "enum": [
                            "all-MiniLM-L6-v2",
                            "bge-small-en-v1.5",
                            "bge-base-en-v1.5",
                            "nomic-embed-text-v1",
                            "gte-small",
                            "e5-small-v2"
                        ],
                        "type": "string",
                        "description": "AI model for generating text embeddings. No API key needed - runs locally!",
                        "default": "all-MiniLM-L6-v2"
                    },
                    "generateEmbeddings": {
                        "title": "Generate Embeddings",
                        "type": "boolean",
                        "description": "Auto-generate embeddings for documents without them",
                        "default": true
                    },
                    "dimensions": {
                        "title": "Vector Dimensions",
                        "minimum": 64,
                        "maximum": 4096,
                        "type": "integer",
                        "description": "Embedding dimensions (384 for MiniLM/BGE-small, 768 for larger models)",
                        "default": 384
                    },
                    "indexType": {
                        "title": "Index Type",
                        "enum": [
                            "hnsw",
                            "ivfflat",
                            "none"
                        ],
                        "type": "string",
                        "description": "Vector index algorithm for faster search",
                        "default": "hnsw"
                    },
                    "hnswM": {
                        "title": "HNSW M Parameter",
                        "minimum": 4,
                        "maximum": 64,
                        "type": "integer",
                        "description": "Max connections per node. Higher = better recall, more memory",
                        "default": 16
                    },
                    "hnswEfConstruction": {
                        "title": "HNSW ef_construction",
                        "minimum": 16,
                        "maximum": 512,
                        "type": "integer",
                        "description": "Index build quality. Higher = better index, slower build",
                        "default": 64
                    },
                    "hnswEfSearch": {
                        "title": "HNSW ef_search",
                        "minimum": 16,
                        "maximum": 512,
                        "type": "integer",
                        "description": "Search quality. Higher = better recall, slower search",
                        "default": 100
                    },
                    "ivfLists": {
                        "title": "IVF Lists",
                        "minimum": 10,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Number of IVF partitions for IVFFlat index",
                        "default": 100
                    },
                    "hybridWeight": {
                        "title": "Hybrid Weight",
                        "minimum": 0,
                        "maximum": 1,
                        "type": "number",
                        "description": "Balance between vector (1.0) and keyword (0.0) search",
                        "default": 0.7
                    },
                    "batchSize": {
                        "title": "Batch Size",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Documents per batch for batch operations",
                        "default": 100
                    },
                    "documentId": {
                        "title": "Document ID",
                        "type": "integer",
                        "description": "Single document ID for get/update/delete operations"
                    },
                    "documentIds": {
                        "title": "Document IDs",
                        "type": "array",
                        "description": "Multiple document IDs for batch delete"
                    },
                    "updates": {
                        "title": "Updates",
                        "type": "object",
                        "description": "Fields to update: {content, metadata, embedding}"
                    },
                    "enableLearning": {
                        "title": "Enable Learning",
                        "type": "boolean",
                        "description": "Enable self-learning index optimization",
                        "default": false
                    },
                    "learningRate": {
                        "title": "Learning Rate",
                        "minimum": 0.0001,
                        "maximum": 1,
                        "type": "number",
                        "description": "GNN training learning rate",
                        "default": 0.01
                    },
                    "gnnLayers": {
                        "title": "GNN Layers",
                        "minimum": 1,
                        "maximum": 10,
                        "type": "integer",
                        "description": "Number of Graph Neural Network layers",
                        "default": 2
                    },
                    "trainEpochs": {
                        "title": "Training Epochs",
                        "minimum": 1,
                        "maximum": 100,
                        "type": "integer",
                        "description": "Number of GNN training epochs",
                        "default": 10
                    },
                    "numClusters": {
                        "title": "Number of Clusters",
                        "minimum": 2,
                        "maximum": 100,
                        "type": "integer",
                        "description": "K-means cluster count",
                        "default": 10
                    },
                    "clusteringAlgorithm": {
                        "title": "Clustering Algorithm",
                        "enum": [
                            "kmeans",
                            "hierarchical"
                        ],
                        "type": "string",
                        "description": "Clustering method",
                        "default": "kmeans"
                    },
                    "similarityThreshold": {
                        "title": "Similarity Threshold",
                        "minimum": 0.5,
                        "maximum": 1,
                        "type": "number",
                        "description": "Threshold for duplicate detection (0-1, higher = stricter)",
                        "default": 0.95
                    },
                    "exportFormat": {
                        "title": "Export Format",
                        "enum": [
                            "json",
                            "csv"
                        ],
                        "type": "string",
                        "description": "Data export format",
                        "default": "json"
                    },
                    "importData": {
                        "title": "Import Data",
                        "type": "array",
                        "description": "Data to import (array of documents with content and optional metadata)"
                    },
                    "ragMaxTokens": {
                        "title": "RAG Max Tokens",
                        "minimum": 100,
                        "maximum": 10000,
                        "type": "integer",
                        "description": "Maximum context tokens for RAG query",
                        "default": 2000
                    },
                    "ragContext": {
                        "title": "RAG Context",
                        "type": "string",
                        "description": "Additional context to prepend to RAG results"
                    },
                    "sonaEnabled": {
                        "title": "Enable SONA Learning",
                        "type": "boolean",
                        "description": "Enable TRM/SONA self-learning with trajectory tracking and pattern recognition",
                        "default": true
                    },
                    "ewcLambda": {
                        "title": "EWC Lambda",
                        "minimum": 100,
                        "maximum": 10000,
                        "type": "number",
                        "description": "Elastic Weight Consolidation strength for anti-forgetting protection. Higher values preserve more learned knowledge.",
                        "default": 2000
                    },
                    "patternThreshold": {
                        "title": "Pattern Threshold",
                        "minimum": 0.1,
                        "maximum": 1,
                        "type": "number",
                        "description": "Minimum confidence threshold for pattern recognition (0-1)",
                        "default": 0.7
                    },
                    "maxTrajectories": {
                        "title": "Max Trajectories",
                        "minimum": 10,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Maximum number of trajectory steps to track for learning",
                        "default": 100
                    },
                    "sonaLearningTiers": {
                        "title": "Learning Tiers",
                        "type": "array",
                        "description": "SONA learning tiers to enable (instant=real-time, background=async, deep=comprehensive)",
                        "default": [
                            "instant",
                            "background"
                        ]
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```