# CogniGraph Weaver (`monumental_wardrobe/cognigraph-weaver`) Actor

A powerful Apify Actor that converts web content into interactive knowledge graphs using artificial intelligence. This Python-based web crawler and AI system extracts content from websites, analyzes it with LLMs, and generates comprehensive knowledge graphs with learning paths.

- **URL**: https://apify.com/monumental\_wardrobe/cognigraph-weaver.md
- **Developed by:** [Enrique Meza](https://apify.com/monumental_wardrobe) (community)
- **Categories:** Automation
- **Stats:** 2 total users, 1 monthly users, 0.0% runs succeeded, 1 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## CogniGraph Weaver

A powerful Apify Actor that converts web content into interactive knowledge graphs using artificial intelligence. This Python-based web crawler and AI system extracts content from websites, analyzes it with LLMs, and generates comprehensive knowledge graphs with learning paths.

### 🚀 Features

- **Web Crawling**: Extracts content from multiple web pages using `requests` and `BeautifulSoup`
- **AI-Powered Knowledge Graph Generation**: Uses OpenAI or OpenRouter APIs to create structured knowledge graphs
- **Graph Analysis**: Analyzes graph topology, centrality, and connectivity
- **Learning Path Generation**: Automatically generates optimized learning paths through the knowledge graph
- **Multilingual Support**: Supports English, Spanish, French, German, and more
- **Markdown Documentation**: Generates comprehensive documentation in Markdown format
- **Validation & Error Handling**: Robust input validation and comprehensive error reporting

### 📊 Architecture

````

┌─────────────────────────────────────────────────────────────┐
│                     CogniGraph Weaver                       │
├─────────────────────────────────────────────────────────────┤
│  1. Input Validation  →  2. API Keys Config  →  3. Crawling │
│                                                             │
│  4. AI Graph Generation  →  5. Graph Analysis  →  6. Docs  │
│                                                             │
│                     7. Save Results                        │
└─────────────────────────────────────────────────────────────┘

````

#### Components

1. **CrawlerService** (`services/crawler_service.py`)
   - Web content extraction
   - HTML parsing with BeautifulSoup
   - Metadata collection
   - Content aggregation

2. **AIService** (`services/ai_service.py`)
   - OpenAI API integration
   - OpenRouter API integration
   - Knowledge graph generation
   - JSON response parsing

3. **GraphService** (`services/graph_service.py`)
   - Graph analysis (centrality, density, components)
   - Learning path generation using Dijkstra's algorithm
   - Graph traversal algorithms
   - Statistical analysis

4. **Utilities**
   - **Markdown Generator** (`utils/markdown_generator.py`): Documentation generation
   - **Validators** (`utils/validators.py`): Input, graph, and path validation
   - **Translations** (`utils/translations.py`): Multilingual support

### 🛠️ Installation

#### Prerequisites

- Python 3.11+
- pip
- Apify account and API token

#### Local Development

```bash
## Clone the repository
cd cogni

## Install dependencies
pip install -r requirements.txt

## Create a .env file with your API keys
cp .env.example .env
## Edit .env and add your OpenAI/OpenRouter API keys

## Run locally
python3 main.py
````

#### Deployment to Apify

```bash
## Login to Apify
apify login

## Deploy the actor
apify push

## Run the actor
apify call my-actor-2 --input='{"startUrls": [{"url": "https://example.com"}], "maxPages": 1}'
```

### 🔑 API Keys

The actor requires at least one AI provider API key:

1. **OpenAI API Key** (for OpenAI provider)
   - Get from: https://platform.openai.com/api-keys
   - Environment variable: `OPENAI_API_KEY`

2. **OpenRouter API Key** (for OpenRouter provider)
   - Get from: https://openrouter.ai/keys
   - Environment variable: `OPENROUTER_API_KEY`

### 📥 Input Schema

```json
{
  "startUrls": [
    {
      "url": "https://example.com"
    }
  ],
  "maxPages": 5,
  "outputLanguage": "en",
  "aiProvider": "openrouter",
  "openaiModel": "gpt-oss-20b",
  "useBrowser": false,
  "generatePng": true,
  "generateHtml": true,
  "generateSvg": false,
  "generatePlotly": false,
  "pngWidth": 1200,
  "pngHeight": 800,
  "pngDpi": 100
}
```

#### Parameters

- **startUrls** (array, required): List of URLs to crawl
  - `url` (string): The URL to fetch

- **maxPages** (integer, default: 5): Maximum number of pages to crawl

- **outputLanguage** (string, default: "en"): Language for output
  - Supported: "en", "es", "fr", "de", "pt", "it", "ja", "zh", "ko", "ar"

- **aiProvider** (string, default: "openrouter"): AI provider
  - Options: "openai", "openrouter"

- **openaiModel** (string, default: "gpt-oss-20b"): Model to use
  - Examples: "gpt-4", "gpt-3.5-turbo", "gpt-oss-20b"

- **useBrowser** (boolean, default: false): Whether to use browser rendering
  - Note: Currently not implemented, uses simple HTTP requests

#### Visualization Parameters

- **generatePng** (boolean, default: true): Generate PNG image of the knowledge graph
  - Creates a static visualization with node colors and relationships

- **generateHtml** (boolean, default: true): Generate interactive HTML visualization
  - Creates a fully interactive graph using PyVis with zoom, pan, and hover features

- **generateSvg** (boolean, default: false): Generate SVG vector image
  - Creates a scalable vector graphic suitable for high-resolution displays

- **generatePlotly** (boolean, default: false): Generate Plotly interactive chart
  - Creates an advanced interactive visualization with Plotly

- **pngWidth** (integer, default: 1200): PNG image width in pixels

- **pngHeight** (integer, default: 800): PNG image height in pixels

- **pngDpi** (integer, default: 100): PNG image DPI (dots per inch)

### 📤 Understanding the Outputs

CogniGraph Weaver generates **6 different datasets** to provide you with comprehensive insights. Each output serves a specific purpose and is designed for different use cases.

***

### 📊 Output Datasets Overview

| Dataset | Purpose | Best For |
|---------|---------|----------|
| **Knowledge Graph (JSON)** | Structured data | Developers, Data analysis, Visualization |
| **Knowledge Graph (Markdown)** | Human-readable report | End users, Documentation, Sharing |
| **Visualizations** | Graph images & interactive charts | Presentations, Reports, Exploration |
| **Learning Paths (JSON)** | Curated learning sequences | Educators, Students, Training |
| **Analysis (JSON)** | Graph statistics & insights | Researchers, Advanced analysis |
| **Metadata (JSON)** | Processing summary | Monitoring, Debugging, Overview |

***

### 📖 Detailed Output Breakdown

#### 1. Knowledge Graph (JSON) - For Developers & Data Scientists

**Why this matters:** This is the core structured data that represents the extracted knowledge in a machine-readable format.

**Use cases:**

- Import into graph databases (Neo4j, ArangoDB)
- Create interactive visualizations (D3.js, vis.js, Cytoscape)
- Build recommendation systems
- Perform advanced graph analysis
- Train ML models on knowledge structures

**Structure:**

```json
{
  "type": "knowledge_graph",
  "format": "json",
  "data": {
    "nodes": [
      {
        "id": "unique_identifier",
        "label": "Concept Name",
        "description": "Detailed explanation of the concept",
        "type": "concept|term|topic",
        "metadata": {
          "source": "which page it came from",
          "importance": 0.85
        }
      }
    ],
    "edges": [
      {
        "source": "node_id_1",
        "target": "node_id_2",
        "relationship": "is_a|part_of|relates_to|depends_on|causes",
        "weight": 0.9,
        "description": "How these concepts relate"
      }
    ]
  }
}
```

**Example use with Python:**

```python
import json
from apify import ApifyClient

## Load the knowledge graph
client = ApifyClient.init(token='YOUR_API_TOKEN')
dataset = client.dataset('DATASET_ID')
knowledge_graph = next(item for item in dataset.iterate_items()
                      if item['type'] == 'knowledge_graph' and item['format'] == 'json')

## Analyze centrality
nodes = knowledge_graph['data']['nodes']
edges = knowledge_graph['data']['edges']
print(f"Found {len(nodes)} concepts with {len(edges)} relationships")
```

***

#### 2. Knowledge Graph (Markdown) - For End Users & Documentation

**Why this matters:** A human-readable, beautifully formatted document that tells the complete story of what was extracted and analyzed.

**Use cases:**

- Share findings with non-technical stakeholders
- Create documentation for projects
- Reference materials for learning
- Report generation
- Print or export to PDF

**What's included:**

```
## CogniGraph Weaver Analysis Report
Generated: 2025-11-06

### Executive Summary
- Graph contains 15 concepts with 23 relationships
- Density: 0.21 (moderately connected)
- Main topic: Machine Learning fundamentals

### Knowledge Graph Structure
#### Nodes (Concepts)
- **Machine Learning** (Central concept)
  Description: A field of computer science...
  Type: concept
  Connected to: 7 other concepts

- **Supervised Learning**
  Description: Learning from labeled data...
  Type: concept
  Connected to: 5 other concepts

#### Relationships
- Machine Learning → **is_a** → Artificial Intelligence (weight: 0.95)
- Supervised Learning → **part_of** → Machine Learning (weight: 0.88)
...

### Learning Paths
#### Path 1: Beginner Introduction (45 minutes)
1. Start with: Artificial Intelligence
2. Then: Machine Learning
3. Next: Supervised Learning
4. Finally: Neural Networks

#### Path 2: Core Concepts (30 minutes)
1. Start with: Data
2. Then: Algorithms
3. Next: Training
4. Finally: Models

### Graph Analysis
- Most connected concepts: Machine Learning (8 connections)
- Root concepts (prerequisites): Artificial Intelligence
- Leaf concepts (end results): Deep Learning, Reinforcement Learning
- Average path length: 3.2 concepts
```

**How to access:**

- Download directly from Apify console
- Parse from dataset and save to file
- Use in CI/CD to generate automatic reports

***

#### 3. Visualizations - For Presentations & Exploration

**Why this matters:** Visual representations make complex knowledge graphs easy to understand, share, and explore. Different visualization formats serve different needs - from static reports to interactive exploration.

**Use cases:**

- Create presentations with visual knowledge maps
- Generate reports with graph images
- Interactive exploration of knowledge structures
- Share findings with non-technical stakeholders
- Embed in websites or documentation
- Print high-resolution diagrams

**Available Formats:**

**PNG Image (Base64-encoded)**

- Static visualization of the entire knowledge graph
- Color-coded nodes by type (concepts, terms, topics)
- Edge weights shown by line thickness
- Perfect for: Reports, presentations, printing
- Example access:

```python
visualization = next(item for item in dataset.iterate_items()
                    if item['type'] == 'visualization' and item['format'] == 'png')
png_data = base64.b64decode(visualization['data'])
with open('knowledge_graph.png', 'wb') as f:
    f.write(png_data)
```

**Interactive HTML (PyVis)**

- Fully interactive graph visualization
- Zoom, pan, and drag to explore
- Hover over nodes to see descriptions
- Click and drag nodes to reorganize
- Perfect for: Web embedding, interactive reports, exploration
- Example access:

```python
visualization = next(item for item in dataset.iterate_items()
                    if item['type'] == 'visualization' and item['format'] == 'html')
with open('knowledge_graph.html', 'w') as f:
    f.write(visualization['data'])
```

**SVG Vector Graphic**

- Scalable vector format for high-resolution displays
- Infinitely scalable without quality loss
- Perfect for: Print materials, high-DPI screens, professional diagrams
- Example access:

```python
visualization = next(item for item in dataset.iterate_items()
                    if item['type'] == 'visualization' and item['format'] == 'svg')
with open('knowledge_graph.svg', 'w') as f:
    f.write(visualization['data'])
```

**Plotly Interactive Chart**

- Advanced interactive visualization
- Enhanced hover information and controls
- Statistical overlays and metrics
- Perfect for: Data analysis, research, technical exploration
- Example access:

```python
visualization = next(item for item in dataset.iterate_items()
                    if item['type'] == 'visualization' and item['format'] == 'plotly')
with open('knowledge_graph_plotly.html', 'w') as f:
    f.write(visualization['data'])
```

**Visualization Metadata:**
Each visualization includes metadata with:

- `width` and `height`: Image dimensions in pixels
- `dpi`: Dots per inch (for PNG)
- `nodes`: Number of nodes in the graph
- `edges`: Number of edges in the graph

**Performance Tips:**

- PNG generation is fastest and most reliable
- HTML visualizations are best for interactive exploration
- SVG is best for print or high-resolution needs
- Plotly offers the most advanced features but may be slower for large graphs

***

#### 4. Learning Paths (JSON) - For Educators & Students

**Why this matters:** This is the **most valuable output** for learning. AI automatically curates the optimal sequence to learn concepts based on their relationships in the knowledge graph.

**Use cases:**

- Create curricula for courses
- Design training programs
- Build adaptive learning systems
- Personalize education paths
- Content recommendation engines

**Example:**

```json
{
  "type": "learning_paths",
  "format": "json",
  "data": [
    {
      "path_id": "path_beginner_1",
      "title": "Introduction to Artificial Intelligence",
      "description": "Start with the fundamentals and build up to complex topics",
      "nodes": [
        "artificial_intelligence",
        "machine_learning",
        "supervised_learning",
        "neural_networks"
      ],
      "difficulty": "beginner",
      "estimated_time": "1 hour 15 minutes",
      "step_count": 4,
      "prerequisites": [],
      "learning_objectives": [
        "Understand what AI is",
        "Learn the basics of ML",
        "Grasp supervised learning concepts",
        "Introduction to neural networks"
      ]
    },
    {
      "path_id": "path_intermediate_1",
      "title": "Deep Learning Specialization",
      "description": "Dive deeper into advanced neural network architectures",
      "nodes": [
        "neural_networks",
        "deep_learning",
        "cnns",
        "rnns"
      ],
      "difficulty": "intermediate",
      "estimated_time": "2 hours 30 minutes",
      "step_count": 4,
      "prerequisites": ["machine_learning"],
      "learning_objectives": [
        "Master deep learning concepts",
        "Understand CNNs for vision",
        "Learn RNNs for sequences",
        "Build real projects"
      ]
    }
  ]
}
```

**How to use:**

```python
## For Educators
learning_paths = next(item for item in dataset.iterate_items()
                     if item['type'] == 'learning_paths')

for path in learning_paths['data']:
    print(f"\n=== {path['title']} ===")
    print(f"Difficulty: {path['difficulty']}")
    print(f"Duration: {path['estimated_time']}")
    print("\nLearning sequence:")
    for i, node_id in enumerate(path['nodes'], 1):
        node_label = get_node_label(node_id)  # You'd look this up
        print(f"  {i}. {node_label}")
```

***

#### 4. Analysis (JSON) - For Researchers & Advanced Analysis

**Why this matters:** Provides statistical insights about the knowledge graph structure, helping you understand the domain complexity and learning landscape.

**Use cases:**

- Research on knowledge representation
- Curriculum design optimization
- Identifying knowledge gaps
- Graph quality assessment
- Comparative analysis across domains

**Example:**

```json
{
  "type": "analysis",
  "format": "json",
  "data": {
    "stats": {
      "num_nodes": 15,
      "num_edges": 23,
      "density": 0.21,
      "is_connected": true,
      "num_components": 1,
      "avg_degree": 3.07
    },
    "root_nodes": [
      {
        "id": "artificial_intelligence",
        "label": "Artificial Intelligence",
        "degree": 7
      }
    ],
    "leaf_nodes": [
      {
        "id": "deep_learning",
        "label": "Deep Learning",
        "degree": 4
      },
      {
        "id": "reinforcement_learning",
        "label": "Reinforcement Learning",
        "degree": 3
      }
    ],
    "most_connected": [
      {
        "id": "machine_learning",
        "label": "Machine Learning",
        "degree": 8
      },
      {
        "id": "neural_networks",
        "label": "Neural Networks",
        "degree": 6
      }
    ],
    "components": [
      {
        "id": 0,
        "nodes": ["ai", "ml", "dl", "nn", "supervised", ...],
        "size": 15
      }
    ]
  }
}
```

**Interpretation guide:**

- **Density**: How interconnected the domain is (0.0 = sparse, 1.0 = fully connected)
- **Root nodes**: Foundational concepts (no incoming edges) - these should be learned first
- **Leaf nodes**: End results (no outgoing edges) - these are advanced topics
- **Most connected**: Core concepts that link many ideas together
- **Components**: Separate knowledge clusters (ideally just 1 for cohesive learning)

***

#### 5. Metadata (JSON) - For Monitoring & Debugging

**Why this matters:** Provides a summary of the entire process, useful for monitoring, debugging, and understanding what happened during execution.

**Use cases:**

- Monitor actor health
- Debug failed runs
- Track processing statistics
- Generate performance reports
- Quality assurance

**Example:**

```json
{
  "type": "metadata",
  "format": "json",
  "data": {
    "status": "success",
    "message": "Knowledge graph generated successfully",
    "timestamp": 1731000000.0,
    "stats": {
      "pages_processed": 3,
      "nodes": 15,
      "edges": 23,
      "learning_paths": 3,
      "content_length": 45230,
      "execution_time_seconds": 18.5,
      "ai_provider": "openrouter",
      "model": "gpt-oss-20b"
    },
    "source_urls": [
      {"url": "https://example.com/article1"},
      {"url": "https://example.com/article2"},
      {"url": "https://example.com/article3"}
    ],
    "warnings": [],
    "errors": []
  }
}
```

***

### 🎯 How to Use the Outputs

#### For End Users (Non-Technical)

1. **Start with the Markdown report** - It's the most readable
2. **Review the learning paths** - Choose one that matches your level
3. **Follow the recommended sequence** - Use it as a study guide
4. **Check the analysis section** - Understand the difficulty level

#### For Developers

1. **Parse the JSON outputs** for integration into your systems
2. **Use the Graph JSON** with visualization libraries (D3.js, vis.js, etc.)
3. **Implement learning path recommendations** in your app
4. **Store in databases** for further querying and analysis

#### For Educators & Trainers

1. **Use Learning Paths JSON** to design curricula
2. **Extract concepts** to create lesson plans
3. **Use difficulty levels** to segment audiences
4. **Track prerequisites** for proper sequencing

#### For Researchers

1. **Analyze the Graph structure** for domain insights
2. **Compare graphs** across different sources
3. **Study the relationship types** for knowledge representation
4. **Research learning path algorithms**

***

### 💡 Practical Example: Building a Learning Platform

Here's how you might use all outputs together:

```python
## 1. Get all datasets
dataset_items = list(dataset.iterate_items())

## 2. Extract each component
graph_json = find_by_type(dataset_items, 'knowledge_graph', 'json')
learning_paths = find_by_type(dataset_items, 'learning_paths', 'json')
analysis = find_by_type(dataset_items, 'analysis', 'json')
markdown_doc = find_by_type(dataset_items, 'knowledge_graph', 'markdown')

## 3. Build a learning platform
for path in learning_paths['data']:
    # Create course from learning path
    course = {
        'title': path['title'],
        'difficulty': path['difficulty'],
        'estimated_hours': parse_time(path['estimated_time']),
        'modules': []
    }

    # Get detailed info for each node from the graph
    for node_id in path['nodes']:
        node = next(n for n in graph_json['data']['nodes'] if n['id'] == node_id)
        course['modules'].append({
            'title': node['label'],
            'description': node['description'],
            'type': node['type']
        })

    # Save course to your platform
    save_course(course)
```

***

### 🔗 Accessing Outputs

All outputs are available in the Apify dataset created when the run completes. You can:

1. **Download directly** from the Apify Console
2. **Access via API** using the Apify Python/Node.js client
3. **Export to various formats** (JSON, CSV, Excel)
4. **Integrate into workflows** using webhooks or API calls

**Example - Getting specific output:**

```python
from apify import ApifyClient

client = ApifyClient.init(token='YOUR_TOKEN')
dataset = client.dataset('DATASET_ID')

## Get only learning paths
for item in dataset.iterate_items():
    if item['type'] == 'learning_paths':
        learning_paths = item['data']
        break
```

### 🧪 Testing

#### Run Locally

```bash
## With test input file
python3 main.py

## Or specify a test file
python3 -c "
import json
with open('test-input.json', 'r') as f:
    data = json.load(f)
print(json.dumps(data, indent=2))
"
```

#### Example Test Input

```json
{
  "startUrls": [
    {"url": "https://en.wikipedia.org/wiki/Machine_learning"},
    {"url": "https://en.wikipedia.org/wiki/Artificial_intelligence"}
  ],
  "maxPages": 2,
  "outputLanguage": "en",
  "aiProvider": "openrouter",
  "openaiModel": "gpt-oss-20b",
  "useBrowser": false
}
```

### 🔧 Configuration

#### Environment Variables

- `OPENAI_API_KEY`: OpenAI API key (for OpenAI provider)
- `OPENROUTER_API_KEY`: OpenRouter API key (for OpenRouter provider)
- `APIFY_TOKEN`: Apify API token (for deployment)

#### Custom Models

For OpenAI provider:

- "gpt-4"
- "gpt-4-turbo"
- "gpt-3.5-turbo"

For OpenRouter provider:

- "gpt-oss-20b"
- "claude-3-sonnet"
- "llama-3-70b"

### 📈 How It Works

1. **Input Validation**: Validates all required fields and parameters
2. **API Key Configuration**: Sets up OpenAI/OpenRouter API keys
3. **Web Crawling**: Fetches content from specified URLs
   - Extracts text using BeautifulSoup
   - Removes scripts and styles
   - Aggregates content with metadata
4. **AI Graph Generation**: Sends content to AI API
   - Prompts the model to extract concepts and relationships
   - Parses JSON response
   - Cleans and validates graph data
5. **Graph Analysis**: Analyzes graph structure
   - Calculates centrality (degree centrality)
   - Identifies root and leaf nodes
   - Finds connected components
   - Computes density and statistics
6. **Learning Path Generation**: Creates optimal learning paths
   - Uses Dijkstra's algorithm for longest paths
   - Finds shortest paths between important nodes
   - Generates beginner and intermediate paths
   - Estimates learning time
7. **Documentation**: Creates comprehensive Markdown output
8. **Save Results**: Stores all data in Apify datasets

### 🏗️ Project Structure

```
cogni/
├── main.py                      # Main actor entry point
├── Dockerfile                   # Docker container configuration
├── requirements.txt             # Python dependencies
├── test-input-simple.json       # Sample test input
├── README.md                    # This file
│
├── services/                    # Core services
│   ├── crawler_service.py       # Web crawling
│   ├── ai_service.py           # AI integration
│   └── graph_service.py        # Graph analysis
│
├── utils/                       # Utilities
│   ├── markdown_generator.py   # Documentation generation
│   ├── validators.py           # Input/graph validation
│   └── translations.py         # Multilingual support
│
└── models/                      # Data models
    └── __init__.py             # Type definitions
```

### 🐛 Error Handling

The actor includes comprehensive error handling:

- **Input Validation**: Validates required fields and types
- **API Errors**: Handles rate limits, authentication, and API errors
- **Network Errors**: Handles connection timeouts and failures
- **JSON Parsing**: Recovers from malformed AI responses
- **Graph Validation**: Ensures graph data integrity

All errors are logged and saved to Apify datasets for debugging.

### 🤝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Test locally
5. Deploy to Apify
6. Submit a pull request

### 📝 License

This project is licensed under the MIT License.

### 🆘 Support

For issues and questions:

1. Check the Apify actor logs
2. Review error messages in the output datasets
3. Verify API keys are correctly configured
4. Test with simpler inputs first

### 🔄 Changelog

#### Version 1.0 (2025-11-06)

- Initial Python implementation
- Complete rewrite from TypeScript
- Improved error handling
- Enhanced validation
- Multilingual support
- Learning path generation
- Comprehensive documentation

***

**Built with ❤️ using Python, Apify, OpenAI, and OpenRouter**

## Actor input object example

```json
{}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("monumental_wardrobe/cognigraph-weaver").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("monumental_wardrobe/cognigraph-weaver").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call monumental_wardrobe/cognigraph-weaver --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=monumental_wardrobe/cognigraph-weaver",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "CogniGraph Weaver",
        "description": "A powerful Apify Actor that converts web content into interactive knowledge graphs using artificial intelligence. This Python-based web crawler and AI system extracts content from websites, analyzes it with LLMs, and generates comprehensive knowledge graphs with learning paths.",
        "version": "0.0",
        "x-build-id": "oQCCP4eFSHQU5CcNi"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/monumental_wardrobe~cognigraph-weaver/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-monumental_wardrobe-cognigraph-weaver",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/monumental_wardrobe~cognigraph-weaver/runs": {
            "post": {
                "operationId": "runs-sync-monumental_wardrobe-cognigraph-weaver",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/monumental_wardrobe~cognigraph-weaver/run-sync": {
            "post": {
                "operationId": "run-sync-monumental_wardrobe-cognigraph-weaver",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {}
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```