# AI Content Processor (`valid_headlamp/ai-content-processor`) Actor

Unlock powerful text processing with this AI actor. Using GPT-4o-mini, it handles summarization, sentiment, NER, and translation. Offers dual modes: batch for bulk tasks and standby server for real-time API use. Scalable and fast, it streamlines your content automation workflows with high precision.

- **URL**: https://apify.com/valid\_headlamp/ai-content-processor.md
- **Developed by:** [Rod G.](https://apify.com/valid_headlamp) (community)
- **Categories:** AI, Agents, Automation
- **Stats:** 3 total users, 0 monthly users, 0.0% runs succeeded, 1 bookmarks
- **User rating**: No ratings yet

## Pricing

from $0.29 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## AI Content Processor Actor

This Apify Actor provides a comprehensive suite of AI-powered text processing and content generation tools. It is designed to be high-availability, scalable, and easy to integrate via REST API or as a standalone batch processor.

### How It Works

The **AI Content Processor** leverages OpenAI's **GPT-4o-mini** model via LangChain to perform complex NLP tasks and content generation efficiently. It operates in two distinct modes:

#### 1. Run-Once Mode (Batch Processing)
In this mode, the Actor reads input configuration, processes the text according to the specified tasks, pushes the results to the Apify Dataset, and then exits. This is ideal for:
- Processing a single document or a batch of text from the Apify Console.
- Scheduled jobs (e.g., summarizing daily news).
- Integration with other Actors in a workflow.

**Workflow:**
1. **Input**: Receives `input_text` and `tasks` from the input configuration.
2. **Process**: The `AIContentProcessor` (powered by LangChain) executes each task sequentially or in parallel.
3. **Output**: Results are validated against schemas and pushed to the default Apify Dataset.

#### 2. Standby Mode (Web Server)
In this mode, the Actor starts a high-performance **FastAPI** server and listens for HTTP requests. This keeps the container warm, allowing for instant responses without cold-start delays. This is ideal for:
- Real-time applications (e.g., a chatbot backend).
- High-volume processing where you want to avoid spinning up a new container for every request.
- Integrating via REST API with external systems.

**Workflow:**
1. **Start**: The Actor starts a web server on the port defined by `ACTOR_WEB_SERVER_PORT`.
2. **Request**: Clients send `POST /process` requests with a JSON body containing a batch of texts and tasks.
3. **Response**: The server processes the requests asynchronously and returns the results immediately.

---

### Features

#### Core NLP Processing
- **Summarization**: Extractive and abstractive summarization with style controls.
- **Translation**: Multi-language translation with quality assessment.
- **Classification**: Multi-label content categorization and tagging.
- **Named Entity Recognition (NER)**: Extracts entities and relationships.
- **Sentiment Analysis**: Analyzes sentiment score and emotion.
- **Note Generation**: Converts unstructured text into structured notes.
- **Formatting**: Standardizes text format and normalization.

#### Content Generation
- **Email Drafting**: Generates professional email drafts.
- **Marketing Copy**: Creates variations of marketing copy.
- **Report Summaries**: Generates executive-level report summaries.
- **Content Normalization**: Unifies tone and style.

---

### Usage

#### Input Schema

The Actor accepts the following input:

```json
{
    "openai_api_key": "YOUR_OPENAI_API_KEY",
    "input_text": "Text to process...",
    "tasks": ["summarization", "sentiment"],
    "standby": false
}
````

- `openai_api_key`: Your OpenAI API Key (required).
- `input_text`: The raw text to process (for Run-Once mode).
- `tasks`: A list of tasks to perform.
- `standby`: If `true`, runs as a long-running web server (API mode).

#### Output (Run-Once Mode)

The results are stored in the default Apify Dataset.

```json
[
    {
        "task": "summarization",
        "result": "Summary text...",
        "processing_time_ms": 1200,
        "status": "success"
    },
    {
        "task": "sentiment",
        "result": {
            "sentiment": "positive",
            "score": 0.8,
            "emotion": "joy"
        },
        "processing_time_ms": 500,
        "status": "success"
    }
]
```

***

### API Integration (Standby Mode)

When running in `standby` mode, the Actor exposes a REST API.

#### Endpoints

- `POST /process`: Process a batch of content.
- `GET /health`: Health check.

##### Request Body (`/process`)

```json
{
  "requests": [
    {
      "text": "Content to process",
      "tasks": ["ner", "classification"],
      "options": {
        "target_language": "Spanish"
      }
    }
  ]
}
```

***

### Development

#### Local Setup

1. Clone the repository.
2. Install dependencies:
   ```bash
   pip install -r requirements.txt
   ```
3. Configure your environment:
   Create a `.env` file in the root directory:
   ```env
   OPENAI_API_KEY=your_api_key_here
   ```

#### Running Locally

**Run Once Mode:**

```bash
## Mock Apify input (or rely on defaults/env vars)
export APIFY_DEFAULT_KEY_VALUE_STORE_ID="local"
python -m src.main
```

**Server Mode:**

```bash
uvicorn src.server:app --reload
```

### Deployment

This Actor is containerized and ready for deployment on the Apify Platform.

1. Push the code to Apify.
2. Build the Actor.
3. Run via API or Scheduler.

# Actor input Schema

## `openai_api_key` (type: `string`):

Your OpenAI API Key.

## `input_text` (type: `string`):

Raw text input to process.

## `tasks` (type: `array`):

List of tasks to perform. Available options: summarization, translation, classification, ner, sentiment, note\_generation, formatting, email\_draft, marketing\_copy, report\_summary, normalize\_content.

## `standby` (type: `boolean`):

Run as a web server (Standby Mode).

## Actor input object example

```json
{
  "input_text": "Apify is a platform for web scraping and data extraction. It enables you to automate anything you can do manually in a web browser. With Apify Actors, you can extract data from any website, process it, and store it in a format of your choice. Actors are serverless cloud programs that can do anything from scraping a single page to crawling an entire website. The headquarters is located in Prague, Czech Republic. I love using Apify because it saves me so much time and the community is great!",
  "tasks": [
    "summarization",
    "sentiment",
    "ner",
    "classification"
  ],
  "standby": false
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "input_text": "Apify is a platform for web scraping and data extraction. It enables you to automate anything you can do manually in a web browser. With Apify Actors, you can extract data from any website, process it, and store it in a format of your choice. Actors are serverless cloud programs that can do anything from scraping a single page to crawling an entire website. The headquarters is located in Prague, Czech Republic. I love using Apify because it saves me so much time and the community is great!",
    "tasks": [
        "summarization",
        "sentiment",
        "ner",
        "classification"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("valid_headlamp/ai-content-processor").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "input_text": "Apify is a platform for web scraping and data extraction. It enables you to automate anything you can do manually in a web browser. With Apify Actors, you can extract data from any website, process it, and store it in a format of your choice. Actors are serverless cloud programs that can do anything from scraping a single page to crawling an entire website. The headquarters is located in Prague, Czech Republic. I love using Apify because it saves me so much time and the community is great!",
    "tasks": [
        "summarization",
        "sentiment",
        "ner",
        "classification",
    ],
}

# Run the Actor and wait for it to finish
run = client.actor("valid_headlamp/ai-content-processor").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "input_text": "Apify is a platform for web scraping and data extraction. It enables you to automate anything you can do manually in a web browser. With Apify Actors, you can extract data from any website, process it, and store it in a format of your choice. Actors are serverless cloud programs that can do anything from scraping a single page to crawling an entire website. The headquarters is located in Prague, Czech Republic. I love using Apify because it saves me so much time and the community is great!",
  "tasks": [
    "summarization",
    "sentiment",
    "ner",
    "classification"
  ]
}' |
apify call valid_headlamp/ai-content-processor --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=valid_headlamp/ai-content-processor",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "AI Content Processor",
        "description": "Unlock powerful text processing with this AI actor. Using GPT-4o-mini, it handles summarization, sentiment, NER, and translation. Offers dual modes: batch for bulk tasks and standby server for real-time API use. Scalable and fast, it streamlines your content automation workflows with high precision.",
        "version": "0.1",
        "x-build-id": "0g0sdN8l2cUmV5j9S"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/valid_headlamp~ai-content-processor/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-valid_headlamp-ai-content-processor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/valid_headlamp~ai-content-processor/runs": {
            "post": {
                "operationId": "runs-sync-valid_headlamp-ai-content-processor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/valid_headlamp~ai-content-processor/run-sync": {
            "post": {
                "operationId": "run-sync-valid_headlamp-ai-content-processor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "openai_api_key"
                ],
                "properties": {
                    "openai_api_key": {
                        "title": "OpenAI API Key",
                        "type": "string",
                        "description": "Your OpenAI API Key."
                    },
                    "input_text": {
                        "title": "Input Text",
                        "type": "string",
                        "description": "Raw text input to process."
                    },
                    "tasks": {
                        "title": "Tasks",
                        "type": "array",
                        "description": "List of tasks to perform. Available options: summarization, translation, classification, ner, sentiment, note_generation, formatting, email_draft, marketing_copy, report_summary, normalize_content.",
                        "default": [
                            "summarization"
                        ]
                    },
                    "standby": {
                        "title": "Standby Mode",
                        "type": "boolean",
                        "description": "Run as a web server (Standby Mode).",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
