# Ai-ML-scraper (`labrat011/ai-ml-scraper`) Actor

Search AI/ML models, research papers, and trending papers from HuggingFace Hub and arXiv. No API key required.

- **URL**: https://apify.com/labrat011/ai-ml-scraper.md
- **Developed by:** [mick\_](https://apify.com/labrat011) (community)
- **Categories:** AI, Other
- **Stats:** 5 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $0.50 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## AI/ML Intelligence Scraper

Search AI/ML models, research papers, and trending papers from HuggingFace Hub and arXiv -- structured, filterable, and ready for analysis. No API key required. MCP-ready for AI agent integration.

### What does it do?

AI/ML Intelligence Scraper pulls structured data from HuggingFace Hub and arXiv, the two largest open sources for machine learning models and AI research. You provide search filters and it returns clean, structured data. Returns consistent JSON -- ready for analysis, ML pipelines, or consumption by AI agents via MCP.

**Use cases:**

- **ML engineering** -- find models by task, framework, and popularity for integration into your pipeline
- **Research tracking** -- monitor new papers in specific AI subfields (NLP, computer vision, robotics, etc.)
- **Competitive intelligence** -- track trending models and papers to understand where the industry is moving
- **Investment research** -- identify emerging AI capabilities and technology trends from publication patterns
- **Content creation** -- aggregate trending AI research for newsletters, reports, and media coverage
- **Academic research** -- search arXiv by author, category, date range, and keyword
- **AI agent tooling** -- expose as an MCP tool so AI agents can search ML models, find research papers, and track AI trends in real time

### Features

- **3 modes:** search models (HuggingFace), search papers (arXiv), trending papers (HuggingFace Daily Papers)
- **No API key required** -- all data sources are public
- **No proxies needed** -- direct API access to public academic and ML infrastructure
- **Model search filters:** keyword, pipeline task (25 tasks), ML framework (14 libraries), sort by downloads/likes/trending
- **Paper search filters:** keyword, arXiv category (13 AI/ML categories), author name, date range (YYYY-MM-DD), sort by relevance/date
- **Trending papers** with HuggingFace community upvotes, AI-generated summaries, and AI keywords
- **Automatic pagination** through results (up to 10,000 records)
- **Rate limiting** built in (0.5-second interval between requests)
- **Retry logic** with exponential backoff on failures
- **State persistence** -- survives Apify actor migrations mid-run

### What data does it extract?

#### Models (HuggingFace Hub)

| Field | Description |
|-------|-------------|
| `type` | Always `"model"` |
| `modelId` | Full model ID (e.g. `meta-llama/Llama-3.1-8B`) |
| `author` | Model author/organization |
| `modelName` | Model name without author prefix |
| `pipelineTag` | Task type (text-generation, image-classification, etc.) |
| `library` | ML framework (transformers, diffusers, pytorch, etc.) |
| `downloads` | Recent download count |
| `downloadsAllTime` | All-time download count |
| `likes` | Community likes |
| `trending` | Trending score |
| `tags` | All model tags |
| `lastModified` | Last update timestamp |
| `createdAt` | Creation timestamp |
| `private` | Whether the model is private |
| `gated` | Whether the model requires access approval |
| `url` | Direct link to HuggingFace model page |

#### Papers (arXiv)

| Field | Description |
|-------|-------------|
| `type` | Always `"paper"` |
| `source` | `"arxiv"` |
| `arxivId` | arXiv paper ID (e.g. `2401.12345`) |
| `title` | Paper title |
| `summary` | Paper abstract |
| `authors` | Comma-separated author names |
| `authorList` | Array of author names |
| `publishedDate` | Publication date (ISO format) |
| `updatedDate` | Last updated date (ISO format) |
| `primaryCategory` | Primary arXiv category (e.g. `cs.CL`) |
| `categories` | All categories (comma-separated) |
| `categoryList` | Array of categories |
| `comment` | Author comment (often has page count, conference info) |
| `pdfUrl` | Direct link to PDF |
| `url` | Link to arXiv abstract page |

#### Trending Papers (HuggingFace Daily Papers)

| Field | Description |
|-------|-------------|
| `type` | Always `"paper"` |
| `source` | `"huggingface_daily"` |
| `arxivId` | arXiv paper ID |
| `title` | Paper title |
| `summary` | Paper abstract |
| `authors` | Comma-separated author names |
| `authorList` | Array of author names |
| `publishedDate` | Publication date |
| `upvotes` | HuggingFace community upvotes |
| `numComments` | Number of community comments |
| `aiSummary` | AI-generated summary (when available) |
| `aiKeywords` | AI-generated keywords (when available) |
| `submittedBy` | HuggingFace user who submitted the paper |
| `mediaUrl` | Media/thumbnail URL |
| `pdfUrl` | Direct link to PDF |
| `url` | Link to HuggingFace paper page |

---

### Input

Choose a scraping mode and provide your search filters.

#### Mode 1: Search Models

Search HuggingFace Hub for ML models by keyword, task, and framework.

```json
{
    "mode": "search_models",
    "query": "large language model",
    "sort": "downloads",
    "maxResults": 100
}
````

Filter by pipeline task and framework:

```json
{
    "mode": "search_models",
    "query": "stable diffusion",
    "pipelineTag": "text-to-image",
    "libraryFilter": "diffusers",
    "sort": "likes",
    "maxResults": 50
}
```

#### Mode 2: Search Papers

Search arXiv for AI/ML research papers.

```json
{
    "mode": "search_papers",
    "query": "transformer attention mechanism",
    "arxivCategory": "cs.CL",
    "sort": "submittedDate",
    "maxResults": 100
}
```

Search by author and date range:

```json
{
    "mode": "search_papers",
    "author": "Yann LeCun",
    "dateFrom": "2025-01-01",
    "dateTo": "2026-01-01",
    "sort": "submittedDate",
    "maxResults": 50
}
```

#### Mode 3: Trending Papers

Get today's trending AI papers from HuggingFace with community engagement data.

```json
{
    "mode": "trending_papers",
    "maxResults": 50
}
```

Filter trending papers by keyword:

```json
{
    "mode": "trending_papers",
    "query": "language model",
    "maxResults": 20
}
```

#### Search Filters

**Model filters (Mode 1):**

| Parameter | Description |
|-----------|-------------|
| `query` | Search keyword (required for model search) |
| `pipelineTag` | Filter by task: text-generation, text-classification, image-classification, text-to-image, automatic-speech-recognition, and 20 more |
| `libraryFilter` | Filter by framework: transformers, diffusers, pytorch, tensorflow, jax, onnx, gguf, spacy, keras, sklearn, and more |
| `sort` | Sort by: `downloads`, `likes`, or `trending` |

**Paper filters (Mode 2):**

| Parameter | Description |
|-----------|-------------|
| `query` | Search keyword (searches titles and abstracts) |
| `arxivCategory` | arXiv category: cs.AI, cs.LG, cs.CL, cs.CV, cs.NE, cs.RO, cs.IR, cs.MA, stat.ML, cs.SD, eess.AS, cs.HC, cs.CR |
| `author` | Author name |
| `dateFrom` | Filter papers from this date (YYYY-MM-DD) |
| `dateTo` | Filter papers up to this date (YYYY-MM-DD) |
| `sort` | Sort by: `relevance`, `submittedDate`, or `lastUpdatedDate` |

**Trending paper filters (Mode 3):**

| Parameter | Description |
|-----------|-------------|
| `query` | Optional keyword to filter trending papers by title/abstract |

**General settings:**

| Parameter | Default | Description |
|-----------|---------|-------------|
| `maxResults` | `100` | Maximum results to return (max 10,000). Free users are limited to 25 per run. |

***

### Output

Results are saved to the default dataset. Download them in JSON, CSV, Excel, or XML format from the Output tab.

#### Example: Model output

```json
{
    "type": "model",
    "modelId": "meta-llama/Llama-3.1-8B-Instruct",
    "author": "meta-llama",
    "modelName": "Llama-3.1-8B-Instruct",
    "pipelineTag": "text-generation",
    "library": "transformers",
    "downloads": 12500000,
    "downloadsAllTime": 45000000,
    "likes": 8500,
    "trending": 42,
    "tags": ["transformers", "pytorch", "safetensors", "llama", "text-generation"],
    "lastModified": "2026-01-15T10:30:00.000Z",
    "createdAt": "2025-07-23T00:00:00.000Z",
    "private": false,
    "gated": true,
    "url": "https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct"
}
```

#### Example: arXiv paper output

```json
{
    "type": "paper",
    "source": "arxiv",
    "arxivId": "2401.12345",
    "title": "Attention Is All You Need: A Retrospective Analysis",
    "summary": "We revisit the transformer architecture and analyze its impact...",
    "authors": "Jane Smith, John Doe, Alice Johnson",
    "authorList": ["Jane Smith", "John Doe", "Alice Johnson"],
    "publishedDate": "2026-01-15T12:00:00Z",
    "updatedDate": "2026-01-20T08:00:00Z",
    "primaryCategory": "cs.CL",
    "categories": "cs.CL, cs.AI, cs.LG",
    "categoryList": ["cs.CL", "cs.AI", "cs.LG"],
    "comment": "15 pages, 8 figures. Accepted at ICML 2026",
    "pdfUrl": "https://arxiv.org/pdf/2401.12345",
    "url": "https://arxiv.org/abs/2401.12345"
}
```

#### Example: Trending paper output

```json
{
    "type": "paper",
    "source": "huggingface_daily",
    "arxivId": "2401.67890",
    "title": "Scaling Laws for Neural Machine Translation",
    "summary": "We present new scaling laws that predict performance of...",
    "authors": "Alice Researcher, Bob Scientist",
    "authorList": ["Alice Researcher", "Bob Scientist"],
    "publishedDate": "2026-02-14T00:00:00Z",
    "upvotes": 142,
    "numComments": 23,
    "aiSummary": "This paper establishes new scaling laws for NMT systems...",
    "aiKeywords": ["scaling laws", "machine translation", "large language models"],
    "submittedBy": "AkitoP",
    "mediaUrl": "https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2401.67890.png",
    "pdfUrl": "https://arxiv.org/pdf/2401.67890",
    "url": "https://huggingface.co/papers/2401.67890"
}
```

***

### Cost

This actor uses **pay-per-event (PPE) pricing**. You pay only for the results you get.

- **$0.50 per 1,000 results** ($0.0005 per result)
- **No proxy costs** -- public APIs, no proxies needed
- **No API key costs** -- all data sources are free
- Free tier: **25 results per run** (no subscription required)

Requests to HuggingFace and arXiv are fast. A typical run fetching 100 items completes in under a minute.

***

### Technical details

- HuggingFace Hub API (`huggingface.co/api/models`) for model search -- returns JSON, offset pagination, 100 per page
- arXiv API (`export.arxiv.org/api/query`) for paper search -- returns Atom XML, offset pagination, 200 per page
- HuggingFace Daily Papers API (`huggingface.co/api/daily_papers`) for trending papers -- returns JSON, offset pagination
- Client-side date filtering for arXiv papers (arXiv API does not support date range natively)
- Rate limited to 1 request per 0.5 seconds
- Automatic retry with exponential backoff on failures
- Results pushed in batches of 25 for efficiency
- Actor state persisted across migrations
- No proxies, no browser, no cookies -- direct API access

***

### Limitations

- arXiv date filtering is client-side: the API returns results ordered by relevance or date, and papers outside the specified date range are skipped. For large date ranges this is efficient, but for very narrow ranges you may need to increase `maxResults` to get enough matches.
- Maximum pagination depth is 10,000 results per run (arXiv hard limit).
- HuggingFace trending papers are a daily feed -- the total available on any given day is typically 20-50 papers.
- arXiv paper summaries (abstracts) can be long. They are included in full.
- HuggingFace AI summaries and keywords are not available for all daily papers.

***

### FAQ

#### Do I need an API key?

No. All three data sources (HuggingFace Hub, arXiv, HuggingFace Daily Papers) are fully public APIs with no authentication required.

#### What are arXiv categories?

arXiv organizes papers into categories. The most relevant for AI/ML research:

- **cs.AI** -- Artificial Intelligence (general)
- **cs.LG** -- Machine Learning
- **cs.CL** -- Computation and Language (NLP, LLMs)
- **cs.CV** -- Computer Vision
- **cs.NE** -- Neural and Evolutionary Computing
- **cs.RO** -- Robotics
- **stat.ML** -- Machine Learning (from a statistics perspective)

#### What are pipeline tags?

HuggingFace categorizes models by the task they perform. Common examples: `text-generation` (LLMs), `text-to-image` (Stable Diffusion), `text-classification` (sentiment analysis), `automatic-speech-recognition` (Whisper), `feature-extraction` (embeddings).

#### Can I combine filters?

Yes. For model search, you can combine keyword + pipeline task + framework. For paper search, you can combine keyword + category + author + date range. All filters are AND-combined.

#### How current is the trending papers data?

HuggingFace Daily Papers updates throughout the day. The trending feed reflects papers that the HuggingFace community is currently engaging with.

#### Can I use this with the Apify API?

Yes. Call the actor via the Apify API and retrieve results programmatically in JSON, CSV, or other formats. Works with the Apify Python and JavaScript clients.

***

### MCP Integration

This actor works as an MCP tool through Apify's hosted MCP server. No custom server needed.

- **Endpoint:** `https://mcp.apify.com?tools=labrat011/ai-ml-scraper`
- **Auth:** `Authorization: Bearer <APIFY_TOKEN>`
- **Transport:** Streamable HTTP
- **Works with:** Claude Desktop, Cursor, VS Code, Windsurf, Warp, Gemini CLI

**Example MCP config (Claude Desktop / Cursor):**

```json
{
    "mcpServers": {
        "ai-ml-scraper": {
            "url": "https://mcp.apify.com?tools=labrat011/ai-ml-scraper",
            "headers": {
                "Authorization": "Bearer <APIFY_TOKEN>"
            }
        }
    }
}
```

AI agents can use this actor to search HuggingFace models, find arXiv papers, track trending AI research, and monitor ML model releases -- all as a callable MCP tool.

***

### Feedback

Found a bug or have a feature request? Open an issue on the actor's Issues tab in Apify Console.

# Actor input Schema

## `mode` (type: `string`):

What type of AI/ML data to search for.

## `query` (type: `string`):

Search keyword for models or papers. For arXiv, searches titles and abstracts.

## `sort` (type: `string`):

How to sort results. Options depend on the mode.

## `pipelineTag` (type: `string`):

Filter models by pipeline task. Leave empty for all tasks.

## `libraryFilter` (type: `string`):

Filter models by ML framework. Leave empty for all frameworks.

## `arxivCategory` (type: `string`):

Filter papers by arXiv category. Leave empty to search all AI/ML categories.

## `author` (type: `string`):

Filter arXiv papers by author name.

## `dateFrom` (type: `string`):

Filter papers published from this date (YYYY-MM-DD).

## `dateTo` (type: `string`):

Filter papers published up to this date (YYYY-MM-DD).

## `maxResults` (type: `integer`):

Maximum number of results to return. Free users are limited to 25 results per run.

## Actor input object example

```json
{
  "mode": "search_models",
  "query": "large language model",
  "sort": "downloads",
  "pipelineTag": "",
  "libraryFilter": "",
  "arxivCategory": "",
  "maxResults": 10
}
```

# Actor output Schema

## `results` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "query": "large language model",
    "maxResults": 10
};

// Run the Actor and wait for it to finish
const run = await client.actor("labrat011/ai-ml-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "query": "large language model",
    "maxResults": 10,
}

# Run the Actor and wait for it to finish
run = client.actor("labrat011/ai-ml-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "query": "large language model",
  "maxResults": 10
}' |
apify call labrat011/ai-ml-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=labrat011/ai-ml-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Ai-ML-scraper",
        "description": "Search AI/ML models, research papers, and trending papers from HuggingFace Hub and arXiv. No API key required.",
        "version": "0.0",
        "x-build-id": "fJhNUsObZt2eNhuwA"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/labrat011~ai-ml-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-labrat011-ai-ml-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/labrat011~ai-ml-scraper/runs": {
            "post": {
                "operationId": "runs-sync-labrat011-ai-ml-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/labrat011~ai-ml-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-labrat011-ai-ml-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "mode"
                ],
                "properties": {
                    "mode": {
                        "title": "Scraping mode",
                        "enum": [
                            "search_models",
                            "search_papers",
                            "trending_papers"
                        ],
                        "type": "string",
                        "description": "What type of AI/ML data to search for.",
                        "default": "search_models"
                    },
                    "query": {
                        "title": "Search query",
                        "type": "string",
                        "description": "Search keyword for models or papers. For arXiv, searches titles and abstracts."
                    },
                    "sort": {
                        "title": "Sort by",
                        "enum": [
                            "downloads",
                            "likes",
                            "trending",
                            "relevance",
                            "submittedDate",
                            "lastUpdatedDate"
                        ],
                        "type": "string",
                        "description": "How to sort results. Options depend on the mode.",
                        "default": "downloads"
                    },
                    "pipelineTag": {
                        "title": "Pipeline task (Models only)",
                        "enum": [
                            "",
                            "text-generation",
                            "text-classification",
                            "token-classification",
                            "question-answering",
                            "summarization",
                            "translation",
                            "fill-mask",
                            "text2text-generation",
                            "conversational",
                            "text-to-image",
                            "image-to-text",
                            "image-classification",
                            "object-detection",
                            "image-segmentation",
                            "automatic-speech-recognition",
                            "text-to-speech",
                            "audio-classification",
                            "feature-extraction",
                            "sentence-similarity",
                            "zero-shot-classification",
                            "reinforcement-learning",
                            "depth-estimation",
                            "image-to-image",
                            "video-classification"
                        ],
                        "type": "string",
                        "description": "Filter models by pipeline task. Leave empty for all tasks.",
                        "default": ""
                    },
                    "libraryFilter": {
                        "title": "Framework / Library (Models only)",
                        "enum": [
                            "",
                            "transformers",
                            "diffusers",
                            "pytorch",
                            "tensorflow",
                            "jax",
                            "onnx",
                            "safetensors",
                            "gguf",
                            "spacy",
                            "keras",
                            "sklearn",
                            "sentence-transformers",
                            "peft",
                            "adapter-transformers"
                        ],
                        "type": "string",
                        "description": "Filter models by ML framework. Leave empty for all frameworks.",
                        "default": ""
                    },
                    "arxivCategory": {
                        "title": "arXiv category",
                        "enum": [
                            "",
                            "cs.AI",
                            "cs.LG",
                            "cs.CL",
                            "cs.CV",
                            "cs.NE",
                            "cs.RO",
                            "cs.IR",
                            "cs.MA",
                            "stat.ML",
                            "cs.SD",
                            "eess.AS",
                            "cs.HC",
                            "cs.CR"
                        ],
                        "type": "string",
                        "description": "Filter papers by arXiv category. Leave empty to search all AI/ML categories.",
                        "default": ""
                    },
                    "author": {
                        "title": "Author name (Papers only)",
                        "type": "string",
                        "description": "Filter arXiv papers by author name."
                    },
                    "dateFrom": {
                        "title": "Date from (Papers only)",
                        "type": "string",
                        "description": "Filter papers published from this date (YYYY-MM-DD)."
                    },
                    "dateTo": {
                        "title": "Date to (Papers only)",
                        "type": "string",
                        "description": "Filter papers published up to this date (YYYY-MM-DD)."
                    },
                    "maxResults": {
                        "title": "Max results",
                        "minimum": 1,
                        "maximum": 10000,
                        "type": "integer",
                        "description": "Maximum number of results to return. Free users are limited to 25 results per run.",
                        "default": 100
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
