# WebSight API — AI-Native Web Intelligence (`george.the.developer/websight-api`) Actor

One API call for 7 types of web intelligence from any URL: clean markdown, tech stack, SEO audit, contacts, structured data, AI score, domain intel. Token-optimized for AI agents. Cached.

- **URL**: https://apify.com/george.the.developer/websight-api.md
- **Developed by:** [George Kioko](https://apify.com/george.the.developer) (community)
- **Categories:** Developer tools, Automation
- **Stats:** 6 total users, 0 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

$5.00 / 1,000 page analyzeds

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

<h1 align="center">
  WebSight API
  <br/>
  <sub>AI-Native Web Intelligence</sub>
</h1>

<p align="center">
  <strong>One API call. Seven intelligence modules. Any URL.</strong>
  <br/>
  Extract clean content, tech stack, SEO scores, contacts, structured data, AI detection, and domain intel — all optimized for AI agents and LLM applications.
</p>

<p align="center">
  <a href="https://apify.com/george.the.developer/websight-api"><img src="https://img.shields.io/badge/Apify-Actor-00B3E6?style=for-the-badge&logo=apify&logoColor=white" alt="Apify Actor" /></a>
  <img src="https://img.shields.io/badge/Price-$0.005%2Fpage-10B981?style=for-the-badge" alt="Price" />
  <img src="https://img.shields.io/badge/Modules-7-8B5CF6?style=for-the-badge" alt="Modules" />
  <img src="https://img.shields.io/badge/Response-<2s-F59E0B?style=for-the-badge" alt="Speed" />
  <img src="https://img.shields.io/badge/Cache-1hr_Free-3B82F6?style=for-the-badge" alt="Cache" />
  <img src="https://img.shields.io/badge/Bulk-20_URLs-EF4444?style=for-the-badge" alt="Bulk" />
</p>

---

### The Problem

Building an AI agent that needs web intelligence? Today you need:

| Task | Separate API | Cost |
|------|-------------|------|
| Content extraction | Diffbot / Jina | $0.01+ |
| Tech stack detection | BuiltWith / Wappalyzer | $0.02+ |
| SEO analysis | Moz / Ahrefs API | $0.05+ |
| Contact extraction | Hunter.io | $0.01+ |
| Structured data | Custom parser | Dev time |
| AI content detection | GPTZero / Originality | $0.01+ |
| Domain intel | WHOIS API | $0.01+ |
| **Total per page** | **7 APIs, 7 integrations** | **$0.12+** |

**WebSight does ALL of this in one call for $0.005.**

---

### How It Works

```mermaid
flowchart LR
    A["URL Input"] --> B["Fetch & Parse"]
    B --> C1["Clean Content"]
    B --> C2["Tech Stack"]
    B --> C3["SEO Audit"]
    B --> C4["Contacts"]
    B --> C5["Structured Data"]
    B --> C6["AI Score"]
    B --> C7["Domain Intel"]
    C1 --> D["Combined JSON"]
    C2 --> D
    C3 --> D
    C4 --> D
    C5 --> D
    C6 --> D
    C7 --> D
    D --> E["1hr Cache"]
    E --> F["Response"]

    style A fill:#3B82F6,color:#fff,stroke:none
    style B fill:#8B5CF6,color:#fff,stroke:none
    style C1 fill:#10B981,color:#fff,stroke:none
    style C2 fill:#10B981,color:#fff,stroke:none
    style C3 fill:#10B981,color:#fff,stroke:none
    style C4 fill:#10B981,color:#fff,stroke:none
    style C5 fill:#10B981,color:#fff,stroke:none
    style C6 fill:#10B981,color:#fff,stroke:none
    style C7 fill:#10B981,color:#fff,stroke:none
    style D fill:#F59E0B,color:#fff,stroke:none
    style E fill:#EF4444,color:#fff,stroke:none
    style F fill:#3B82F6,color:#fff,stroke:none
````

All 7 modules execute **in parallel**. Average response time under 2 seconds.

***

### Architecture

```mermaid
graph TB
    subgraph "Client Layer"
        AI["AI Agents<br/>Claude, GPT, Gemini"]
        MCP["MCP Servers<br/>Cursor, Windsurf"]
        APP["Applications<br/>JS, Python, cURL"]
    end

    subgraph "WebSight API"
        LB["HTTP Server<br/>Standby Mode"]
        CACHE["In-Memory Cache<br/>1hr TTL, LRU"]
        SSRF["SSRF Protection<br/>Private IP Block"]

        subgraph "Parallel Module Engine"
            M1["Content Extractor<br/>Markdown / Text / HTML"]
            M2["Tech Detector<br/>40+ Technologies"]
            M3["SEO Analyzer<br/>Score 0-100"]
            M4["Contact Finder<br/>Emails, Phones, Social"]
            M5["Structured Parser<br/>OG, Twitter, JSON-LD"]
            M6["AI Scorer<br/>5-Factor Heuristic"]
            M7["Domain Intel<br/>DNS, SPF, DMARC, WHOIS"]
        end
    end

    subgraph "Infrastructure"
        APIFY["Apify Platform"]
        PPE["Pay-Per-Event<br/>$0.005/page"]
    end

    AI --> LB
    MCP --> LB
    APP --> LB
    LB --> CACHE
    LB --> SSRF
    SSRF --> M1 & M2 & M3 & M4 & M5 & M6 & M7
    LB --> APIFY
    APIFY --> PPE

    style AI fill:#8B5CF6,color:#fff,stroke:none
    style MCP fill:#8B5CF6,color:#fff,stroke:none
    style APP fill:#8B5CF6,color:#fff,stroke:none
    style LB fill:#3B82F6,color:#fff,stroke:none
    style CACHE fill:#F59E0B,color:#fff,stroke:none
    style SSRF fill:#EF4444,color:#fff,stroke:none
    style M1 fill:#10B981,color:#fff,stroke:none
    style M2 fill:#10B981,color:#fff,stroke:none
    style M3 fill:#10B981,color:#fff,stroke:none
    style M4 fill:#10B981,color:#fff,stroke:none
    style M5 fill:#10B981,color:#fff,stroke:none
    style M6 fill:#10B981,color:#fff,stroke:none
    style M7 fill:#10B981,color:#fff,stroke:none
    style APIFY fill:#00B3E6,color:#fff,stroke:none
    style PPE fill:#00B3E6,color:#fff,stroke:none
```

***

### Why WebSight

#### vs. Using 7 Separate APIs

| Feature | 7 Separate APIs | WebSight API |
|---------|:-:|:-:|
| API integrations to maintain | 7 | **1** |
| Authentication tokens | 7 | **1** |
| Cost per page | ~$0.12 | **$0.005** |
| Response calls needed | 7 sequential | **1 parallel** |
| Latency | 5-10s total | **<2s** |
| Caching | Build your own | **Built-in (free)** |
| Bulk support | Manual orchestration | **Native (20 URLs)** |

#### vs. Raw HTML for AI Agents

| Metric | Raw HTML | WebSight Markdown |
|--------|:-:|:-:|
| Tokens (typical page) | ~15,000 | **~2,000** |
| Token reduction | 0% | **~87%** |
| Noise (ads, nav, scripts) | Included | **Removed** |
| Format for LLMs | Unparseable | **Clean Markdown** |
| Structured metadata | Must parse | **Pre-extracted JSON** |
| Cost at GPT-4 rates ($10/1M tokens) | $0.15/page | **$0.02/page** |

#### Token Savings Example

```
Raw HTML of stripe.com homepage:
  <html><head><meta charset="utf-8"><meta name="viewport"...
  <script>window.__NEXT_DATA__={...15KB of JSON...}</script>
  <nav class="TopNav">...<div class="cookie-banner">...
  ========================
  ~18,400 tokens ($0.184 at GPT-4)

WebSight markdown output:
  # Stripe: Financial Infrastructure for the Internet
  Millions of companies use Stripe to accept payments...
  ## Products
  - Payments: Accept online and in-person payments
  - Billing: Build and scale subscriptions...
  ========================
  ~1,800 tokens ($0.018 at GPT-4)

  Token savings: 90% | Cost savings: $0.166 per page
```

***

### Quick Start

#### cURL

```bash
## Full analysis — all 7 modules
curl "https://george-the-developer--websight-api.apify.actor/analyze?url=https://stripe.com"

## Selective modules — only what you need
curl "https://george-the-developer--websight-api.apify.actor/analyze?url=https://stripe.com&modules=tech,seo,contacts"

## Bulk analysis — up to 20 URLs
curl -X POST "https://george-the-developer--websight-api.apify.actor/analyze/bulk" \
  -H "Content-Type: application/json" \
  -d '{"urls": ["https://stripe.com", "https://shopify.com", "https://vercel.com"]}'
```

#### JavaScript / Node.js

```javascript
// Using the Apify Standby endpoint
const BASE = 'https://george-the-developer--websight-api.apify.actor';

// Single URL — full analysis
const response = await fetch(`${BASE}/analyze?url=https://stripe.com`);
const data = await response.json();

console.log(data.modules.content.text);     // Clean markdown
console.log(data.modules.tech.detected);    // [{name: "React", category: "framework"}, ...]
console.log(data.modules.seo.score);        // 78
console.log(data.modules.contacts.emails);  // ["sales@stripe.com"]
console.log(data.modules.aiContent.verdict); // "likely_human"

// Selective modules — save processing time
const seoOnly = await fetch(`${BASE}/analyze?url=https://stripe.com&modules=seo,tech`);

// Bulk analysis
const bulk = await fetch(`${BASE}/analyze/bulk`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    urls: ['https://stripe.com', 'https://shopify.com'],
    modules: ['tech', 'seo', 'contacts']
  })
});
const results = await bulk.json();
console.log(`Analyzed ${results.total} pages, charged for ${results.charged}`);
```

#### Python

```python
import requests

BASE = "https://george-the-developer--websight-api.apify.actor"

## Single URL
resp = requests.get(f"{BASE}/analyze", params={"url": "https://stripe.com"})
data = resp.json()

## Access modules
print(data["modules"]["content"]["text"][:200])  # First 200 chars of markdown
print(data["modules"]["tech"]["detected"])        # Tech stack
print(data["modules"]["seo"]["score"])            # SEO score 0-100
print(data["modules"]["domain"]["dns"]["a"])      # IP addresses

## Bulk analysis
bulk = requests.post(f"{BASE}/analyze/bulk", json={
    "urls": ["https://stripe.com", "https://shopify.com", "https://vercel.com"],
    "modules": ["tech", "contacts"]
})
for result in bulk.json()["results"]:
    print(f"{result['url']}: {len(result['modules']['tech']['detected'])} technologies found")
```

#### AI Agent Integration (LangChain / Claude Tools)

```javascript
// Define as a tool for your AI agent
const websightTool = {
  name: 'analyze_website',
  description: 'Extract comprehensive intelligence from any URL: content, tech stack, SEO, contacts, AI detection, domain info',
  parameters: {
    url: { type: 'string', description: 'URL to analyze' },
    modules: { type: 'string', description: 'Comma-separated: content,tech,seo,contacts,structured,ai_score,domain' }
  },
  execute: async ({ url, modules }) => {
    const params = new URLSearchParams({ url });
    if (modules) params.set('modules', modules);
    const res = await fetch(`https://george-the-developer--websight-api.apify.actor/analyze?${params}`);
    return res.json();
  }
};

// Your agent can now call:
// "Analyze stripe.com and tell me what tech stack they use"
// "Check if this blog post was written by AI"
// "Find contact information for this company"
```

#### MCP Server (Coming Soon)

WebSight API is designed to work as an MCP tool for AI coding assistants like Cursor, Windsurf, and Claude Desktop. Native MCP server configuration will be published soon.

```json
{
  "mcpServers": {
    "websight": {
      "command": "npx",
      "args": ["@anthropic/websight-mcp"],
      "env": {
        "WEBSIGHT_API_URL": "https://george-the-developer--websight-api.apify.actor"
      }
    }
  }
}
```

***

### API Reference

#### Endpoints

| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/analyze?url=<url>` | Full analysis (all modules) |
| `GET` | `/analyze?url=<url>&modules=tech,seo` | Selective module analysis |
| `POST` | `/analyze` | JSON body: `{ url, modules?, contentFormat?, maxContentLength? }` |
| `POST` | `/analyze/bulk` | JSON body: `{ urls: [...], modules? }` (max 20 URLs) |
| `GET` | `/` | Health check + API documentation |

#### Query Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `url` | string | *required* | URL to analyze |
| `modules` | string | all | Comma-separated module list |
| `format` | string | `markdown` | Content format: `markdown`, `text`, `html` |
| `maxLength` | integer | `10000` | Max content characters (500-100,000) |

#### Available Modules

| Module | Key | What It Returns |
|--------|-----|-----------------|
| **Clean Content** | `content` | Token-optimized markdown/text/HTML. Strips ads, nav, footers, cookie banners, scripts. 80-90% token reduction. |
| **Tech Stack** | `tech` | Array of detected technologies with categories. 40+ techs: CMS (WordPress, Shopify, Wix), Frameworks (React, Next.js, Vue, Angular, Svelte), Analytics (GA, GTM, Hotjar, Segment), Infrastructure (Cloudflare, Vercel, AWS), Payments (Stripe, PayPal), Security (reCAPTCHA, hCaptcha). |
| **SEO Audit** | `seo` | Score 0-100, title analysis (length + optimal range), meta description audit, heading structure (H1/H2), canonical URL, robots meta, viewport, favicon, image alt coverage, internal/external link counts, word count. |
| **Contacts** | `contacts` | Emails (filtered, deduplicated), phone numbers, social profiles: LinkedIn, Twitter/X, Facebook, Instagram, GitHub, YouTube. |
| **Structured Data** | `structured` | OpenGraph tags (title, image, description, type), Twitter Card data, JSON-LD schemas (parsed and normalized). |
| **AI Content Score** | `ai_score` | Score 0-100 with verdict (`ai_generated`, `likely_ai`, `mixed`, `likely_human`, `human_written`). 5-factor analysis: sentence uniformity, AI phrase density, vocabulary diversity (TTR), readability (Flesch-Kincaid), paragraph uniformity. |
| **Domain Intel** | `domain` | DNS records (A, AAAA, MX, NS), SPF validation, DMARC check, WHOIS data (creation date, registrar, domain age in days). |

***

### Input Schema

When running as a standard Apify Actor (non-API mode):

```json
{
  "url": "https://stripe.com",
  "urls": ["https://stripe.com", "https://shopify.com"],
  "modules": ["content", "tech", "seo", "contacts", "structured", "ai_score", "domain"],
  "contentFormat": "markdown",
  "maxContentLength": 10000
}
```

***

### Output Example

Full response from `GET /analyze?url=https://stripe.com`:

```json
{
  "url": "https://stripe.com",
  "analyzedAt": "2026-03-22T14:30:00.000Z",
  "fetchInfo": {
    "statusCode": 200,
    "responseTimeMs": 847,
    "contentLength": 198432
  },
  "modules": {
    "content": {
      "format": "markdown",
      "text": "# Stripe: Financial Infrastructure for the Internet\n\nMillions of companies of all sizes use Stripe online and in person to accept payments, send payouts, automate financial processes, and ultimately grow revenue.\n\n## Products\n\n- Payments: Accept online and in-person payments\n- Billing: Build and scale your recurring business model\n- Connect: Set up multi-party payments for platforms\n- Terminal: Build in-person payment experiences\n...",
      "charCount": 4823
    },
    "tech": {
      "detected": [
        { "name": "React", "category": "framework" },
        { "name": "Next.js", "category": "framework" },
        { "name": "Cloudflare", "category": "infra" },
        { "name": "Google Analytics", "category": "analytics" },
        { "name": "Google Tag Manager", "category": "analytics" },
        { "name": "Segment", "category": "analytics" },
        { "name": "Stripe", "category": "payment" }
      ]
    },
    "seo": {
      "score": 82,
      "title": {
        "text": "Stripe | Financial Infrastructure for the Internet",
        "length": 51,
        "optimal": true
      },
      "metaDescription": {
        "text": "Stripe powers online and in-person payment processing and financial solutions for businesses of all sizes.",
        "length": 106,
        "optimal": true
      },
      "headings": {
        "h1": ["Financial Infrastructure for the Internet"],
        "h2": ["Products", "Use cases", "Integrations & custom solutions", "Get started"]
      },
      "canonical": "https://stripe.com",
      "robots": "",
      "viewport": true,
      "favicon": true,
      "images": { "total": 34, "withAlt": 30, "withoutAlt": 4 },
      "links": { "internal": 87, "external": 12 },
      "wordCount": 2341
    },
    "contacts": {
      "emails": ["sales@stripe.com", "support@stripe.com"],
      "phones": [],
      "socialProfiles": {
        "twitter": "https://twitter.com/stripe",
        "linkedin": "https://www.linkedin.com/company/stripe",
        "github": "https://github.com/stripe",
        "youtube": "https://www.youtube.com/stripe"
      }
    },
    "structured": {
      "openGraph": {
        "title": "Stripe | Financial Infrastructure for the Internet",
        "description": "Stripe powers online and in-person payment processing...",
        "image": "https://images.ctfassets.net/fzn2n1nzq965/og-image.png",
        "type": "website",
        "url": "https://stripe.com"
      },
      "twitterCard": {
        "card": "summary_large_image",
        "site": "@stripe",
        "title": "Stripe | Financial Infrastructure for the Internet"
      },
      "jsonLd": [
        {
          "@context": "https://schema.org",
          "@type": "Organization",
          "name": "Stripe",
          "url": "https://stripe.com"
        }
      ]
    },
    "aiContent": {
      "score": 28,
      "verdict": "likely_human",
      "factors": [
        { "name": "sentence_uniformity", "score": 30, "weight": 15 },
        { "name": "ai_phrase_density", "score": 10, "weight": 20 },
        { "name": "vocabulary_diversity", "score": 50, "weight": 10 },
        { "name": "readability", "score": 50, "weight": 8 },
        { "name": "paragraph_uniformity", "score": 20, "weight": 8 }
      ]
    },
    "domain": {
      "hostname": "stripe.com",
      "dns": {
        "a": ["3.18.12.63", "3.18.1.20"],
        "aaaa": [],
        "mx": [
          { "priority": 1, "exchange": "aspmx.l.google.com" },
          { "priority": 5, "exchange": "alt1.aspmx.l.google.com" }
        ],
        "ns": ["ns-423.awsdns-52.com", "ns-1408.awsdns-48.org"],
        "hasSPF": true,
        "hasDMARC": true
      },
      "age": {
        "created": "2009-09-16T04:00:00Z",
        "registrar": "SafeNames Ltd.",
        "daysOld": 6032
      }
    }
  },
  "processingTimeMs": 1843
}
```

***

### Use Cases

```mermaid
mindmap
  root((WebSight API))
    AI Agents
      Research assistants
      Content summarizers
      Competitive analysis bots
      Lead qualification agents
    Lead Generation
      Contact extraction at scale
      Tech stack targeting
      Company enrichment
    SEO Tools
      Site auditing
      Competitor benchmarking
      Content quality scoring
    Competitive Intelligence
      Tech stack monitoring
      Domain age verification
      Social presence mapping
    Content Analysis
      AI content detection
      Readability scoring
      Structured data validation
    Security & Trust
      Domain reputation checks
      SPF/DMARC verification
      SSL & DNS auditing
```

***

### Pricing

**$0.005 per page analyzed** (half a cent).

Cached responses within 1 hour are **free** and return in <500ms.

| Volume | Cost | Effective Price |
|--------|------|-----------------|
| 100 pages | $0.50 | $0.005/page |
| 1,000 pages | $5.00 | $0.005/page |
| 10,000 pages | $50.00 | $0.005/page |
| 100,000 pages | $500.00 | $0.005/page |

**Cost comparison for analyzing 1,000 pages:**

| Solution | Monthly Cost |
|----------|:-:|
| BuiltWith + Hunter + Moz + GPTZero + ... | **$120+** |
| WebSight API (all 7 modules) | **$5.00** |
| Savings | **96%** |

> Free tier available on Apify. Start with $5 of free platform credits.

***

### Features at a Glance

| Feature | Details |
|---------|---------|
| **Modules** | 7 intelligence modules, all in one call |
| **Content formats** | Markdown (default), plain text, HTML |
| **Tech detection** | 40+ technologies across 7 categories |
| **SEO scoring** | 0-100 weighted score, 12 ranking factors |
| **AI detection** | 5-factor heuristic analysis, 5 confidence levels |
| **DNS analysis** | A, AAAA, MX, NS, SPF, DMARC records |
| **WHOIS** | Domain age, registrar, creation date |
| **Caching** | 1-hour TTL, LRU eviction, cached = free |
| **Bulk** | Up to 20 URLs per request |
| **SSRF protection** | Private IP blocking built-in |
| **CORS** | Enabled for all origins |
| **Timeout** | 15s per URL fetch, 300s actor timeout |
| **Memory** | 2GB default allocation |

***

### FAQ

<details>
<summary><strong>What counts as a "page analyzed"?</strong></summary>

Each unique URL analyzed counts as one page. If you request the same URL within 1 hour, the cached result is returned for free. Bulk requests charge per URL analyzed (not per API call).

</details>

<details>
<summary><strong>Can I choose which modules to run?</strong></summary>

Yes. Pass `modules=tech,seo,contacts` as a query parameter (GET) or in the JSON body (POST). Only the selected modules execute, but the price is the same per page regardless of module count.

</details>

<details>
<summary><strong>How accurate is the AI content detection?</strong></summary>

The AI content scorer uses 5 heuristic factors: sentence uniformity, AI phrase density, vocabulary diversity (type-token ratio), readability (Flesch-Kincaid), and paragraph uniformity. It returns a score 0-100 and a verdict from `human_written` to `ai_generated`. It is a heuristic model, not a trained classifier — best used as a signal alongside other analysis, not as sole evidence.

</details>

<details>
<summary><strong>Does it handle JavaScript-rendered pages?</strong></summary>

The current version uses HTTP fetch with Cheerio parsing, which works for server-rendered and static pages. For heavy SPA/JavaScript-rendered pages, content extraction may be limited. Puppeteer-based rendering is available as a dependency and may be enabled in a future update.

</details>

<details>
<summary><strong>What about rate limits?</strong></summary>

There are no artificial rate limits. Throughput is bounded by Apify platform concurrency and the 2GB memory allocation. For high-volume batch jobs, use the bulk endpoint or run the actor directly with the `urls` input field.

</details>

<details>
<summary><strong>Is WHOIS data always available?</strong></summary>

WHOIS lookup is best-effort. Some TLDs or registrars block automated WHOIS queries. When unavailable, the `domain.age` field returns `null` but all DNS records (A, MX, NS, SPF, DMARC) are still resolved.

</details>

<details>
<summary><strong>Can I use this with my own Apify proxy?</strong></summary>

Yes. When running the actor directly (not via Standby API), you can configure Apify proxy in the input schema under `proxyConfiguration`. This helps with sites that block datacenter IPs.

</details>

<details>
<summary><strong>What happens if the target URL is down?</strong></summary>

The API returns a 502 status with `{ "url": "...", "error": "Failed to fetch: ...", "statusCode": 502 }`. You are not charged for failed fetches.

</details>

***

### Links

- **Apify Store**: [george.the.developer/websight-api](https://apify.com/george.the.developer/websight-api)
- **API Endpoint**: `https://george-the-developer--websight-api.apify.actor`
- **Author**: [george.the.developer](https://apify.com/george.the.developer) on Apify

***

<p align="center">
  Built by <a href="https://apify.com/george.the.developer">george.the.developer</a>
  <br/>
  <sub>WebSight API — See the web the way AI sees it.</sub>
</p>

# Actor input Schema

## `url` (type: `string`):

The URL to extract intelligence from.

## `urls` (type: `array`):

Multiple URLs to analyze (max 20). Use this OR the single url field.

## `modules` (type: `array`):

Which intelligence modules to include. Default: all. Options: content, tech, seo, contacts, structured, ai\_score, domain, screenshot

## `contentFormat` (type: `string`):

Format for extracted content: 'markdown' (token-optimized), 'text' (plain), 'html' (cleaned HTML)

## `maxContentLength` (type: `integer`):

Maximum characters of content to return. Useful for token budget control.

## `proxyConfiguration` (type: `object`):

Apify proxy configuration for accessing blocked sites.

## Actor input object example

```json
{
  "url": "https://openai.com",
  "modules": [
    "content",
    "tech",
    "seo",
    "contacts",
    "structured",
    "ai_score",
    "domain"
  ],
  "contentFormat": "markdown",
  "maxContentLength": 10000
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "url": "https://openai.com",
    "modules": [
        "content",
        "tech",
        "seo",
        "contacts",
        "structured",
        "ai_score",
        "domain"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("george.the.developer/websight-api").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "url": "https://openai.com",
    "modules": [
        "content",
        "tech",
        "seo",
        "contacts",
        "structured",
        "ai_score",
        "domain",
    ],
}

# Run the Actor and wait for it to finish
run = client.actor("george.the.developer/websight-api").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "url": "https://openai.com",
  "modules": [
    "content",
    "tech",
    "seo",
    "contacts",
    "structured",
    "ai_score",
    "domain"
  ]
}' |
apify call george.the.developer/websight-api --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=george.the.developer/websight-api",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "WebSight API — AI-Native Web Intelligence",
        "description": "One API call for 7 types of web intelligence from any URL: clean markdown, tech stack, SEO audit, contacts, structured data, AI score, domain intel. Token-optimized for AI agents. Cached.",
        "version": "1.0",
        "x-build-id": "OoYlnTCzMEvo8NxXg"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/george.the.developer~websight-api/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-george.the.developer-websight-api",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/george.the.developer~websight-api/runs": {
            "post": {
                "operationId": "runs-sync-george.the.developer-websight-api",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/george.the.developer~websight-api/run-sync": {
            "post": {
                "operationId": "run-sync-george.the.developer-websight-api",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "url": {
                        "title": "URL to Analyze",
                        "type": "string",
                        "description": "The URL to extract intelligence from."
                    },
                    "urls": {
                        "title": "Bulk URLs",
                        "maxItems": 20,
                        "type": "array",
                        "description": "Multiple URLs to analyze (max 20). Use this OR the single url field.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "modules": {
                        "title": "Modules to Run",
                        "type": "array",
                        "description": "Which intelligence modules to include. Default: all. Options: content, tech, seo, contacts, structured, ai_score, domain, screenshot",
                        "items": {
                            "type": "string"
                        }
                    },
                    "contentFormat": {
                        "title": "Content Format",
                        "enum": [
                            "markdown",
                            "text",
                            "html"
                        ],
                        "type": "string",
                        "description": "Format for extracted content: 'markdown' (token-optimized), 'text' (plain), 'html' (cleaned HTML)",
                        "default": "markdown"
                    },
                    "maxContentLength": {
                        "title": "Max Content Length (chars)",
                        "minimum": 500,
                        "maximum": 100000,
                        "type": "integer",
                        "description": "Maximum characters of content to return. Useful for token budget control.",
                        "default": 10000
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Apify proxy configuration for accessing blocked sites."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
