# Html Lang Validator (`zerobreak/html-lang-validator`) Actor

HTML lang validator that checks any webpage for missing or invalid lang attributes, so developers and SEO teams can fix language tag errors across large sites without clicking through pages one by one.

- **URL**: https://apify.com/zerobreak/html-lang-validator.md
- **Developed by:** [ZeroBreak](https://apify.com/zerobreak) (community)
- **Categories:** Developer tools, SEO tools
- **Stats:** 3 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

$4.99/month + usage

To use this Actor, you pay a monthly rental fee to the developer. The rent is subtracted from your prepaid usage every month after the free trial period.You also pay for the Apify platform usage, which gets cheaper the higher Apify subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#rental-actors

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## HTML Lang Validator: Check and Fix Missing Lang Attributes on Any Website

HTML lang validator checks every URL you give it for a missing or invalid `lang` attribute on the `<html>` tag. The `lang` attribute tells browsers and screen readers what language a page is written in. Leave it out and you create an ambiguity that costs you in accessibility audits and search rankings. Most site audits catch broken links and slow pages but skip the lang check entirely.

Point the actor at one URL or feed it a list of hundreds. It fetches each page, reads the `lang` attribute, and validates the value against BCP 47 rules. Results land in a dataset with the exact lang value found, HTTP status, page title, and a plain list of any issues detected.

### Use cases

- **SEO auditing**: find pages with missing lang attributes before they hurt international SEO performance
- **Accessibility compliance**: flag lang attribute errors that break screen reader language detection
- **Site migrations**: validate lang attributes across every page after a CMS or template change
- **Multilingual site QA**: confirm each language variant has the correct lang value set
- **Bulk content audits**: check hundreds of URLs without opening each page by hand

### Input

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `url` | string | - | Single URL to validate |
| `urls` | array | - | List of URLs to validate, one per line |
| `maxUrls` | integer | 100 | Maximum URLs to process per run (max: 1000) |
| `requestTimeoutSecs` | integer | 30 | Per-request timeout in seconds |
| `proxyConfiguration` | object | Datacenter (Anywhere) | Proxy type and location for requests. Supports Datacenter, Residential, Special, and custom proxies. Optional. |

#### Example input

```json
{
    "urls": [
        "https://apify.com",
        "https://apify.com/store",
        "https://apify.com/about"
    ],
    "maxUrls": 100,
    "requestTimeoutSecs": 30,
    "proxyConfiguration": { "useApifyProxy": true }
}
````

### What data does this actor collect?

Each result in the dataset contains:

```json
{
    "url": "https://apify.com",
    "httpStatus": 200,
    "pageTitle": "Apify: Full-stack web scraping and data extraction platform",
    "langValue": "en",
    "xmlLangValue": null,
    "isLangPresent": true,
    "isLangValid": true,
    "issues": [],
    "hasIssues": false,
    "error": null,
    "checkedAt": "2025-03-04T10:23:41.123456+00:00"
}
```

| Field | Type | Description |
|-------|------|-------------|
| `url` | string | Final URL after any redirects |
| `httpStatus` | integer | HTTP response status code |
| `pageTitle` | string | Page title from the `<title>` tag |
| `langValue` | string | Value of the `lang` attribute. Null if missing. |
| `xmlLangValue` | string | Value of `xml:lang` (used in XHTML documents). Null if not present. |
| `isLangPresent` | boolean | True if a `lang` attribute exists on the `<html>` tag |
| `isLangValid` | boolean | True if the `lang` value passes BCP 47 validation |
| `issues` | array | List of validation issues found for this URL |
| `hasIssues` | boolean | True if any issues were detected |
| `error` | string | Error message if the page could not be fetched. Null otherwise. |
| `checkedAt` | string | ISO 8601 timestamp of when the check ran |

### How it works

1. Collects URLs from `url` and `urls` inputs, deduplicates them, and caps at `maxUrls`
2. Fetches each page with an async HTTP client using a realistic browser user-agent
3. Parses the HTML with BeautifulSoup and reads the `lang` attribute from the `<html>` tag
4. Validates the value against BCP 47 language tag rules
5. Flags issues: missing attribute, empty value, or invalid format
6. Pushes results to the dataset in real time as each URL finishes

### Integrations

Connect HTML Lang Validator with other apps using [Apify integrations](https://apify.com/integrations). You can pipe results into Google Sheets, trigger Slack alerts via Make or Zapier, or connect to Airbyte for data warehouse ingestion. Use [webhooks](https://docs.apify.com/integrations/webhooks) to trigger downstream actions as soon as a run completes.

### FAQ

**What counts as a valid lang attribute?**
The actor validates against BCP 47 language tags. Values like `en`, `en-US`, `fr`, `zh-Hant`, and `pt-BR` all pass. Values like `english`, `EN_US`, or an empty string fail.

**What happens if a page returns a 404 or 5xx error?**
The actor records the HTTP status and writes an error message to the result. It keeps processing the rest of the list rather than stopping.

**Can it handle XHTML pages that use xml:lang instead of lang?**
Yes. The actor reads both `lang` and `xml:lang`. If only `xml:lang` is found, it flags the missing `lang` attribute and notes the xml:lang value in the result.

**How many URLs can it process in one run?**
Up to 1,000 per run. Use `maxUrls` to keep runs smaller during testing.

**Does it follow redirects?**
Yes. The actor follows HTTP redirects automatically and reports the final URL after all hops.

**Can I run this on a schedule?**
Yes. Set up a scheduled run in Apify Console to monitor your site's lang attribute compliance over time and catch regressions after deployments.

***

Run the HTML lang validator on your site and get a full report on missing or invalid lang attributes. Export results as JSON or CSV, or push them directly to Google Sheets for review.

# Actor input Schema

## `url` (type: `string`):

A single URL to validate. The actor checks the HTML lang attribute on this page.

## `urls` (type: `array`):

A list of URLs to validate. One URL per line. Use this to check multiple pages in a single run.

## `maxUrls` (type: `integer`):

Maximum number of URLs to process per run. Useful for capping costs on large lists. Maximum allowed: 1000.

## `requestTimeoutSecs` (type: `integer`):

How long to wait for each page to respond before giving up. Increase for slow sites.

## `proxyConfiguration` (type: `object`):

Select proxies to use for requests. Helps avoid IP blocking and rate limits. Datacenter proxies are fastest; Residential proxies are harder to detect.

## Actor input object example

```json
{
  "url": "https://apify.com",
  "urls": [
    "https://apify.com",
    "https://apify.com/store"
  ],
  "maxUrls": 100,
  "requestTimeoutSecs": 30,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "url": "https://apify.com",
    "proxyConfiguration": {
        "useApifyProxy": true
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("zerobreak/html-lang-validator").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "url": "https://apify.com",
    "proxyConfiguration": { "useApifyProxy": True },
}

# Run the Actor and wait for it to finish
run = client.actor("zerobreak/html-lang-validator").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "url": "https://apify.com",
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}' |
apify call zerobreak/html-lang-validator --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=zerobreak/html-lang-validator",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Html Lang Validator",
        "description": "HTML lang validator that checks any webpage for missing or invalid lang attributes, so developers and SEO teams can fix language tag errors across large sites without clicking through pages one by one.",
        "version": "0.0",
        "x-build-id": "84p9GFgdOhefRbBKo"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/zerobreak~html-lang-validator/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-zerobreak-html-lang-validator",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/zerobreak~html-lang-validator/runs": {
            "post": {
                "operationId": "runs-sync-zerobreak-html-lang-validator",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/zerobreak~html-lang-validator/run-sync": {
            "post": {
                "operationId": "run-sync-zerobreak-html-lang-validator",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "url": {
                        "title": "URL",
                        "type": "string",
                        "description": "A single URL to validate. The actor checks the HTML lang attribute on this page."
                    },
                    "urls": {
                        "title": "URLs",
                        "type": "array",
                        "description": "A list of URLs to validate. One URL per line. Use this to check multiple pages in a single run.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxUrls": {
                        "title": "Max URLs",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Maximum number of URLs to process per run. Useful for capping costs on large lists. Maximum allowed: 1000.",
                        "default": 100
                    },
                    "requestTimeoutSecs": {
                        "title": "Request timeout (seconds)",
                        "minimum": 5,
                        "maximum": 120,
                        "type": "integer",
                        "description": "How long to wait for each page to respond before giving up. Increase for slow sites.",
                        "default": 30
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Select proxies to use for requests. Helps avoid IP blocking and rate limits. Datacenter proxies are fastest; Residential proxies are harder to detect."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
