Website URL Extractor - Get All Site URLs
Pricing
from $0.20 / 1,000 results
Website URL Extractor - Get All Site URLs
Extract every URL from any website automatically — no code needed. This URL extractor crawls pages and parses XML sitemaps into one structured list with metadata (lastmod, priority, changefreq). Filter by keyword, cap results, and export to JSON, CSV, or Excel. Built for SEO audits & migrations.
Pricing
from $0.20 / 1,000 results
Rating
5.0
(1)
Developer
Lofomachines
Maintained by CommunityActor stats
5
Bookmarked
149
Total users
14
Monthly active users
20 hours ago
Last modified
Categories
Share
Website URL Extractor - Get All URLs from Any Website
Website URL Extractor automatically extracts ALL URLs from any website — combining XML sitemap parsing with recursive crawling to return a complete, structured list of every page URL. Just add one or more start URLs and get back a clean URL list in minutes. Perfect for SEO audits, site migrations, content inventories, and feeding data pipelines.
👉 To try it, paste a website URL into startUrls and hit Start. That's it.
What does Website URL Extractor do?
This URL extractor discovers and collects every reachable URL on a website. It reads the site's XML sitemap when available and also crawls pages recursively, so you capture URLs that aren't listed in the sitemap. For each URL it returns any available metadata — last modified date (lastmod), priority, and change frequency (changefreq) — and lets you filter results by keyword. No coding required.
It works on any domain, handles large sites (50,000+ URLs), and runs on the Apify platform, so you also get scheduling, API access, integrations, proxy rotation, and monitoring out of the box.
Why use this URL extractor?
- 🔍 Complete URL discovery — sitemap parsing + recursive crawling find every page, including orphan pages missing from the sitemap.
- 💨 Fast and scalable — extract URLs from whole sites with 50,000+ pages efficiently.
- 📦 Bulk multi-domain — pass several start URLs to scrape URLs from multiple websites in one run.
- 🏷️ Rich metadata — capture
lastmod,priority, andchangefreqto track recently updated content. - 🔎 Keyword URL filtering — return only URLs containing your keywords (e.g.
/blog/,/product/). - 🎯 Result control — use
maxResultsto sample orreturnAllfor a full export. - 🛡️ Reliable — automatic retries, proxy support, and loop prevention.
What data can Website URL Extractor extract?
| Field | Description |
|---|---|
url | The extracted page URL |
lastmod | Last modified date (when available in the sitemap) |
priority | Sitemap priority value of the URL |
changefreq | How often the page is expected to change |
How do I extract all URLs from a website?
- Open the Actor and paste one or more website URLs into the
startUrlsfield. - (Optional) Set
keywordsto keep only matching URLs, e.g.["blog"]. - (Optional) Set
returnAll: truefor the full list, or usemaxResultsto cap output. - Click Start and wait for the run to finish.
- Download your URL list as JSON, CSV, XML, or Excel from the dataset.
How much does it cost to extract URLs from a website?
This URL extractor is intentionally lightweight — it parses URL structures without rendering JavaScript unless strictly necessary, keeping platform usage low.
| Site size | Estimated cost |
|---|---|
| Small (< 1,000 URLs) | A few cents per run |
| Medium (~10,000 URLs) | Typically under $1.00 |
| Large (50,000+ URLs) | Scales efficiently; depends on site complexity |
💡 Tip: Keep Apify Proxy enabled to avoid IP blocking on larger sites.
Input
Website URL Extractor accepts a JSON input. Only startUrls is required — click the Input tab for all options.
Minimal input
{"startUrls": [{ "url": "https://example.com" }]}
Full input example
{"startUrls": [{ "url": "https://apify.com" },{ "url": "https://crawlee.dev" }],"proxyConfiguration": { "useApifyProxy": true },"returnAll": true,"maxResults": 1000,"keywords": ["blog", "article"]}
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
startUrls | Array | ✅ Yes | [{ "url": "https://apify.com" }] | One or more websites to extract URLs from |
proxyConfiguration | Object | ❌ No | { "useApifyProxy": false } | Proxy settings for reliable, unblocked access |
returnAll | Boolean | ❌ No | true | Extract all URLs and ignore maxResults |
maxResults | Integer | ❌ No | 1000 | Max URLs to return. Ignored when returnAll is true |
keywords | Array | ❌ No | [] | Case-insensitive filter — returns only URLs containing all listed keywords |
Output
Results are stored in the Apify dataset and can be downloaded as JSON, CSV, XML, or Excel, or pulled via the Apify API.
[{"url": "https://example.com/blog/post-1","lastmod": "2026-05-14","priority": "0.8","changefreq": "weekly"},{"url": "https://example.com/products/item-42","lastmod": "2026-04-02","priority": "0.6","changefreq": "monthly"}]
Use cases
- SEO audit — extract all URLs to map site architecture and find orphan pages.
- Site migration — build a complete page inventory before a redesign or replatform.
- Content inventory — list every page to plan content audits and updates.
- Change monitoring — track
lastmoddates to spot recently updated pages. - Data pipelines — feed the URL list into downstream scrapers (HTML scraper, Google Sheets, etc.).
- Targeted scraping — use keyword filters to scope extraction to a section like
/blog/.
Run Website URL Extractor with the Apify API or AI agents
You can run this URL extractor programmatically via the Apify API in any language, schedule recurring runs, or call it from an AI agent through the Apify MCP server — making it easy for assistants like Claude or ChatGPT to extract all URLs from a website on demand.
Build a complete website intelligence pipeline
This URL extractor is step one. Once you have the full URL list, power up your analysis:
| Actor | What it adds |
|---|---|
| GEO Audit — AI Search Optimization Checker | Score each page's visibility in ChatGPT, Perplexity, and Gemini |
| Website Tech Profiler | Detect the tech stack behind any URL — CMS, frameworks, analytics, CDN |
| Organization Registered Domain & Subdomain Scraper | Discover all domains owned by an organization before extraction |
| Website API & Endpoint Analyzer | Uncover hidden API endpoints behind any extracted URL |
FAQ
How do I get a list of all pages on a website?
Add the website to startUrls and run the Actor with returnAll: true. It parses the sitemap and crawls the site to return every reachable page URL.
Can I extract URLs from an XML sitemap?
Yes. Website URL Extractor reads XML sitemaps automatically and pulls lastmod, priority, and changefreq where available, then supplements them with recursive crawling.
Is it legal to extract URLs from a website?
This Actor only collects publicly available page URLs and does not extract private user data. URLs are public information, but you remain responsible for complying with the target site's terms and applicable laws.
Why am I getting fewer URLs than expected?
The site's sitemap may be incomplete, some URLs may be blocked by robots.txt, or maxResults may be limiting output. Set returnAll: true for the full list. JavaScript-heavy single-page apps (SPAs) may also expose fewer URLs.
What if no URLs are returned?
Check that the startUrls are public and not behind a login, and enable proxyConfiguration with useApifyProxy: true if the site is blocking requests.
Support
Found a bug or need a feature? Open an issue on the Issues tab — feedback is welcome, and custom solutions based on this Actor are available on request.