Website URL Extractor - Get All Site URLs avatar

Website URL Extractor - Get All Site URLs

Pricing

from $0.20 / 1,000 results

Go to Apify Store
Website URL Extractor - Get All Site URLs

Website URL Extractor - Get All Site URLs

Extract every URL from any website automatically — no code needed. This URL extractor crawls pages and parses XML sitemaps into one structured list with metadata (lastmod, priority, changefreq). Filter by keyword, cap results, and export to JSON, CSV, or Excel. Built for SEO audits & migrations.

Pricing

from $0.20 / 1,000 results

Rating

5.0

(1)

Developer

Lofomachines

Lofomachines

Maintained by Community

Actor stats

5

Bookmarked

149

Total users

14

Monthly active users

20 hours ago

Last modified

Share

Website URL Extractor - Get All URLs from Any Website

Website URL Extractor automatically extracts ALL URLs from any website — combining XML sitemap parsing with recursive crawling to return a complete, structured list of every page URL. Just add one or more start URLs and get back a clean URL list in minutes. Perfect for SEO audits, site migrations, content inventories, and feeding data pipelines.

👉 To try it, paste a website URL into startUrls and hit Start. That's it.

What does Website URL Extractor do?

This URL extractor discovers and collects every reachable URL on a website. It reads the site's XML sitemap when available and also crawls pages recursively, so you capture URLs that aren't listed in the sitemap. For each URL it returns any available metadata — last modified date (lastmod), priority, and change frequency (changefreq) — and lets you filter results by keyword. No coding required.

It works on any domain, handles large sites (50,000+ URLs), and runs on the Apify platform, so you also get scheduling, API access, integrations, proxy rotation, and monitoring out of the box.

Why use this URL extractor?

  • 🔍 Complete URL discovery — sitemap parsing + recursive crawling find every page, including orphan pages missing from the sitemap.
  • 💨 Fast and scalable — extract URLs from whole sites with 50,000+ pages efficiently.
  • 📦 Bulk multi-domain — pass several start URLs to scrape URLs from multiple websites in one run.
  • 🏷️ Rich metadata — capture lastmod, priority, and changefreq to track recently updated content.
  • 🔎 Keyword URL filtering — return only URLs containing your keywords (e.g. /blog/, /product/).
  • 🎯 Result control — use maxResults to sample or returnAll for a full export.
  • 🛡️ Reliable — automatic retries, proxy support, and loop prevention.

What data can Website URL Extractor extract?

FieldDescription
urlThe extracted page URL
lastmodLast modified date (when available in the sitemap)
prioritySitemap priority value of the URL
changefreqHow often the page is expected to change

How do I extract all URLs from a website?

  1. Open the Actor and paste one or more website URLs into the startUrls field.
  2. (Optional) Set keywords to keep only matching URLs, e.g. ["blog"].
  3. (Optional) Set returnAll: true for the full list, or use maxResults to cap output.
  4. Click Start and wait for the run to finish.
  5. Download your URL list as JSON, CSV, XML, or Excel from the dataset.

How much does it cost to extract URLs from a website?

This URL extractor is intentionally lightweight — it parses URL structures without rendering JavaScript unless strictly necessary, keeping platform usage low.

Site sizeEstimated cost
Small (< 1,000 URLs)A few cents per run
Medium (~10,000 URLs)Typically under $1.00
Large (50,000+ URLs)Scales efficiently; depends on site complexity

💡 Tip: Keep Apify Proxy enabled to avoid IP blocking on larger sites.

Input

Website URL Extractor accepts a JSON input. Only startUrls is required — click the Input tab for all options.

Minimal input

{
"startUrls": [{ "url": "https://example.com" }]
}

Full input example

{
"startUrls": [
{ "url": "https://apify.com" },
{ "url": "https://crawlee.dev" }
],
"proxyConfiguration": { "useApifyProxy": true },
"returnAll": true,
"maxResults": 1000,
"keywords": ["blog", "article"]
}
ParameterTypeRequiredDefaultDescription
startUrlsArray✅ Yes[{ "url": "https://apify.com" }]One or more websites to extract URLs from
proxyConfigurationObject❌ No{ "useApifyProxy": false }Proxy settings for reliable, unblocked access
returnAllBoolean❌ NotrueExtract all URLs and ignore maxResults
maxResultsInteger❌ No1000Max URLs to return. Ignored when returnAll is true
keywordsArray❌ No[]Case-insensitive filter — returns only URLs containing all listed keywords

Output

Results are stored in the Apify dataset and can be downloaded as JSON, CSV, XML, or Excel, or pulled via the Apify API.

[
{
"url": "https://example.com/blog/post-1",
"lastmod": "2026-05-14",
"priority": "0.8",
"changefreq": "weekly"
},
{
"url": "https://example.com/products/item-42",
"lastmod": "2026-04-02",
"priority": "0.6",
"changefreq": "monthly"
}
]

Use cases

  • SEO audit — extract all URLs to map site architecture and find orphan pages.
  • Site migration — build a complete page inventory before a redesign or replatform.
  • Content inventory — list every page to plan content audits and updates.
  • Change monitoring — track lastmod dates to spot recently updated pages.
  • Data pipelines — feed the URL list into downstream scrapers (HTML scraper, Google Sheets, etc.).
  • Targeted scraping — use keyword filters to scope extraction to a section like /blog/.

Run Website URL Extractor with the Apify API or AI agents

You can run this URL extractor programmatically via the Apify API in any language, schedule recurring runs, or call it from an AI agent through the Apify MCP server — making it easy for assistants like Claude or ChatGPT to extract all URLs from a website on demand.

Build a complete website intelligence pipeline

This URL extractor is step one. Once you have the full URL list, power up your analysis:

ActorWhat it adds
GEO Audit — AI Search Optimization CheckerScore each page's visibility in ChatGPT, Perplexity, and Gemini
Website Tech ProfilerDetect the tech stack behind any URL — CMS, frameworks, analytics, CDN
Organization Registered Domain & Subdomain ScraperDiscover all domains owned by an organization before extraction
Website API & Endpoint AnalyzerUncover hidden API endpoints behind any extracted URL

FAQ

How do I get a list of all pages on a website?

Add the website to startUrls and run the Actor with returnAll: true. It parses the sitemap and crawls the site to return every reachable page URL.

Can I extract URLs from an XML sitemap?

Yes. Website URL Extractor reads XML sitemaps automatically and pulls lastmod, priority, and changefreq where available, then supplements them with recursive crawling.

This Actor only collects publicly available page URLs and does not extract private user data. URLs are public information, but you remain responsible for complying with the target site's terms and applicable laws.

Why am I getting fewer URLs than expected?

The site's sitemap may be incomplete, some URLs may be blocked by robots.txt, or maxResults may be limiting output. Set returnAll: true for the full list. JavaScript-heavy single-page apps (SPAs) may also expose fewer URLs.

What if no URLs are returned?

Check that the startUrls are public and not behind a login, and enable proxyConfiguration with useApifyProxy: true if the site is blocking requests.

Support

Found a bug or need a feature? Open an issue on the Issues tab — feedback is welcome, and custom solutions based on this Actor are available on request.