Pricing

from $0.20 / 1,000 results

Try for free

Go to Apify Store

Website URL Extractor - Get All Site URLs

Try for free

Extract every URL from any website automatically — no code needed. This URL extractor crawls pages and parses XML sitemaps into one structured list with metadata (lastmod, priority, changefreq). Filter by keyword, cap results, and export to JSON, CSV, or Excel. Built for SEO audits & migrations.

Pricing

from $0.20 / 1,000 results

Rating

5.0

(1)

Developer

Lofomachines

Actor stats

Bookmarked

149

Total users

Monthly active users

20 hours ago

Last modified

Website URL Extractor - Get All URLs from Any Website

Website URL Extractor automatically extracts ALL URLs from any website — combining XML sitemap parsing with recursive crawling to return a complete, structured list of every page URL. Just add one or more start URLs and get back a clean URL list in minutes. Perfect for SEO audits, site migrations, content inventories, and feeding data pipelines.

👉 To try it, paste a website URL into startUrls and hit Start. That's it.

What does Website URL Extractor do?

This URL extractor discovers and collects every reachable URL on a website. It reads the site's XML sitemap when available and also crawls pages recursively, so you capture URLs that aren't listed in the sitemap. For each URL it returns any available metadata — last modified date (lastmod), priority, and change frequency (changefreq) — and lets you filter results by keyword. No coding required.

It works on any domain, handles large sites (50,000+ URLs), and runs on the Apify platform, so you also get scheduling, API access, integrations, proxy rotation, and monitoring out of the box.

Why use this URL extractor?

🔍 Complete URL discovery — sitemap parsing + recursive crawling find every page, including orphan pages missing from the sitemap.
💨 Fast and scalable — extract URLs from whole sites with 50,000+ pages efficiently.
📦 Bulk multi-domain — pass several start URLs to scrape URLs from multiple websites in one run.
🏷️ Rich metadata — capture lastmod, priority, and changefreq to track recently updated content.
🔎 Keyword URL filtering — return only URLs containing your keywords (e.g. /blog/, /product/).
🎯 Result control — use maxResults to sample or returnAll for a full export.
🛡️ Reliable — automatic retries, proxy support, and loop prevention.

What data can Website URL Extractor extract?

Field	Description
`url`	The extracted page URL
`lastmod`	Last modified date (when available in the sitemap)
`priority`	Sitemap priority value of the URL
`changefreq`	How often the page is expected to change

How do I extract all URLs from a website?

Open the Actor and paste one or more website URLs into the startUrls field.
(Optional) Set keywords to keep only matching URLs, e.g. ["blog"].
(Optional) Set returnAll: true for the full list, or use maxResults to cap output.
Click Start and wait for the run to finish.
Download your URL list as JSON, CSV, XML, or Excel from the dataset.

How much does it cost to extract URLs from a website?

This URL extractor is intentionally lightweight — it parses URL structures without rendering JavaScript unless strictly necessary, keeping platform usage low.

Site size	Estimated cost
Small (< 1,000 URLs)	A few cents per run
Medium (~10,000 URLs)	Typically under $1.00
Large (50,000+ URLs)	Scales efficiently; depends on site complexity

💡 Tip: Keep Apify Proxy enabled to avoid IP blocking on larger sites.

Input

Website URL Extractor accepts a JSON input. Only startUrls is required — click the Input tab for all options.

Minimal input

{
  "startUrls": [{ "url": "https://example.com" }]
}

Full input example

{
  "startUrls": [
    { "url": "https://apify.com" },
    { "url": "https://crawlee.dev" }
  ],
  "proxyConfiguration": { "useApifyProxy": true },
  "returnAll": true,
  "maxResults": 1000,
  "keywords": ["blog", "article"]
}

Parameter	Type	Required	Default	Description
`startUrls`	Array	✅ Yes	`[{ "url": "https://apify.com" }]`	One or more websites to extract URLs from
`proxyConfiguration`	Object	❌ No	`{ "useApifyProxy": false }`	Proxy settings for reliable, unblocked access
`returnAll`	Boolean	❌ No	`true`	Extract all URLs and ignore `maxResults`
`maxResults`	Integer	❌ No	`1000`	Max URLs to return. Ignored when `returnAll` is `true`
`keywords`	Array	❌ No	`[]`	Case-insensitive filter — returns only URLs containing all listed keywords

Output

Results are stored in the Apify dataset and can be downloaded as JSON, CSV, XML, or Excel, or pulled via the Apify API.

[
  {
    "url": "https://example.com/blog/post-1",
    "lastmod": "2026-05-14",
    "priority": "0.8",
    "changefreq": "weekly"
  },
  {
    "url": "https://example.com/products/item-42",
    "lastmod": "2026-04-02",
    "priority": "0.6",
    "changefreq": "monthly"
  }
]

Use cases

SEO audit — extract all URLs to map site architecture and find orphan pages.
Site migration — build a complete page inventory before a redesign or replatform.
Content inventory — list every page to plan content audits and updates.
Change monitoring — track lastmod dates to spot recently updated pages.
Data pipelines — feed the URL list into downstream scrapers (HTML scraper, Google Sheets, etc.).
Targeted scraping — use keyword filters to scope extraction to a section like /blog/.

Run Website URL Extractor with the Apify API or AI agents

You can run this URL extractor programmatically via the Apify API in any language, schedule recurring runs, or call it from an AI agent through the Apify MCP server — making it easy for assistants like Claude or ChatGPT to extract all URLs from a website on demand.

Build a complete website intelligence pipeline

This URL extractor is step one. Once you have the full URL list, power up your analysis:

Actor	What it adds
GEO Audit — AI Search Optimization Checker	Score each page's visibility in ChatGPT, Perplexity, and Gemini
Website Tech Profiler	Detect the tech stack behind any URL — CMS, frameworks, analytics, CDN
Organization Registered Domain & Subdomain Scraper	Discover all domains owned by an organization before extraction
Website API & Endpoint Analyzer	Uncover hidden API endpoints behind any extracted URL

FAQ

How do I get a list of all pages on a website?

Add the website to startUrls and run the Actor with returnAll: true. It parses the sitemap and crawls the site to return every reachable page URL.

Can I extract URLs from an XML sitemap?

Yes. Website URL Extractor reads XML sitemaps automatically and pulls lastmod, priority, and changefreq where available, then supplements them with recursive crawling.

Is it legal to extract URLs from a website?

This Actor only collects publicly available page URLs and does not extract private user data. URLs are public information, but you remain responsible for complying with the target site's terms and applicable laws.

Why am I getting fewer URLs than expected?

The site's sitemap may be incomplete, some URLs may be blocked by robots.txt, or maxResults may be limiting output. Set returnAll: true for the full list. JavaScript-heavy single-page apps (SPAs) may also expose fewer URLs.

What if no URLs are returned?

Check that the startUrls are public and not behind a login, and enable proxyConfiguration with useApifyProxy: true if the site is blocking requests.

Support

Found a bug or need a feature? Open an issue on the Issues tab — feedback is welcome, and custom solutions based on this Actor are available on request.

Link Extractor

automation-lab/link-extractor

This actor extracts all hyperlinks from web pages. For each link, it captures the anchor text, href, rel attributes (nofollow, ugc, sponsored), target attribute, and classifies links as internal or external. It also detects the link's location in the page (nav, header, footer, main content,...

Stas Persiianenko

Actor Builder

handleco-app/actor-builder

handleco-app

Google Search Scraper - Most Comprehensive

kaix/google-serp-scraper

🔥 ~$1/1K pages 🔥 Scrape Google into 40+ structured sections: organic results, knowledge panel with cast/ratings/streaming prices, finance with earnings, shopping with filters, AI overview, PAA with answers, perspectives, video key moments, hotels, local pack, flights, lyrics, sports, ads.

Kai

FindLaw Scraper

jungle_synthesizer/findlaw-scraper

Scrape attorney and law firm data from FindLaw Lawyer Directory to generate high-quality, targeted legal industry leads

BowTiedRaccoon

180

Website Tech Profiler

lofomachines/website-tech-profiler

Advanced technology stack scraper and Wappalyzer alternative. Detect frontend frameworks (React, Vue, Angular), backend technologies, CDN, hosting providers, analytics, advertising scripts, API endpoints, and more. Complete techstack analysis for competitive research and lead generation.

Lofomachines

5.0

Airbnb Availability Calendar

agenscrape/airbnb-availability-calendar

Scrape Airbnb listing availability calendars with optional pricing. Get day-by-day availability status, nightly rates, taxes, and booking requirements for any property.

Agenscrape

117

2.0

Website Scraper Search Email, Phone, & Social Media

scraping_solutions/website-scraper-search-email-phone-social-media

Automatically extracts emails, social media links, and phone numbers from any website. Perfect for quickly gathering contact details and online presence data of businesses or professionals.

Scraping Solutions

105

TikTok Ad Library & Creative Scraper - Spy on Competitor Ads

whoareyouanas/tiktok-ad-scraper

Scrape TikTok Ad Library + Creative Center at scale - competitor ads, video URLs, hooks, days-running, EU/EEA regions - into clean CSV/JSON. Track ads before they vanish. No $99/mo PipiAds subscription needed.