Pricing

Pay per event

GitHub Repo Scraper

Fetch full GitHub repository metadata for one or many repos in one call — stars, forks, languages, topics, license, default branch, latest release, contributor count — export to JSON or CSV. A GitHub repo API wrapper; optional token for higher rate limits.

Pricing

Pay per event

Rating

0.0

(0)

Developer

DevilScrapes

Actor stats

Bookmarked

Total users

Monthly active users

14 days ago

Last modified

🎯 What this scrapes

GitHub exposes public repository data through its REST API, but turning a list of repos into a reliable dataset is messier than it looks: secondary-rate-limit errors kick in at burst speeds, the languages and release endpoints are separate calls, and unauthenticated requests cap out at 60 per hour. This GitHub repo scraper fans out requests in parallel, handles the retry dance automatically, and delivers one richly-typed row per repository — covering everything from stargazers_count through latest_release_tag to scraped_at.

Give it a list of owner/repo slugs or full GitHub URLs. It writes clean, Pydantic-validated rows straight into your Apify dataset. Use it for competitor benchmarking, OSS health checks, DevRel dashboards, AI/RAG corpus building, or any workflow that needs bulk GitHub repository data on demand.

🔥 What we handle for you

🛡️ Browser fingerprint rotation — curl-cffi impersonates real Chrome / Firefox / Safari TLS handshakes so requests look like a browser, not a Python script.
🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP whenever the target pushes back.
🔁 Retries with exponential backoff on 408 / 429 / 5xx — up to 5 attempts per request, Retry-After headers honoured.
🧱 Rate-limit-aware pacing — when GitHub's secondary rate limit kicks in, we slow down and wait rather than hammering until banned.
🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable field names, JSON / CSV / Excel export straight from Apify Console.
💰 Pay-Per-Event pricing — you pay only for results that land in your dataset. No data, no charge.

💡 Use cases

Competitor OSS benchmarking — track stars and forks across rival projects week-over-week and pipe deltas to Slack or a BI tool.
Dependency health monitoring — feed your stack's transitive repo list and flag anything archived, disabled, or unmaintained.
RAG corpus building — pull language breakdowns and README metadata for a curated set of repos to seed a vector store.
Hiring and M&A research — quantify the open-source surface area of a target company or candidate's personal GitHub activity.
Newsletter automation — ingest a curated list weekly, diff the star counts, surface the fastest movers.
DevRel dashboards — track your own org's repos alongside ecosystem repos in one unified dataset.

⚙️ How to use it

Click Try for free at the top of the page — no credit card required.
Paste one or more owner/repo slugs (or full GitHub URLs) into the repos field.
Optionally add a GitHub Personal Access Token to lift the rate limit from 60 req/hr to 5 000 req/hr.
Click Start. Output streams into the run's dataset in real time.
Export from Storage → Dataset as JSON, CSV, or Excel — or pull via the Apify API.

📥 Input

Field	Type	Required	Default	Notes
`repos`	`array`	yes	`["apify/apify-sdk-python", "apify/crawlee-python"]`	List of repos as `owner/repo` slugs or full GitHub URLs.
`githubToken`	`string`	no	—	Personal access token. Unauthenticated = 60 req/hr; with token = 5 000 req/hr. Read-only `public_repo` scope is sufficient.
`includeLanguages`	`boolean`	no	`true`	Adds a `languages` map (language → bytes) per repo. One extra API call per repo.
`includeLatestRelease`	`boolean`	no	`true`	Adds `latest_release_tag` and `latest_release_published_at`. One extra API call per repo.
`concurrency`	`integer`	no	`6`	Parallel API requests. Up to 8 with a token; 2–3 without.
`proxyConfiguration`	`object`	no	`{"useApifyProxy": false}`	Apify Proxy settings. Proxy is optional for the GitHub REST API — enable it if your network routing requires it.

Example input

{
  "repos": [
    "apify/apify-sdk-python",
    "apify/crawlee-python"
  ],
  "githubToken": "",
  "includeLanguages": true,
  "includeLatestRelease": true,
  "concurrency": 4,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}

📤 Output

Every row is one dataset item representing one GitHub repository.

Field	Type	Notes
`owner`	`string`	Owner or organisation login.
`name`	`string`	Repository name (without owner prefix).
`full_name`	`string`	Full slug — `owner/name`.
`html_url`	`string`	Canonical GitHub URL.
`description`	`string \| null`	Repository tagline.
`fork`	`boolean`	`true` if this is a fork.
`archived`	`boolean`	`true` if the repo is archived (read-only).
`disabled`	`boolean`	`true` if the repo is disabled.
`stargazers_count`	`integer`	Star count at scrape time.
`forks_count`	`integer`	Fork count.
`watchers_count`	`integer`	Watcher count (subscribers).
`open_issues_count`	`integer`	Open issues + open PRs combined.
`size_kb`	`integer`	Repository size in kilobytes.
`language`	`string \| null`	Primary language (GitHub's classification).
`languages`	`object \| null`	Map of language → bytes. Populated when `includeLanguages=true`.
`topics`	`array`	Repository topics / tags.
`license`	`string \| null`	SPDX identifier (e.g. `MIT`, `Apache-2.0`).
`default_branch`	`string`	Default branch name (usually `main`).
`homepage`	`string \| null`	User-supplied homepage URL.
`created_at`	`string`	Repo creation timestamp (ISO-8601 UTC).
`updated_at`	`string`	Last metadata update timestamp (ISO-8601 UTC).
`pushed_at`	`string`	Last commit push timestamp (ISO-8601 UTC).
`latest_release_tag`	`string \| null`	Tag of the latest GitHub release. Populated when `includeLatestRelease=true`.
`latest_release_published_at`	`string \| null`	Publish timestamp of the latest release (ISO-8601 UTC).
`scraped_at`	`string`	When this row was recorded (ISO-8601 UTC).

Example output

{
  "owner": "apify",
  "name": "apify-sdk-python",
  "full_name": "apify/apify-sdk-python",
  "html_url": "https://github.com/apify/apify-sdk-python",
  "description": "The Apify SDK for Python.",
  "fork": false,
  "archived": false,
  "stargazers_count": 415,
  "forks_count": 41,
  "watchers_count": 415,
  "open_issues_count": 12,
  "size_kb": 2048,
  "language": "Python",
  "languages": {
    "Python": 198432,
    "Shell": 1024
  },
  "topics": ["apify", "scraping", "sdk"],
  "license": "Apache-2.0",
  "default_branch": "main",
  "homepage": "https://docs.apify.com/sdk/python",
  "created_at": "2022-08-01T10:00:00Z",
  "updated_at": "2026-05-30T14:22:00Z",
  "pushed_at": "2026-05-29T08:11:00Z",
  "latest_release_tag": "v3.4.0",
  "latest_release_published_at": "2026-05-20T12:00:00Z",
  "scraped_at": "2026-06-01T09:00:00Z"
}

💰 Pricing

Pay-Per-Event — you only pay when these events fire:

Event	USD	What triggers it
`actor-start`	$0.005	One-off warm-up charge per run
`result`	$0.002	Per dataset row written

Example: 1 000 repos at the rates above ≈ $2.00. No subscription, no minimum, no card required to start — Apify gives every new account $5 of free credit.

🚧 Limitations

Private repos require a token with the matching scopes — this Actor only processes repos the token can read. Do not reuse production tokens here.
README content, raw code, commit graphs, and pull request history are outside the scope of this Actor. Use GitHub's search API or a dedicated commits scraper for those.
Large orgs with thousands of repos will hit the 5 000 req/hr authenticated ceiling on long runs. Plan batches or spread runs over multiple hours.
GitHub caches some counts (stars, forks) for a few minutes. Compare runs at least 5 minutes apart to catch real movement.

❓ FAQ

Do I need a GitHub token?

For small batches (under ~50 repos), no. The unauthenticated GitHub REST API allows 60 requests per hour, which is enough for a quick test. Provide a Personal Access Token to raise that ceiling to 5 000 requests per hour — public_repo read-only scope is all you need.

Is this a GitHub REST API alternative or replacement?

Neither — it is a wrapper that handles authentication, pagination, sub-resource fetching, secondary-rate-limit retries, and structured output so you do not have to write that code yourself. The GitHub API still powers the requests; we handle the operational layer on top of it.

Can I use this to fetch github repo metadata api-style for hundreds of repos at once?

Yes. This is the primary use case — pass a list of hundreds of owner/repo slugs, set your token, and the Actor fans them out in parallel while respecting GitHub's rate limits. Output lands in a clean dataset ready to export or query.

How is this different from calling the GitHub REST API myself?

Writing a reliable github repository scraper yourself means handling secondary rate limits, separate language and release endpoints, pagination, token rotation, and structured output validation. That is 1–2 dev-weeks of plumbing. This Actor handles all of it for $2 per 1 000 repos.

What if a repo doesn't exist or has been deleted?

The Actor logs a warning for that slug and continues — the rest of your list still processes normally. You get a partial dataset with a clear log entry for every skipped repo.

Can I scrape private repositories?

You can — if you supply a token that has access to those repos. This Actor was designed for bulk public-data extraction; do not use production tokens here.

💬 Your feedback

Spotted a bug, hit a weird edge case, or need a new output field? Open an issue on the Actor's Issues tab in Apify Console — we read every report and ship fixes weekly.

Github Scraper

fortuitous_pirate/github-scraper

Extract GitHub repository data including trending repos, search results, and contributor lists. Get stars, forks, language, topics, license, and activity dates. No authentication required for public data — optional GitHub token for higher rate limits.

Fortuitous Pirate

GitHub Stars Scraper

lulzasaur/github-stars-scraper

Scrape GitHub repository data. Search by keyword or language, fetch specific repos. Extract star counts, forks, topics, licenses, and full repo metadata.

lulz bot

GitHub Repository & Trending Scraper

rupom888/github-repository-scraper

Search GitHub repos, scrape user profiles with repos, get repo details with contributors, or track GitHub trending. Uses public API - optional token for higher rate limits.

Syed Rupom

GitHub Repository Intelligence

crawlerbros/github-repo-intelligence

Fetch rich metadata (stars, forks, README, languages, topics, license) from GitHub repositories. Search by query or provide direct URLs. Optional GitHub token for 80x higher rate limit.

Crawler Bros

GitHub Repo Stats. Stars, Forks, Languages, Contributors

seemuapps/github-repo-stats-scraper

Get stars, forks, issues, language breakdown, license, last commit, and contributor counts for any GitHub repository. Bulk-process a list of repos in one run.

Andrew

GitHub Repository Scraper - Stars, Topics, Trending

logiover/github-repository-scraper

Scrape GitHub repos by search query and export stars, topics, forks & license to CSV/JSON. GitHub data export without an API key - trending repos scraper.

Logiover

GitHub Repository Scraper

skystone_labs/github-repo-scraper

Extract GitHub repository metadata using GitHub API and scraping. Get repo info, stars, forks, language, topics, and README content. Perfect for research, analysis, and building datasets.

Skystone

GitHub Repository Scraper

cloud9_ai/github-scraper

Scrape GitHub repositories, users, and trending projects via REST API. Extract repo names, stars, forks, languages, descriptions, and contributor data.

cloud9

GitHub Trending Scraper — Repos & Developers

diverse_venture/github-trending-scraper

Scrape trending GitHub repositories filtered by language and period (daily/weekly/monthly), or top developers by location. Returns full repo metadata: stars, forks, topics, language, license. Uses public GitHub API — auth optional for higher rate limits.

Chak Man Fung

GitHub Repo Search — Stars, Language & Topics

ryanclinton/github-repo-search

Search and scrape GitHub repositories by keyword, language, stars, forks, or topic. Extract structured repo metadata including owner, license, topics, and activity timestamps. Sort by stars, forks, or recently updated. Export to JSON, CSV, or API. No token required.