GitHub Scraper
Pricing
from $1.50 / 1,000 item returneds
GitHub Scraper
Search GitHub repositories or users via the public REST API and get clean, structured rows: stars, forks, issues, language, topics, license, dates for repos; name, bio, company, location, followers for users. No key needed; add a token for higher limits.
Pricing
from $1.50 / 1,000 item returneds
Rating
5.0
(1)
Developer
Dami's Studio
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Share
Search GitHub repositories or users via the public GitHub REST API and get back clean, structured rows. No API key required — but adding a free GitHub token raises your rate limit dramatically (60 → 5000 requests/hr), which matters for larger jobs.
What you get
Repositories (type: repositories):
fullName, name, owner, url, description, stars, forks, openIssues, language, topics[], license (SPDX id), homepage, defaultBranch, createdAt, updatedAt, pushedAt.
Users / organizations (type: users) — each result is enriched with profile details:
login, url, type, id, name, bio, company, location, blog, followers, publicRepos, createdAt.
Every successful row also carries ok: true. Diagnostic rows (no results, bad input, rate limit, network) carry ok: false plus an errorCode and error message, and are never charged.
Nullable fields: GitHub only returns what a repo/user actually sets, so optional fields are null when absent — e.g. repo description, language, license, homepage; user name, bio, company, location, blog. These nulls are normal and still count as complete rows.
Input
| Field | Notes |
|---|---|
query | GitHub search syntax. Repos: language:python stars:>1000 machine learning, topic:cli. Users: location:berlin followers:>500. |
type | repositories (default) or users. |
sort | stars (default), forks, updated, or best-match. Applies to repository searches. |
maxItems | Default 100, max 1000 (GitHub's Search API cap). |
githubToken | Optional but recommended — raises limits to 5000 req/hr and 30 searches/min. No scopes needed for public data. Kept private. |
Output
One dataset row per repository or user, deduplicated by fullName / login. Queries with no matches return a single NO_RESULTS row and are not charged. An empty/missing query returns a single BAD_INPUT row (also not charged) instead of failing the run.
Rate limits
GitHub allows 60 requests/hr unauthenticated (10 searches/min) and 5000/hr with a token (30 searches/min). The Search API also caps at 1000 results per query. If you hit the limit, the actor returns a clear RATE_LIMITED row (with the reset time) suggesting you add a githubToken — it does not silently fail. This applies to user searches too: the per-user profile enrichment step makes one request per result, so a tokenless user search can exhaust the 60/hr budget — if that happens mid-enrichment, the actor surfaces a RATE_LIMITED row rather than returning zero rows silently.
Troubleshooting
- Got a
RATE_LIMITEDrow — add a freegithubToken(60 → 5000 req/hr) and re-run; user searches especially benefit because each result triggers a profile-detail request. The row includesrateLimitResetsAtso you know when to retry. - Got a
NO_RESULTSrow — the query ran but matched nothing; broaden it or check GitHub search qualifiers. - Got a
BAD_INPUTrow —querywas empty; provide a search string. - These diagnostic rows have
ok: falseand are never charged.
Example
{ "query": "language:python stars:>5000 web framework", "type": "repositories", "sort": "stars", "maxItems": 100 }
Notes
To pull more than 1000 results, split the job by a qualifier — e.g. star bands (stars:1000..5000, stars:5001..20000) or creation date windows (created:2022-01-01..2022-12-31).