GitHub Repositories Search Scraper
Pricing
from $0.12 / 1,000 repository results
GitHub Repositories Search Scraper
Search public GitHub repositories by query, language, topics, stars, forks, and activity. Export repo URLs, owners, topics, licenses, timestamps, and optional owner details.
Pricing
from $0.12 / 1,000 repository results
Rating
0.0
(0)
Developer
Hanna Nosova
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
20 hours ago
Last modified
Categories
Share
Search and export public GitHub repositories by keyword, language, topic, stars, forks, and recent activity.
Use this actor when you need a repeatable CSV, JSON, Excel, or API export of public repositories for developer relations, market research, open-source intelligence, investment research, competitive tracking, or technical lead generation.
What does GitHub Repositories Search Scraper do?
GitHub Repositories Search Scraper turns GitHub repository searches into structured datasets.
It accepts one or more search queries and returns public repository records with useful discovery and scoring fields.
You can search by product name, company name, technology, framework, keyword, or topic.
You can optionally filter by programming language.
You can sort by best match, stars, forks, or recently updated repositories.
You can export results from Apify as JSON, CSV, Excel, XML, RSS, or through the Dataset API.
Who is it for?
This actor is useful for teams that need public open-source repository data in a clean table.
- 🧑💻 Developer-relations teams can find projects that mention a platform, SDK, or integration.
- 📈 Venture and market researchers can map active open-source ecosystems.
- 🧲 Lead-generation teams can discover companies, maintainers, and projects by technology.
- 📰 Newsletter writers can build lists of repositories around a trend or topic.
- 🛡️ Open-source intelligence teams can monitor public repositories related to vendors, packages, or keywords.
- 🏢 Competitive-intelligence teams can track new projects around a competitor or market category.
Why use this actor?
The actor gives you a repeatable workflow instead of copying results from a browser.
It saves repository URLs, owners, stars, forks, topics, licenses, timestamps, and other fields in one dataset.
It also supports multiple queries in a single run.
That makes it easier to compare markets, keywords, languages, and topics.
Typical use cases
- Find public repositories mentioning your company, product, or API.
- Build a list of popular repositories in a programming language.
- Monitor repositories around a topic such as
web-scraping,agent, ormcp. - Export GitHub repository URLs for enrichment in another system.
- Discover high-star projects in a technical category.
- Track recently updated repositories for trend research.
- Create market maps for open-source ecosystems.
GitHub repository search examples
Try focused searches such as:
language:python AI agentsto find Python AI-agent repositories.topic:web-scrapingto export public scraping-related repositories.- Competitor or product names to monitor open-source references.
- SDK or package names with
sort:updatedto find recently active projects.
What data can you extract?
The actor returns one dataset row per repository.
| Field | Description |
|---|---|
query | The input query that produced the repository |
fullName | Owner and repository name |
htmlUrl | Public GitHub repository URL |
description | Repository description |
ownerLogin | Repository owner login |
ownerType | Owner type, such as User or Organization |
stars | Stargazer count |
forks | Fork count |
watchers | Watcher count |
openIssues | Open issue count |
language | Primary language |
topics | Public GitHub topics |
licenseName | Repository license name when available |
createdAt | Repository creation date |
updatedAt | Repository update date |
pushedAt | Last push date |
archived | Whether the repository is archived |
fork | Whether the repository is a fork |
homepage | Public homepage URL when set |
cloneUrl | HTTPS clone URL |
Optional owner details
Turn on includeOwnerDetails when you need more public profile data about repository owners.
This can return fields such as owner name, company, blog, location, public repo count, followers, following, and public profile timestamps.
Owner details use additional GitHub requests.
For large enriched runs, add a GitHub token to increase API limits.
How much does it cost to scrape GitHub repositories?
The actor uses pay-per-event pricing.
You pay a small start fee for each run and a per-repository fee for each saved result.
Current pricing is $0.005 per run plus a BRONZE per-repository price of $0.0002, with canonical tier discounts on higher Apify plans.
Exact live pricing is shown on the Apify actor page before you start a run.
Use a low maxResults value for your first run, review the dataset, and then scale up.
Input configuration
The main input fields are:
queries— required list of search terms.language— optional programming language filter.topics— optional list of GitHub topics to require.sort— best match, stars, forks, or updated.order— descending or ascending.maxResults— maximum repositories to save.includeOwnerDetails— fetch extra owner profile fields.githubToken— optional secret token for higher GitHub API limits.
Example input
{"queries": ["apify", "web scraping"],"language": "javascript","topics": ["automation"],"sort": "stars","order": "desc","maxResults": 50,"includeOwnerDetails": false}
Example output
{"query": "apify","name": "API-mega-list","fullName": "cporter202/API-mega-list","htmlUrl": "https://github.com/cporter202/API-mega-list","description": "A public API collection and developer resource list.","ownerLogin": "cporter202","ownerType": "User","stars": 6733,"forks": 1294,"watchers": 6733,"openIssues": 15,"language": "JavaScript","topics": ["api", "automation", "web-scraping"],"licenseName": null,"updatedAt": "2026-06-22T01:14:16Z","pushedAt": "2026-01-27T18:40:57Z","archived": false,"fork": false,"homepage": null,"cloneUrl": "https://github.com/cporter202/API-mega-list.git"}
How to run it
- Open the actor on Apify.
- Add one or more repository search queries.
- Optionally choose a language and topics.
- Set the maximum number of repositories.
- Start the run.
- Download the dataset or use the API.
Search tips
Use broad queries for discovery.
Use language filters when you need a specific developer ecosystem.
Use topic filters when you need a cleaner topical list.
Sort by stars for popular projects.
Sort by updated for recent activity.
Use multiple queries when comparing markets or competitors.
Start with 25 results before scaling to hundreds.
Handling GitHub limits
Small public searches work without a token.
GitHub applies rate limits to anonymous API traffic.
If you run large searches or owner enrichment, provide a GitHub token in the secret input field.
The token is used only for GitHub API authentication during your run.
If a rate limit is reached, the actor reports a clear error and suggests using a smaller run or token.
Integrations
You can connect the dataset to downstream tools.
- 📊 Send CSV exports to spreadsheets for market analysis.
- 🧩 Use Make or Zapier to trigger follow-up enrichment.
- 🗃️ Load JSON results into a warehouse or database.
- 📬 Feed repository lists into developer-relations workflows.
- 🔎 Combine repository URLs with other Apify actors for broader research.
API usage
Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: process.env.APIFY_TOKEN });const run = await client.actor('fetch_cat/github-repositories-search-scraper').call({queries: ['apify'],language: 'javascript',maxResults: 25});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Python
from apify_client import ApifyClientimport osclient = ApifyClient(os.environ['APIFY_TOKEN'])run = client.actor('fetch_cat/github-repositories-search-scraper').call(run_input={'queries': ['apify'],'language': 'javascript','maxResults': 25,})items = client.dataset(run['defaultDatasetId']).list_items().itemsprint(items)
cURL
curl -X POST 'https://api.apify.com/v2/acts/fetch_cat~github-repositories-search-scraper/runs?token=YOUR_APIFY_TOKEN' \-H 'Content-Type: application/json' \-d '{"queries":["apify"],"language":"javascript","maxResults":25}'
MCP usage
Use this actor from MCP-compatible tools through Apify MCP Server.
MCP URL:
https://mcp.apify.com/?tools=fetch_cat/github-repositories-search-scraper
Claude Code setup:
$claude mcp add apify-github-repositories "https://mcp.apify.com/?tools=fetch_cat/github-repositories-search-scraper"
Claude Desktop JSON config:
{"mcpServers": {"apify-github-repositories": {"url": "https://mcp.apify.com/?tools=fetch_cat/github-repositories-search-scraper"}}}
Example prompts:
- "Find 25 JavaScript repositories about Apify and summarize the top owners."
- "Export popular Python repositories about web scraping to a table."
- "Search GitHub repositories for MCP tools and rank them by stars."
Data quality notes
Repository counts and metadata come from public GitHub repository records.
Stars, forks, issues, and timestamps can change over time.
Some repositories do not have a license, homepage, topics, or recent push timestamp.
Owner email is often unavailable because many GitHub users do not publish an email on their profile.
FAQ and troubleshooting
Why did my run return fewer repositories than requested?
The source may have fewer matching public repositories for your exact query, language, and topic filters.
Try removing a topic filter or using a broader query.
Why did I hit a rate limit?
Anonymous GitHub API traffic has stricter limits.
Use a smaller maxResults value or add a GitHub token for higher limits.
Do I need a GitHub token?
No for small anonymous public searches. For large runs or owner enrichment, provide a GitHub token because GitHub API limits are stricter without authentication.
Why are some fields empty?
Some public repositories do not provide homepage, license, topics, or owner profile details.
The actor keeps those fields empty instead of guessing.
Legality and responsible use
This actor extracts publicly available GitHub repository data.
Use the data responsibly and follow GitHub's terms, Apify's terms, and applicable laws.
Do not use exported data for spam, harassment, credential collection, or other abusive activity.
If you use owner details for outreach, comply with privacy and anti-spam rules in your jurisdiction.
Related scrapers
Explore related actors from the same Apify account:
- https://apify.com/fetch_cat/website-content-crawler-lite
- https://apify.com/fetch_cat/web-page-to-markdown-extractor
- https://apify.com/fetch_cat/google-news-scraper
- https://apify.com/fetch_cat/reddit-scraper
Changelog
0.1— Initial version for public GitHub repository search exports.
Need help?
If a query behaves differently than expected, start with a small maxResults value and inspect the dataset.
Then adjust language, topic, sort, and order settings.
For larger enriched runs, use a GitHub token and keep your first test small.