GitHub Repositories Search Scraper avatar

GitHub Repositories Search Scraper

Pricing

from $0.12 / 1,000 repository results

Go to Apify Store
GitHub Repositories Search Scraper

GitHub Repositories Search Scraper

Search public GitHub repositories by query, language, topics, stars, forks, and activity. Export repo URLs, owners, topics, licenses, timestamps, and optional owner details.

Pricing

from $0.12 / 1,000 repository results

Rating

0.0

(0)

Developer

Hanna Nosova

Hanna Nosova

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

20 hours ago

Last modified

Categories

Share

Search and export public GitHub repositories by keyword, language, topic, stars, forks, and recent activity.

Use this actor when you need a repeatable CSV, JSON, Excel, or API export of public repositories for developer relations, market research, open-source intelligence, investment research, competitive tracking, or technical lead generation.

What does GitHub Repositories Search Scraper do?

GitHub Repositories Search Scraper turns GitHub repository searches into structured datasets.

It accepts one or more search queries and returns public repository records with useful discovery and scoring fields.

You can search by product name, company name, technology, framework, keyword, or topic.

You can optionally filter by programming language.

You can sort by best match, stars, forks, or recently updated repositories.

You can export results from Apify as JSON, CSV, Excel, XML, RSS, or through the Dataset API.

Who is it for?

This actor is useful for teams that need public open-source repository data in a clean table.

  • 🧑‍💻 Developer-relations teams can find projects that mention a platform, SDK, or integration.
  • 📈 Venture and market researchers can map active open-source ecosystems.
  • 🧲 Lead-generation teams can discover companies, maintainers, and projects by technology.
  • 📰 Newsletter writers can build lists of repositories around a trend or topic.
  • 🛡️ Open-source intelligence teams can monitor public repositories related to vendors, packages, or keywords.
  • 🏢 Competitive-intelligence teams can track new projects around a competitor or market category.

Why use this actor?

The actor gives you a repeatable workflow instead of copying results from a browser.

It saves repository URLs, owners, stars, forks, topics, licenses, timestamps, and other fields in one dataset.

It also supports multiple queries in a single run.

That makes it easier to compare markets, keywords, languages, and topics.

Typical use cases

  • Find public repositories mentioning your company, product, or API.
  • Build a list of popular repositories in a programming language.
  • Monitor repositories around a topic such as web-scraping, agent, or mcp.
  • Export GitHub repository URLs for enrichment in another system.
  • Discover high-star projects in a technical category.
  • Track recently updated repositories for trend research.
  • Create market maps for open-source ecosystems.

GitHub repository search examples

Try focused searches such as:

  • language:python AI agents to find Python AI-agent repositories.
  • topic:web-scraping to export public scraping-related repositories.
  • Competitor or product names to monitor open-source references.
  • SDK or package names with sort:updated to find recently active projects.

What data can you extract?

The actor returns one dataset row per repository.

FieldDescription
queryThe input query that produced the repository
fullNameOwner and repository name
htmlUrlPublic GitHub repository URL
descriptionRepository description
ownerLoginRepository owner login
ownerTypeOwner type, such as User or Organization
starsStargazer count
forksFork count
watchersWatcher count
openIssuesOpen issue count
languagePrimary language
topicsPublic GitHub topics
licenseNameRepository license name when available
createdAtRepository creation date
updatedAtRepository update date
pushedAtLast push date
archivedWhether the repository is archived
forkWhether the repository is a fork
homepagePublic homepage URL when set
cloneUrlHTTPS clone URL

Optional owner details

Turn on includeOwnerDetails when you need more public profile data about repository owners.

This can return fields such as owner name, company, blog, location, public repo count, followers, following, and public profile timestamps.

Owner details use additional GitHub requests.

For large enriched runs, add a GitHub token to increase API limits.

How much does it cost to scrape GitHub repositories?

The actor uses pay-per-event pricing.

You pay a small start fee for each run and a per-repository fee for each saved result.

Current pricing is $0.005 per run plus a BRONZE per-repository price of $0.0002, with canonical tier discounts on higher Apify plans.

Exact live pricing is shown on the Apify actor page before you start a run.

Use a low maxResults value for your first run, review the dataset, and then scale up.

Input configuration

The main input fields are:

  • queries — required list of search terms.
  • language — optional programming language filter.
  • topics — optional list of GitHub topics to require.
  • sort — best match, stars, forks, or updated.
  • order — descending or ascending.
  • maxResults — maximum repositories to save.
  • includeOwnerDetails — fetch extra owner profile fields.
  • githubToken — optional secret token for higher GitHub API limits.

Example input

{
"queries": ["apify", "web scraping"],
"language": "javascript",
"topics": ["automation"],
"sort": "stars",
"order": "desc",
"maxResults": 50,
"includeOwnerDetails": false
}

Example output

{
"query": "apify",
"name": "API-mega-list",
"fullName": "cporter202/API-mega-list",
"htmlUrl": "https://github.com/cporter202/API-mega-list",
"description": "A public API collection and developer resource list.",
"ownerLogin": "cporter202",
"ownerType": "User",
"stars": 6733,
"forks": 1294,
"watchers": 6733,
"openIssues": 15,
"language": "JavaScript",
"topics": ["api", "automation", "web-scraping"],
"licenseName": null,
"updatedAt": "2026-06-22T01:14:16Z",
"pushedAt": "2026-01-27T18:40:57Z",
"archived": false,
"fork": false,
"homepage": null,
"cloneUrl": "https://github.com/cporter202/API-mega-list.git"
}

How to run it

  1. Open the actor on Apify.
  2. Add one or more repository search queries.
  3. Optionally choose a language and topics.
  4. Set the maximum number of repositories.
  5. Start the run.
  6. Download the dataset or use the API.

Search tips

Use broad queries for discovery.

Use language filters when you need a specific developer ecosystem.

Use topic filters when you need a cleaner topical list.

Sort by stars for popular projects.

Sort by updated for recent activity.

Use multiple queries when comparing markets or competitors.

Start with 25 results before scaling to hundreds.

Handling GitHub limits

Small public searches work without a token.

GitHub applies rate limits to anonymous API traffic.

If you run large searches or owner enrichment, provide a GitHub token in the secret input field.

The token is used only for GitHub API authentication during your run.

If a rate limit is reached, the actor reports a clear error and suggests using a smaller run or token.

Integrations

You can connect the dataset to downstream tools.

  • 📊 Send CSV exports to spreadsheets for market analysis.
  • 🧩 Use Make or Zapier to trigger follow-up enrichment.
  • 🗃️ Load JSON results into a warehouse or database.
  • 📬 Feed repository lists into developer-relations workflows.
  • 🔎 Combine repository URLs with other Apify actors for broader research.

API usage

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
const run = await client.actor('fetch_cat/github-repositories-search-scraper').call({
queries: ['apify'],
language: 'javascript',
maxResults: 25
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python

from apify_client import ApifyClient
import os
client = ApifyClient(os.environ['APIFY_TOKEN'])
run = client.actor('fetch_cat/github-repositories-search-scraper').call(run_input={
'queries': ['apify'],
'language': 'javascript',
'maxResults': 25,
})
items = client.dataset(run['defaultDatasetId']).list_items().items
print(items)

cURL

curl -X POST 'https://api.apify.com/v2/acts/fetch_cat~github-repositories-search-scraper/runs?token=YOUR_APIFY_TOKEN' \
-H 'Content-Type: application/json' \
-d '{"queries":["apify"],"language":"javascript","maxResults":25}'

MCP usage

Use this actor from MCP-compatible tools through Apify MCP Server.

MCP URL:

https://mcp.apify.com/?tools=fetch_cat/github-repositories-search-scraper

Claude Code setup:

$claude mcp add apify-github-repositories "https://mcp.apify.com/?tools=fetch_cat/github-repositories-search-scraper"

Claude Desktop JSON config:

{
"mcpServers": {
"apify-github-repositories": {
"url": "https://mcp.apify.com/?tools=fetch_cat/github-repositories-search-scraper"
}
}
}

Example prompts:

  • "Find 25 JavaScript repositories about Apify and summarize the top owners."
  • "Export popular Python repositories about web scraping to a table."
  • "Search GitHub repositories for MCP tools and rank them by stars."

Data quality notes

Repository counts and metadata come from public GitHub repository records.

Stars, forks, issues, and timestamps can change over time.

Some repositories do not have a license, homepage, topics, or recent push timestamp.

Owner email is often unavailable because many GitHub users do not publish an email on their profile.

FAQ and troubleshooting

Why did my run return fewer repositories than requested?

The source may have fewer matching public repositories for your exact query, language, and topic filters.

Try removing a topic filter or using a broader query.

Why did I hit a rate limit?

Anonymous GitHub API traffic has stricter limits.

Use a smaller maxResults value or add a GitHub token for higher limits.

Do I need a GitHub token?

No for small anonymous public searches. For large runs or owner enrichment, provide a GitHub token because GitHub API limits are stricter without authentication.

Why are some fields empty?

Some public repositories do not provide homepage, license, topics, or owner profile details.

The actor keeps those fields empty instead of guessing.

Legality and responsible use

This actor extracts publicly available GitHub repository data.

Use the data responsibly and follow GitHub's terms, Apify's terms, and applicable laws.

Do not use exported data for spam, harassment, credential collection, or other abusive activity.

If you use owner details for outreach, comply with privacy and anti-spam rules in your jurisdiction.

Explore related actors from the same Apify account:

Changelog

  • 0.1 — Initial version for public GitHub repository search exports.

Need help?

If a query behaves differently than expected, start with a small maxResults value and inspect the dataset.

Then adjust language, topic, sort, and order settings.

For larger enriched runs, use a GitHub token and keep your first test small.