# Github Profile Scraper (`vulnv/github-profile-scraper`) Actor

Scrapes GitHub user profiles including bio, repositories, followers, contributions, and more. Accepts a list of usernames and extracts comprehensive profile data.

- **URL**: https://apify.com/vulnv/github-profile-scraper.md
- **Developed by:** [VulnV](https://apify.com/vulnv) (community)
- **Categories:** Lead generation, Other
- **Stats:** 34 total users, 1 monthly users, 100.0% runs succeeded, 1 bookmarks
- **User rating**: No ratings yet

## Pricing

$20.00/month + usage

To use this Actor, you pay a monthly rental fee to the developer. The rent is subtracted from your prepaid usage every month after the free trial period.You also pay for the Apify platform usage, which gets cheaper the higher Apify subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#rental-actors

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## 🚀 GitHub Profile Scraper ⚡ Extract Developer Profiles at Scale

### Overview
The **GitHub Profile Scraper** is a powerful Apify Actor designed to extract comprehensive data from GitHub user profiles efficiently. Perfect for recruitment, developer research, competitive analysis, or building developer databases — this scraper provides detailed insights into GitHub users' professional profiles, repositories, and contributions.

✅ Bulk username processing | ✅ Comprehensive profile data | ✅ Email extraction (when public) | ✅ Repository analysis | ✅ Contribution tracking

---

#### **Complete Profile Data Extraction**
- **Basic Information** — Name, username, bio, location, website
- **Contact Details** — Email addresses (when publicly visible)
- **Professional Details** — Company, Twitter/X handle
- **Network Statistics** — Followers, following counts
- **Repository Data** — Public repositories count, pinned repositories with details
- **Activity Metrics** — Contribution counts and contribution graph data
- **Social Links** — Website, social media profiles
- **Starred Repositories** — List of starred projects (when accessible)

#### **Key Features**
- **Bulk Processing** — Process multiple GitHub usernames in one run
- **Smart Email Detection** — Extracts emails using multiple methods including `itemprop="email"` elements (only for publicly visible emails)
- **Proxy Support** — Built-in Apify proxy integration for reliable scraping
- **Error Handling** — Robust error handling with detailed status reporting
- **Clean JSON Output** — Structured, ready-to-use data format
- **Username Validation** — Automatic username cleaning and validation with GitHub format requirements
- **Format Flexibility** — Accepts various username formats and automatically normalizes them

---

### 🧾 Input Configuration

Submit an array of GitHub usernames via the input schema:

```json
{
  "usernames": [
    "johndeveloper",
    "jane-coder", 
    "techexpert123",
    "@another-user",
    "https://github.com/some-developer"
  ],
  "max_threads": 5,
  "proxy_configuration": {
    "useApifyProxy": true,
    "apifyProxyGroups": ["RESIDENTIAL"]
  }
}
````

**Note:** The scraper automatically normalizes different username formats and validates them against GitHub's requirements. Invalid usernames will be skipped with warning messages.

#### **Input Parameters**

1. **Usernames** (required):
   - Array of GitHub usernames to scrape
   - **Supported formats**: `username`, `@username`, `github.com/username`, `https://github.com/username`
   - **Username requirements**: Must follow GitHub's username rules (alphanumeric characters and hyphens, no consecutive hyphens, cannot start/end with hyphen, max 39 characters)
   - Invalid usernames will be automatically filtered out with warnings

2. **Max Threads** (optional):
   - Number of concurrent threads for scraping (1-20)
   - Default: 5
   - Higher values = faster processing but may increase chance of rate limiting

3. **Proxy Configuration** (recommended):
   - Enable Apify proxy to avoid rate limiting
   - **Recommended for bulk scraping operations**

***

### 📤 Output Format

Each GitHub profile returns structured data such as:

```json
{
  "username": "johndeveloper",
  "status": "success",
  "name": "John Developer",
  "bio": "Full-stack developer passionate about open source",
  "location": "San Francisco, CA",
  "email": "john@example.com",
  "website": "https://johndeveloper.dev",
  "twitter": "john_codes",
  "followers": "1234",
  "following": "456",
  "repos_count": "42",
  "contribs": "567 contributions in the last year",
  "pinnedrepos": [
    {
      "name": "awesome-project",
      "url": "https://github.com/johndeveloper/awesome-project",
      "desc": "An innovative web application framework",
      "lang": "JavaScript",
      "stars": "2,500",
      "forks": "320"
    }
  ],
  "repos": [
    {
      "url": "https://github.com/johndeveloper/web-framework",
      "name": "web-framework",
      "desc": "Modern web development framework",
      "stars": "1850",
      "forks": "210",
      "languages": [
        {"lang": "JavaScript", "percent": "78.2%"},
        {"lang": "TypeScript", "percent": "18.5%"}
      ]
    }
  ],
  "starred_repos_list": [
    {
      "url": "https://github.com/example-org/popular-tool",
      "name": "popular-tool"
    }
  ],
  "contrib_matrix": [
    {
      "date": "2024-01-01",
      "count": "3",
      "level": "1"
    }
  ]
}
```

#### **Error Handling**

Failed profiles return structured error information:

```json
{
  "username": "nonexistent-user",
  "status": "not_found",
  "message": "User not found"
}
```

**Common Error Cases:**

- `not_found` — User doesn't exist or profile is private
- `error` — Network issues or scraping errors
- Invalid usernames are filtered out before processing with warning logs

***

### 💼 Common Use Cases

#### **Recruitment & Talent Sourcing**

- Research developer profiles and technical expertise
- Analyze contribution patterns and project involvement
- Build comprehensive talent pipelines with GitHub activity data
- Assess coding skills through repository analysis

#### **Developer Research & Analysis**

- Study open source community members and contributors
- Analyze technology trends through developer profiles
- Research competitor team structures and technical expertise
- Track developer career progression and project involvement

#### **Lead Generation & Business Development**

- Extract contact information for developer outreach
- Build databases of potential customers in tech sectors
- Identify decision-makers in technology companies
- Enrich existing contact databases with GitHub profiles

#### **Community Building & Networking**

- Find developers with specific skills or interests
- Build communities around particular technologies
- Identify potential collaborators for open source projects
- Research conference speakers and industry experts

***

### 📊 Output & Export Options

#### **Dataset Storage**

- All extracted data stored in Apify dataset
- Each profile becomes one dataset item
- Status tracking for successful and failed extractions

#### **Export Formats**

- **JSON** — Raw structured data for API integration
- **CSV** — Spreadsheet-compatible format for analysis
- **Excel** — Formatted spreadsheet with profile data

#### **Data Processing**

- Clean, validated usernames
- Structured error reporting
- Comprehensive logging for troubleshooting

***

### ⚡ Quick Start Guide

1. **Configure Input**:
   - Add GitHub usernames to the `usernames` array
   - Set desired `max_threads` (recommended: 5-10)
   - Enable proxy configuration for reliable scraping

2. **Run the Actor**:
   - Execute through Apify Console or API
   - Monitor progress through real-time logs
   - Review extracted data in the dataset

3. **Export Results**:
   - Download data in your preferred format
   - Integrate with your existing tools and workflows

***

### 🛡️ Privacy & Compliance

- **Public Data Only** — Extracts only publicly visible profile information
- **Respects Privacy Settings** — Email extraction only works for publicly visible emails
- **Rate Limiting** — Built-in delays and proxy support to respect GitHub's terms
- **Error Handling** — Graceful handling of private or restricted profiles

***

### 🔧 Technical Details

#### **Built With**

- **Python & BeautifulSoup** — Efficient HTML parsing and data extraction
- **Apify SDK** — Robust actor framework with built-in storage and proxy support
- **Multi-threading** — Concurrent processing for improved performance
- **Request Handling** — Smart retry mechanisms and error recovery

#### **Performance**

- Process hundreds of profiles per run
- Configurable concurrency for optimal speed
- Proxy rotation for reliable access
- Comprehensive error logging and recovery

***

### 📈 Example Results

#### **Successful Profile Extraction**

```json
{
  "username": "jane-coder",
  "status": "success",
  "name": "Jane Smith",
  "bio": "Frontend developer specializing in React and TypeScript. Open source enthusiast.",
  "location": "Austin, TX",
  "email": null,
  "website": "https://jane-codes.dev",
  "followers": "3456",
  "following": "234",
  "repos_count": "87",
  "pinnedrepos": [
    {
      "name": "react-toolkit",
      "desc": "Comprehensive React development toolkit",
      "stars": "8500",
      "lang": "TypeScript"
    }
  ]
}
```

***

### 💡 Tips for Best Results

- **Enable Proxies** — Use Apify proxy configuration for reliable large-scale scraping
- **Username Format** — Ensure usernames follow GitHub's format rules:
  - Only alphanumeric characters and hyphens allowed
  - Cannot start or end with a hyphen
  - No consecutive hyphens (e.g., `user--name` is invalid)
  - Maximum 39 characters
  - Invalid usernames will be skipped with warnings
- **Monitor Rate Limits** — Use appropriate thread counts to avoid GitHub rate limiting
- **Handle Private Profiles** — Some data may not be available for users with privacy settings
- **Email Availability** — Email extraction only works for publicly visible emails (most users keep emails private)

***

### 🆘 Support & Feedback

For questions, feature requests, or technical support:

- Visit the [Apify Community Forum](https://forum.apify.com)
- Contact us through the Apify platform
- Submit issues for improvements and bug reports

***

### 🌟 Explore More Actors

✨ **Need more scraping solutions?** Discover additional actors on Apify for comprehensive web automation and data extraction. Explore our full range of tools at 🌐 [Explore More Actors on Apify](https://apify.com/vulnv).

📧 For inquiries or custom development, reach out at apify@vulnv.com.

# Actor input Schema

## `usernames` (type: `array`):

List of GitHub usernames to scrape profiles for

## `max_threads` (type: `integer`):

Maximum number of concurrent threads for scraping (default: 5)

## `proxy_configuration` (type: `object`):

Select proxies to be used by your scraper.

## Actor input object example

```json
{
  "usernames": [
    "aws"
  ],
  "max_threads": 5,
  "proxy_configuration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ]
  }
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "usernames": [
        "aws"
    ],
    "proxy_configuration": {
        "useApifyProxy": true,
        "apifyProxyGroups": [
            "RESIDENTIAL"
        ]
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("vulnv/github-profile-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "usernames": ["aws"],
    "proxy_configuration": {
        "useApifyProxy": True,
        "apifyProxyGroups": ["RESIDENTIAL"],
    },
}

# Run the Actor and wait for it to finish
run = client.actor("vulnv/github-profile-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "usernames": [
    "aws"
  ],
  "proxy_configuration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ]
  }
}' |
apify call vulnv/github-profile-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=vulnv/github-profile-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Github Profile Scraper",
        "description": "Scrapes GitHub user profiles including bio, repositories, followers, contributions, and more. Accepts a list of usernames and extracts comprehensive profile data.",
        "version": "1.0",
        "x-build-id": "KZPdl18CHoToEQF7N"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/vulnv~github-profile-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-vulnv-github-profile-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/vulnv~github-profile-scraper/runs": {
            "post": {
                "operationId": "runs-sync-vulnv-github-profile-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/vulnv~github-profile-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-vulnv-github-profile-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "usernames"
                ],
                "properties": {
                    "usernames": {
                        "title": "GitHub Usernames",
                        "type": "array",
                        "description": "List of GitHub usernames to scrape profiles for",
                        "items": {
                            "type": "string"
                        }
                    },
                    "max_threads": {
                        "title": "Maximum Threads",
                        "minimum": 1,
                        "maximum": 20,
                        "type": "integer",
                        "description": "Maximum number of concurrent threads for scraping (default: 5)",
                        "default": 5
                    },
                    "proxy_configuration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Select proxies to be used by your scraper."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
