DEV.to Scraper — Articles by Tag & Author
Pricing
Pay per event
DEV.to Scraper — Articles by Tag & Author
Pull articles from DEV.to by tag, author, or feed via the official dev.to v1 API — title, body (Markdown), author, tags, reading time, reactions, comment count, publish timestamp — export to JSON or CSV. Free, no key needed for read access.
Pricing
Pay per event
Rating
0.0
(0)
Developer
DevilScrapes
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
14 days ago
Last modified
Categories
Share
🎯 What this scrapes
DEV.to is the internet's largest developer blogging platform: hundreds of thousands of articles, millions of reactions, and a v1 API that pages in batches of 30. This Actor wraps that API with bulk fan-out — scrape by tag (python, webdev, ai), by author username, or from the global latest or top feed — and delivers one clean, Pydantic-validated row per article with the full Markdown body included.
Whether you need a corpus for fine-tuning a code assistant, a daily digest of trending posts, or a competitive read on a DevRel team's publishing cadence, the DEV.to scraper handles the pagination, retry logic, and rate-limit pacing so you don't have to.
🔥 What we handle for you
- 🛡️ Browser fingerprint rotation —
curl-cffiimpersonates real Chrome / Firefox / Safari TLS handshakes so requests look like a live browser, not a Python script. - 🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block or rate-limit response.
- 🔁 Retries with exponential backoff on
408 / 429 / 5xx— up to 5 attempts per request,Retry-Afterheaders honoured automatically. - 🧱 Rate-limit-aware pacing — when the target pushes back we slow down and surface partial results rather than silently dropping them.
- 🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable IDs, JSON / CSV / Excel export straight from the Apify Console.
- 💰 Pay-Per-Event pricing — you pay only for results that land in your dataset. No data means no charge (beyond the small actor-start warm-up fee).
💡 Use cases
- RAG corpus seeding — pull all articles for
tag=aiortag=pythonto build a retrieval-augmented assistant grounded in real developer tutorials. - Trending tag dashboards — schedule daily runs on
mode=topand diff the output to watch which posts gain traction over time. - Author monitoring — mirror a writer's full catalogue and alert your team the moment they publish something new.
- Newsletter assembly — pull the top 10 articles from a tag each week, render them to Markdown, and pipe straight into your send queue.
- Competitor DevRel benchmarking — compare engagement (reactions, comments, reading time) across multiple author accounts to see whose content strategy is landing.
- AI training data — bulk-export articles tagged
tutorialorshowdevfor supervised fine-tuning datasets; body Markdown preserves code blocks cleanly.
⚙️ How to use it
- Click Try for free at the top of the page — no credit card required.
- Choose a
mode:tag(default),username,latest, ortop. - Fill in
tagorusernamedepending on your mode. Most other fields have production-ready defaults. - Set
maxResultsto cap output size and cost. - Click Start. Results stream into the run's dataset in real time.
- Export from Storage → Dataset as JSON, CSV, or Excel — or fetch programmatically via the Apify API.
Scheduling tip: use the Actor's built-in scheduler (Apify Console → Schedules) to run nightly tag sweeps and accumulate a growing dataset without touching a line of code.
📥 Input
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
mode | string | no | tag | Which DEV.to feed to read: tag, username, latest, or top. |
tag | string | no | python | DEV.to tag, lowercase, no #. Examples: python, webdev, typescript. Active when mode=tag. |
username | string | no | — | DEV.to username (no @). Active when mode=username. |
includeBody | boolean | no | true | Fetch the full article body in Markdown. One extra API call per article — disable to cut cost when you only need metadata. |
maxResults | integer | no | 30 | Maximum articles to return across all pages. |
concurrency | integer | no | 4 | Parallel article-body fetches (1–16). |
proxyConfiguration | object | no | {"useApifyProxy": false} | Apify Proxy configuration. We handle retry and rotation automatically. |
Example input
{"mode": "tag","tag": "python","includeBody": false,"maxResults": 3,"concurrency": 4,"proxyConfiguration": {"useApifyProxy": false}}
📤 Output
Every run produces one dataset row per article.
| Field | Type | Notes |
|---|---|---|
id | integer | DEV.to article ID. |
slug | string | URL slug. |
title | string | Article title. |
description | string | null | Short description (first sentence-ish). |
url | string | Canonical article URL. |
cover_image | string | null | Cover image URL. |
tags | array | Article tags. |
author_username | string | Author DEV.to username. |
author_name | string | Author display name. |
reading_time_minutes | integer | null | Estimated reading time. |
positive_reactions_count | integer | Sum of positive reactions. |
comments_count | integer | Comment count. |
body_markdown | string | null | Full article body in Markdown (when includeBody=true). |
published_at | string | Publish timestamp (ISO-8601). |
edited_at | string | null | Last-edited timestamp (ISO-8601). |
scraped_at | string | When this row was recorded (ISO-8601). |
Example output
{"id": 1234567,"slug": "fast-python-async-tricks-1abc","title": "Fast Python async tricks every dev should know","description": "Five patterns that cut your async boilerplate in half.","url": "https://dev.to/me/fast-python-async-tricks-1abc","author_username": "me","author_name": "Ada Dev","tags": ["python", "async", "webdev"],"positive_reactions_count": 142,"comments_count": 8,"body_markdown": "## Introduction\n\nAsync Python is...","published_at": "2025-11-12T09:00:00Z","edited_at": null,"scraped_at": "2026-06-01T07:32:11Z"}
💰 Pricing
Pay-Per-Event — you are charged only when these events fire:
| Event | USD | What it covers |
|---|---|---|
actor-start | $0.005 | One-off warm-up charge per run |
result | $0.002 | Per dataset item written |
Example: 1 000 articles ≈ $2.00 all-in. No subscription, no minimum spend. New Apify accounts get $5 of free credit — enough for ~2,500 articles before you need to add a payment method.
🚧 Limitations
- Body fetch cost: enabling
includeBody=truefires one extra API call per article, which counts toward both run time and result cost. Disable it for metadata-only sweeps. - API pagination cap: DEV.to's v1 API stops returning results after 1 000 articles per tag/mode. Use narrower tags or multiple sequential runs to exceed that limit.
- Images and code-block formatting: body Markdown is delivered as-is from the API. Images are URLs, not downloaded files. Code blocks are fenced Markdown, not rendered HTML.
- Cached API responses: if DEV.to's API returns a stale version of a recently edited article, that is what lands in your dataset. Re-run to refresh.
❓ FAQ
Do I need a DEV.to API key?
No. The DEV.to v1 API allows unauthenticated GET requests for public articles. The Actor handles authentication transparently where needed for higher rate-limit tiers.
What is the difference between mode=latest and mode=top?
latest returns articles in reverse-chronological order — useful for real-time monitoring. top returns all-time highest-reaction articles — useful for corpus building and quality filtering.
Can I scrape a specific DEV.to tag feed in bulk — for example all javascript articles?
Yes. Set mode=tag, tag=javascript, and maxResults to your desired cap. The Actor pages through the API automatically. Note the 1 000-article API cap per tag (see Limitations).
How do I scrape dev.to articles from multiple tags in one run?
The current version accepts a single tag per run. For multi-tag sweeps, use the Apify API to launch parallel runs — one per tag — and merge the datasets downstream.
What is dev.to articles api and how does this Actor relate to it?
DEV.to exposes a REST v1 API at https://developers.forem.com/api/v1. This Actor wraps it, handling pagination, retries, backoff, and clean row output — so you write zero client code.
Can I use the output for an AI / RAG dataset?
Yes — the body_markdown field preserves code blocks, headings, and list formatting, making it directly ingestible by most embedding pipelines. Check DEV.to's Terms of Service for your specific use case.
What happens if the Actor hits a rate limit mid-run?
We back off, wait for the Retry-After window, and resume. You will see a partial-progress message in the run log. If the target stops responding entirely, the Actor exits with a clear status message rather than returning a silent empty dataset.
💬 Your feedback
Spotted a bug, need an extra field, or want multi-tag support in one run? Open an issue on the Actor's Issues tab in Apify Console — we read every report and ship fixes weekly.