DEV.to Scraper — Articles by Tag & Author avatar

DEV.to Scraper — Articles by Tag & Author

Pricing

Pay per event

Go to Apify Store
DEV.to Scraper — Articles by Tag & Author

DEV.to Scraper — Articles by Tag & Author

Pull articles from DEV.to by tag, author, or feed via the official dev.to v1 API — title, body (Markdown), author, tags, reading time, reactions, comment count, publish timestamp — export to JSON or CSV. Free, no key needed for read access.

Pricing

Pay per event

Rating

0.0

(0)

Developer

DevilScrapes

DevilScrapes

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

14 days ago

Last modified

Share


🎯 What this scrapes

DEV.to is the internet's largest developer blogging platform: hundreds of thousands of articles, millions of reactions, and a v1 API that pages in batches of 30. This Actor wraps that API with bulk fan-out — scrape by tag (python, webdev, ai), by author username, or from the global latest or top feed — and delivers one clean, Pydantic-validated row per article with the full Markdown body included.

Whether you need a corpus for fine-tuning a code assistant, a daily digest of trending posts, or a competitive read on a DevRel team's publishing cadence, the DEV.to scraper handles the pagination, retry logic, and rate-limit pacing so you don't have to.

🔥 What we handle for you

  • 🛡️ Browser fingerprint rotationcurl-cffi impersonates real Chrome / Firefox / Safari TLS handshakes so requests look like a live browser, not a Python script.
  • 🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block or rate-limit response.
  • 🔁 Retries with exponential backoff on 408 / 429 / 5xx — up to 5 attempts per request, Retry-After headers honoured automatically.
  • 🧱 Rate-limit-aware pacing — when the target pushes back we slow down and surface partial results rather than silently dropping them.
  • 🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable IDs, JSON / CSV / Excel export straight from the Apify Console.
  • 💰 Pay-Per-Event pricing — you pay only for results that land in your dataset. No data means no charge (beyond the small actor-start warm-up fee).

💡 Use cases

  • RAG corpus seeding — pull all articles for tag=ai or tag=python to build a retrieval-augmented assistant grounded in real developer tutorials.
  • Trending tag dashboards — schedule daily runs on mode=top and diff the output to watch which posts gain traction over time.
  • Author monitoring — mirror a writer's full catalogue and alert your team the moment they publish something new.
  • Newsletter assembly — pull the top 10 articles from a tag each week, render them to Markdown, and pipe straight into your send queue.
  • Competitor DevRel benchmarking — compare engagement (reactions, comments, reading time) across multiple author accounts to see whose content strategy is landing.
  • AI training data — bulk-export articles tagged tutorial or showdev for supervised fine-tuning datasets; body Markdown preserves code blocks cleanly.

⚙️ How to use it

  1. Click Try for free at the top of the page — no credit card required.
  2. Choose a mode: tag (default), username, latest, or top.
  3. Fill in tag or username depending on your mode. Most other fields have production-ready defaults.
  4. Set maxResults to cap output size and cost.
  5. Click Start. Results stream into the run's dataset in real time.
  6. Export from Storage → Dataset as JSON, CSV, or Excel — or fetch programmatically via the Apify API.

Scheduling tip: use the Actor's built-in scheduler (Apify Console → Schedules) to run nightly tag sweeps and accumulate a growing dataset without touching a line of code.

📥 Input

FieldTypeRequiredDefaultNotes
modestringnotagWhich DEV.to feed to read: tag, username, latest, or top.
tagstringnopythonDEV.to tag, lowercase, no #. Examples: python, webdev, typescript. Active when mode=tag.
usernamestringnoDEV.to username (no @). Active when mode=username.
includeBodybooleannotrueFetch the full article body in Markdown. One extra API call per article — disable to cut cost when you only need metadata.
maxResultsintegerno30Maximum articles to return across all pages.
concurrencyintegerno4Parallel article-body fetches (1–16).
proxyConfigurationobjectno{"useApifyProxy": false}Apify Proxy configuration. We handle retry and rotation automatically.

Example input

{
"mode": "tag",
"tag": "python",
"includeBody": false,
"maxResults": 3,
"concurrency": 4,
"proxyConfiguration": {
"useApifyProxy": false
}
}

📤 Output

Every run produces one dataset row per article.

FieldTypeNotes
idintegerDEV.to article ID.
slugstringURL slug.
titlestringArticle title.
descriptionstring | nullShort description (first sentence-ish).
urlstringCanonical article URL.
cover_imagestring | nullCover image URL.
tagsarrayArticle tags.
author_usernamestringAuthor DEV.to username.
author_namestringAuthor display name.
reading_time_minutesinteger | nullEstimated reading time.
positive_reactions_countintegerSum of positive reactions.
comments_countintegerComment count.
body_markdownstring | nullFull article body in Markdown (when includeBody=true).
published_atstringPublish timestamp (ISO-8601).
edited_atstring | nullLast-edited timestamp (ISO-8601).
scraped_atstringWhen this row was recorded (ISO-8601).

Example output

{
"id": 1234567,
"slug": "fast-python-async-tricks-1abc",
"title": "Fast Python async tricks every dev should know",
"description": "Five patterns that cut your async boilerplate in half.",
"url": "https://dev.to/me/fast-python-async-tricks-1abc",
"author_username": "me",
"author_name": "Ada Dev",
"tags": ["python", "async", "webdev"],
"positive_reactions_count": 142,
"comments_count": 8,
"body_markdown": "## Introduction\n\nAsync Python is...",
"published_at": "2025-11-12T09:00:00Z",
"edited_at": null,
"scraped_at": "2026-06-01T07:32:11Z"
}

💰 Pricing

Pay-Per-Event — you are charged only when these events fire:

EventUSDWhat it covers
actor-start$0.005One-off warm-up charge per run
result$0.002Per dataset item written

Example: 1 000 articles ≈ $2.00 all-in. No subscription, no minimum spend. New Apify accounts get $5 of free credit — enough for ~2,500 articles before you need to add a payment method.

🚧 Limitations

  • Body fetch cost: enabling includeBody=true fires one extra API call per article, which counts toward both run time and result cost. Disable it for metadata-only sweeps.
  • API pagination cap: DEV.to's v1 API stops returning results after 1 000 articles per tag/mode. Use narrower tags or multiple sequential runs to exceed that limit.
  • Images and code-block formatting: body Markdown is delivered as-is from the API. Images are URLs, not downloaded files. Code blocks are fenced Markdown, not rendered HTML.
  • Cached API responses: if DEV.to's API returns a stale version of a recently edited article, that is what lands in your dataset. Re-run to refresh.

❓ FAQ

Do I need a DEV.to API key?

No. The DEV.to v1 API allows unauthenticated GET requests for public articles. The Actor handles authentication transparently where needed for higher rate-limit tiers.

What is the difference between mode=latest and mode=top?

latest returns articles in reverse-chronological order — useful for real-time monitoring. top returns all-time highest-reaction articles — useful for corpus building and quality filtering.

Can I scrape a specific DEV.to tag feed in bulk — for example all javascript articles?

Yes. Set mode=tag, tag=javascript, and maxResults to your desired cap. The Actor pages through the API automatically. Note the 1 000-article API cap per tag (see Limitations).

How do I scrape dev.to articles from multiple tags in one run?

The current version accepts a single tag per run. For multi-tag sweeps, use the Apify API to launch parallel runs — one per tag — and merge the datasets downstream.

What is dev.to articles api and how does this Actor relate to it?

DEV.to exposes a REST v1 API at https://developers.forem.com/api/v1. This Actor wraps it, handling pagination, retries, backoff, and clean row output — so you write zero client code.

Can I use the output for an AI / RAG dataset?

Yes — the body_markdown field preserves code blocks, headings, and list formatting, making it directly ingestible by most embedding pipelines. Check DEV.to's Terms of Service for your specific use case.

What happens if the Actor hits a rate limit mid-run?

We back off, wait for the Retry-After window, and resume. You will see a partial-progress message in the run log. If the target stops responding entirely, the Actor exits with a clear status message rather than returning a silent empty dataset.

💬 Your feedback

Spotted a bug, need an extra field, or want multi-tag support in one run? Open an issue on the Actor's Issues tab in Apify Console — we read every report and ship fixes weekly.