Bluesky Scraper — Feed Posts avatar

Bluesky Scraper — Feed Posts

Pricing

Pay per event

Go to Apify Store
Bluesky Scraper — Feed Posts

Bluesky Scraper — Feed Posts

Export posts from any public Bluesky custom or algorithm feed via the AT Protocol API — feed metadata and engagement counts — to JSON or CSV. No login needed; we page through and retry so the whole feed lands.

Pricing

Pay per event

Rating

0.0

(0)

Developer

DevilScrapes

DevilScrapes

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

15 days ago

Last modified

Categories

Share

Bluesky Scraper — Feed Posts

Bluesky Scraper — Export Posts from Any Custom or Algorithmic Feed

We do the dirty work so your dataset stays clean. 😈

$2.05 / 1,000 posts — pay only for results that land. No credit card required to try.

Export posts from any public Bluesky custom or algorithm feed — including "Discover", "What's Hot", and community-built feed generators. We handle the AT Protocol cursor pagination, rate-limit back-off, and fingerprint rotation so you get clean, denormalised rows ready for spreadsheets, BI tools, or SQL — no AT Protocol knowledge required.

🎯 What this scrapes

Two operating modes, controlled by which input field you set:

  1. Single-feed mode — provide a feed URI (AT URI or bsky.app/profile/.../feed/... web URL) and the Actor exports every post in that feed up to the per-feed cap.
  2. Creator-discovery mode — provide a creator's Bluesky handle (e.g. bsky.app) and the Actor calls app.bsky.feed.getActorFeeds to enumerate every feed that creator publishes, then scrapes each one in turn.

For each post you receive the post body, engagement counts (likes, reposts, replies, quotes), author handle and DID, post CID, and indexing timestamp — plus the parent feed's display name, creator handle, and description denormalised onto every row so a CSV export is entirely self-contained.

FieldTypeDescription
feed_uristringAT URI of the feed generator
feed_display_namestringHuman-readable feed name (e.g. Discover)
feed_creator_handlestringBluesky handle of the feed creator
feed_descriptionstring | nullFeed description text set by the creator
post_uristringAT URI of the post
post_cidstringContent identifier (CID) of the post record
post_indexed_atstringISO 8601 datetime the post was indexed by the AppView
post_textstringBody text of the post
post_langstring | nullPrimary language code (e.g. en), if present
post_reply_countintegerNumber of replies
post_repost_countintegerNumber of reposts
post_like_countintegerNumber of likes
post_quote_countintegerNumber of quote posts
author_didstringDecentralized identifier of the post author
author_handlestringBluesky handle of the post author
author_display_namestring | nullDisplay name of the post author
scraped_atstringISO 8601 UTC datetime this row was written

🔥 Features

  • What we handle for you: fingerprint rotation (curl-cffi browser impersonation — Chrome / Firefox profiles), exponential back-off on 408 / 429 / 503 with Retry-After honoured, up to 5 retries per failed request, proxy routing via Apify Proxy on demand, and Pydantic v2-validated rows. You get the clean CSV; we get the error logs.
  • Two operating modes: single feed URI or discover-all-feeds-by-creator via getActorFeeds.
  • Accepts either AT URIs (at://did:plc:.../app.bsky.feed.generator/whats-hot) or bsky.app web URLs — the Actor rewrites web URLs to AT URI form automatically.
  • Denormalised output — feed metadata (name, description, creator handle) on every post row, no joins needed for downstream analytics or CSV exports.
  • Cursor-based pagination with a client-side per-feed cap so you only pay for what you need.
  • Pure HTTP scraping with browser fingerprint impersonation — no browser automation overhead.
  • Pydantic v2 input validation with XOR guard: exactly one of feedUri or creatorHandle must be set.
  • Pairs with the companion bluesky-starter-pack Actor as the Bluesky Intel Suite.
  • Pay only for results that land — no data written means no result-row charges (only the small actor-start warm-up fee).

💡 Use cases

  • Algorithm research — sample what posts the "Discover" / "What's Hot" algorithmic feeds actually surface across days or weeks; analyse topic drift and amplification patterns.
  • Newsroom monitoring — subscribe to curated topic feeds for breaking-news posts on specific beats, then pipe to Slack or a Google Sheet via Apify integrations.
  • Marketing intelligence — see which posts are amplified by community feeds in your niche; measure which content formats dominate each feed's engagement distribution.
  • Creator analytics — pull every post a niche feed generator surfaces and rank by like / repost / quote ratios to benchmark your own posts against feed peers.
  • Dataset bootstrap — collect labelled training data from topic-curated feeds for downstream NLP or sentiment models without manual tagging of raw timelines.
  • Competitive monitoring — track community-curated feeds that aggregate competitor announcements, support complaints, or product mentions.
  • Academic social-media research — Bluesky's public AT Protocol data is significantly more research-accessible than the current Twitter / X API; this Actor is a low-cost entry point for longitudinal feed studies.

⚙️ How to use it

  1. Open the Actor input form.
  2. Either paste a feed AT URI or bsky.app web URL into Feed URI or URL (single-feed mode) or type a Bluesky handle into Creator handle (discovery mode). Setting both is an error; setting neither is also an error — the Actor fails fast before making any network call.
  3. Adjust Max posts per feed (default 100, maximum 5000).
  4. In discovery mode, adjust Max feeds to cap how many of the creator's feeds are scraped (default 5, maximum 50).
  5. Click Start and watch the run log. Results stream into the default dataset in real time and can be downloaded as JSON, CSV, Excel, or XML via the Export button.

Finding a feed URI

Every Bluesky feed has a bsky.app URL in the form https://bsky.app/profile/<creator>/feed/<rkey>. Examples:

  • https://bsky.app/profile/bsky.app/feed/whats-hot — Bluesky's "Discover" feed
  • https://bsky.app/profile/bsky.app/feed/with-friends — "With Friends"

Paste the full URL into the Feed URI or URL field and this Actor converts it to AT URI form internally. You can also paste the raw AT URI directly if you have it.

📥 Input

FieldTypeRequiredDefaultDescription
feedUristringone-ofAT URI or bsky.app/profile/<handle>/feed/<rkey> URL of one feed
creatorHandlestringone-ofBluesky handle or DID; drives getActorFeeds discovery
maxPostsPerFeedintegerno100Max post rows emitted per feed (1–5000)
maxFeedsintegerno5Max feeds processed in discovery mode (1–50)
useProxybooleannofalseRoute requests through Apify Proxy

Exactly one of feedUri and creatorHandle must be set. Setting both, or neither, causes the Actor to exit immediately with a clear error message.

Single-feed mode example

{
"feedUri": "at://did:plc:z72i7hdynmk6r22z27h6tvur/app.bsky.feed.generator/whats-hot",
"maxPostsPerFeed": 100,
"useProxy": false
}

Creator-discovery mode example

{
"creatorHandle": "bsky.app",
"maxPostsPerFeed": 50,
"maxFeeds": 10,
"useProxy": false
}

📤 Output

One row per post. Feed metadata is denormalised onto every row so a flat CSV is self-contained.

{
"feed_uri": "at://did:plc:z72i7hdynmk6r22z27h6tvur/app.bsky.feed.generator/whats-hot",
"feed_display_name": "Discover",
"feed_creator_handle": "bsky.app",
"feed_description": "Trending content from your personal network",
"post_uri": "at://did:plc:sj5wj7libgr7omqiotenxadx/app.bsky.feed.post/3mlxmr4jyfs2s",
"post_cid": "bafyreidgimgd7v3g3pazsp5oq7ur6bvedpnwohul26mss7cbffg6bdqjkm",
"post_indexed_at": "2026-05-16T10:20:40.467Z",
"post_text": "If you never read the book or saw the movie, you missed one of the greatest Pulitzer Prize winning sagas ever written.",
"post_lang": "en",
"post_reply_count": 89,
"post_repost_count": 414,
"post_like_count": 1288,
"post_quote_count": 27,
"author_did": "did:plc:sj5wj7libgr7omqiotenxadx",
"author_handle": "louiseplease.bsky.social",
"author_display_name": "Louise",
"scraped_at": "2026-05-16T12:00:00+00:00"
}

Optional fields (feed_description, post_lang, author_display_name) are emitted as null when the API does not return them. Rows are never dropped for missing optional fields.

Export formats

After a run completes, click Export in the Apify Console to download:

  • JSON — full fidelity, all fields, newline-delimited
  • CSV — flat, one row per post, all columns including denormalised feed metadata
  • Excel.xlsx via the Apify dataset converter
  • XML — structured per-item

All formats are available via the Apify API: GET /datasets/{id}/items?format=csv&clean=true.

💰 Pricing

Pay-Per-Event (PPE) — you pay only for what lands in your dataset:

EventPrice (USD)When
actor-start$0.05Once per run, at boot
result-row$0.002Per post row written to the dataset

Example costs

Posts scrapedActor startsTotal cost
1001$0.25
5001$1.05
1,0001$2.05
5,0001$10.05

At the maximum single-run input (50 feeds × 100 posts = 5,000 rows) a single run costs around $10.05. Zero rows written means you pay only the $0.05 start fee.

This rate is consistent with the companion Actor bluesky-starter-pack so the Bluesky Intel Suite has uniform pricing across both tools.

🚧 Limitations

  • Private or access-restricted feeds are not exposed by the public AppView API — only feeds whose data is visible at public.api.bsky.app can be scraped.
  • Global feed discovery by keyword is not supported — Bluesky's getPopularFeedGenerators endpoint returns MethodNotImplemented on the public AppView. Use creator-discovery mode (creatorHandle) to enumerate one creator's feeds at a time.
  • Post images, embeds, and quoted-post bodies are not extracted — only the plain-text body (post_text) is captured. Image ALT text, external link cards, and quoted-post content are outside the current schema.
  • Reply thread expansion is out of scope — only the top-level post row is emitted. Threaded context (parent / root posts) would require additional getPostThread calls and is not wired in this version.
  • The maxPostsPerFeed cap is client-side — the Actor paginates until it has collected the cap or the API cursor is exhausted. If a feed has fewer posts than the cap, fewer rows are returned. This is expected behaviour, not a failure.
  • Storage retention on FREE Apify plans — run-scoped storage is retained for 7 days. Export your dataset immediately after the run or upgrade to a paid plan for longer retention.
  • Rate limiting — the Actor retries on 429 with exponential back-off and honours Retry-After, but very large scrapes (tens of thousands of rows) may require splitting into multiple runs to stay within per-window limits.
  • AT Protocol is pre-1.0 — Bluesky's endpoint surface has changed without warning in the past. We monitor the AT Protocol Discord #breaking-changes channel and ship patches within days of validated breakage reports.

Tips for best results

  • Use AT URIs when possible. The Actor resolves bsky.app web URLs on the fly (one extra getProfile API call), which adds latency. Pasting the AT URI directly skips this step.
  • Cap maxPostsPerFeed to what you actually need. Feeds like "Discover" can have hundreds of posts; a lower cap keeps cost and runtime predictable.
  • Prefer creator-discovery mode for bulk collection. If you want posts from all feeds by a creator like bsky.app, use creatorHandle: "bsky.app" rather than multiple single-feed runs — the Actor handles pagination for each feed sequentially.
  • Schedule recurring runs to track feed evolution. Set up an Apify Schedule to run this Actor daily or weekly on a specific feed. Use a named dataset (via Apify API datasetName parameter at run time) to accumulate rows across runs.
  • Use the CSV export for spreadsheet workflows. Because feed metadata is denormalised onto every row, no pivot or VLOOKUP is needed — the CSV is immediately usable in Google Sheets or Excel.
  • Combine with bluesky-starter-pack. If you want both the posts from a community feed and the member list of the Starter Pack that drives that community, run both Actors and join on author_handle.

Integrations

This Actor works natively with the Apify platform's built-in connectors:

  • Apify API — trigger runs programmatically, poll for status, and fetch dataset items via REST. Full OpenAPI spec at https://docs.apify.com/api/v2.
  • Webhooks — configure a webhook to POST the run result to your endpoint as soon as the Actor finishes.
  • Apify Schedules — run this Actor on a cron schedule (e.g. daily at 08:00 UTC) to keep a feed dataset fresh.
  • Make (formerly Integromat) — use the Apify Make module to trigger runs and route results to Google Sheets, Airtable, Slack, or anywhere Make connects.
  • Zapier — Apify's Zapier integration triggers on run completion and passes dataset items downstream.
  • n8n — use the HTTP Request node with the Apify REST API for fully self-hosted automation pipelines.

❓ FAQ

Do I need a Bluesky account to use this Actor?

The AT Protocol public AppView at public.api.bsky.app/xrpc/ is unauthenticated by design — every endpoint this Actor calls is open without a login or API key. You run the Actor; we handle the protocol plumbing.

What is a feed URI?

An AT URI like at://did:plc:z72i7hdynmk6r22z27h6tvur/app.bsky.feed.generator/whats-hot. The segment after at:// is a DID — a decentralized identifier. The collection is always app.bsky.feed.generator. The final segment is the rkey that identifies the specific feed. You can also paste a bsky.app web URL — the Actor converts it automatically.

How do I scrape all feeds published by a single creator?

Set the creatorHandle input to the creator's Bluesky handle (e.g. bsky.app) and leave feedUri blank. The Actor calls app.bsky.feed.getActorFeeds and scrapes each feed in turn, up to the maxFeeds cap.

Can I export Bluesky's built-in feeds like "Discover" or "What's Hot"?

Yes. Those are published by the bsky.app account. Use the feed URI at://did:plc:z72i7hdynmk6r22z27h6tvur/app.bsky.feed.generator/whats-hot for "Discover" / "What's Hot", or paste the bsky.app/profile/bsky.app/feed/whats-hot URL. You can also use creator-discovery mode with creatorHandle: "bsky.app" to get all feeds that account publishes.

Is this a good Twitter / X API alternative in 2026?

For public feed data, yes — AT Protocol is explicitly designed as a research-accessible open protocol, and the data model is significantly more predictable than Twitter's current API. That said, we recommend framing your project as "Bluesky-native" rather than "Twitter replacement" — the audiences and content patterns differ.

Is exporting public Bluesky feeds legal?

The AT Protocol is an open, federated protocol. public.api.bsky.app is explicitly unauthenticated and publicly accessible. The Bluesky Terms of Service permit accessing public data programmatically as long as you do not impersonate users or violate AT Protocol data-portability principles. Always verify the current Terms of Service at bsky.social/about/support/tos and your local jurisdiction's data-protection rules before using scraped data commercially.

How do I export to Google Sheets?

After the run finishes, click Export → CSV in the Apify Console and import the file into Google Sheets. Alternatively, use the Apify API URL shown in the run's Output tab to import data directly via =IMPORTDATA("...") in Sheets.

What happens if a feed is empty or the URI is wrong?

The Actor exits with a non-zero status code and a clear status message: "No posts emitted — feed may be empty, private, or the URI invalid." The dataset will have zero rows and the actor-start fee is the only charge.

  • Bluesky Starter Pack Scraper — companion Actor in the Bluesky Intel Suite; exports full member lists from any public Bluesky Starter Pack. Pair with this Actor to cross-reference feed posts with community membership data.

💬 Your feedback

Found a bug, hit a rate limit, or need a new field on the output row? Open an issue on the Actor's Apify Store page or contact the Devil Scrapes team at apify.com/DevilScrapes. We ship updates within days of validated reports.