rss2masto

package module
v1.1.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 16, 2026 License: MIT Imports: 27 Imported by: 0

README

rss2masto

Go Report Card Release Go Reference GitHub go.mod Go version License

A Go library for publishing RSS/Atom feed items as Mastodon posts. Designed to handle hundreds or thousands of feeds concurrently, with built-in scheduling, deduplication via Redis, and ETag-based conditional fetching.

Features

  • Concurrent processing of any number of RSS/Atom feeds using goroutines
  • Per-feed scheduler with configurable check interval
  • Deduplication via Redis — each item is posted exactly once
  • ETag / If-None-Match support — unchanged feeds skip parsing entirely
  • HTML sanitization and automatic post truncation to instance character limit
  • Hashtag generation from feed item categories or URL patterns
  • Text replacement rules per feed
  • Post visibility control (public, unlisted, private)
  • Automatic language detection from feed metadata
  • Follower count tracking per Mastodon account
  • Optional state persistence to feed.yaml

Requirements

  • Go 1.25+
  • Redis (used for deduplication and caching) — optional.
    If Redis is unavailable, an in-process TinyLFU cache is used as a fallback. Deduplication will not persist across restarts in that case.

Installation

go get -u github.com/glaydus/rss2masto

Or add the import and run go mod tidy:

import "github.com/glaydus/rss2masto"

Quick start

package main

import (
    "log"
    "time"

    "github.com/glaydus/rss2masto"
)

func main() {
    fm, err := rss2masto.NewFeedsMonitor()
    if err != nil {
        log.Fatalln(err)
    }

    // Run once
    fm.Start()

    // Or run on a ticker (e.g. every minute)
    ticker := time.NewTicker(time.Minute)
    for range ticker.C {
        fm.Start()
    }
}

NewFeedsMonitor reads feed.yaml from the current directory. Each call to Start processes all feeds whose scheduler counter has reached its configured interval.

Configuration — feed.yaml

instance:
  url: https://mastodon.example        # Mastodon instance base URL (HTTPS required)
  lang: en                             # default post language (ISO 639-1)
  timezone: Europe/Warsaw              # timezone for timestamps (IANA format)
  limit:                               # max characters per post; auto-detected from instance if empty
  save: false                          # persist last_run timestamps back to feed.yaml after each run

  feed:
    - name: My Tech Blog               # display name (used as log prefix and idempotency key prefix)
      url: https://example.com/rss     # RSS or Atom feed URL (single URL or a list of fallback URLs)
      token: <MASTODON_API_TOKEN>      # Mastodon access token for this account
      interval: 10                     # check every N scheduler ticks (e.g. 10 = every 10 minutes if ticker is 1 min)
      visibility: public               # public | unlisted | private
      prefix: Tech                     # optional hashtag prefix added to every generated tag
      hashtag:                         # static hashtag always added to every post from this feed
      hashlink:                        # regex with one capture group — extracts hashtag from item URL
      replace_from:                    # regex applied to post description
      replace_to:                      # replacement string (used with replace_from)
      replace_link:                    # regex applied to item link — all matches are removed from the URL

    - name: Another Feed
      url: https://another.example/feed.xml
      token: <ANOTHER_TOKEN>
      interval: 30
      visibility: unlisted

    - name: Feed With Fallbacks
      url:                             # list of URLs — first is primary, rest are fallbacks tried in order
        - https://primary.example/feed.xml
        - https://mirror.example/feed.xml
      token: <YET_ANOTHER_TOKEN>
      interval: 60
      visibility: public
Field reference
Field Required Default Description
instance.url yes Mastodon instance base URL
instance.lang no en Fallback post language
instance.timezone no UTC Timezone for display timestamps
instance.limit no auto Max post characters; fetched from instance API if not set
instance.save no false Write updated last_run values back to feed.yaml
feed.name no derived from URL host Feed identifier used in logs and idempotency keys
feed.url yes RSS/Atom feed endpoint — single URL string or a YAML list of URLs; the first is primary, the rest are fallbacks tried in order
feed.token yes Mastodon API access token
feed.interval no 10 Scheduler ticks between checks
feed.visibility no private Mastodon post visibility
feed.prefix no Prefix added to each generated hashtag
feed.hashtag no Static hashtag always included in every post from this feed
feed.hashlink no Regex to extract a hashtag from the item link
feed.replace_from no Regex pattern applied to post description
feed.replace_to no Replacement string for replace_from matches
feed.replace_link no Regex applied to item link — all matches are removed from the URL before posting

Redis

Redis is used for two purposes:

  1. Deduplication — an idempotency key (<feed_prefix>:<item_hash>) is stored after each successful post. Items already in Redis are skipped on subsequent runs.
  2. Caching — a local TinyLFU cache (backed by go-redis/cache) reduces Redis round-trips for hot keys.

The same idempotency key is also sent to the Mastodon API as the Idempotency-Key request header on every post. This provides a second layer of duplicate protection — if the same request is submitted more than once within 1 hour (e.g. due to a retry), the Mastodon instance will return the original status instead of creating a duplicate.

The Redis connection is configured via the REDIS_HOST environment variable:

export REDIS_HOST=localhost:6379

The value is passed directly to redis.ParseURL, so full Redis URLs are also accepted:

export REDIS_HOST=redis://:password@localhost:6379/0

Hash dictionary

When hashtags are extracted from item links via hashlink, the raw URL segment is looked up in an optional dictionary before being used as a hashtag. This lets you map slugs that would otherwise be unusable (contain hyphens, lack diacritics, etc.) to proper hashtag forms.

The dictionary is loaded from ./hashdict.txt at startup. The path can be changed by setting rss2masto.HashDictFile before calling NewFeedsMonitor. The dictionary can also be reloaded at runtime without restarting by calling rss2masto.ReloadHashDict(data).

File format
# lines starting with # are comments
krakow=Kraków
hokej-na-lodzie=HokejNaLodzie
zuzel=Żużel

Each line is key=value. The key is the raw string extracted from the URL; the value is the hashtag that will be used instead. If a key is not found in the dictionary, the original string is used (with title-casing applied).

Runtime reload
data, err := os.ReadFile("hashdict.txt")
if err == nil {
    rss2masto.ReloadHashDict(data)
}

Passing nil re-reads the file at HashDictFile:

rss2masto.ReloadHashDict(nil)

Scheduler

Start() is designed to be called repeatedly on a fixed ticker. Each feed has an interval field that acts as a divisor — a feed with interval: 10 is only processed on every 10th call to Start(). This lets you run a single tight ticker (e.g. every minute) while checking different feeds at different frequencies without managing multiple goroutines externally.

ticker: 1 minute
feed A interval: 5  → checked every  5 minutes
feed B interval: 15 → checked every 15 minutes
feed C interval: 60 → checked every  1 hour

Start() is safe to call concurrently — a built-in atomic guard prevents overlapping runs.

Scaling

The library is designed to handle large numbers of feeds efficiently:

  • All feeds within a single Start() call are processed in parallel via goroutines.
  • HTTP fetching uses fasthttp with connection pooling and DNS caching.
  • ETag support means unchanged feeds generate zero parsing overhead.
  • Redis connection pool is pre-configured for high concurrency (20 connections, 5 idle minimum).

There is no hard limit on the number of feeds. Practical limits depend on available Redis connections, network bandwidth, and Mastodon API rate limits per token.

Environment variables

Variable Description
REDIS_HOST Redis address or URL (required in production)

License

MIT

Documentation

Index

Constants

View Source
const DefaultCharacterLimit = 500 // default mastodon max character limit
View Source
const DefaultCheckInterval = 10 // default check feed interval in minutes
View Source
const DefaultUserAgent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/146.0.0.0 Safari/537.36"

Variables

View Source
var HashDictFile = "./hashdict.txt"

HashDictFile is the path to the external hash dictionary file used for hashtag translation. It maps raw strings extracted from item links (via HashLink regex) to their desired hashtag forms. Each line must be in the format: key=value (e.g. "krakow=Kraków" or "hokej-na-lodzie=HokejNaLodzie"). Lines starting with '#' are treated as comments and ignored. The file is loaded once at startup via init() and can be reloaded at runtime with ReloadHashDict.

Functions

func ReloadHashDict added in v1.1.2

func ReloadHashDict(data []byte)

ReloadHashDict replaces the hash dictionary with the provided data. The data format is the same as the file format: key=value lines, with '#' as comment prefix. If data is nil, the file at HashDictFile is read. Calling ReloadHashDict is safe for concurrent use — reads are always lock-free. Typical use: reload after updating hashdict.txt at runtime without restarting.

func ViewHashDict added in v1.1.2

func ViewHashDict() []byte

ViewHashDict returns a copy of the current dictionary, in file format. The result is sorted by key.

Types

type CacheClient added in v1.0.6

type CacheClient struct {
	// contains filtered or unexported fields
}

CacheClient is a wrapper around the redis client and the cache library

var (
	Cache *CacheClient = newCache()
)

Cache is the global cache client

func (*CacheClient) Close added in v1.1.0

func (c *CacheClient) Close()

Close closes the redis connection

func (*CacheClient) Delete added in v1.1.0

func (c *CacheClient) Delete(key string) error

Delete deletes a value from the cache for the given key

func (*CacheClient) Exists added in v1.0.6

func (c *CacheClient) Exists(key string) bool

Exists checks if a key exists in redis

func (*CacheClient) Get added in v1.0.6

func (c *CacheClient) Get(key string) (string, error)

Get gets a value from redis

func (*CacheClient) GetBytes added in v1.0.6

func (c *CacheClient) GetBytes(key string) ([]byte, error)

GetBytes gets a value from redis as bytes

func (*CacheClient) GetEx added in v1.0.6

func (c *CacheClient) GetEx(key string, expiration time.Duration) (string, error)

GetEx gets a value from redis with an expiration

func (*CacheClient) GetKeys added in v1.0.6

func (c *CacheClient) GetKeys(keyPattern string, count ...int64) ([]string, error)

GetKeys gets all keys matching a pattern

func (*CacheClient) Load added in v1.1.0

func (c *CacheClient) Load(key string, value any) error

Load retrieves a value from the cache for the given key and stores it in the value interface

func (*CacheClient) MGet added in v1.0.10

func (c *CacheClient) MGet(keys []string) ([]any, error)

MGet gets multiple values from redis

func (*CacheClient) PoolStats added in v1.1.0

func (c *CacheClient) PoolStats() *redis.PoolStats

PoolStats returns the redis connection pool statistics

func (*CacheClient) Save added in v1.1.0

func (c *CacheClient) Save(key string, value any) error

Save saves a value in the cache with the given key and a 2 year TTL

func (*CacheClient) Set added in v1.0.6

func (c *CacheClient) Set(key string, value any, expiration time.Duration) error

Set sets a key-value pair in redis

func (*CacheClient) Stats added in v1.1.0

func (c *CacheClient) Stats() *cache.Stats

Stats returns the cache statistics

func (*CacheClient) Store added in v1.1.0

func (c *CacheClient) Store(key string, value any) error

Store saves a value in the cache with the given key and a 7 day TTL

func (*CacheClient) ValueExists added in v1.1.0

func (c *CacheClient) ValueExists(key string) bool

ValueExists checks if a value exists in the cache for the given key

func (*CacheClient) ZAdd added in v1.0.10

func (c *CacheClient) ZAdd(key string, members []redis.Z) error

ZAdd adds members to a sorted set stored at key, creating the sorted set if it doesn't exist

func (*CacheClient) ZRange added in v1.0.11

func (c *CacheClient) ZRange(key string, start, stop int64) ([]string, error)

ZRange returns the elements of the sorted set stored at key with a score between min and max (inclusive)

func (*CacheClient) ZRevRange added in v1.0.10

func (c *CacheClient) ZRevRange(key string, start, stop int64) ([]string, error)

ZRevRange returns the elements of the sorted set stored at key in reverse order with a score between start and stop (inclusive)

type Feed

type Feed struct {
	Name        string       `yaml:"name"`
	URLs        FeedURLs     `yaml:"url"`
	Token       string       `yaml:"token"`
	Prefix      string       `yaml:"prefix,omitempty"`
	Visibility  string       `yaml:"visibility,omitempty"`
	HashLink    string       `yaml:"hashlink,omitempty"`
	HashTag     string       `yaml:"hashtag,omitempty"`
	ReplaceFrom string       `yaml:"replace_from,omitempty"`
	ReplaceTo   string       `yaml:"replace_to,omitempty"`
	ReplaceLink string       `yaml:"replace_link,omitempty"`
	Interval    int64        `yaml:"interval,omitempty"`
	LastRun     int64        `yaml:"last_run,omitempty"`
	Count       int64        `yaml:"-"`
	Id          int64        `yaml:"-"`
	Language    string       `yaml:"-"`
	SendTime    time.Time    `yaml:"-"`
	Followers   atomic.Int64 `yaml:"-"`
	// contains filtered or unexported fields
}

Feed holds the configuration and state for a single RSS feed The struct includes fields for: - Name: feed identifier/name - URLs: RSS feed endpoint(s); the first is primary, the rest are fallbacks - Token: Mastodon API access token - Prefix: optional text to prepend to posts - Visibility: post visibility level (public, unlisted, private) - HashLink: regex to extract a hashtag from the item link - HashTag: static hashtag always added to every post from this feed - ReplaceFrom/ReplaceTo: regex-based text replacement applied to post description - ReplaceLink: regex applied to item link — all matches are removed from the URL before posting - Interval: check interval in scheduler ticks - LastRun: Unix timestamp of last processed item - Count: number of items posted - Id: Mastodon account ID - Language: default language for posts from Mastodon profile - SendTime: time when last post was sent - Followers: concurrent follower count - shedCounter: scheduled counter for posting - etag: HTTP ETag for conditional requests

func NewTestFeed added in v1.1.0

func NewTestFeed(name, url string) *Feed

NewTestFeed creates a new feed with default values for testing purposes This function is intended for testing and development purposes only

func (*Feed) ETag added in v1.1.0

func (f *Feed) ETag() []byte

ETag returns the current ETag for the feed.

func (*Feed) EmptyEtag added in v1.1.4

func (f *Feed) EmptyEtag()

EmptyEtag initialises the etag to an empty slice. Must be called after URLs are set.

func (*Feed) SetETag added in v1.1.4

func (f *Feed) SetETag(etag []byte)

SetETag stores a new ETag for the feed.

func (*Feed) URL added in v1.1.0

func (f *Feed) URL() string

URL returns the primary (first) feed URL for convenience.

type FeedURLs added in v1.1.4

type FeedURLs []string

FeedURLs holds one or more RSS feed URLs with YAML unmarshaling support for both a single string ("url: https://...") and a list ("url:\n - https://..."). The first URL is the primary; subsequent URLs are used as fallbacks in order.

func (FeedURLs) MarshalYAML added in v1.1.4

func (u FeedURLs) MarshalYAML() (any, error)

MarshalYAML implements yaml.Marshaler so that a single-element FeedURLs is serialised back as a plain scalar (preserving the original YAML format).

func (*FeedURLs) UnmarshalYAML added in v1.1.4

func (u *FeedURLs) UnmarshalYAML(value *yaml.Node) error

UnmarshalYAML implements yaml.Unmarshaler so that both scalar and sequence YAML values are accepted for the "url" field.

type FeedsMonitor

type FeedsMonitor struct {
	// Instance holds the Mastodon instance configuration and list of feeds to monitor
	// The struct includes fields for:
	// - URL: Mastodon instance URL
	// - Lang: default language for posts
	// - Limit: maximum characters per post
	// - TimeZone: timezone for date formatting
	// - Save: whether to save state to disk
	// - Monit: last monitoring run timestamp
	// - Feeds: list of feeds to monitor
	Instance struct {
		URL      string  `yaml:"url"`
		Lang     string  `yaml:"lang"`
		Limit    int     `yaml:"limit"`
		TimeZone string  `yaml:"timezone"`
		Save     bool    `yaml:"save,omitempty"`
		Monit    int64   `yaml:"last_monit,omitempty"`
		Feeds    []*Feed `yaml:"feed"`
	} `yaml:"instance"`

	Parser *Parser
	// contains filtered or unexported fields
}

FeedsMonitor holds the configuration and state for monitoring multiple RSS feeds

func NewFeedsMonitor

func NewFeedsMonitor() (*FeedsMonitor, error)

NewFeedsMonitor creates and initializes a new FeedsMonitor instance by: - Loading and parsing the feed configuration from YAML file - Setting up monitoring timestamps and intervals - Configuring timezone and language settings - Setting character limits and feed IDs - Initializing default values for all feeds

func (*FeedsMonitor) FeedIndex added in v1.0.6

func (fm *FeedsMonitor) FeedIndex(name string) int

FeedIndex returns the index of the feed with the given name prefix, or -1 if not found

func (*FeedsMonitor) GetFeed added in v1.1.0

func (fm *FeedsMonitor) GetFeed(f *Feed)

GetFeed retrieves and processes items from a feed For each item in the feed: - Checks if item is within time limits - Generates idempotency key based on item GUID - Skips if item already processed - Sanitizes title and description - Applies replacement rules if configured - Constructs message with title, description, hashtags and link - Sends post to mastodon instance - Updates counters and timestamps

func (*FeedsMonitor) GetFromInstance added in v1.1.0

func (fm *FeedsMonitor) GetFromInstance(endpoint string, token ...string) ([]byte, error)

GetFromInstance performs a GET request to the specified endpoint on the Mastodon instance. Optional parameter token can be provided for authentication

func (*FeedsMonitor) LastCheck added in v1.0.7

func (fm *FeedsMonitor) LastCheck() int64

LastCheck returns the Unix timestamp of the last check

func (*FeedsMonitor) LastCheckStr added in v1.0.7

func (fm *FeedsMonitor) LastCheckStr() string

LastCheckStr returns the formatted date/time string of the last check

func (*FeedsMonitor) LastMonit added in v1.0.6

func (fm *FeedsMonitor) LastMonit() int64

LastMonit returns the Unix timestamp of the last monitoring run

func (*FeedsMonitor) Location added in v1.0.6

func (fm *FeedsMonitor) Location() *time.Location

Location returns the timezone location used for time formatting

func (*FeedsMonitor) PostToInstance added in v1.1.0

func (fm *FeedsMonitor) PostToInstance(req *fasthttp.Request) error

PostToInstance performs a POST request to the Mastodon instance's API endpoint for creating statuses.

func (*FeedsMonitor) SaveFeedsData

func (fm *FeedsMonitor) SaveFeedsData() error

SaveFeedsData saves the current feed monitoring state to the config file

func (*FeedsMonitor) Start

func (fm *FeedsMonitor) Start()

Start processes all feeds in parallel using goroutines For each feed with valid URL and token: - Increments sheduler counter - When shedCounter reaches interval, resets counter and processes feed - Updates last check timestamp - Saves feed data if configured

func (*FeedsMonitor) UpdateFollowers added in v1.0.6

func (fm *FeedsMonitor) UpdateFollowers()

UpdateFollowers concurrently updates the follower counts for all feeds

type MastodonPost added in v1.1.0

type MastodonPost struct {
	Status     string `json:"status"`
	Visibility string `json:"visibility"`
	Language   string `json:"language,omitempty"`
}

MastodonPost holds the data needed to post to Mastodon This struct is used to marshal the request body for posting to Mastodon API

type Parser added in v1.1.0

type Parser struct {
	Client httpClient
	// contains filtered or unexported fields
}

Parser wraps gofeed.Parser with an HTTP client and sync.Pool for efficient reuse

func NewParser added in v1.1.0

func NewParser(c httpClient) *Parser

NewParser creates a new RSS parser with optional custom HTTP client If no client is provided, a default fasthttp.Client is created with: - 1MB max response size - 15s read/write timeouts - 4096 concurrency limit - 1-hour DNS cache duration

func (*Parser) FetchAndParse added in v1.1.0

func (p *Parser) FetchAndParse(f *Feed) *gofeed.Feed

FetchAndParse fetches and parses a feed, trying each URL in order. The first URL is the primary; subsequent URLs are used as fallbacks. Returns a parsed feed or nil if all URLs fail.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL