walrusds

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 9, 2026 License: MIT Imports: 19 Imported by: 0

README

Walrus Datastore for Kubo (walrusds)

An implementation of the IPFS/Kubo datastore interface backed by Walrus (a Sui-based decentralized blob store), using a shared Postgres database as the durable key -> blobId index.

It is derived from go-ds-s3 and keeps the same plugin shape, so it installs the same way.

NOTE: Plugins only work on Linux and MacOS at the moment. See https://github.com/golang/go/issues/19282

Why this design

Walrus is content-addressed (blobs are addressed by a content-derived blob ID, not an arbitrary key), exposes no list/query API, and treats blobs as immutable with a finite, epoch-based lifetime. A datastore therefore needs an external index. We keep that index in Postgres so it is:

  • Shared — separate upload and retrieval nodes point at the same database and see the same mapping.
  • Durable — it does not live on the node's local disk, so disk failure does not lose data.
  • Recoverable — enable Postgres Point-in-Time Recovery (PITR) for accidental-delete protection.

Has, GetSize, and Query are answered entirely from Postgres (no Walrus round-trip); only Get fetches bytes from the Walrus aggregator.

Kubo ──ds.Datastore──▶ walrusds ──┬── bytes ───────────▶ Walrus publisher / aggregator
                                   └── key→blobId+meta ─▶ Postgres (walrus_index table)

Index schema

Created automatically on first start:

CREATE TABLE walrus_index (
  key         TEXT PRIMARY KEY,   -- ds.Key string, e.g. "/blocks/CIQ..."
  blob_id     TEXT NOT NULL,      -- Walrus blob ID (shared across packed blocks)
  blob_offset BIGINT NOT NULL DEFAULT 0, -- byte offset of this block within the blob
  size        BIGINT NOT NULL,    -- block length; the block is blob[offset : offset+size]
  deletable   BOOLEAN NOT NULL DEFAULT FALSE,
  end_epoch   BIGINT NOT NULL DEFAULT 0,
  expires_at  TIMESTAMPTZ,        -- used by the renewal worker
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

Block packing. To amortize Walrus's per-blob cost (Sui gas + WAL minimums + erasure-coding overhead), Batch.Commit concatenates many IPFS blocks into one Walrus blob (a "packfile") up to packTargetSizeBytes (default 64 MiB), so several key rows share a blob_id and are distinguished by blob_offset/size. Get reads only its slice via an HTTP Range request (falling back to a cached whole-blob slice if the aggregator ignores Range). Existing repos upgrade transparently: the blob_offset column is added with default 0, so legacy one-blob-per-block rows keep working unchanged.

packTargetSizeBytes is a ceiling, not a floor — a small file uploads immediately as a smaller blob and never waits to fill. The plugin can only pack the blocks delivered in one Batch.Commit, which Kubo bounds by Import.BatchMaxSize (default ~8 MiB on Kubo <0.33; configurable on v0.33+, where it is additionally divided by runtime.NumCPU()). To get large packs, set Import.BatchMaxSize ≈ packTargetSizeBytes × NumCPU and raise Import.BatchMaxNodes. Pairs well with raising the IPFS chunk size to 1 MiB (the max interoperable block size; clean on Kubo v0.40+). Measure the realized ratio with SELECT count(*), count(DISTINCT blob_id) FROM walrus_index;.

Building and Installing

This plugin is not pinned to a single Kubo release. The go.mod carries a baseline version, but the build is retargeted to whatever Kubo you point it at — so pick the KUBO_VERSION you need and the tooling aligns the dependency graph to match.

Build the plugin with the exact Go version used to build your Kubo binary, against the matching Kubo version. Substitute the tag you want for ${KUBO_VERSION} below (and use the Go toolchain that Kubo's own go.mod requires for that tag — newer Kubo lines need newer Go):

ARG KUBO_VERSION=v0.30.0

RUN git clone https://github.com/ipfs/kubo && \
    cd kubo && \
    git checkout ${KUBO_VERSION} && \
    go get github.com/lighthouse-web3/go-ds-s3-walrus/plugin@latest

RUN cd kubo && \
    echo "\nwalrusds github.com/lighthouse-web3/go-ds-s3-walrus/plugin 0" >> plugin/loader/preload_list && \
    go mod edit -require=github.com/lighthouse-web3/go-ds-s3-walrus@v0.0.0 && \
    go mod tidy && \
    make build && \
    cp cmd/ipfs/ipfs /usr/local/bin/ipfs

Notes:

  • The preload name token (walrusds) is just a label; the import path is what matters.
  • Kubo needs the plugin module both required and replaced/get-resolved; the explicit go mod edit -require=...@v0.0.0 + go mod tidy avoids the "is replaced but not required" build error.
  • Pure Go (Postgres driver lib/pq); no CGO required.

To build/install the .so locally instead: make install (drops walrusplugin.so into $IPFS_PATH/plugins/go-ds-s3-walrus.so). Retarget the Kubo version with the IPFS_VERSION variable, which rewrites go.mod/go.sum to that release via set-target.sh:

make install IPFS_VERSION=v0.30.0       # build against a published Kubo tag
make install IPFS_VERSION=/path/to/kubo # build against a local Kubo checkout

Provisioning Postgres

CREATE DATABASE walrusidx;
CREATE USER ipfs WITH PASSWORD 'CHANGE_ME';
GRANT ALL PRIVILEGES ON DATABASE walrusidx TO ipfs;
-- the walrus_index table is created automatically on first run

Recommended:

  • Turn on PITR / continuous backups — this is what protects you from accidental deletes.
  • Use TLS (sslmode=require) if Postgres is reachable over a network.

Connection string (standard database/sql + lib/pq):

postgres://ipfs:CHANGE_ME@db-host:5432/walrusidx?sslmode=require

The postgresURL (with the password) is not written into the repo datastore_spec; only publisherURL, aggregatorURL, and table are used as the disk fingerprint.

Configuration

In $IPFS_DIR/config, set the /blocks mount to walrusds:

{
  "Datastore": {
    "Spec": {
      "mounts": [
        {
          "child": {
            "type": "walrusds",
            "publisherURL": "https://publisher.walrus-testnet.walrus.space",
            "aggregatorURL": "https://aggregator.walrus-testnet.walrus.space",
            "postgresURL": "postgres://ipfs:CHANGE_ME@127.0.0.1:5432/walrusidx? sslmode=disable",
            "table": "walrus_index",
            "epochs": 53,
            "deletable": false,
            "workers": 16
          },
          "mountpoint": "/blocks",
          "prefix": "walrus.datastore",
          "type": "measure"
        },
        {
          "child": { "type": "levelds", "path": "datastore", "compression": "none" },
          "mountpoint": "/",
          "prefix": "leveldb.datastore",
          "type": "measure"
        }
      ],
      "type": "mount"
    }
  }
}

Matching $IPFS_DIR/datastore_spec (brand-new repo only — do not do this on a repo with existing data):

{"mounts":[{"aggregatorURL":"https://aggregator.walrus-testnet.walrus.space","mountpoint":"/blocks","publisherURL":"https://publisher.walrus-testnet.walrus.space","table":"walrus_index"},{"mountpoint":"/","path":"datastore","type":"levelds"}],"type":"mount"}

Multiple nodes (e.g. an upload node and a retrieval node) share data by pointing the same postgresURL, publisherURL, and aggregatorURL at all of them.

Setting the config from the CLI / Dockerfile

If you configure the node from a Dockerfile or script (like the S3 plugin's ipfs config --json Datastore.Spec ... approach), use these two commands after ipfs init. They assume the following build-time variables are available:

ARG WALRUS_PUBLISHER_URL=https://publisher.walrus-mainnet.walrus.space
ARG WALRUS_AGGREGATOR_URL=https://aggregator.walrus-mainnet.walrus.space
ARG WALRUS_POSTGRES_URL=postgres://ipfs:CHANGE_ME@db-host:5432/walrusidx?sslmode=require
ENV IPFS_PATH=/data/ipfs

Set Datastore.Spec (the live config):

RUN ipfs config --json Datastore.Spec "{\"mounts\":[{\"child\":{\"type\":\"walrusds\",\"publisherURL\":\"${WALRUS_PUBLISHER_URL}\",\"aggregatorURL\":\"${WALRUS_AGGREGATOR_URL}\",\"postgresURL\":\"${WALRUS_POSTGRES_URL}\",\"table\":\"walrus_index\",\"epochs\":53},\"mountpoint\":\"/blocks\",\"prefix\":\"walrus.datastore\",\"type\":\"measure\"},{\"child\":{\"compression\":\"none\",\"path\":\"datastore\",\"type\":\"levelds\"},\"mountpoint\":\"/\",\"prefix\":\"leveldb.datastore\",\"type\":\"measure\"}],\"type\":\"mount\"}"

Overwrite datastore_spec (the on-disk fingerprint). Only on a brand-new repo with no data — overwriting this on a populated repo orphans existing blocks:

RUN echo "{\"mounts\":[{\"aggregatorURL\":\"${WALRUS_AGGREGATOR_URL}\",\"mountpoint\":\"/blocks\",\"publisherURL\":\"${WALRUS_PUBLISHER_URL}\",\"table\":\"walrus_index\"},{\"mountpoint\":\"/\",\"path\":\"datastore\",\"type\":\"levelds\"}],\"type\":\"mount\"}" > $IPFS_PATH/datastore_spec

Critical: the datastore_spec entry for /blocks must contain exactly the datastore's DiskSpec keys — publisherURL, aggregatorURL, and table — and must not include postgresURL (it carries credentials and is deliberately excluded from the fingerprint). If the spec doesn't match what the plugin computes, Kubo refuses to start with a "datastore configuration does not match what is on disk" error.

Key order matters. Kubo computes the expected spec by JSON-marshaling a Go map, which always emits keys in alphabetical order, and compares it against the raw bytes of this file. So every object's keys must be alphabetized: the /blocks mount is aggregatorURL, mountpoint, publisherURL, table (note mountpoint is injected by the mount wrapper and sorts in), and the root mount is mountpoint, path, type. The simplest way to avoid mistakes is to let ipfs init generate datastore_spec for you and only hand-write it when scripting a brand-new repo (as above), copying the exact string Kubo reports as the expected value in any mismatch error.

Notes:

  • Run these after ipfs init and with IPFS_PATH set.
  • Build-time ARG/ENV values are baked into the image. To avoid baking the Postgres password (and to keep epochs/endpoints flexible), prefer injecting these at container start instead — e.g. an entrypoint script that runs the same ipfs config command using runtime environment variables before ipfs daemon.
Config keys
Key Required Default Description
publisherURL yes Walrus publisher (write) base URL(s). Comma-separated for failover.
aggregatorURL yes Walrus aggregator (read) base URL(s). Comma-separated for failover.
postgresURL yes database/sql connection string for the shared index.
table no walrus_index Index table name.
epochs no 1 Storage epochs to purchase per blob. Set this high (see below).
deletable no false Register blobs as deletable on Walrus.
workers no 100 Concurrency for Batch().Commit() (packfile uploads run in parallel).
packTargetSizeBytes no 67108864 (64 MiB) Ceiling for a packed Walrus blob: blocks in one Batch.Commit are concatenated up to this size and stored as one blob (a smaller commit uploads immediately as a smaller blob — it never waits to fill). Packs >10 MiB require a self-hosted publisher/aggregator (public services cap requests near 10 MiB). The realized pack size is also bounded by Kubo's Import.BatchMaxSize (see below).
blobCacheBytes no 268435456 (256 MiB) In-memory LRU budget for whole blobs, used to serve range reads of packed blocks. Per-entry cap is ¼ of this, so the default keeps a 64 MiB pack cacheable. A negative value disables the cache.
requestTimeoutSeconds no 60 Per-attempt Walrus HTTP timeout.
maxRetries no 3 Retries per Walrus request (exponential backoff).
epochDurationSeconds no 0 Wall-clock length of one Walrus epoch. Enables the renewal worker when set.
renewIntervalSeconds no 0 How often to scan for expiring blobs. Enables renewal when set.
renewLeadSeconds no one epoch How far ahead of expiry to renew.

Durability: epochs and renewal (read this)

Walrus blobs are paid for a finite number of epochs and are deleted when they expire — if that happens the IPFS block is gone even though the Postgres row survives. Stay durable by either:

  1. Buying a long lifetime up front: set epochs high enough for your retention window (mainnet epoch ≈ 14 days, so epochs: 53 ≈ ~2 years).
  2. Enabling the renewal worker: set both epochDurationSeconds (the network's epoch length, e.g. 1209600 for ~14 days) and renewIntervalSeconds (e.g. 86400). The worker finds blobs nearing expires_at, re-uploads their bytes for a fresh window, and updates the index. HTTP-only (no Sui key required).

The default epochs: 1 expires quickly — fine for testing, not for production.

Limitations

  • Delete removes the Postgres row only; it does not delete the blob on Walrus (on-chain deletion needs a Sui key, out of scope). Unreferenced blobs simply expire. With packing, a deleted block's bytes also remain inside its shared blob until the whole blob expires — reclaiming that space would need a future compaction/GC pass.
  • Block-packing batches blocks written through Batch().Commit() (e.g. ipfs add). A single Put outside a batch still writes one blob per block, since a lone Put must be durable on return.
  • Read efficiency on packed blobs depends on the aggregator honoring HTTP Range; otherwise the whole blob is fetched once and cached (blobCacheBytes).
  • Query does not support Orders or Filters (same as the S3 datastore).

License

MIT

Documentation

Overview

Package walrusds implements a Kubo (IPFS) datastore backed by Walrus, a decentralized storage network built on Sui, using a shared Postgres database as the durable key -> blob index.

Walrus is content-addressed (blobs are addressed by a content-derived blob ID, not by an arbitrary key), exposes no list/query API, and treats blobs as immutable with a finite, epoch-based lifetime. To present it as an IPFS datastore we keep the bytes on Walrus and the mapping

ds.Key -> { blobId, size, deletable, endEpoch, expiresAt }

in Postgres. Postgres is the source of truth for what the node "knows": it is shared across upload and retrieval nodes, survives local disk loss, and supports point-in-time recovery. Has/GetSize/Query are answered purely from Postgres so they never incur a Walrus round-trip; only Get touches the Walrus aggregator.

Index

Constants

This section is empty.

Variables

View Source
var ErrBlobNotFound = errors.New("walrusds: blob not found")

ErrBlobNotFound is returned by the Walrus client when the aggregator has no blob for the requested blob ID.

Functions

This section is empty.

Types

type Client

type Client struct {
	// contains filtered or unexported fields
}

Client is a small, context-aware HTTP client for the Walrus publisher (write) and aggregator (read) HTTP APIs. It supports multiple endpoints for failover and retries transient failures with exponential backoff.

We deliberately implement this directly instead of depending on a third-party SDK so that every request honours the caller's context (cancellation/timeouts) and so retry/failover behaviour is under our control.

func NewClient

func NewClient(conf ClientConfig) *Client

NewClient builds a Walrus client from the supplied configuration.

func (*Client) Read

func (c *Client) Read(ctx context.Context, blobID string) ([]byte, error)

Read fetches the bytes of the blob identified by blobID from a Walrus aggregator. It returns ErrBlobNotFound if the aggregator returns 404.

func (*Client) ReadRange

func (c *Client) ReadRange(ctx context.Context, blobID string, offset, length int64) ([]byte, error)

ReadRange fetches the bytes [offset, offset+length) of the blob identified by blobID. It first issues an HTTP Range request so only the requested block travels over the wire. If the aggregator ignores the Range header and returns the whole blob (HTTP 200), the full body is cached and sliced locally, so packed blocks remain cheap to read even without range support.

func (*Client) Store

func (c *Client) Store(ctx context.Context, value []byte, epochs int, deletable bool) (StoreResult, error)

Store uploads value to a Walrus publisher, keeping it alive for the given number of epochs. When deletable is true the blob is registered as deletable so it can later be removed on-chain.

type ClientConfig

type ClientConfig struct {
	// PublisherURLs are Walrus publisher (write) base URLs. At least one is
	// required for Put to work.
	PublisherURLs []string
	// AggregatorURLs are Walrus aggregator (read) base URLs. At least one is
	// required for Get to work.
	AggregatorURLs []string
	// RequestTimeout bounds a single HTTP attempt. Zero means no per-attempt
	// timeout beyond the caller's context.
	RequestTimeout time.Duration
	// MaxRetries is the number of additional attempts (per endpoint set) on
	// transient failures. Zero means a single attempt.
	MaxRetries int
	// BlobCacheBytes is the byte budget for the in-memory LRU of whole blobs
	// used to satisfy range reads of packed blocks. Zero disables the cache.
	BlobCacheBytes int64
}

ClientConfig configures a Walrus Client.

type Config

type Config struct {
	// PublisherURLs are Walrus publisher (write) base URLs (comma-separated
	// values are split by the plugin). At least one is required.
	PublisherURLs []string
	// AggregatorURLs are Walrus aggregator (read) base URLs. At least one is
	// required.
	AggregatorURLs []string

	// PostgresURL is the database/sql connection string for the shared index,
	// e.g. "postgres://user:pass@host:5432/db?sslmode=require".
	PostgresURL string
	// Table is the index table name. Defaults to "walrus_index".
	Table string

	// Epochs is how many storage epochs new blobs are paid for. Defaults to 1.
	Epochs int
	// Deletable registers blobs as deletable on Walrus. Defaults to false.
	Deletable bool
	// Workers is the Batch.Commit() concurrency. Defaults to defaultWorkers.
	Workers int

	// PackTargetSize is the target size (in bytes) of a packed Walrus blob.
	// During Batch.Commit, blocks are concatenated into packfiles up to this
	// size and uploaded as a single blob, amortizing the per-blob Walrus cost
	// (Sui gas + WAL minimums) across many IPFS blocks. A block larger than
	// this gets its own blob. Defaults to defaultPackTargetSize.
	PackTargetSize int64
	// BlobCacheBytes is the byte budget for the in-memory LRU of whole blobs
	// used to serve range reads of packed blocks. Defaults to
	// defaultBlobCacheBytes; a negative value disables the cache.
	BlobCacheBytes int64

	// RequestTimeout bounds a single Walrus HTTP attempt. Defaults to 60s.
	RequestTimeout time.Duration
	// MaxRetries is the number of retries per Walrus request. Defaults to 3.
	MaxRetries int

	// EpochDuration is the wall-clock length of one Walrus storage epoch. When
	// non-zero (together with RenewInterval) it enables the renewal worker,
	// which re-uploads blobs before their paid storage expires. Operators set
	// this to match the target network (e.g. ~14 days on mainnet).
	EpochDuration time.Duration
	// RenewInterval is how often the renewal worker scans for expiring blobs.
	// Zero disables renewal.
	RenewInterval time.Duration
	// RenewLead is how far ahead of expiry a blob is renewed. Defaults to one
	// EpochDuration when zero and renewal is enabled.
	RenewLead time.Duration
}

Config holds everything needed to construct a WalrusDatastore.

type Index

type Index interface {
	Put(ctx context.Context, key string, rec Record) error
	PutMany(ctx context.Context, recs []KeyRecord) error
	Get(ctx context.Context, key string) (Record, error)
	Delete(ctx context.Context, key string) error
	DeleteMany(ctx context.Context, keys []string) error
	List(ctx context.Context, prefix string, limit, offset int) ([]ListItem, error)
	DueForRenewal(ctx context.Context, before time.Time, limit int) ([]RenewItem, error)
	UpdateBlobAfterRenewal(ctx context.Context, oldBlobID, newBlobID string, endEpoch uint64, expiresAt sql.NullTime) error
	Close() error
}

Index is the durable key -> blob mapping. It is intentionally an interface so the backend (Postgres today, DynamoDB/SQLite later) can be swapped without touching the datastore logic.

type KeyRecord

type KeyRecord struct {
	Key string
	Rec Record
}

KeyRecord pairs a datastore key with its Record for bulk insertion.

type ListItem

type ListItem struct {
	Key  string
	Size int64
}

ListItem is a single entry returned by a prefix listing.

type Record

type Record struct {
	BlobID    string
	Offset    int64
	Size      int64
	Deletable bool
	EndEpoch  uint64
	ExpiresAt sql.NullTime
}

Record is the metadata we persist in the shared index for each IPFS key. It is everything needed to (a) locate the block's bytes within a Walrus blob, (b) answer Has/GetSize without touching Walrus, and (c) drive epoch renewal.

With block-packing, several keys can share a single Walrus blob: each row records the BlobID plus the byte range [Offset, Offset+Size) of the block inside that blob. Unpacked blocks (and all legacy rows) simply have Offset == 0 and Size == the whole blob length.

type RenewItem

type RenewItem struct {
	BlobID string
}

RenewItem identifies a Walrus blob whose paid storage is approaching expiry. Renewal operates per blob (not per key) so a packed blob holding many blocks is re-uploaded exactly once.

type StoreResult

type StoreResult struct {
	BlobID   string
	EndEpoch uint64
}

StoreResult is the subset of a Walrus publisher "store" response that we care about: the resulting blob ID and the epoch at which the blob's paid storage ends.

type WalrusDatastore

type WalrusDatastore struct {
	// contains filtered or unexported fields
}

WalrusDatastore stores values on Walrus and keeps the durable key -> blob mapping in Postgres.

func NewWalrusDatastore

func NewWalrusDatastore(conf Config) (*WalrusDatastore, error)

NewWalrusDatastore validates the configuration, connects to Postgres (creating the index table if needed), prepares the Walrus client and, if configured, starts the background renewal worker.

func (*WalrusDatastore) Batch

func (w *WalrusDatastore) Batch(_ context.Context) (ds.Batch, error)

Batch buffers Put/Delete operations and applies them concurrently on Commit.

func (*WalrusDatastore) Close

func (w *WalrusDatastore) Close() error

Close stops the renewal worker and closes the Postgres connection.

func (*WalrusDatastore) Delete

func (w *WalrusDatastore) Delete(ctx context.Context, k ds.Key) error

Delete removes the index entry for k. It does not delete the underlying Walrus blob: on-chain deletion requires a Sui key and is out of scope for this datastore. The blob becomes unreferenced and eventually expires. Delete is idempotent.

func (*WalrusDatastore) Get

func (w *WalrusDatastore) Get(ctx context.Context, k ds.Key) ([]byte, error)

Get resolves the blob ID for k in Postgres and fetches the bytes from the Walrus aggregator. Returns ds.ErrNotFound when k is unknown to the index.

func (*WalrusDatastore) GetSize

func (w *WalrusDatastore) GetSize(ctx context.Context, k ds.Key) (int, error)

GetSize returns the stored size for k, answered entirely from Postgres.

func (*WalrusDatastore) Has

func (w *WalrusDatastore) Has(ctx context.Context, k ds.Key) (bool, error)

Has reports whether k is present, answered entirely from Postgres.

func (*WalrusDatastore) Put

func (w *WalrusDatastore) Put(ctx context.Context, k ds.Key, value []byte) error

Put uploads value to Walrus and records the resulting blob ID and metadata in Postgres. The Walrus upload happens first: if the index write then fails, the blob exists but is unreferenced (a recoverable leak), which is strictly safer than an index row pointing at a blob that was never stored.

func (*WalrusDatastore) Query

func (w *WalrusDatastore) Query(ctx context.Context, q dsq.Query) (dsq.Results, error)

Query enumerates keys from Postgres. Orders and Filters are unsupported, matching the S3 datastore. When KeysOnly is false each value is fetched lazily from Walrus.

func (*WalrusDatastore) Sync

func (w *WalrusDatastore) Sync(ctx context.Context, prefix ds.Key) error

Directories

Path Synopsis
main command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL