sqlitezstd

package module
v0.0.0-...-568da00 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 5, 2026 License: MIT Imports: 17 Imported by: 1

README

SQLiteZSTD: Read-Only Access to Compressed SQLite Files

[!IMPORTANT] A new version of this extension written in C is now available. This C version offers the advantage of being usable across different platforms, languages, and runtimes. It is not publicly available and is provided under a one-time fee in perpetuity license with support. The original Go version will remain freely available. For more information about the C extension, please email jtarchie@gmail.com.

Description

SQLiteZSTD provides a tool for accessing SQLite databases compressed with Zstandard seekable (zstd) in a read-only manner. Its functionality is based on the SQLite3 Virtual File System (VFS) in Go.

Please note, SQLiteZSTD is specifically designed for reading data and does not support write operations.

Features

  1. Read-only access to Zstd-compressed SQLite databases.
  2. Interface through SQLite3 VFS.
  3. The compressed database is seekable, facilitating ease of access.

Usage

Your database needs to be compressed in the seekable Zstd format. I recommend using this CLI tool:

go get -a github.com/SaveTheRbtz/zstd-seekable-format-go/...
go run github.com/SaveTheRbtz/zstd-seekable-format-go/cmd/zstdseek \
    -f <dbPath> \
    -o <dbPath>.zst
Choosing a frame size

The -c min:avg:max option controls the size of each zstd frame (in KiB). The frame is the unit of random access: to read any byte, the whole frame containing it must be fetched and decompressed. SQLiteZSTD caches recently used frames (see Configuration), so frame size is the main tuning knob:

  • Too large — more bytes fetched/decompressed per read, and frames whose compressed size exceeds 128 MiB are rejected outright by the reader.
  • Too small — worse compression ratio and more per-frame overhead.

A frame size on the order of tens to a few hundred KiB (for example -c 16:32:64) is a reasonable starting point; align it to your read locality and measure.

Below is an example of how to use SQLiteZSTD in a Go program:

import (
    _ "github.com/jtarchie/sqlitezstd"
)

db, err := sql.Open("sqlite3", "<path-to-your-file>?vfs=zstd")
if err != nil {
    panic(fmt.Sprintf("Failed to open database: %s", err))
}

conn, err := db.Conn(context.Background())
if err != nil {
    panic(fmt.Sprintf("Failed to get connection: %s", err))
}
defer conn.Close()

// PRAGMAs are not persisted across `database/sql` pooled connections;
// this ensures the setting applies to the connection you query on.
_, err = conn.ExecContext(context.Background(), `PRAGMA temp_store = memory;`)
if err != nil {
    panic(fmt.Sprintf("Failed to set PRAGMA: %s", err))
}

// Use conn for subsequent operations to ensure PRAGMA is applied

In this Go code example:

  • The sql.Open() function takes as a parameter the path to the compressed SQLite database, appended with a query string with vfs=zstd to use the VFS.
  • PRAGMA temp_store = memory ensures the read-only VFS is not asked to create temporary files on disk (which it cannot do).
Connections and concurrency

The VFS is safe to use from multiple connections concurrently — the tests and benchmarks open many connections against one database. Each connection allocates its own decompression reader and frame cache, so the trade-off of a larger connection pool is memory and duplicated decompression, not correctness. Tune db.SetMaxOpenConns(...) to balance parallelism against memory; there is no requirement to limit it to a single connection.

Reading over HTTP(S)

Pass an http:// or https:// URL as the filename to read a remote database without downloading it in full — only the bytes needed for each query are fetched using HTTP range requests:

db, err := sql.Open("sqlite3", "https://example.com/data.sqlite.zst?vfs=zstd")

The server must support HTTP range requests (responding 206 Partial Content with a Content-Range header); a server that ignores Range is rejected rather than silently returning wrong data. Each opened connection makes one small request to determine the file size, then fetches frames on demand. Frames are cached per connection, so repeated reads do not re-hit the network.

For authenticated buckets, supply a signing transport with WithRoundTripper/WithHTTPClient; the library still wraps it with timeout, retry, and range-validation.

Build tags

Databases that use SQLite extensions such as FTS5 or R*Tree require building your binary with the matching mattn/go-sqlite3 build tag, e.g.:

go build -tags fts5 ./...
Configuration

Importing the package registers a zstd VFS with sensible defaults. To tune the frame-cache size, HTTP timeout, retry count, transport (WithRoundTripper/WithHTTPClient), or logger, register your own named VFS and reference it via ?vfs=<name>:

import sqlitezstd "github.com/jtarchie/sqlitezstd"

err := sqlitezstd.Register("zstd-tuned",
    sqlitezstd.WithFrameCacheSize(128),
    sqlitezstd.WithHTTPTimeout(10*time.Second),
    sqlitezstd.WithHTTPRetries(5),
)
// ...
db, _ := sql.Open("sqlite3", "https://example.com/data.sqlite.zst?vfs=zstd-tuned")

Performance

Here's a simple benchmark comparing performance between reading from an uncompressed vs. a compressed SQLite database, involving the insertion of 10k records and retrieval of the MAX value (without an index) and FTS5.

BenchmarkReadUncompressedSQLite-4              	  159717	      7459 ns/op	     473 B/op	      15 allocs/op
BenchmarkReadUncompressedSQLiteFTS5Porter-4    	    2478	    471685 ns/op	     450 B/op	      15 allocs/op
BenchmarkReadUncompressedSQLiteFTS5Trigram-4   	     100	  10449792 ns/op	     542 B/op	      16 allocs/op
BenchmarkReadCompressedSQLite-4                	  266703	      3877 ns/op	    2635 B/op	      15 allocs/op
BenchmarkReadCompressedSQLiteFTS5Porter-4      	    2335	    487430 ns/op	   33992 B/op	      16 allocs/op
BenchmarkReadCompressedSQLiteFTS5Trigram-4     	      48	  21235303 ns/op	45970431 B/op	     148 allocs/op
BenchmarkReadCompressedHTTPSQLite-4            	  284820	      4341 ns/op	    3312 B/op	      15 allocs/op

Documentation

Overview

Package sqlitezstd provides a read-only SQLite VFS for opening Zstandard "seekable" compressed SQLite database files, either from the local filesystem or over HTTP(S) using range requests.

Importing the package for its side effects registers a VFS named "zstd":

import _ "github.com/jtarchie/sqlitezstd"

db, err := sql.Open("sqlite3", "path/to/db.sqlite.zst?vfs=zstd")

The source database must first be compressed into the zstd seekable format (see the README). The VFS is strictly read-only: writes, journals, and WAL files are rejected.

For HTTP(S) sources, pass an http:// or https:// URL as the filename. The server must support HTTP range requests (responding 206 with a Content-Range header).

Use Register to register the VFS under a different name with tuned Options (frame-cache size, HTTP timeouts, retries, logger).

Index

Constants

View Source
const (
	// DefaultFrameCacheSize is the number of decoded zstd frames cached per
	// opened file.
	DefaultFrameCacheSize = 64
	// DefaultHTTPTimeout bounds dialing and waiting for response headers on the
	// HTTP(S) path so a hung server cannot block a query indefinitely.
	DefaultHTTPTimeout = 30 * time.Second
	// DefaultHTTPMaxRetries is how many times a transient HTTP failure (network
	// error, 5xx, 429) is retried before the read fails. Reads are idempotent
	// against an immutable source, so retrying is safe.
	DefaultHTTPMaxRetries = 3
)

Default option values. These are applied by Register and by the default "zstd" VFS registered in init().

Variables

This section is empty.

Functions

func Init deprecated

func Init() error

Deprecated: Init is a no-op retained for backward compatibility. The "zstd" VFS is registered automatically when the package is imported.

func Register

func Register(name string, opts ...Option) error

Register registers a zstd VFS under the given name with the supplied options. Open a database against it with the "?vfs=<name>" query parameter. The default "zstd" VFS (registered automatically on import) uses default options.

Types

type Option

type Option func(*Options)

Option mutates an Options. See WithFrameCacheSize, WithHTTPTimeout, WithHTTPRetries, and WithLogger.

func WithFrameCacheSize

func WithFrameCacheSize(frames int) Option

WithFrameCacheSize sets the number of decoded zstd frames cached per opened file. Values <= 0 are ignored (the default is kept).

func WithHTTPClient

func WithHTTPClient(c *http.Client) Option

WithHTTPClient is a convenience that uses the client's Transport as the base round-tripper (see WithRoundTripper). A nil client is ignored.

func WithHTTPRetries

func WithHTTPRetries(n int) Option

WithHTTPRetries sets the number of retries for transient HTTP failures. Negative values are ignored; 0 disables retries.

func WithHTTPTimeout

func WithHTTPTimeout(d time.Duration) Option

WithHTTPTimeout sets the dial and response-header timeout for the HTTP(S) path. Values <= 0 are ignored.

func WithLogger

func WithLogger(l *slog.Logger) Option

WithLogger sets the logger used to report otherwise-opaque open/read failures (the sqlite3vfs interface can only return fixed sentinel errors, so the real cause is logged). A nil logger is ignored.

func WithRoundTripper

func WithRoundTripper(rt http.RoundTripper) Option

WithRoundTripper sets the base http.RoundTripper used for the HTTP(S) path. The library still wraps it with retry and Range-response validation, so a caller can supply, for example, a request-signing transport for authenticated buckets without losing those protections. A nil transport is ignored.

type Options

type Options struct {
	// contains filtered or unexported fields
}

Options configures a registered VFS. Construct it with Option values passed to Register; the zero value is not valid (use Register, which fills in defaults).

type ZstdFile

type ZstdFile struct {
	// contains filtered or unexported fields
}

ZstdFile is a read-only sqlite3vfs.File backed by a zstd-seekable reader.

func (*ZstdFile) CheckReservedLock

func (z *ZstdFile) CheckReservedLock() (bool, error)

func (*ZstdFile) Close

func (z *ZstdFile) Close() error

func (*ZstdFile) DeviceCharacteristics

func (z *ZstdFile) DeviceCharacteristics() sqlite3vfs.DeviceCharacteristic

func (*ZstdFile) FileSize

func (z *ZstdFile) FileSize() (int64, error)

func (*ZstdFile) Lock

func (z *ZstdFile) Lock(elock sqlite3vfs.LockType) error

func (*ZstdFile) ReadAt

func (z *ZstdFile) ReadAt(p []byte, off int64) (int, error)

func (*ZstdFile) SectorSize

func (z *ZstdFile) SectorSize() int64

func (*ZstdFile) Sync

func (z *ZstdFile) Sync(flag sqlite3vfs.SyncType) error

func (*ZstdFile) Truncate

func (z *ZstdFile) Truncate(size int64) error

func (*ZstdFile) Unlock

func (z *ZstdFile) Unlock(elock sqlite3vfs.LockType) error

func (*ZstdFile) WriteAt

func (z *ZstdFile) WriteAt(p []byte, off int64) (int, error)

type ZstdVFS

type ZstdVFS struct {
	// contains filtered or unexported fields
}

ZstdVFS is a read-only sqlite3vfs.VFS for zstd-seekable compressed databases.

func (*ZstdVFS) Access

func (z *ZstdVFS) Access(name string, flags sqlite3vfs.AccessFlag) (bool, error)

func (*ZstdVFS) Delete

func (z *ZstdVFS) Delete(name string, dirSync bool) error

func (*ZstdVFS) FullPathname

func (z *ZstdVFS) FullPathname(name string) string

func (*ZstdVFS) Open

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL