clickhousestats

package module
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 14, 2025 License: MIT Imports: 19 Imported by: 0

Documentation

Overview

Package clickhousestats implements statistics logging to the ClickHouse database using async_insert queries.

Data schema

This library supports any struct-based type or a pointer to it for representing a statistics record.

A special field named `_` or `_struct` can be used to specify the table name that will be used for logging records of that type. The type of this field is ignored, and the struct tag format is compatible with the clickhouse-gate-client library:

type StatsRecord struct {
    _ struct{} `clickhousegate:"table_name, everything other is ignored"`

Any other non-exported fields (i.e. starting with a lowercase letter) are ignored completely.

If no table name is specified using such a tag, then the type name converted to snake_case is used as the table name. The conversion is done using the github.com/iancoleman/strcase library:

type StatsRecord struct {
    // no tag: the table name is stats_record

If the struct type name can't be determined (e.g. when the struct type isn't a defined type), the ErrNoTable error will be returned.

All exported struct fields must correspond to similarly named table column names. The table might have more columns than the struct. To match names, both are converted to the lowercase, and _'s are removed. The common convention is CamelCase struct fields and snake_case table column names, but this isn't enforced. Currently, you can't customize a column name (e.g. using a struct tag).

The following types are fully supported and tested:

  • All "simple" types like ints, floats, and string.
  • time.Time for Date, DateTime, or DateTime64 column type. Subsecond resolution (3, 6, or 9 decimal places for fractional seconds) is preserved.
  • UUID type (using both popular libraries: github.com/gofrs/uuid/v5 and github.com/google/uuid).
  • github.com/shopspring/decimal.Decimal type (although you can use floats with a Decimal column, it is highly discouraged).
  • Any type (including the `any` type, i.e. an empty interface) for a JSON column type.
  • Nullable columns are supported using pointers.

JSON fields are marshaled using encoding/json from the standard library.

No special declaration is required for LowCardinality types or Enums.

clickhouse-gate-client compatibility

This library aims to be compatible with clickhouse-gate-client but isn't an exact "drop-in replacement". Some points must be considered to achieve a compatibility:

  • Both `_` and `_struct` fields are supported to specify the `clickhousegate` struct tag. The table name is used, and all the flags (like validate_names) are ignored.
  • This library uses the table schema from the actual database and doesn't require a "category" schema in OnlineConf.
  • The configuration is entirely different from that used by clickhouse-gate-client, despite being compatible with the clickhouse-gate configuration.
  • This library doesn't provide some hack wrapper types like JSON or TimeMs. You can use any type for a JSON column, and time.Time for Date/DateTime/DateTime64 column types of any precision.
  • Unlike clickhouse-gate-client, this library doesn't require any special field order.
  • Some features (like a column name customization) aren't implemented intentionally to allow migration from and to clickhouse-gate-client.

Configuration

The following configuration settings are used by this library:

  • /clickhouse/native-host (required) - the ClickHouse native protocol endpoint.
  • /clickhouse/base (default: "default")
  • /clickhouse/user (default: no user)
  • /clickhouse/pass (default: no password)
  • /clickhouse/settings (default: empty) - a JSON or YAML map containing ClickHouse session settings: https://clickhouse.com/docs/operations/settings/settings. Settings with "async_insert" in the name are especially important for this library's performance.
  • /clickhouse/debug (default: false) - github.com/ClickHouse/clickhouse-go/v2 debug mode
  • /clickhouse/max-open-conns (default: 10) - maximum connections in a pool, also used as the worker goroutine count.
  • /clickhouse/max-idle-conns (default: 3) - maximum idle connections in a pool.
  • /busy-percentage (default: 70) - when the worker goroutine busy percentage is less than or equal to this value, the "wait" INSERT mode is used (the query status is checked and the record is logged as lost on INSERT failure). Otherwise, the "fire-and-forget" mode is used.
  • /send-timeout (default: 50ms). It is used when all worker goroutines are busy (even the "fire-and-forget" ones). If this timeout is reached, the record is logged as lost. You can use negative values to wait forever, e.g. for cron jobs or other non-interactive cases, along with the pool size of 1.

When any configuration value whose key begins with /clickhouse/ gets changed, the database connection pool is re-initiated. Other values are checked dynamically. So you can change *any* configuration parameter on the fly without an application restart.

Although you can use any type implementing the OnlineConf interface, including Module, the typical configuration is:

# in onlineconf
/project/application/stats/clickhouse -> /infrastructure/clickhouse/project
/project/application/stats/send-timeout: 1m  # specified directly

# in your app's main function
stats, err := clickhousestats.Connect(ctx, oconfRoot.Subtree("/project/application/stats"))

Lost data

In these cases, a data record is considered as "lost":

  • when all the worker goroutines are busy and /send-timeout is reached,
  • when the async_insert query has returned an error,
  • when sending new data after calling Client.Close.

"Lost" records are logged using log/slog with the following keys:

  • message key: "clickhousestats: lost record",
  • "err" key - ErrShutdown, ErrBusy, or some clickhouse-go error,
  • "record" key - the data itself.

TODO implement a package for recovering "lost" data from the logs.

Metrics

Using the WithMetrics option with the Connect enables metrics collection with the supplied advmetricsset.Set. The following metrics are collected:

Connection pool metrics (labels: "addr" and "database"):

  • clickhousestats_connections - current count of connections (gauge)
  • clickhousestats_connection_duration_seconds - time spent while establishing new connections (histogram)
  • clickhousestats_connection_errors - count of errors during establishing new connections (counter)

Record delivery metrics (label: "table")

  • clickhousestats_records_total - total count of records sent (counter)
  • clickhousestats_records_nowait - count of records sent in the "fire-and-forget" mode (i.e. when the /busy-percentage is exceeded) (counter)
  • clickhousestats_records_lost - count of records that weren't delivered to the database and were logged with the "clickhousestats: lost record" message (counter)

Examples

Schema declaration in the DB:

CREATE TABLE stats_data (
	id       UUID,
	name     String,
	optional Nullable(String),
	custom   JSON
)

Schema declaration in go:

type StatsData struct {
	_ struct{} `clickhousegate:"stats_data"`
	ID       uuid.UUID
	Name     string
	Optional *string
	Custom   JSONData
}

type JSONData struct {
	ID   int    `json:"id"`
	Name string `json:"name"`
}

Index

Constants

View Source
const (
	DefaultDatabase       = "default"             // default for /clickhouse/base
	DefaultMaxOpenConns   = 10                    // default for /clickhouse/max-open-conns
	DefaultMaxIdleConns   = 3                     // default for /clickhouse/max-idle-conns
	DefaultBusyPercentage = 70                    // default for /busy-percentage
	DefaultSendTimeout    = 50 * time.Millisecond // default for /send-timeout
)

Configuration defaults.

Variables

View Source
var (
	ErrShutdown       = errors.New("shutting down")
	ErrBusy           = errors.New("channel is busy")
	ErrNotStruct      = errors.New("unsupported type - not a struct or struct pointer")
	ErrNoTable        = errors.New("table name isn't specified for type")
	ErrEmptyStruct    = errors.New("struct doesn't have any exported field")
	ErrColumnConflict = errors.New("conflicting column names")
	ErrMissingField   = errors.New("struct field is missing from the table")
	ErrNonNullable    = errors.New("struct field is a pointer, but the corresponding table column isn't Nullable")
	ErrTimePrecision  = errors.New("unsupported DateTime64 precision (use 3, 6, or 9")
)

Errors that may be returned by the library's public API. It should be noted that certain errors that may be logged but are never returned are not included in this list.

Functions

This section is empty.

Types

type Client

type Client struct {
	// contains filtered or unexported fields
}

Client represents the ClickHouse client and its corresponding goroutine pool.

func Connect

func Connect(ctx context.Context, oconf OnlineConf, opts ...Option) (*Client, error)

Connect creates a Client instance, a clickhouse-go connection pool, and a goroutine pool for processing stats records issued by Client.Send. The ClickHouse server address and pool size are configured by the supplied oconf parameter.

When the supplied context is canceled, all the pending data is sent to the ClickHouse server, a goroutine pool is stopped, and the ClickHouse connection pool is closed. You can use Client.Close method to wait for pending data flush.

Available options:

func (*Client) Close

func (c *Client) Close()

Close flushes the pending data to the ClickHouse server and stops the goroutine pool and the clickhouse connection pool. The application should not exit before this method returns to prevent data loss.

func (*Client) GetConn

func (c *Client) GetConn() (driver.Conn, error)

GetConn returns the underlying clickhouse connection pool. Use it with care, especially never perform synchronous INSERT queries or SELECT *

func (*Client) Send

func (c *Client) Send(_ context.Context, data any) error

Send sends the record specified by the data parameter to the ClickHouse server. Send returns immediately if possible (before the actual data delivery). If all worker goroutines are busy, Send waits for up to /send-timeout (default: 50ms) and then logs the record as lost if the delivery is still not possible.

The supplied context is used for logging purposes only, and cancelling this context will not cancel the pending data delivery.

Send returns an error in the following cases:

  • when the Client is closed (ErrShutdown)
  • when the data schema is incorrect or doesn't match the corresponding table schema.

When the record delivery fails for some reason, the error isn't returned (see "Lost data"), because the data insertion is performed asynchronously.

TODO context isn't used right now but surely will be.

This method is compatible with the clickhouse-gate-client library method with the same name.

func (*Client) SendUnsafe

func (c *Client) SendUnsafe(ctx context.Context, data any) error

SendUnsafe calls Client.Send and returns. This method is for clickhouse-gate-client library compatibility only.

type OnlineConf

type OnlineConf interface {
	Path(string) string
	GetInt(path string, dfl int) int
	GetBool(path string, dfl bool) bool
	GetString(path, dfl string) string
	GetStringsErr(path string, dfl []string) ([]string, error)
	GetDuration(path string, dfl time.Duration) time.Duration
	GetStruct(path string, dataPtr any) (bool, error)
	SubscribeSubtree(path string) (chan struct{}, error)
	UnsubscribeChanSubtree(path string, ch chan<- struct{})
}

OnlineConf dependency. You can use Module, Subtree or any other compatible type.

type Option

type Option func(*options)

Option for Connect function.

func WithMetrics

func WithMetrics(ms advmetricsset.Set) Option

WithMetrics option enables metrics collection.

type UnixMicroer

type UnixMicroer interface {
	UnixMicro() int64
}

UnixMicroer is used for DateTime64(6) columns.

time.Time implements this interface, so you can use it to represent all the supported time resolutions.

type UnixMillier

type UnixMillier interface {
	UnixMilli() int64
}

UnixMillier is used for DateTime64(3) (or simply DateTime64) columns.

time.Time implements this interface, so you can use it to represent all the supported time resolutions.

type UnixNanoer

type UnixNanoer interface {
	UnixNano() int64
}

UnixNanoer is used for DateTime64(9) columns.

time.Time implements this interface, so you can use it to represent all the supported time resolutions.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL