arboreal

package module
v0.0.0-...-0458b4b Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 10, 2026 License: Apache-2.0 Imports: 5 Imported by: 0

README

arboreal

(chiefly of animals) living in trees.

arboreal logo

Pure Go library for inferencing gradient boosted decision trees.

Usage

go get github.com/stillmatic/arboreal

Regression exmaple

schema, _ := arboreal.NewGBDTFromXGBoostJSON("testdata/regression.json")
inpArr := []float32{0.1, 0.2, 0.3, 0.4, 0.5}
inpVec := arboreal.SparseVectorFromArray(inpArr)
res, _ := schema.Predict(inpVec)

Optimized classification example

schema, _ := arboreal.NewGBDTFromXGBoostJSON("testdata/mortgage_xgb.json")
newRes := arboreal.NewOptimizedGBDTClassifierFromSchema(schema)
vec := make(arboreal.SparseVector, 44)
res, _ = newRes.Predict(vec)

Why?

Go is a great language for backend development, especially for web-facing apps. However, it is not a great language for machine learning. Machine learning and data science deal with messy data and benefit from the comparative flexibility of Python. Building models in Python is easy, but serving them is not particularly quick - Python's GIL and weak concurrency model make it difficult to serve many requests at high throughput.

This library aims to solve that problem. It is a pure Go implementation of gradient boosted decision trees, possibly the most popular machine learning model type within enterprise applications. It is optimized for serving inference requests, and is fast enough to be used in a production web server.

Documentation

Overview

Package arboreal is a pure Go package for XGBoost model inference.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Softmax

func Softmax(ys []float32) []float32

Softmax applies the softmax function to a slice of scores.

Types

type GBLinear

type GBLinear struct {
	Name  string `json:"name"`
	Model struct {
		Weights []float32 `json:"weights"`
	} `json:"model"`
}

GBLinear represents a linear booster model.

func (*GBLinear) GetName

func (m *GBLinear) GetName() string

func (*GBLinear) Predict

func (m *GBLinear) Predict(features SparseVector) ([]float32, error)

type GBTModelAoS

type GBTModelAoS struct {
	Trees []TreeAoS
}

func OptimizedGBTModelAoS

func OptimizedGBTModelAoS(in *model) *GBTModelAoS

OptimizedGBTModelAoS converts a parsed model to the compact AoS representation.

func (*GBTModelAoS) GetName

func (m *GBTModelAoS) GetName() string

func (*GBTModelAoS) Predict

func (m *GBTModelAoS) Predict(features SparseVector) ([]float32, error)

type GBTModelOptimized

type GBTModelOptimized struct {
	Trees []TreeOptimized
}

GBTModelOptimized holds the ensemble of optimized trees.

func OptimizedGBTModel

func OptimizedGBTModel(in *model) *GBTModelOptimized

OptimizedGBTModel converts a parsed model to the SoA optimized representation.

func (*GBTModelOptimized) GetName

func (m *GBTModelOptimized) GetName() string

func (*GBTModelOptimized) Predict

func (m *GBTModelOptimized) Predict(features SparseVector) ([]float32, error)

func (*GBTModelOptimized) PredictDense

func (m *GBTModelOptimized) PredictDense(features []float32) []float32

PredictDense predicts using a dense feature vector. Missing values are represented as NaN. This is the fast path — no map lookups.

type GBTree

type GBTree struct {
	Model model  `json:"model"`
	Name  string `json:"name"`
}

GBTree is the deserialization target for gbtree gradient boosters. It is converted to GBTModelOptimized at load time.

type GradientBooster

type GradientBooster interface {
	GetName() string
	Predict(features SparseVector) ([]float32, error)
}

GradientBooster is the interface for tree ensemble and linear boosters.

type NodeAoS

type NodeAoS struct {
	SplitCondition float32 // split threshold or leaf value
	LeftChild      int32   // left child index (-1 for leaf)
	SplitIndex     int32   // feature index for this split
	RightChild     int32   // right child index (categorical only)
	Category       int32   // categorical split value
	DefaultLeft    bool    // missing value → go left
	IsLeaf         bool    // precomputed: leftChild == -1 && rightChild == -1
	SplitType      uint8   // 0=numerical, 1=categorical

}

NodeAoS is a compact 24-byte node struct. Fields are ordered largest-first to minimize padding. Hot fields (SplitCondition, LeftChild, SplitIndex, DefaultLeft, IsLeaf) are all within the first 16 bytes.

type Objective

type Objective struct {
	Name   string
	Params map[string]string
}

Objective holds the parsed objective function metadata.

type SparseVector

type SparseVector map[int]float32

func SparseVectorFromArray

func SparseVectorFromArray(arr []float32) SparseVector

type TreeAoS

type TreeAoS struct {
	Nodes          []NodeAoS // contiguous value slice, not pointers
	HasCategorical bool
}

func OptimizedTreeAoS

func OptimizedTreeAoS(in *tree) TreeAoS

OptimizedTreeAoS converts a single parsed tree to compact AoS layout.

func (*TreeAoS) Predict

func (t *TreeAoS) Predict(features SparseVector) float32

Predict dispatches to numerical or categorical path.

func (*TreeAoS) PredictDense

func (t *TreeAoS) PredictDense(features []float32) float32

PredictDense dispatches using dense []float32. Missing = NaN.

type TreeOptimized

type TreeOptimized struct {
	LeftChild      []int32
	RightChild     []int32
	SplitIndex     []int32
	SplitCondition []float32
	DefaultLeft    []bool
	SplitType      []uint8 // 0=numerical, 1=categorical
	Category       []int32
	HasCategorical bool
}

TreeOptimized uses a struct-of-arrays layout for cache-friendly tree traversal. Each field is a contiguous slice indexed by node ID. Numerical-only trees use the fast path (predictNumerical); trees with any categorical splits use predictMixed.

func OptimizedTree

func OptimizedTree(in *tree) TreeOptimized

OptimizedTree converts a single parsed tree to SoA layout.

func (*TreeOptimized) Predict

func (t *TreeOptimized) Predict(features SparseVector) float32

Predict dispatches to the appropriate traversal method using SparseVector input.

func (*TreeOptimized) PredictDense

func (t *TreeOptimized) PredictDense(features []float32) float32

PredictDense dispatches using a dense []float32 slice. Missing = NaN.

type XGBoostSchema

type XGBoostSchema struct {
	Learner *learner `json:"learner"`
	Version []int    `json:"version"`

	// AoS model for benchmarking comparison (built alongside SoA at load time)
	ModelAoS *GBTModelAoS
	// contains filtered or unexported fields
}

XGBoostSchema is the top-level model representation. After loading, perScore and postProcess are resolved from the objective so the prediction hot path has no switch statements.

func NewGBDTFromXGBoostJSON

func NewGBDTFromXGBoostJSON(filename string) (*XGBoostSchema, error)

NewGBDTFromXGBoostJSON loads an XGBoost model from a JSON file and resolves the objective into function pointers for fast prediction. Builds both SoA and AoS tree representations for benchmarking.

func (*XGBoostSchema) Predict

func (m *XGBoostSchema) Predict(features SparseVector) ([]float32, error)

Predict runs inference through the full XGBoost pipeline.

func (*XGBoostSchema) PredictDense

func (m *XGBoostSchema) PredictDense(features []float32) ([]float32, error)

PredictDense runs inference using a dense feature vector (SoA trees, no map overhead). Missing features should be set to NaN. Uses a fused loop: tree prediction, per-class accumulation, and objective transform happen in one pass with no intermediate []float32 allocation for tree results.

func (*XGBoostSchema) PredictDenseAoS

func (m *XGBoostSchema) PredictDenseAoS(features []float32) ([]float32, error)

PredictDenseAoS runs inference using the compact AoS tree layout. Same fused loop as PredictDense but with AoS cache locality.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL