smallset

package module
v0.4.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 2, 2026 License: MIT Imports: 4 Imported by: 4

README

smallset

smallset is a generic sorted set implementation for Go, backed by a slice. It is designed to be highly memory-efficient and offers fast range access, making it an excellent choice for small sets (under 1000 elements)

  • smallset.Ordered[T cmp.Ordered]: The fastest implementation, which works for any standard Go cmp.Ordered type (e.g., int, string, float64).

  • smallset.Custom[T any]: A flexible implementation that works for any type (T any), including structs or non-comparable types. It requires a user-provided comparison function to define order and uniqueness.

Installation

go get github.com/pippellia-btc/smallset

Features

  • Go Generics: Provides an Ordered set for cmp.Ordered types and a Custom set for any other type.

  • Flexible: The Custom set allows you to create sets of complex structs, defining order and uniqueness with a cmp(a, b) int function. This enables set operations on types that standard Go maps cannot support.

  • Sorted Data: Elements are always maintained in ascending order within the internal slice.

  • Fast Accessors: Provides O(1) performance for Min(), Max(), MinK(), and MaxK().

  • Memory Efficiency: Avoids the memory overhead of map keys and pointers associated with traditional hash maps, leading to better cache locality for small data sets.

  • Efficient Set Operations: Binary operations like Intersect, Union, and Difference are performed using the efficient two-pointer merge algorithm, which is often much faster than map-based iteration for sorted data.

Performance Trade-offs

Operation smallset map[T]struct{}
Lookup (Contains) $O(\log N)$ (Binary Search) $O(1)$ average (Hash Lookup)
Range Access (MinK/MaxK) $O(1)$ $O(N)$ (Requires map iteration or conversion)
Mutation (Add/Remove) $O(N)$ (Slice shift dominates) $O(1)$ average
Set Operations (Intersect, Union, Difference) $O(N)$ (Two-Pointer Merge) $O(N)$ (Iteration + $O(1)$ lookups)
Why use smallset over a map?

The most common advice is to use hash maps for their guaranteed $O(1)$ complexity. However, for small $N$, the map's large constant factor is so dominant that it outweighs the slice's slightly worse asymptotic complexity. The reasons are:

  • Cache Locality: A Go slice stores all its elements contiguously in memory. When performing set operations the CPU efficiently loads sequential elements into the cache with minimal latency. Map-based sets, however, store key-value pairs scattered across memory via hash buckets. This forces the CPU to constantly jump through pointers, resulting in more cache-misses.

  • Hashing Overhead: Every single map operation requires calculating a hash for the key. This cryptographic overhead is a constant tax that the slice-based set bypasses entirely, replacing it with simpler, faster value comparison.

The Sweet Spot: smallset is best used for sets where the size remains small (< 1000) or in scenarios where additions/removals are infrequent, but lookups, iteration, and ordered access are common.

Benchmarks

We tested smallset against github.com/deckarep/golang-set, the most used map-based set implementation. See the results here.

TLDR: smallset.Ordered offers comparable performance for insertions and deletions up to 1000 elements, while outperforming the map based sets by 10-30x for heavy set-like operations like Intersect, Difference, SymetricDifference...

Usage Examples

Using smallset.Ordered (fastest for cmp.Ordered types)

package main

import (
	"fmt"
	"github.com/pippellia-btc/smallset"
)

func main() {
	// 1. Create a New set with initial capacity
	set := smallset.New[int](5)

	// 2. Add elements (returns true if added, false if duplicate)
	set.Add(20) // true
	set.Add(10) // true
	set.Add(20) // false (duplicate)
	fmt.Println("Set size:", set.Size()) // 2

	// 3. Contains and Remove
	set.Contains(10) 	// true
	set.Remove(10)		// true (actually removed)
	set.Remove(69)		// false (not in the set)
	set.Contains(10) 	// false
}

Min/Max and TopK (smallset.Ordered)

package main

import (
	"fmt"
	"github.com/pippellia-btc/smallset"
)

func main() {
	set := smallset.NewFrom(50, 10, 30, 20, 40) // Internally sorted: [10, 20, 30, 40, 50]

	// MinK (returns k smallest elements)
	mins := set.MinK(2)
	fmt.Println("2 Smallest:", mins) // [10 20]

	// MaxK (returns k largest elements)
	maxs := set.MaxK(3)
	fmt.Println("3 Largest:", maxs) // [30 40 50]

	// MinK with k > size (returns all elements)
	all := set.MinK(10)
	fmt.Println("All elements:", all) // [10 20 30 40 50]
}

Using smallset.Custom (flexible, for any type)

package main

import (
    "cmp"
    "fmt"
    "github.com/pippellia-btc/smallset"
)

// A custom struct, which is not even comparable due to the slice field.
type Person struct {
    ID   	int
    Hobbies []string
}

func CmpPerson(a, b Person) int {
    // We can use the standard cmp.Compare for the underlying field
    return cmp.Compare(a.ID, b.ID)
}

func main() {
    // 1. Create a new set, providing the comparison function.
    set := smallset.NewCustom[Person](CmpPerson, 5)

    // 2. Add elements (returns true if added, false if duplicate)
    set.Add(Person{ID: 10, Name: "Alice"})	// true
    set.Add(Person{ID: 5, Name: "Bob"})		// true
	set.Add(Person{ID: 10, Name: "A."})		// false (same ID)
    set.Size() // 2

	// 3. Contains and Remove
	set.Contains(Person{ID: 10}) 	// true
	set.Remove(Person{ID: 10})		// true (actually removed)
	set.Remove(Person{ID: 69})		// false (not in the set)
	set.Contains(10) 				// false
}

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Custom added in v0.2.0

type Custom[T any] struct {
	// contains filtered or unexported fields
}

Custom is a slice-based set sorted in ascending order, as determined by the cmp function provided in the contructor. If T is an ordered type, you should use Ordered for better performance.

It's more performant that a map based approach for small collections (< 1000). The capacity of the set can dynamically grow, but the performance would start to deteriorate. Not safe for concurrent use.

func CustomFrom added in v0.4.3

func CustomFrom[T any](cmp func(a, b T) int, items ...T) *Custom[T]

CustomFrom returns an initialized set that contains the provided elements, sorted by the provided compare function cmp.

The cmp function allows two elements, a and b, to be compared, following a similar convention to that of the slices package. - cmp(a, b) < 0 if a < b - cmp(a, b) > 0 if a > b - cmp(a, b) == 0 if a = b (duplicates)

It panics if cmp is nil.

func IntersectCustom added in v0.4.0

func IntersectCustom[T any](compare func(a, b T) int, sets ...*Custom[T]) *Custom[T]

IntersectCustom efficiently finds the common elements present in *all* provided Custom sets. The 'cmp' function defines the ordering for the resulting set, and *must* be the same as the comparison functions of all sets. It works by iteratively intersecting sets from the smallest to the biggest. It sorts the sets slice in place.

func MergeCustom added in v0.4.0

func MergeCustom[T any](compare func(a, b T) int, sets ...*Custom[T]) *Custom[T]

MergeCustom efficiently combines multiple Custom sets into a single new set with the specified comparison function cmp. This is significantly more efficient than chaining s1.Union(s2).Union(s3)... as it performs only a single sort and compact operation on the combined data.

func NewCustom added in v0.2.0

func NewCustom[T any](cmp func(a, b T) int, capacity int) *Custom[T]

NewCustom returns an initialized set with the provided compare function and capacity.

The cmp function allows two elements, a and b, to be compared, following a similar convention to that of the slices package. - cmp(a, b) < 0 if a < b - cmp(a, b) > 0 if a > b - cmp(a, b) == 0 if a = b (duplicates)

It panics if the cmp function is nil or capacity is <= 0.

func (*Custom[T]) Add added in v0.2.0

func (s *Custom[T]) Add(e T) bool

Add an element and returns whether is was added (true), or was already present (false).

func (*Custom[T]) Ascend added in v0.2.0

func (s *Custom[T]) Ascend() iter.Seq2[int, T]

Ascend returns an iterator over the set in ascending order.

func (*Custom[T]) At added in v0.2.0

func (s *Custom[T]) At(i int) T

At returns the element at index i or panics if out of range.

func (*Custom[T]) BetweenAsc added in v0.2.0

func (s *Custom[T]) BetweenAsc(min, max T) iter.Seq2[int, T]

BetweenAsc iterates CustomFrom min (inclusive) to max (exclusive) in ascending order. If min or max are not present in the set, iteration starts/ends at the position where they would appear in the sorted slice. Panics if max < min.

func (*Custom[T]) BetweenDesc added in v0.2.0

func (s *Custom[T]) BetweenDesc(max, min T) iter.Seq2[int, T]

BetweenDesc iterates CustomFrom max (inclusive) down to min (exclusive) in descending order. If min or max are not present in the set, iteration starts/ends at the position where they would appear in the sorted slice. Panics if max < min.

func (*Custom[T]) Capacity added in v0.4.2

func (s *Custom[T]) Capacity() int

Capacity returns the capacity of the underlying slice.

func (*Custom[T]) Clear added in v0.2.0

func (s *Custom[T]) Clear()

Clear removes all elements from the set.

It zeroes out the elements to prevent memory leaks (releasing references) and resets the length to 0. The underlying array capacity is preserved to minimize allocations during future insertions.

func (*Custom[T]) Clone added in v0.2.0

func (s *Custom[T]) Clone() *Custom[T]

Clone returns a clone of the set, that shares the cmp comparator function.

func (*Custom[T]) Contains added in v0.2.0

func (s *Custom[T]) Contains(e T) bool

Contains returns whether the element is in the set. Operation is O(log(N))

func (*Custom[T]) Descend added in v0.2.0

func (s *Custom[T]) Descend() iter.Seq2[int, T]

Descend returns an iterator over the set in descending order.

func (*Custom[T]) Difference added in v0.2.0

func (s *Custom[T]) Difference(other *Custom[T]) *Custom[T]

Difference returns the difference between this set and other. The returned set will contain all elements of this set that are not elements of other. O(N+M) complexity. s1 and s2 must use the same (or equivalent) comparison functions.

func (*Custom[T]) Find added in v0.2.0

func (s *Custom[T]) Find(e T) (int, bool)

Find returns the index of an element, or the position where target would appear in the sort order. It also returns a bool saying whether the target is really found in the slice.

func (*Custom[T]) Intersect added in v0.2.0

func (s *Custom[T]) Intersect(other *Custom[T]) *Custom[T]

Intersect returns the intersection of two sets, returning a NewCustom set containing only the common elements. O(N+M) complexity. s1 and s2 must use the same (or equivalent) comparison functions.

func (*Custom[T]) IsEmpty added in v0.2.0

func (s *Custom[T]) IsEmpty() bool

IsEmpty returns whether the set has no elements.

func (*Custom[T]) IsEqual added in v0.2.0

func (s *Custom[T]) IsEqual(other *Custom[T]) bool

IsEqual returns whether the two sets have the same elements.

func (*Custom[T]) Items added in v0.2.0

func (s *Custom[T]) Items() []T

Items returns a copy of the internal slice of the set.

func (*Custom[T]) Max added in v0.2.0

func (s *Custom[T]) Max() T

Max returns the biggest element in the sets. It panics if the set is empty.

func (*Custom[T]) MaxK added in v0.2.0

func (s *Custom[T]) MaxK(k int) []T

MaxK returns the k biggest elements in s, sorted in ascending order. O(k) complexity. It panics if k is negative. If k is bigger than the set size, it returns all the items.

func (*Custom[T]) Min added in v0.2.0

func (s *Custom[T]) Min() T

Min returns the smallest element in the set. It panics if the set is empty.

func (*Custom[T]) MinK added in v0.2.0

func (s *Custom[T]) MinK(k int) []T

MinK returns the k smallest elements in s, sorted in ascending order. O(k) complexity. It panics if k is negative. If k is bigger than the set size, it returns all the items.

func (*Custom[T]) Partition added in v0.2.0

func (s1 *Custom[T]) Partition(s2 *Custom[T]) (d12, inter, d21 *Custom[T])

Partition returns three NewCustom sets: - d12: elements in s1 not in s2 - inter: elements in both sets - d21: elements in s2 not in s1 O(N+M) complexity.

s1 and s2 must use the same (or equivalent) comparison functions.

func (*Custom[T]) Remove added in v0.2.0

func (s *Custom[T]) Remove(e T) bool

Remove an element if present, and returns whether is was removed (true), or was never present (false).

func (*Custom[T]) RemoveBefore added in v0.3.0

func (s *Custom[T]) RemoveBefore(max T) int

RemoveBefore removes all elements e such that e < max. Returns num removed.

func (*Custom[T]) RemoveBetween added in v0.3.0

func (s *Custom[T]) RemoveBetween(min, max T) int

RemoveBetween removes all elements e such that min <= e < max. Returns num removed.

func (*Custom[T]) RemoveFrom added in v0.3.0

func (s *Custom[T]) RemoveFrom(min T) int

RemoveFrom removed all elements e such that e >= min. Returns num removed.

func (*Custom[T]) Size added in v0.2.0

func (s *Custom[T]) Size() int

Size returns the number of elements in the set.

func (*Custom[T]) SymmetricDifference added in v0.2.0

func (s *Custom[T]) SymmetricDifference(other *Custom[T]) *Custom[T]

SymmetricDifference returns a NewCustom set with all elements which are in either this set or the other set but not in both. O(N+M) complexity. s1 and s2 must use the same (or equivalent) comparison functions.

func (*Custom[T]) Union added in v0.2.0

func (s *Custom[T]) Union(other *Custom[T]) *Custom[T]

Union returns a NewCustom set with all elements in both sets. O(N+M) complexity. s1 and s2 must use the same (or equivalent) comparison functions.

type Ordered added in v0.2.0

type Ordered[T cmp.Ordered] struct {
	// contains filtered or unexported fields
}

Ordered is a slice-based set sorted in ascending order. It's more performant that a map based approach for small collections (< 1000) of ordered types. The capacity of the set can dynamically grow, but the performance would start to deteriorate. Not safe for concurrent use.

func From

func From[T cmp.Ordered](items ...T) *Ordered[T]

From returns an initialized set that contains the provided elements.

func Intersect added in v0.4.0

func Intersect[T cmp.Ordered](sets ...*Ordered[T]) *Ordered[T]

Intersect efficiently finds the common elements present in *all* provided Ordered sets. It works by iteratively intersecting sets from the smallest to the biggest. It sorts the sets slice in place.

func Merge added in v0.4.0

func Merge[T cmp.Ordered](sets ...*Ordered[T]) *Ordered[T]

Merge efficiently combines multiple Ordered sets into a single new set. This is significantly more efficient than chaining s1.Union(s2).Union(s3)... as it performs only a single sort and compact operation on the combined data.

func New

func New[T cmp.Ordered](capacity int) *Ordered[T]

New returns an initialized set with the provided capacity. It panics if the capacity is <= 0.

func (*Ordered[T]) Add added in v0.2.0

func (s *Ordered[T]) Add(e T) bool

Add an element and returns whether is was added (true), or was already present (false).

func (*Ordered[T]) Ascend added in v0.2.0

func (s *Ordered[T]) Ascend() iter.Seq2[int, T]

Ascend returns an iterator over the set in ascending order.

func (*Ordered[T]) At added in v0.2.0

func (s *Ordered[T]) At(i int) T

At returns the element at index i or panics if out of range.

func (*Ordered[T]) BetweenAsc added in v0.2.0

func (s *Ordered[T]) BetweenAsc(min, max T) iter.Seq2[int, T]

BetweenAsc iterates From min (inclusive) to max (exclusive) in ascending order. If min or max are not present in the set, iteration starts/ends at the position where they would appear in the sorted slice. Panics if max < min.

func (*Ordered[T]) BetweenDesc added in v0.2.0

func (s *Ordered[T]) BetweenDesc(max, min T) iter.Seq2[int, T]

BetweenDesc iterates From max (inclusive) down to min (exclusive) in descending order. If min or max are not present in the set, iteration starts/ends at the position where they would appear in the sorted slice. Panics if max < min.

func (*Ordered[T]) Capacity added in v0.4.2

func (s *Ordered[T]) Capacity() int

Capacity returns the capacity of the underlying slice.

func (*Ordered[T]) Clear added in v0.2.0

func (s *Ordered[T]) Clear()

Clear removes all elements from the set.

It zeroes out the elements to prevent memory leaks (releasing references) and resets the length to 0. The underlying array capacity is preserved to minimize allocations during future insertions.

func (*Ordered[T]) Clone added in v0.2.0

func (s *Ordered[T]) Clone() *Ordered[T]

Clone returns a clone of the set.

func (*Ordered[T]) Contains added in v0.2.0

func (s *Ordered[T]) Contains(e T) bool

Contains returns whether the element is in the set. Operation is O(log(N))

func (*Ordered[T]) Descend added in v0.2.0

func (s *Ordered[T]) Descend() iter.Seq2[int, T]

Descend returns an iterator over the set in descending order.

func (*Ordered[T]) Difference added in v0.2.0

func (s *Ordered[T]) Difference(other *Ordered[T]) *Ordered[T]

Difference returns the difference between this set and other. The returned set will contain all elements of this set that are not elements of other. O(N+M) complexity.

func (*Ordered[T]) Find added in v0.2.0

func (s *Ordered[T]) Find(e T) (int, bool)

Find returns the index of an element, or the position where target would appear in the sort order. It also returns a bool saying whether the target is really found in the slice.

func (*Ordered[T]) Intersect added in v0.2.0

func (s *Ordered[T]) Intersect(other *Ordered[T]) *Ordered[T]

Intersect returns the intersection of two sets, returning a New set containing only the common elements. O(N+M) complexity.

func (*Ordered[T]) IsEmpty added in v0.2.0

func (s *Ordered[T]) IsEmpty() bool

IsEmpty returns whether the set has no elements.

func (*Ordered[T]) IsEqual added in v0.2.0

func (s *Ordered[T]) IsEqual(other *Ordered[T]) bool

IsEqual returns whether the two sets have the same elements.

func (*Ordered[T]) Items added in v0.2.0

func (s *Ordered[T]) Items() []T

Items returns a copy of the internal slice of the set.

func (*Ordered[T]) Max added in v0.2.0

func (s *Ordered[T]) Max() T

Max returns the biggest element in the sets. It panics if the set is empty.

func (*Ordered[T]) MaxK added in v0.2.0

func (s *Ordered[T]) MaxK(k int) []T

MaxK returns the k biggest elements in s, sorted in ascending order. O(k) complexity. It panics if k is negative. If k is bigger than the set size, it returns all the items.

func (*Ordered[T]) Min added in v0.2.0

func (s *Ordered[T]) Min() T

Min returns the smallest element in the set. It panics if the set is empty.

func (*Ordered[T]) MinK added in v0.2.0

func (s *Ordered[T]) MinK(k int) []T

MinK returns the k smallest elements in s, sorted in ascending order. O(k) complexity. It panics if k is negative. If k is bigger than the set size, it returns all the items.

func (*Ordered[T]) Partition added in v0.2.0

func (s1 *Ordered[T]) Partition(s2 *Ordered[T]) (d12, inter, d21 *Ordered[T])

Partition returns three New sets: - d12: elements in s1 not in s2 - inter: elements in both sets - d21: elements in s2 not in s1 O(N+M) complexity.

func (*Ordered[T]) Remove added in v0.2.0

func (s *Ordered[T]) Remove(e T) bool

Remove an element if present, and returns whether is was removed (true), or was never present (false).

func (*Ordered[T]) RemoveBefore added in v0.3.0

func (s *Ordered[T]) RemoveBefore(max T) int

RemoveBefore removes all elements e such that e < max. Returns num removed.

func (*Ordered[T]) RemoveBetween added in v0.3.0

func (s *Ordered[T]) RemoveBetween(min, max T) int

RemoveBetween removes all elements e such that min <= e < max. Returns num removed.

func (*Ordered[T]) RemoveFrom added in v0.3.0

func (s *Ordered[T]) RemoveFrom(min T) int

RemoveFrom removed all elements e such that e >= min. Returns num removed.

func (*Ordered[T]) Size added in v0.2.0

func (s *Ordered[T]) Size() int

Size returns the number of elements in the set.

func (*Ordered[T]) SymmetricDifference added in v0.2.0

func (s *Ordered[T]) SymmetricDifference(other *Ordered[T]) *Ordered[T]

SymmetricDifference returns a New set with all elements which are in either this set or the other set but not in both. O(N+M) complexity.

func (*Ordered[T]) Union added in v0.2.0

func (s *Ordered[T]) Union(other *Ordered[T]) *Ordered[T]

Union returns a New set with all elements in both sets. O(N+M) complexity.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL