eightsetmap

package module
v0.0.0-...-b51f204 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 28, 2021 License: MIT Imports: 15 Imported by: 0

README

eightsetmap

a hyper-specialized data structure that supports a map from 64-bit integers onto sorted sets of 64-bit integers without being constrained to available memory.

It's like a map[uint64]map[uint64]struct{} but the maps are sorted, or like a map[uint64][]uint64 but the slice has no duplicates.

The data is stored on disk in a way that makes direct access reasonably efficient, and allows for standard set operations like union, intersection, and difference. Manipulation of the data is supported, but the data structure and operations are optimized primarly for heavy read access patterns, and essentially minimal writes. Essentially write once, then access a lot.

Simply keeping the metadata for 1 billion 64bit keys (and matching 64bit offsets to their sets) would take over 16Gb of memory (and that doesn't even include the sets themselves). So this framework uses a shift value to truncate keys for an intermediate lookup. This is a simple bitwise shift right on the key, so each increment cuts the key storage requirements approximately in half. When using a shift>0, the storage layer will do an initial lookup to see if a key in the right range is even present, and then do a fast lookup (at most reading 2^shift total key-offsets) on the disk for the actual key.

Documentation

Overview

Package eightsetmap contains a hyper-specialized map[uint64][]uint64 wrapper that implements an out-of-core storage method. So if you have a machine with 16G of RAM, you can still access 64G of data.

Index

Constants

View Source
const (
	// Magic header uint32 = 'j8sm' defines the file type.
	MAGIC uint32 = 0x6d73386a
)

Variables

View Source
var (
	// DefaultCapacity of sets within the backing file
	DefaultCapacity uint32 = 32

	// FillFactor is the fill cutoff to bump to the next capacity size when saving.
	// e.g. if FillFactor out of DefaultCapacity slots are used in the last bucket,
	// add more capacity.
	FillFactor uint32 = 24
)
View Source
var (
	// DefaultCacheSize is the number of keys to keep in a LRU cache for each map.
	DefaultCacheSize = 65535
)

Functions

func DefaultPacker

func DefaultPacker(key uint64, valsize uint32) (count int, extra interface{})

DefaultPacker tells the serialization code to leave some padding in the file so that minimal updates can be performed in-place.

func Difference

func Difference(m Map, k1, k2 uint64) []uint64

Difference returns the set of values for k1 after removing any values also found in k2's set.

func Intersect

func Intersect(m Map, k1, k2 uint64) []uint64

Intersect returns the set of values associated to both k1 and k2.

func MultiIntersect

func MultiIntersect(m Map, keys ...uint64) []uint64

MultiIntersect returns the set of values associated to all of the given keys. Note that if any keys are missing, or any pair has no intersection then the result is empty.

NB this implementation is not fully optimized yet

func MultiUnion

func MultiUnion(m Map, keys ...uint64) []uint64

MultiUnion returns the set of unique values associated to any of the given keys.

NB this implementation is not fully optimized yet

func TightPacker

func TightPacker(key uint64, valsize uint32) (count int, extra interface{})

TightPacker does not reserve any extra space in the disk storage format.

func Union

func Union(m Map, k1, k2 uint64) []uint64

Union returns the set of unique values associated to either k1 or k2.

Types

type Map

type Map interface {
	// Get returns a slice of values for the given key.
	Get(key uint64) ([]uint64, bool)

	// GetSet returns a set of values for the given key.
	GetSet(key uint64) (map[uint64]struct{}, bool)

	// GetWithExtra returns a slice of values for the given key, and calls the "extra" func
	// for any additional data stored within the lookup table.
	GetWithExtra(key uint64, extra func(n int, r io.Reader)) ([]uint64, bool)

	// EachKey calls eachFunc for every key in the map until a non-nil error is returned.
	EachKey(eachFunc func(uint64) error) error

	// GetSize gets the size of the set of values for the given key
	GetSize(key uint64) (uint32, bool)

	// GetCapacity gets the capacity reserved for the set of values for the given key
	GetCapacity(key uint64) (uint32, bool)
}

func MMap

func MMap(mp Map) Map

func New

func New(filename string) Map

New returns a new Map backed by the (possibly empty) data in filename.

func NewShifted

func NewShifted(filename string, shift uint64) Map

NewShifted returns a Map with shifting enabled to reduce core memory usage. A shift is a power of 2 factor, so shift=1 means that memory usage is approximately cut in half, but that lookups will take additional disk seeks.

type MutableKey

type MutableKey struct {
	*MutableMap
	// contains filtered or unexported fields
}

MutableKey represents a key that is open for writing changes to the set of values. Once open, the calling code must call Sync to add changes to the MutableMap's buffered changes.

func (*MutableKey) Clear

func (k *MutableKey) Clear()

Clear empties the set of values for the key.

func (*MutableKey) Discard

func (k *MutableKey) Discard()

Discard frees up internal references to this key to release memory.

func (*MutableKey) Put

func (k *MutableKey) Put(val uint64)

Put adds a value to the key's set.

func (*MutableKey) PutSet

func (k *MutableKey) PutSet(vals map[uint64]struct{})

PutSet adds a set of values to the key's set.

func (*MutableKey) PutSlice

func (k *MutableKey) PutSlice(vals []uint64)

PutSlice adds a slice of values to the key's set.

func (*MutableKey) Remove

func (k *MutableKey) Remove(val uint64)

Remove a value from the key's set.

func (*MutableKey) RemoveSet

func (k *MutableKey) RemoveSet(vals map[uint64]struct{})

RemoveSet removes a set of values from the key's set.

func (*MutableKey) RemoveSlice

func (k *MutableKey) RemoveSlice(vals []uint64)

RemoveSlice removes a slice of values from the key's set.

func (*MutableKey) Sync

func (k *MutableKey) Sync()

Sync prepares the key's new data for writing to disk by copying updates to the linked MutableMap. MutableKey may continue to be used but will not reflect other changes outside of it own scope (since OpenKey).

type MutableMap

type MutableMap struct {
	Map *stdMap
	// contains filtered or unexported fields
}

MutableMap represents a Map that can be written to.

func Mutate

func Mutate(m Map, autosync bool) *MutableMap

Mutate creates a mutable reference to the map. To write any changes to disk, you must call Commit first. Mutated keys will not be visible to the parent Map until after a reload.

If autosync is true, then mutated keys are automatically Sync()ed when Commit is called. If false, then you must Sync() mutated keys manually to pull them into a Commit.

func (*MutableMap) Commit

func (m *MutableMap) Commit(packed bool) error

Commit writes the changed entries to disk. If packed is true, then no empty room is left for later expansion. The MutableMap can be immediately reused after a successful commit.

If packed is false, then a much faster in-place commit is possible (using the additional space reserved from the previous un-packed commit). If an in-place commit is not possible then a standard full commit will be used.

Note if autosync is enabled and there are no changes, nothing will be done.

func (*MutableMap) CommitWithPacker

func (m *MutableMap) CommitWithPacker(packer PackerFunc) error

CommitWithPacker allows the usage of custom data embedded into the lookup table. Maps saved using custom packers should not be modified unless you know what you are doing.

Note if autosync is enabled and there are no changes, nothing will be done.

func (*MutableMap) Get

func (m *MutableMap) Get(key uint64) ([]uint64, bool)

Get returns a slice of values for the given key. If there is a newly written, uncommitted key then it will be returned.

func (*MutableMap) GetSet

func (m *MutableMap) GetSet(key uint64) (map[uint64]struct{}, bool)

GetSet returns a set of values for the given key. If there is a newly written, uncommitted key then it will be returned.

func (*MutableMap) OpenKey

func (m *MutableMap) OpenKey(key uint64) *MutableKey

OpenKey prepares a key for writing. You must call Sync to mark data for later commit to disk.

func (*MutableMap) SetOutputFilename

func (m *MutableMap) SetOutputFilename(fn string)

SetOutputFilename tells this MutableMap to commit to a different filename. Use the empty string "" to save to the default filename (the source Map's filename).

type PackerFunc

type PackerFunc func(key uint64, valsize uint32) (count int, extra interface{})

PackerFunc is a function that tells the serialization code how to pack additional data into the file. Additional data MUST by 8-byte aligned, and returned in the 'extra' return value. The count must be the number of 8-byte chunks found.

Note that extra must be a data type serializable by `encoding/binary`, but it does not necessarily need to be 64bit types.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL