bufarrow

package module
v0.5.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 4, 2025 License: Apache-2.0 Imports: 23 Imported by: 2

README

bufarrow 🦬

Go Reference

Go library to build Apache Arrow records from Protocol Buffers

Features

  • generate Arrow and Parquet schemas from Protobuf structs
  • augment Protobuf data with custom fields
  • build Arrow records from Protobuf
  • build Arrow records with normalized flattened Protobuf

🚀 Install

Using bufarrow is easy. First, use go get to install the latest version of the library.

go get -u github.com/loicalleyne/bufarrow@latest

💡 Usage

You can import bufarrow using:

import "github.com/loicalleyne/bufarrow"

💫 Show your support

Give a ⭐️ if this project helped you! Feedback and PRs welcome.

License

Bufarrow is released under the Apache 2.0 license. See LICENCE.txt

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ErrMxDepth = errors.New("max depth reached, either the message is deeply nested or a circular dependency was introduced")
View Source
var ErrPathNotFound = errors.New("path not found")

Functions

This section is empty.

Types

type Cardinality added in v0.5.1

type Cardinality protoreflect.Cardinality

Cardinality determines whether a field is optional, required, or repeated.

const (
	Optional Cardinality = 1 // appears zero or one times
	Required Cardinality = 2 // appears exactly one time; invalid with Proto3
	Repeated Cardinality = 3 // appears zero or more times
)

Constants as defined by the google.protobuf.Cardinality enumeration.

func (*Cardinality) Get added in v0.5.1

type CustomField added in v0.5.1

type CustomField struct {
	// Name must not conflict with existing proto.Message field names.
	Name string `toml:"name"`
	// Supported types:
	// 		BOOL	bool
	// 		BYTES	[]byte
	//		STRING	string
	//		INT64	int64
	//		FLOAT64	float64
	Type FieldType `toml:"type"`
	// FieldCardinality is a type alias of protoreflect.Cardinality.
	// Cardinality determines whether a field is optional, required, or repeated.
	// const (
	// 	Optional Cardinality = 1 // appears zero or one times
	// 	Required Cardinality = 2 // appears exactly one time; invalid with Proto3
	// 	Repeated Cardinality = 3 // appears zero or more times
	// )
	// Constants as defined by the google.protobuf.Cardinality enumeration.
	FieldCardinality Cardinality `toml:"field_cardinality"`
	// IsPacked reports whether repeated primitive numeric kinds should be
	// serialized using a packed encoding.
	// If true, then it implies Cardinality is Repeated.
	IsPacked bool `toml:"is_packed"`
}

type FieldType added in v0.5.1

type FieldType fieldType
const (
	BOOL    FieldType = "bool"
	BYTES   FieldType = "[]byte"
	STRING  FieldType = "string"
	INT64   FieldType = "int64"
	FLOAT64 FieldType = "float64"
)

type Opt added in v0.5.0

type Opt struct {
	// contains filtered or unexported fields
}

type Option added in v0.5.0

type Option func(config)

func WithCustomFields added in v0.5.1

func WithCustomFields(c []CustomField) Option

WithCustomFields adds user-defined fields to the message schema which can be populated with AppendWithCustom().

func WithNormalizer added in v0.5.0

func WithNormalizer(fields, aliases []string, failOnRangeError bool) Option

WithNormalizer configures the scalars to add to a flat Arrow Record suitable for efficient aggregation. Fields should be specified by their path (field names separated by a period ie. 'field1.field2.field3'). The Arrow field types of the selected fields will be used to build the new schema. If coaslescing data between multiple fields of the same type, specify only one of the paths. List fields should have an index to retrieve specified, otherwise defaults to all elements; ranges are not yet implemented. Current functionality is limited to valitating the fields/aliases match in `New()“, and `NormalizerBuilder()` returning an `*arrow.RecordBuilder` to be used externally to append data, and NewNormalizerRecord() to get an `arrow.Record` from the normalizer RecordBuilder. Future development may include Append methods that accept protopath operations to normalize protobuf messages in-flight internally to the package. failOnRangeError indicates whether to fail on a list[start:end] where end > len(list). TODO

type Schema

type Schema[T proto.Message] struct {
	// contains filtered or unexported fields
}

func New

func New[T proto.Message](mem memory.Allocator, opts ...Option) (schema *Schema[T], err error)

New returns a new bufarrow.Schema. Options include WithNormalizer and WithCustomFields. WithNormalizer creates a separate Arrow record whilst WithCustomFields expands the schema of the proto.Message used as the type parameter.

func (*Schema[T]) Append

func (s *Schema[T]) Append(value T)

Append appends protobuf value to the schema builder. This method is not safe for concurrent use.

func (*Schema[T]) AppendWithCustom added in v0.5.1

func (s *Schema[T]) AppendWithCustom(value T, c ...any) error

AppendWithCustom appends protobuf value and custom field values to the schema builder. This method is not safe for concurrent use. The number of custom field values must match the number of custom fields. Supported types:

bool
[]byte
string
int64
float64

func (*Schema[T]) Clone added in v0.4.0

func (s *Schema[T]) Clone(mem memory.Allocator) (schema *Schema[T], err error)

Clone returns an identical bufarrow.Schema. Use in concurrency scenarios as Schema methods are not concurrency safe.

func (*Schema[T]) FieldNames added in v0.3.0

func (s *Schema[T]) FieldNames() []string

FieldNames returns top-level field names

func (*Schema[T]) NewNormalizerRecord added in v0.5.0

func (s *Schema[T]) NewNormalizerRecord() arrow.Record

NewNormalizerRecord returns buffered builder value as an arrow.Record. The builder is reset and can be reused to build new records.

func (*Schema[T]) NewRecord

func (s *Schema[T]) NewRecord() arrow.Record

NewRecord returns buffered builder value as an arrow.Record. The builder is reset and can be reused to build new records.

func (*Schema[T]) NormalizerBuilder added in v0.5.0

func (s *Schema[T]) NormalizerBuilder() *array.RecordBuilder

NormalizerBuilder returns the Normalizer's Arrow array.RecordBuilder, to be used to append normalized data.

func (*Schema[T]) Parquet

func (s *Schema[T]) Parquet() *schema.Schema

Parquet returns schema as parquet schema

func (*Schema[T]) Proto

func (s *Schema[T]) Proto(r arrow.Record, rows []int) []T

Proto decodes rows and returns them as proto messages.

func (*Schema[T]) ReadParquet

func (s *Schema[T]) ReadParquet(ctx context.Context, r parquet.ReaderAtSeeker, columns []int) (arrow.Record, error)

ReadParquet specified columns from parquet source r and returns an Arrow record. The returned record must be released by the caller.

func (*Schema[T]) Release

func (s *Schema[T]) Release()

Release releases the reference on the message builder

func (*Schema[T]) Schema

func (s *Schema[T]) Schema() *arrow.Schema

Schema returns schema as arrow schema

func (*Schema[T]) WriteParquet

func (s *Schema[T]) WriteParquet(w io.Writer) error

WriteParquet writes Parquet to an io.Writer

func (*Schema[T]) WriteParquetRecords

func (s *Schema[T]) WriteParquetRecords(w io.Writer, records ...arrow.Record) error

WriteParquetRecords write one or many Arrow records to parquet

Directories

Path Synopsis
gen

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL