go-sentex
Pure Go sentence embeddings. Zero CGo, zero external system dependencies,
one go get.
go-sentex wraps the sentence-transformers/all-MiniLM-L6-v2 transformer
behind a tiny API: give it a string, get back a 384-dimensional unit-norm
vector suitable for semantic search, RAG, clustering, or deduplication.
Why it exists
Every other sentence-embedding option in the Go ecosystem requires CGo
(ONNX Runtime, fastembed-go, all-minilm-l6-v2-go), a Python sidecar,
or settles for word-vector approximations. go-sentex fills the gap: a
single dependency that builds with CGO_ENABLED=0, cross-compiles
cleanly, and needs no C toolchain on the host.
Install
go get github.com/edgetools/go-sentex
No system libraries. No apt install. Builds with CGO_ENABLED=0.
Quick start
package main
import (
"fmt"
"log"
"github.com/edgetools/go-sentex"
)
func main() {
model, err := sentex.LoadModel()
if err != nil {
log.Fatal(err)
}
vec, err := model.Embed("deployment strategy")
if err != nil {
log.Fatal(err)
}
fmt.Println(len(vec), "dims, first value:", vec[0])
vecs, err := model.EmbedBatch([]string{"text one", "text two"})
if err != nil {
log.Fatal(err)
}
fmt.Println(len(vecs), "vectors of", model.Dimensions(), "dims")
}
LoadModel loads model.onnx (~86MB) and tokenizer.json (~700KB) from
the local HuggingFace Hub cache. If the files are already there — for
example because you've used this model before from Python's
sentence-transformers — no download happens. Otherwise they are fetched
from HuggingFace Hub once and reused on every subsequent call.
API
sentex.LoadModel() (*Model, error) — load (and download if needed).
model.Embed(text string) ([]float32, error) — one vector, length 384.
model.EmbedBatch(texts []string) ([][]float32, error) — one vector per
input, preserving order. Empty strings yield all-zero vectors.
model.Dimensions() int — always 384.
Output vectors are L2-normalized, so cosine similarity reduces to a dot
product.
Model
|
|
| Model |
sentence-transformers/all-MiniLM-L6-v2 |
| Format |
ONNX (full precision) |
| Output |
384-dimensional []float32, unit norm |
| Max input |
256 tokens (longer inputs truncated) |
| Download size |
~87MB, only if not already in the HF cache |
Cache location
Model files are stored in the standard HuggingFace Hub cache layout, so
if you've already pulled this model with Python's sentence-transformers
or huggingface_hub, go-sentex picks it up with zero download.
HF_HOME is respected if set. Otherwise the base directory comes from
os.UserCacheDir:
| OS |
Default path |
| Linux |
~/.cache/huggingface |
| macOS |
~/Library/Caches/huggingface |
| Windows |
%LocalAppData%\huggingface |
Model files land under
<cache>/hub/models--sentence-transformers--all-MiniLM-L6-v2/.
Requirements
- Go 1.25 or newer (see
go.mod).
- Network access the first time the model is fetched into the HF cache.
No network is needed once the cache is populated (including when
another tool like Python's
sentence-transformers populated it).
- Works with
CGO_ENABLED=0.
Limitations
- Inputs longer than 256 tokens are truncated by the tokenizer.
- The SimpleGo inference backend is ~5× slower than XLA, which is
irrelevant for typical query-scale embedding but worth knowing if you
need to embed millions of documents in-process.
- If the HF cache is cold, the first
LoadModel call downloads ~87MB
over the network before returning.
Architecture
See DESIGN.md for the inference pipeline, concurrency model,
cache layout, and choice of underlying libraries.
License
MIT. See LICENSE.