Skip to content

Releases: drittich/SemanticSlicer

v1.8.0

02 Feb 19:01

Choose a tag to compare

New public APIs added:

  • SemanticSlicer.Slicer.SplitDocumentChunksRaw() - Split content directly without preprocessing
    • Treats content exactly as provided (no normalization/HTML stripping/whitespace collapsing/trimming)
    • Offsets relative to exact input string
    • Supports overlap, metadata, and chunk headers
  • SemanticSlicer.TextUtilities - Public preprocessing utilities
    • NormalizeLineEndings(string) - Converts all line endings to Unix-style
    • CollapseWhitespace(string) - Limits consecutive spaces/newlines to max 2
  • SemanticSlicer.Slicer.CountTokens() - Count tokens using configured encoder
  • SemanticSlicer.Slicer.PrepareContentForChunking() - Get preprocessed content separately

v1.7.0

10 Dec 20:22

Choose a tag to compare

  • Add support for custom tokenizers

v1.6.0

05 Dec 20:29

Choose a tag to compare

  • Adds optional chunk overlap parameter to improve context retention between chunks

v1.5.0

01 Dec 17:31

Choose a tag to compare

  • The library can still be directly used as a Nuget package
  • Can now also be run as command-line executable/daemon, service, or REST API
  • Supports Mac/Linux/Windows
  • See the README.md for details

v1.4.2

19 Jun 12:52

Choose a tag to compare

  • Performance improvements

v1.4.0

09 Nov 17:25

Choose a tag to compare

  • Update Tiktoken to v2

v1.3.4

09 Nov 15:25

Choose a tag to compare

update README