Skip to content

Search and LINQ

Tom Laird-McConnell edited this page May 11, 2026 · 2 revisions

Search and LINQ

LottaDB stores two representations of every object: one in table storage (the source of truth) and one in a Lucene index (for fast search). GetManyAsync() queries table storage and Search() queries the Lucene index.

Search()

Full-text search against the Lucene index. Returns IQueryable<T> with full LINQ support. Each returned object is annotated with its ETag and key via Object Metadata.

// Free-text query
var results = db.Search<Note>("lucene").ToList();

// LINQ predicate
var results = db.Search<Note>(n => n.Content.Contains("lucene")).ToList();

// Lucene query syntax with field qualifier
var results = db.Search<Note>("AuthorId:alice AND Content:lucene").ToList();

GetManyAsync (Table Storage)

Filters on [Queryable] properties are executed by table storage server-side.

// All actors
var all = db.GetManyAsync<Actor>().ToList();

// Server-side filter (AuthorId is [Queryable])
var aliceNotes = db.GetManyAsync<Note>(n => n.AuthorId == "alice")
    .ToList();

// Polymorphic query -- returns Person and Employee
var people = db.GetManyAsync<Person>().ToList();

Search (Lucene)

Filters on [Queryable] properties are executed against the Lucene index, supporting full-text search.

// Full-text search
var results = db.Search<Note>()
    .Where(n => n.Content.Contains("lucene"))
    .ToList();

// Exact match on NotAnalyzed field
var active = db.Search<Note>()
    .Where(n => n.AuthorId == "alice")
    .ToList();

// Free-text query
var results = db.Search<Note>("foo bar").ToList();

// Lucene query syntax
var results = db.Search<Note>("Title:foo AND bar").ToList();

Supported LINQ operators

Search<T>() returns an IQueryable<T> that translates standard LINQ operators into Lucene queries. Most of LINQ-to-objects works; the parts that don't throw at translation time with a clear message.

LINQ Lucene equivalent
Where(d => d.Field == value) TermQuery
Where(d => d.Field != value) Boolean MUST_NOT
Where(d => d.Field.StartsWith("foo")) PrefixQuery
Where(d => d.Field.EndsWith("foo")) WildcardQuery (*foo)
Where(d => d.Field.Contains("foo")) WildcardQuery (*foo*)
Where(d => d.Numeric > 5) etc. NumericRangeQuery
Where(d => d.Field == null) Negated existence query
&&, ||, ! Boolean MUST / SHOULD / MUST_NOT
OrderBy, OrderByDescending, ThenBy, ThenByDescending Multi-field Sort
Skip(n).Take(m) IndexSearcher.Search(query, n + m) window
First / FirstOrDefault / Single / SingleOrDefault Take(1)
Any() / Any(predicate) TotalHits > 0
Count() / LongCount() TotalHits
Min / Max Sort ascending/descending + Take(1)
Where(d => d.Field.Query("text*")) Parsed Lucene query on a specific field
Where(d => d.Query("text*")) Parsed Lucene query on default search property
Where(d => d.Field.Similar("text")) VectorQuery on field (KNN or cosine similarity)
Where(d => d.Similar("text")) VectorQuery on default search property
Select(d => new { ... }) Document projection (read only the fields you reference)

Collection Contains ("IN" queries)

The LINQ collection.Contains(field) pattern translates to an efficient TermsFilter -- a single-pass filter that matches documents whose field value appears in the collection. This is the Lucene equivalent of SQL's IN operator.

var allowedCategories = new[] { "tech", "science", "health" };

var articles = db.Search<Article>()
    .Where(a => allowedCategories.Contains(a.Category))
    .ToList();

This produces ConstantScoreQuery(TermsFilter([Category:tech, Category:science, Category:health])) -- much more efficient than chaining || equality checks, especially for large collections. Works with arrays, lists, and any IEnumerable<T>, including captured variables.

You can combine it with other predicates:

var results = db.Search<Article>()
    .Where(a => allowedCategories.Contains(a.Category) && a.WordCount > 500)
    .ToList();

An empty collection matches nothing (returns zero results).

Joins

LINQ join syntax works across document types. The library materializes both sides via separate Lucene searches and joins them in memory. A semi-join optimization uses TermsFilter to push the outer key values into the inner query, so only matching inner documents are fetched.

// Single join
var results = (
    from note in db.Search<Note>()
    join actor in db.Search<Actor>() on note.AuthorId equals actor.Username
    select new { note.Content, actor.DisplayName }
).ToList();

Multiple joins chain naturally:

var results = (
    from article in db.Search<Article>()
    join author in db.Search<Actor>() on article.AuthorId equals author.Username
    join category in db.Search<Category>() on article.CategoryId equals category.Id
    select new { article.Title, author.DisplayName, category.Label }
).ToList();

Where clauses on the outer side are pushed into Lucene before the join:

var results = (
    from note in db.Search<Note>().Where(n => n.Content.Contains("lucene"))
    join actor in db.Search<Actor>() on note.AuthorId equals actor.Username
    select new { note.Content, actor.DisplayName }
).ToList();

Method syntax also works:

var results = db.Search<Note>()
    .Join(
        db.Search<Actor>(),
        note => note.AuthorId,
        actor => actor.Username,
        (note, actor) => new { note.Content, actor.DisplayName })
    .ToList();

Clone this wiki locally