lucene_parser

package module
v0.5.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 25, 2024 License: MIT Imports: 6 Imported by: 3

README

lucene_parser

Introduction

This package can parse lucene query used by ES (ElasticSearch), this package is pure go package, lex and yacc in this package doesn't follow standard lucene parser, and this package is used to convert lucene to other special DSL(domain special language) mainly, such as LuceneToSQL / LuceneToEQL (Used in ES). If you want to parse standard lucene query, you can use sub package standard in this repository

Features

  • 1、support phrase term query, for instance x:"foo bar".
  • 2、support regexp term query, for instance x:/\d+\\.?\d+/.
  • 3、support bool operator (i.e. AND, OR, NOT, &&, ||, !) join sub query, for instance x:1 AND y:2, x:1 || y:2, we also support lower case bool operator (i.e. and, or, not).
  • 4、support bound range query, for instance x:[1 TO 2], x:[1 TO 2}.
  • 5、support side range query, for instance x:>1 , x:>=1 , x:<1 , x:<=1.
  • 6、support boost modifier, for instance x:1^2 , x:"dsada 8908"^3.
  • 7、support fuzzy query with default fuzziness or specific fuzziness, for instance x:for~1.0, x;foo~.
  • 8、support proximity query, for instance x:"foo bar"~2.
  • 9、support term group query, for instance x:(foo OR bar), x:(>1 && <2).
  • 10、support not operator be used with just one term (i.g. not x:y), this feature is differs from the definition of NOT in standard lucene syntax.
  • 11、support ignore AND operator when it behind with NOT operator (i.e. you can write x:y and not x2:y2 as x:y not x2:y2).
  • 12、support prefix operator ("+", "-", "!") is ahead of field term, for instance -foo:bar +foo1:bar1 foo2:bar2 !foo3:bar3.

Limitations

  • 1、only support lucene query with field name, instead of query without field name (i.e. this project can't parse query like foo OR bar, foo AND bar, but can parse foo:bar, foo:(bar1 AND bar2)).
  • 2、prefix and bool operator cannot be supported at the same time. on the other hand, you can't parse query which consist bool operator (AND/OR/OR/NOT/&&/||/!) and prefix operator (+/-) at same time.
  • 3、don't support fuzziness of similarity (float number between 0 and 1), instead of fuzziness of maximum edit distance (i.e. Levenshtein Edit Distance — the number of one character changes that need to be made to one string to make it the same as another string.).
  • 4、don't support space is regard as OR operator (i.g. x1:y1 x2:y2). (I don't know how to handle expression which includes both or token and space token (i.g. x y or z) . If you have good idea, please contact me)

Note

  • 1、If similarity is not specified in the fuzzy query, and you will get -1 by invoking function Fuzziness of term, which is allow the user to customize the default fuzziness or parameter of AUTO fuzziness. For example, when -1 is returned, you can specify the maximum and minimum term length of the AUTO parameter according to the fuzziness definition.

  • 2、according to definition of fuzziness, specific fuzziness must to be integer. if you input float fuzziness, we will round this number. For example: input query x:foo~1.2, you will get fuzziness 1; input query x:foo~1.6 you will get fuzziness 2.

  • 3、if you input boost symbol but value, you will get 1.0 by invoking function Boost of term. for instance query foo:bar^.

Usage

basic lucene parser
package main

import (
    "fmt"
    "github.com/zhuliquan/lucene_parser"
)

func main() {
    if lucene, err := lucene_parser.ParseLucene("x:foo AND y:bar"); err != nil {
        panic(err)
    } else {
        fmt.Println(lucene)
    }
}
prefix operator lucene parser

You also can parse lucene query with prefix operator by using prefix package, as below:

package main

import (
    "fmt"
    "github.com/zhuliquan/lucene_parser/prefix"
)

func main() {
    if lucene, err := prefix.ParseLucene("x:foo AND y:bar"); err != nil {
        panic(err)
    } else {
        fmt.Println(lucene)
    }
}
standard syntax lucene parser

You also can parse lucene which follows standard syntax by using standard package, as below:

package main

import (
    "fmt"
    "github.com/zhuliquan/lucene_parser/standard"
)

func main() {
    if lucene, err := standard.ParseLucene("foo^10 bar AND yacc"); err != nil {
        panic(err)
    } else {
        fmt.Println(lucene)
    }
}

EBNF of Lucene

lucene parser will convert string of lucene query to ast, according to EBNF of lucene. EBNF of lucene is below.

(* lucene expression *)
lucene = or_query, { or_sym_query } ;
or_sym_query  = or_symbol, or_query ;
or_query      = and_query, { and_sym_query } ;
and_sym_query =  ( and_symbol | whitespace, not_symbol ), and_query ;
and_query     = [ not_symbol ], ( '(', [ whitespace ], lucene, [ whitespace ], ')' | ( field, term) ) ;

(* field and term *)
field_char       = identifier | '-' | number | dot ;
field            = field_char, { field_char }, ':' ;
term = range_term | fuzzy_term | regexp_term | term_group ;

(* term group *)
term_group = '(', logic_term_group, ')', [ boost_modifier ] ;
logic_term_group   = or_term_group, { or_sym_term_group } ;
or_sym_term_group  = ( or_symbol | whitespace, not_symbol ), or_term_group ;
or_term_group      = and_term_group, { and_sym_term_group } ;
and_sym_term_group = and_symbol, and_term_group ;
and_term_group     = [ not_symbol ], ( '(', [whitespace] , logic_term_group, [whitespace], ')'  | group_elem );
group_elem = simple_term | phrase_term | single_range_term | double_range_term ;

(* compound term *)
range_term = ( double_range_term | single_range_term ), [ boost_modifier ] ;
fuzzy_term = ( simple_term | phrase_term ), [ fuzzy_modifier | boost_modifier ] ;

(* simple term *)
double_range_term = ('[' | '{' ), [whitespace], range_value, whitespace, 'TO', whitespace, range_value, [whitespace], ( ']' | '}' ) ;
single_range_term = [ ('>' | '<'), ['='] ], range_value ;
range_value       = phrase_term | (identifier | number | '.' | '+' | '-' | '|' | '/' | ':') { (identifier | '+' | '-' | dot | ) } | '*' ;
simple_term      = (identifier | number | '+' | '-'), { simple_term_char } ;
phrase_term      = quote, phrase_term_char, {phrase_term_char}, quote ;
regexp_term      = '/', regexp_term_char, { regexp_term_char }, '/' ;
phrase_term_char = ( -quote | '\\', quote ) ;
simple_term_char = identifier | number | dot | '?' | '*' | '-' | '+' | '|' | '/' ;
regexp_term_char = ( -'/' | '\\', '/') ;

(* bool operator *)
and_symbol = whitespace , (( '&', '&' ) | 'AND' | 'and' ), whitespace ;
or_symbol  = whitespace , (( '|', '|' ) | 'OR' | 'or' ), whitespace ;
not_symbol = ('!' , [whitespace] ) | (('NOT' | 'not' ), whitespace ;

(* modifier *)
fuzzy_modifier = '~', [ float ] ;
boost_modifier = '^', [ float ] ;

(* basic element *)
identifier = ident_char , { ident_char } ;
number     = digit , { digit } ;
float      = digit , { digit }, [ dot, digit, { digit } ] ;
escape     = '-' | '+' | '!' | '&' | '|' | '?' | '*' | '\\' | '(' | ')' | '[' | ']' | '{' | '}' | '/' | '<' | '>' | '=' | '~' | '^'  | ':' ;
compare    = ('<' | '>'),[ '=' ] ;
ident_char = ( -( escape | digit | dot | whitespace_char | quote ) | '\\' , (escape | whitespace_char) ) ;
digit      = '0' ... '9' ;
whitespace = whitespace_char , { whitespace_char };
whitespace_char = '\t' | '\r' | '\f' | ' ' ;
quote      = '"' ;
eol        = '\n' ;
dot        = '.' ;

Documentation

Index

Constants

This section is empty.

Variables

View Source
var LuceneParser *participle.Parser

Functions

This section is empty.

Types

type AnSQuery

type AnSQuery struct {
	AndSymbol *op.AndSymbol `parser:"( @@ " json:"and_symbol"`
	NotSymbol *op.NotSymbol `parser:"| WHITESPACE+ @@)" json:"not_symbol"`
	AndQuery  *AndQuery     `parser:"@@" json:"and_query"`
}

AnsQuery: AnSQuery (and symbol query) is AndQuery which be prefix with and symbol ('AND' / 'and' / '&&' )

func (*AnSQuery) GetQueryType

func (q *AnSQuery) GetQueryType() QueryType

func (*AnSQuery) String

func (q *AnSQuery) String() string

type AndQuery

type AndQuery struct {
	NotSymbol  *op.NotSymbol `parser:"  @@?" json:"not_symbol"`
	ParenQuery *ParenQuery   `parser:"( @@ " json:"paren_query"`
	FieldQuery *FieldQuery   `parser:"| @@)" json:"field_query"`
}

AndQuery: consist of not query and paren query and field_query

func (*AndQuery) GetQueryType

func (q *AndQuery) GetQueryType() QueryType

func (*AndQuery) String

func (q *AndQuery) String() string

type FieldQuery

type FieldQuery struct {
	Field *tm.Field `parser:"@@ COLON" json:"field"`
	Term  *tm.Term  `parser:"@@" json:"term"`
}

FieldQuery: consist of field and term

func (*FieldQuery) GetQueryType

func (q *FieldQuery) GetQueryType() QueryType

func (*FieldQuery) String

func (q *FieldQuery) String() string

type Lucene

type Lucene struct {
	OrQuery *OrQuery   `parser:"@@" json:"or_query"`
	OSQuery []*OSQuery `parser:"@@*" json:"or_sym_query"`
}

Lucene: consist of or query and or symbol query

func ParseLucene

func ParseLucene(queryString string) (*Lucene, error)

ParseLucene: parse query to Lucene struct

func TermGroupToLucene added in v0.3.0

func TermGroupToLucene(field *term.Field, termGroup *term.TermGroup) *Lucene

func (*Lucene) GetQueryType

func (q *Lucene) GetQueryType() QueryType

func (*Lucene) String

func (q *Lucene) String() string

type OSQuery

type OSQuery struct {
	OrSymbol *op.OrSymbol `parser:"@@" json:"or_symbol"`
	OrQuery  *OrQuery     `parser:"@@" json:"or_query"`
}

OSQuery: OSQuery (or symbol query) is or query which is prefix with or symbol

func (*OSQuery) GetQueryType

func (q *OSQuery) GetQueryType() QueryType

func (*OSQuery) String

func (q *OSQuery) String() string

type OrQuery

type OrQuery struct {
	AndQuery *AndQuery   `parser:"@@" json:"and_query"`
	AnSQuery []*AnSQuery `parser:"@@*" json:"and_sym_query" `
}

OrQuery: consist of and query and and_symbol_query

func (*OrQuery) GetQueryType

func (q *OrQuery) GetQueryType() QueryType

func (*OrQuery) String

func (q *OrQuery) String() string

type ParenQuery

type ParenQuery struct {
	SubQuery *Lucene `parser:"LPAREN WHITESPACE* @@ WHITESPACE* RPAREN" json:"sub_query"`
}

ParenQuery: lucene query is surround with paren

func (*ParenQuery) GetQueryType

func (q *ParenQuery) GetQueryType() QueryType

func (*ParenQuery) String

func (q *ParenQuery) String() string

type Query

type Query interface {
	String() string
	GetQueryType() QueryType
}

type QueryType

type QueryType uint32
const (
	LUCENE_QUERY QueryType = iota
	OR_QUERY
	OS_QUERY
	AND_QUERY
	ANS_QUERY
	NOT_QUERY
	FIELD_QUERY
	PAREN_QUERY
)

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL