glob

package module
v1.0.1-0...-5871cbd Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 25, 2019 License: BSD-3-Clause Imports: 3 Imported by: 11

README

Welcome To Glob!

Introduction

Glob is a Golang package that adds support for UNIX shell-like pattern matching for strings, commonly known as 'globbing'.

It is released under the 3-clause New BSD license. See LICENSE.md for details.

import glob "github.com/ganbarodigital/go_glob"

g := NewGlob("*.go")
fmt.Sprintf(g.Match("parser.go")) // prints true

Table of Contents

Why Use Glob?

Golang already has filepath.Match(). Why do we need another globbing package?

Who Is Glob For?

Glob is for anyone who wants globbing support to be as close to UNIX shell behaviour as possible.

We've built Glob for our Scriptish and ShellExpand projects, when we realised that Golang's filepath.Match() couldn't do everything we needed.

What Are The Differences Between This Package And The Golang Filepath.Match()?

There's two important differences:

  • longest match vs shortest match behaviour of the * wildcard
  • being able to match prefix or suffix vs matching whole string

In UNIX shell scripts, you can choose whether * matches as few characters as possible, or whether it matches as many characters as possible.

echo ${PARAM1#12*4}  # matches as few as possible
echo ${PARAM1##12*4} # matches as many as possible

Golang's filepath.Match() always matches as few characters as possible. It isn't possible to tailor its behaviour to suit.

Also, Golang's filepath.Match() only supports matching the whole string. While you can use the * wildcard to simulate prefix/suffix matching, filepath.Match() can't tell you how long the matching prefix or suffix is.

If you only need to know that your input string matches your globbing pattern, then filepath.Match() will be faster and a better choice.

How Does It Work?

Getting Started

Import Glob into your Golang code:


import glob "github.com/ganbarodigital/go_glob"

Create a Glob struct, by calling glob.NewGlob() with your globbing pattern:

myGlob := glob.NewGlob("abc*.go")

Once you have your Glob struct, use its methods to glob your strings:

// equivalent of calling `filepath.Match()`
success, err := myGlob.Match(myInput)

// find matching prefix
pos, success, err := myGlob.MatchShortestPrefix(myInput)
if success {
    prefix := myInput[:pos]
}
What Is A Glob Pattern?

A glob pattern (or just pattern for short) can be used as a filter, and/or as a search term.

Historically, it was used in UNIX shells to find a list of matching filenames using a simple set of wildcards. (This is known as pathname expansion today.) As UNIX shells became more powerful, they added the ability to manipulate the contents of strings. Instead of inventing new syntax, they took the existing pathname expansion support, and reused (most!) of it against arbitrary strings too.

It's such an integral part of using UNIX systems that many UNIX services and daemons have added their own support for globbing over the years.

What Does A Glob Pattern Look Like?

A glob pattern is made from:

  • ? is a wildcard, that matches exactly one character
  • * is a wildcard, that matches zero or more characters. Sometimes it can be greedy (match as many characters as possible), and sometimes it can be ungreedy (match as few characters as possible). It all depends on which match method you are calling.
  • [...] matches any one of the characters inside the [ and ].
  • [^...] matches any one of the characters that are not inside the [ and ]
  • [lo-hi] matches any one of the characters defined by the range lo-hi
  • \ escapes the following character. Use this to tell Glob to treat characters like * as a normal char and not as a wildcard.

Any other characters in the pattern are treated as a requirement to match exactly that character.

What About Extended Globbing, Globstars, and GLOB_IGNORE?

Extended globbing adds support for pattern lists and alternates. It's not supported in this release. We'd like to support it in the future, but no promises!

Globstars are the ** and **/ wildcards. They're used in pathname expansion to match all files, all directories, and sub-directories. Because Glob currently only deals with arbitrary strings, it doesn't make sense to implement globstar support atm.

GLOB_IGNORE is an environment variable used in pathname expansion as a second filter against filepaths that have matched the globbing pattern. Because Glob currently only deals with arbitrary strings, it doesn't make sense to implement GLOB_IGNORE support atm.

What Happens When A Match Method Is Called?

This information is mostly to help you if you run into bugs in the Glob package. Try not to rely on it to make your code work. A future version of Glob may implement globbing in a different way.

Whenever you call any of the match methods, here's what happens:

  • we convert the glob pattern into a compiled Golang regex. The regex will be different for each of the match methods.
  • we use the regex to discover if the pattern matches your input string
  • where necessary, we do some additional work to find out which string slice index to return back to you

If we have already compiled a Golang regex for your glob and matcher method, we reuse it instead of compiling it again. This helps performance (for example) if you're globbing against a list of filenames - any situation where you'd be calling the same match method multiple times.

Golang's regex engine uses what's called leftmost-match semantics. Most of the time, that's exactly the behaviour you want ... unless you're after the shortest suffix that matches your pattern. That's where we have to do some additional processing of the regex result to find the shortest match of your pattern.

How Are Errors Handled?

Errors can only occur:

  • if you use a pattern that is somehow invalid, for example abc[
  • if you use a pattern that isn't correctly understood (yet) by Glob

All of the match methods return an error back to you.

What Do I Do If I Find A Valid Pattern That Glob Errors On / Returns The Wrong Result For?

We've got a comprehensive test suite, which is kept up to date. Even so, there could be glob patterns that should work, but don't.

  • It could be that our glob2regex process doesn't correctly translate the pattern (e.g. missing escaping)
  • It could be that the resulting regex doesn't behave the way the glob pattern does in a real UNIX shell

When you run into a problem, here's what to do:

  1. create a small example bash shell script that demonstrates the correct behaviour
  2. please open an issue here on GitHub
  3. add your example shell script, and details of what Glob is doing, to the issue

We're aiming for 100% compatibility with UNIX shell globbing behaviour when applied to arbitrary strings.

We can't accept requests to make Glob behave differently to how globbing works within a UNIX shell.

Creating A Glob

NewGlob()

To create a glob, call glob.NewGlob() with your globbing pattern:


myGlob := NewGlob(myPattern)

This gives you a Glob that you can reuse as many times as you want.

Match Methods

Use one of the following match methods to perform the actual globbing.

Match()
func (g *Glob) Match (input string) (bool, error)

Match() determines if the whole input string matches the given glob pattern.

Returns:

  • true if the whole input string matches the Glob pattern, false otherwise
  • an error if the given Glob pattern cannot be compiled into a regex

Example:

myGlob := NewGlob("*.go")
success, err := myGlob.Match("match.go")
MatchShortestPrefix()
func (g *Glob) MatchShortestPrefix (input string) (int, bool, error)

MatchShortestPrefix() returns the prefix of input that matches the glob pattern. It treats '*' as matching the minimum number of characters.

Returns:

  • the end of the prefix that matches the glob pattern, suitable for you to use in a string slice
  • true if the pattern matched; false otherwise
  • an error if the given Glob pattern cannot be compiled into a regex

Example:

input := "path/to/folder"

myGlob := NewGlob("*/")
pos, success, err := myGlob.MatchShortestPrefix(input)
if err != nil {
    // ... handle err first
}
if !success {
    // input string did not match pattern
}

// if we get here, the pattern did match
//
// in this example, the prefix will be 'path/'
prefix := input[:pos]
MatchLongestPrefix()
func (g *Glob) MatchLongestPrefix (input string) (int, bool, error)

MatchLongestPrefix() returns the prefix of input that matches the glob pattern. It treats '*' as matching the maximum number of characters.

Returns:

  • the end of the prefix that matches the glob pattern, suitable for you to use in a string slice
  • true if the pattern matched; false otherwise
  • an error if the given Glob pattern cannot be compiled into a regex

Example:

input := "path/to/folder"

myGlob := NewGlob("*/")
pos, success, err := myGlob.MatchLongestPrefix(input)
if err != nil {
    // ... handle err first
}
if !success {
    // input string did not match pattern
}

// if we get here, the pattern did match
//
// in this example, the prefix will be 'path/to/'
prefix := input[:pos]
MatchShortestSuffix()
func (g *Glob) MatchShortestSuffix (input string) (int, bool, error)

MatchShortestSuffix() returns the suffix of input that matches the glob pattern. It treats '*' as matching the minimum number of characters.

Returns:

  • the start of the suffix that matches the glob pattern, suitable for you to use in a string slice
  • true if the pattern matched; false otherwise
  • an error if the given Glob pattern cannot be compiled into a regex

BE AWARE that the returned position _can be equal to len(input). This happens when the pattern legitimately matches an empty suffix.

Example:

input := "path/to/folder"

myGlob := NewGlob("/*")
pos, success, err := myGlob.MatchShortestSuffix(input)
if err != nil {
    // ... handle err first
}
if !success {
    // input string did not match pattern
}

// if we get here, the pattern did match
//
// in this example, the suffix will be '/folder'
suffix := ""
if pos <len(input) {
    suffix := input[pos:]
}
MatchLongestSuffix()
func (g *Glob) MatchLongestSuffix (input string) (int, bool, error)

MatchLongestSuffix() returns the suffix of input that matches the glob pattern. It treats '*' as matching the maximum number of characters.

Returns:

  • the start of the suffix that matches the glob pattern, suitable for you to use in a string slice
  • true if the pattern matched; false otherwise
  • an error if the given Glob pattern cannot be compiled into a regex

BE AWARE that the returned position _can be equal to len(input). This happens when the pattern legitimately matches an empty suffix.

Example:

input := "path/to/folder"

myGlob := NewGlob("/*")
pos, success, err := myGlob.MatchLongestSuffix(input)
if err != nil {
    // ... handle err first
}
if !success {
    // input string did not match pattern
}

// if we get here, the pattern did match
//
// in this example, the suffix will be '/to/folder'
suffix := ""
if pos <len(input) {
    suffix := input[pos:]
}

Other Methods

Pattern()

Use Pattern() to get the original pattern out of a prepared Glob (e.g. for logging / debugging purposes):

myGlob := NewGlob("/*")
fmt.Printf("glob pattern is: %s\n", myGlob.Pattern())

Documentation

Index

Constants

View Source
const (
	// GlobShortestMatch makes wildcards match the minimum number of
	// characters possible
	GlobShortestMatch = 0
	// GlobLongestMatch makes wildcards match the maximum number of
	// characters possible
	GlobLongestMatch = 1
	// GlobAnchorPrefix makes the glob pattern match from the start
	// of your input string
	GlobAnchorPrefix = 2
	// GlobAnchorSuffix makes the glob pattern match up to the end
	// of your input string
	GlobAnchorSuffix = 4
)
View Source
const GlobMatchWholeString = GlobAnchorPrefix + GlobAnchorSuffix

GlobMatchWholeString makes the glob pattern apply to all of your input string

Variables

This section is empty.

Functions

func Match

func Match(input, pattern string) (bool, error)

Match determines if the whole input string matches the given glob pattern.

Pattern can be built from:

  • Matches zero or more characters ? Matches exactly one character [...] Matches any one character within the brackets

any other character matches itself

Intent is to be 100% compatible with UNIX shell globbing. Please open a GitHub issue if you find any test cases that show up compatibility problems.

func MatchLongestPrefix

func MatchLongestPrefix(input, pattern string) (int, bool, error)

MatchLongestPrefix returns the prefix of input that matches the glob pattern. It treats '*' as matching maximum number of characters.

Pattern can be built from:

  • Matches zero or more characters ? Matches exactly one character [...] Matches any one character within the brackets

any other character matches itself

Intent is to be 100% compatible with UNIX shell globbing. Please open a GitHub issue if you find any test cases that show up compatibility problems.

Returns - length of prefix that matches, or zero otherwise - `true` if the input has prefix tath matched the pattern

func MatchLongestSuffix

func MatchLongestSuffix(input, pattern string) (int, bool, error)

MatchLongestSuffix returns the suffix of input that matches the glob pattern. It treats '*' as matching maximum number of characters.

Pattern can be built from:

  • Matches zero or more characters ? Matches exactly one character [...] Matches any one character within the brackets

any other character matches itself

Intent is to be 100% compatible with UNIX shell globbing. Please open a GitHub issue if you find any test cases that show up compatibility problems.

Returns - start of suffix that matches (can be len(input)), or zero otherwise - `true` if the input has suffix that matched the pattern

func MatchPrefix

func MatchPrefix(input, pattern string, flags int) (int, bool, error)

MatchPrefix returns the prefix of input that matches the glob pattern

Pattern can be built from:

  • Matches zero or more characters ? Matches exactly one character [...] Matches any one character within the brackets

any other character matches itself

Intent is to be 100% compatible with UNIX shell globbing. Please open a GitHub issue if you find any test cases that show up compatibility problems.

flags can be: - GlobShortestMatch (default) - GlobLongestMatch

Returns - length of prefix that matches, or zero otherwise - `true` if the input has prefix that matched the pattern

func MatchShortestPrefix

func MatchShortestPrefix(input, pattern string) (int, bool, error)

MatchShortestPrefix returns the prefix of input that matches the glob pattern. It treats '*' as matching minimum number of characters.

Pattern can be built from:

  • Matches zero or more characters ? Matches exactly one character [...] Matches any one character within the brackets

any other character matches itself

Intent is to be 100% compatible with UNIX shell globbing. Please open a GitHub issue if you find any test cases that show up compatibility problems.

Returns - length of prefix that matches, or zero otherwise - `true` if the input has prefix that matched the pattern

func MatchShortestSuffix

func MatchShortestSuffix(input, pattern string) (int, bool, error)

MatchShortestSuffix returns the suffix of input that matches the glob pattern. It treats '*' as matching minimum number of characters.

Pattern can be built from:

  • Matches zero or more characters ? Matches exactly one character [...] Matches any one character within the brackets

any other character matches itself

Intent is to be 100% compatible with UNIX shell globbing. Please open a GitHub issue if you find any test cases that show up compatibility problems.

It is computationally more expensive than the other MatchXXX() functions, due to Golang's leftmost-match mechanics (which we have to compensate for).

Returns - start of suffix that matches (can be len(input)), or zero otherwise - `true` if the input has suffix that matched the pattern

func MatchSuffix

func MatchSuffix(input, pattern string, flags int) (int, bool, error)

MatchSuffix returns the start of input that matches the glob pattern.

Pattern can be built from:

  • Matches zero or more characters ? Matches exactly one character [...] Matches any one character within the brackets

any other character matches itself

Intent is to be 100% compatible with UNIX shell globbing. Please open a GitHub issue if you find any test cases that show up compatibility problems.

flags can be: - GlobShortestMatch (default) - GlobLongestMatch

Returns - start of suffix that matches (can be len(input)), or zero otherwise - `true` if the input has suffix that matched the pattern

Types

type Glob

type Glob struct {
	// contains filtered or unexported fields
}

Glob is a compiled Glob expression, which can safely be reused.

Call `NewGlob()` to create your Glob structure

func NewGlob

func NewGlob(pattern string, options ...func(*Glob)) *Glob

NewGlob turns your pattern into a reusable Glob

func (*Glob) Match

func (g *Glob) Match(input string) (bool, error)

Match determines if the whole input string matches the given glob pattern.

Intent is to be 100% compatible with UNIX shell globbing. Please open a GitHub issue if you find any test cases that show up compatibility problems.

func (*Glob) MatchLongestPrefix

func (g *Glob) MatchLongestPrefix(input string) (int, bool, error)

MatchLongestPrefix returns the prefix of input that matches the glob pattern. It treats '*' as matching maximum number of characters.

Intent is to be 100% compatible with UNIX shell globbing. Please open a GitHub issue if you find any test cases that show up compatibility problems.

Returns - length of prefix that matches, or zero otherwise - `true` if the input has prefix tath matched the pattern

func (*Glob) MatchLongestSuffix

func (g *Glob) MatchLongestSuffix(input string) (int, bool, error)

MatchLongestSuffix returns the suffix of input that matches the glob pattern. It treats '*' as matching maximum number of characters.

Intent is to be 100% compatible with UNIX shell globbing. Please open a GitHub issue if you find any test cases that show up compatibility problems.

Returns - start of suffix that matches (can be len(input)), or zero otherwise - `true` if the input has suffix that matched the pattern

func (*Glob) MatchShortestPrefix

func (g *Glob) MatchShortestPrefix(input string) (int, bool, error)

MatchShortestPrefix returns the prefix of input that matches the glob pattern. It treats '*' as matching minimum number of characters.

Intent is to be 100% compatible with UNIX shell globbing. Please open a GitHub issue if you find any test cases that show up compatibility problems.

Returns - length of prefix that matches, or zero otherwise - `true` if the input has prefix that matched the pattern

func (*Glob) MatchShortestSuffix

func (g *Glob) MatchShortestSuffix(input string) (int, bool, error)

MatchShortestSuffix returns the suffix of input that matches the glob pattern. It treats '*' as matching minimum number of characters.

Intent is to be 100% compatible with UNIX shell globbing. Please open a GitHub issue if you find any test cases that show up compatibility problems.

It is computationally more expensive than the other MatchXXX() functions, due to Golang's leftmost-match mechanics (which we have to compensate for).

Returns - start of suffix that matches (can be len(input)), or zero otherwise - `true` if the input has suffix that matched the pattern

func (Glob) Pattern

func (g Glob) Pattern() string

Pattern returns a copy of the original glob pattern that was compiled into the given Glob

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL