Documentation
¶
Overview ¶
Package hre provides the regular expression dialect for Hangulize called HRE. HRE focuses on a very narrow usage.
The HRE syntax is based on RE2. But it tweaks the assertions. For example, in HRE ^ matches with every beginning of a word, not only the beginning of a string.
Lookaround is not supported in RE2 because there's no known efficient algorithm without backtracking to implement it. Anyways, HRE provides a simplified lookaround. The syntax {...} is for the positive lookaround and {~...} is for the negative lookaround. The lookaround is restircted to place at the leftmost or rightmost.
"foo{bar}" "{~bar}foo"
The time complexity of the negative lookbehind is O(n²) while other assertions can be done in O(n). The negative lookbehind should not be used for very long string.
HRE also provides macros and variables.
macros := map[string]string { "@": "<vowels>", } vars := map[string][]string { "abc": []string{"a", "b", "c"}, "vowels": []string{"a", "e", "i", "o", "u"}, } p, err := NewPattern("<abc>@", macros, vars) // The p matches with "ai", "be" or "ci".
Index ¶
- Constants
- func RegexpMaxWidth(re *syntax.Regexp) int
- type Pattern
- func (p *Pattern) Explain() string
- func (p *Pattern) Find(word string, n int) [][]int
- func (p *Pattern) Letters() []string
- func (p *Pattern) NegativeLookaroundWidths() (negAWidth int, negBWidth int)
- func (p *Pattern) Replace(word string, rpat *RPattern, n int) string
- func (p *Pattern) String() string
- type RPattern
Examples ¶
Constants ¶
const Version = "0.2.2"
Version is the version number of HRE package. The version follows Semantic Versioning 2.0.0.
Variables ¶
This section is empty.
Functions ¶
func RegexpMaxWidth ¶ added in v0.2.0
RegexpMaxWidth calculates the maximum width of a parsed Regexp pattern.
Types ¶
type Pattern ¶
type Pattern struct {
// contains filtered or unexported fields
}
Pattern represents an HRE (Hangulize-specific Regular Expression) pattern.
The transcription logic includes several rewriting rules. A rule has a Pattern and an RPattern. A sub-word which is matched with the Pattern, will be rewritten by the RPattern.
rewrite: "'" -> "" "^gli$" -> "li" "^glia$" -> "g.lia" "^glioma$" -> "g.lioma" "^gli{@}" -> "li" "{@}gli" -> "li" "gn{@}" -> "nJ" "gn" -> "n"
Some expressions in Pattern have special meaning:
"^" // start of chunk "^^" // start of string "$" // end of chunk "$$" // end of string "{...}" // zero-width match "{~...}" // zero-width negative match "<var>" // one of var values (defined in spec)
func NewPattern ¶
func NewPattern( expr string, macros map[string]string, vars map[string][]string, ) (*Pattern, error)
NewPattern compiles an HRE pattern from an expression.
func (*Pattern) Explain ¶
Explain shows the HRE expression with the underlying standard regexp patterns.
func (*Pattern) Find ¶
Find searches up to n matches in the word. If n is -1, it will search all matches. The result is an array of submatch locations.
Example ¶
p, _ := NewPattern("^he(l+o){,}", nil, nil) fmt.Println(p.Find("hello, helo, hellllo", -1))
Output: [[0 5 2 5] [7 11 9 11]]
func (*Pattern) Letters ¶
Letters returns the set of natural letters used in the expression in ascending order.
Example ¶
p, _ := NewPattern("^hello{,}", nil, nil) fmt.Println(p.Letters())
Output: [, e h l o]
func (*Pattern) NegativeLookaroundWidths ¶ added in v0.2.2
NegativeLookaroundWidths returns the potential widths of negative lookahead and negative lookbehind.
-1 means unlimited. An unlimited negative lookround width leads to a polynominal time to match. Otherwise, the match consumes only a linear time.
type RPattern ¶
type RPattern struct {
// contains filtered or unexported fields
}
RPattern is a dynamic replacement pattern.
Some expressions in RPattern have special meaning:
"{}" // zero-width space "<var>" // ...
"R" in the name means "replacement" or "right-side".
func NewRPattern ¶
NewRPattern parses the given expression and creates an RPattern.
func (*RPattern) Interpolate ¶
Interpolate determines the final replacement based on the matched Pattern.