Documentation
¶
Overview ¶
Package grammar implements data structures and algorithms for formal grammars.
In the theory of languages, a formal grammar is a set of rules or productions that define the structure of strings within a formal language. A formal grammar defines which strings in a formal language's alphabet are syntactically valid. It specifies the structure of the strings but does not address their meaning or usage in a given context.
Index ¶
- Constants
- Variables
- func WriteString(w io.Writer, s String[Symbol]) (n int, err error)
- func WriteSymbol(w io.Writer, s Symbol) (n int, err error)
- type CFG
- func (g *CFG) AddNewNonTerminal(prefix NonTerminal, suffixes ...string) NonTerminal
- func (g *CFG) ChomskyNormalForm() *CFG
- func (g *CFG) Clone() *CFG
- func (g *CFG) ComputeFIRST() FIRST
- func (g *CFG) ComputeFOLLOW(first FIRST) FOLLOW
- func (g *CFG) EliminateCycles() *CFG
- func (g *CFG) EliminateEmptyProductions() *CFG
- func (g *CFG) EliminateLeftRecursion() *CFG
- func (g *CFG) EliminateSingleProductions() *CFG
- func (g *CFG) EliminateUnreachableProductions() *CFG
- func (g *CFG) Equal(rhs *CFG) bool
- func (g *CFG) IsCNF() error
- func (g *CFG) IsLL1() error
- func (g *CFG) LeftFactor() *CFG
- func (g *CFG) NullableNonTerminals() set.Set[NonTerminal]
- func (g *CFG) OrderNonTerminals() (String[NonTerminal], String[NonTerminal], String[NonTerminal])
- func (g *CFG) OrderTerminals() String[Terminal]
- func (g *CFG) String() string
- func (g *CFG) Symbols() set.Set[Symbol]
- func (g *CFG) Verify() error
- type CNFError
- type FIRST
- type FOLLOW
- type LL1Error
- type NonTerminal
- type Production
- type Productions
- func (p *Productions) Add(ps ...*Production)
- func (p *Productions) All() iter.Seq[*Production]
- func (p *Productions) AllByHead() iter.Seq2[NonTerminal, set.Set[*Production]]
- func (p *Productions) AllMatch(pred generic.Predicate1[*Production]) bool
- func (p *Productions) AnyMatch(pred generic.Predicate1[*Production]) bool
- func (p *Productions) Clone() *Productions
- func (p *Productions) Equal(rhs *Productions) bool
- func (p *Productions) Get(head NonTerminal) set.Set[*Production]
- func (p *Productions) Remove(ps ...*Production)
- func (p *Productions) RemoveAll(heads ...NonTerminal)
- func (p *Productions) SelectMatch(pred generic.Predicate1[*Production]) *Productions
- func (p *Productions) String() string
- type String
- func (s String[T]) AnyMatch(pred generic.Predicate1[T]) bool
- func (s String[T]) Append(syms ...T) String[T]
- func (s String[T]) Concat(ss ...String[T]) String[T]
- func (s String[T]) ContainsSymbol(symbol T) bool
- func (s String[T]) Equal(rhs String[T]) bool
- func (s String[T]) HasPrefix(prefix String[T]) bool
- func (s String[T]) HasSuffix(prefix String[T]) bool
- func (s String[Symbol]) NonTerminals() String[NonTerminal]
- func (s String[T]) Prepend(syms ...T) String[T]
- func (s String[T]) String() string
- func (s String[Symbol]) Terminals() String[Terminal]
- type Symbol
- type Terminal
- type TerminalsAndEmpty
- type TerminalsAndEndmarker
Constants ¶
const Endmarker = Terminal("\uEEEE")
Endmarker is a special symbol that is used to indicate the end of a string. This special symbol assumed not to be a symbol of any grammar and it is taken from a Private Use Area (PUA) in Unicode.
The endmarker is not a formal part of the grammar itself but is introduced during parsing to simplify the handling of end-of-input scenarios, especially in parsing algorithms like LL(1) or LR(1).
For more information and details, see "Compilers: Principles, Techniques, and Tools (2nd Edition)".
Variables ¶
var ( CmpProduction = cmpProduction HashProduction = hashFuncForProduction() EqProduction = func(lhs, rhs *Production) bool { return lhs.Equal(rhs) } EqProductionSet = func(lhs, rhs set.Set[*Production]) bool { return lhs.Equal(rhs) } )
var ( CmpString = cmpString HashString = hashFuncForString() EqString = func(lhs, rhs String[Symbol]) bool { return lhs.Equal(rhs) } )
var ( EqSymbol = func(lhs, rhs Symbol) bool { return lhs.Equal(rhs) } CmpSymbol = compareFuncForSymbol() HashSymbol = hashFuncForSymbol() EqTerminal = generic.NewEqualFunc[Terminal]() CmpTerminal = generic.NewCompareFunc[Terminal]() HashTerminal = hash.HashFuncForString[Terminal](nil) EqNonTerminal = generic.NewEqualFunc[NonTerminal]() CmpNonTerminal = generic.NewCompareFunc[NonTerminal]() HashNonTerminal = hash.HashFuncForString[NonTerminal](nil) )
var E = String[Symbol]{}
E is the empty string ε.
var EqTerminalsAndEmpty = func(lhs, rhs *TerminalsAndEmpty) bool {
return lhs.Terminals.Equal(rhs.Terminals) && lhs.IncludesEmpty == rhs.IncludesEmpty
}
var EqTerminalsAndEndmarker = func(lhs, rhs *TerminalsAndEndmarker) bool {
return lhs.Terminals.Equal(rhs.Terminals) && lhs.IncludesEndmarker == rhs.IncludesEndmarker
}
Functions ¶
func WriteString ¶
WriteString writes a string of symbols to the provided io.Writer. It returns the number of bytes written and any error encountered.
Types ¶
type CFG ¶
type CFG struct { Terminals set.Set[Terminal] NonTerminals set.Set[NonTerminal] Productions *Productions Start NonTerminal }
CFG represents a context-free grammar in formal language theory.
Context-free grammars can express a wide range of programming language constructs while remaining computationally efficient to parse. They are used in computer science and linguistics to describe the syntax of languages.
A context-free grammar G = (V, Σ, R, S) is defined by four sets:
V is a set of terminal symbols from which strings are formed. Terminal symbols are also referred to as tokens.
Σ is a set of non-terminals symbols that denote sets of strings. Non-terminal symbols are sometimes called syntactic variables. Non-terminals impose a hierarchical structure on the language.
R = V × (V ∪ Σ)* is a set of productions, where each production consists of a non-terminal (head), an arrow, and a sequence of terminals and/or non-terminals (body).
S ∈ V is one of the non-terminal symbols designated as the start symbol. The set of strings denoted by the start symbol is the language generated by the grammar.
Context-free languages are a superset of regular languages and they are more expressive.
func NewCFG ¶
func NewCFG(terms []Terminal, nonTerms []NonTerminal, prods []*Production, start NonTerminal) *CFG
NewCFG creates a new context-free grammar.
func (*CFG) AddNewNonTerminal ¶ added in v0.10.1
func (g *CFG) AddNewNonTerminal(prefix NonTerminal, suffixes ...string) NonTerminal
AddNewNonTerminal generates and adds a new non-terminal symbol to the grammar. It does so by appending each of the provided suffixes to the given prefix, in order, until it finds a non-terminal that does not already exist in the set of non-terminals.
If all generated non-terminals already exist, the function panics.
func (*CFG) ChomskyNormalForm ¶
ChomskyNormalForm converts a context-free grammar into an equivalent grammar in Chomsky Normal Form.
A context-free grammar G is in Chomsky Normal Form (CNF) if all of its production rules are of the form
- A → BC, or
- A → a, or
- S → ε
where A, B, and C are non-terminal symbols, a is a terminal symbol, and S is the start symbol. Also, neither B nor C may be the start symbol, and the third production rule can only appear if ε is in L(G).
func (*CFG) Clone ¶
Clone returns a deep copy of a context-free grammar, ensuring the clone is independent of the original.
func (*CFG) ComputeFIRST ¶
ComputeFIRST returns the FIRST function for a context-free grammar. The returned function memoizes the FIRST(X) for all grammar symbols (terminals and non-terminals). It will compute FIRST(α) based on the pre-computed FIRST(X) and memoize the result.
FIRST(α), where α is any string of grammar symbols (terminals and non-terminals), is the set of terminals that begin strings derived from α. If α ⇒* ε, then ε is also in FIRST(α).
func (*CFG) ComputeFOLLOW ¶
ComputeFOLLOW returns the FOLLOW function for a context-free grammar. The returned function memoizes the FOLLOW(A) for all non-terminals A.
FOLLOW(A), for non-terminal A, is the set of terminals 𝑎 that can appear immediately to the right of A in some sentential form; that is; the set of terminals 𝑎 such that there exists a derivation of the form S ⇒* αAaβ for some α and β strings of grammar symbols (terminals and non-terminals).
func (*CFG) EliminateCycles ¶
EliminateCycles converts a context-free grammar into an equivalent cycle-free grammar.
A grammar is cyclic if it has derivations of one or more steps in which A ⇒* A for some non-terminal A.
func (*CFG) EliminateEmptyProductions ¶
EliminateEmptyProductions converts a context-free grammar into an equivalent ε-free grammar.
An empty production (ε-production) is any production of the form A → ε.
func (*CFG) EliminateLeftRecursion ¶
EliminateLeftRecursion converts a context-free grammar into an equivalent grammar with no left recursion.
A grammar is left-recursive if it has a non-terminal A such that there is a derivation A ⇒+ Aα for some string. For top-down parsers, left recursion causes the parser to loop forever. Many bottom-up parsers also will not accept left-recursive grammars.
Note that the resulting non-left-recursive grammar may have ε-productions.
func (*CFG) EliminateSingleProductions ¶
EliminateSingleProductions converts a context-free grammar into an equivalent single-production-free grammar.
A single production a.k.a. unit production is a production rule whose body is a single non-terminal symbol (A → B).
func (*CFG) EliminateUnreachableProductions ¶
EliminateUnreachableProductions converts a context-free grammar into an equivalent grammar with all unreachable productions and their associated non-terminal symbols removed.
An unreachable production refers to a production rule in a grammar that cannot be used to derive any string starting from the start symbol.
The function also removes unreachable terminals, which are terminals that do not appear in any reachable production.
func (*CFG) Equal ¶ added in v0.10.3
Equal determines whether or not two context-free grammars are the same.
func (*CFG) IsCNF ¶
IsCNF checks if a context-free grammar is in Chomsky Normal Form (CNF).
A context-free grammar G is in Chomsky Normal Form (CNF) if all of its production rules are of the form
- A → BC, or
- A → a, or
- S → ε
where A, B, and C are non-terminal symbols, a is a terminal symbol, and S is the start symbol. Also, neither B nor C may be the start symbol, and the third production rule can only appear if ε is in L(G).
The method returns nil if the grammar is in CNF, or an error detailing which production rules do not conform to CNF.
The returned error is always an instance of errors.MultiError, which contains instances of CNFError, if the CFG is not in Chomsky Normal Form (CNF).
func (*CFG) IsLL1 ¶ added in v0.10.1
IsLL1 checks if a context-free grammar (CFG) is an LL(1) grammar.
LL(1) grammars are a subset of context-free grammars used in predictive parsing, a top-down parsing technique that uses recursive descent without backtracking.
- The first "L" indicates that the grammar is parsed from Left to right.
- The second "L" indicates that it produces a left-most derivation of the input string.
- The "1" means that the parser makes its decisions based on one input symbol of lookahead at each step.
LL(1) grammars are expressive enough to cover most programming constructs. No left-recursive or ambiguous grammar can be LL(1).
The method returns nil if the grammar is LL(1). It returns an error otherwise, providing details about what violates the LL(1) constraints.
The returned error is always an instance of errors.MultiError, which contains instances of LL1Error, if the CFG is not LL(1).
func (*CFG) LeftFactor ¶
LeftFactor converts a context-free grammar into an equivalent left-factored grammar.
Left factoring is a grammar transformation for producing a grammar suitable predictive for top-down parsing. When the choice between two alternative A-productions is not clear, we may be able to rewrite the productions to defer the decision until enough of the input has been seen that we can make the right choice.
For example, if we have the two productions
𝑠𝑡𝑚𝑡 → 𝐢𝐟 𝑒𝑥𝑝𝑟 𝐭𝐡𝐞𝐧 𝑠𝑡𝑚𝑡 𝐞𝐥𝐬𝐞 𝑠𝑡𝑚𝑡 | 𝐢𝐟 𝑒𝑥𝑝𝑟 𝐭𝐡𝐞𝐧 𝑠𝑡𝑚𝑡
on seeing the input 𝐢𝐟, we cannot immediately tell which productions to choose to expand 𝑠𝑡𝑚𝑡.
Note that the resulting left-factored grammar may have ε-productions and/or single productions.
func (*CFG) NullableNonTerminals ¶
func (g *CFG) NullableNonTerminals() set.Set[NonTerminal]
NullableNonTerminals finds all non-terminals in a context-free grammar that can derive the empty string ε in one or more steps (A ⇒* ε for some non-terminal A).
func (*CFG) OrderNonTerminals ¶ added in v0.10.1
func (g *CFG) OrderNonTerminals() (String[NonTerminal], String[NonTerminal], String[NonTerminal])
OrderNonTerminals orders the unordered set of grammar non-terminals in a deterministic way.
The goal of this function is to ensure a consistent and deterministic order for any given set of non-terminals.
func (*CFG) OrderTerminals ¶ added in v0.10.1
OrderTerminals orders the unordered set of grammar terminals in a deterministic way.
The goal of this function is to ensure a consistent and deterministic order for any given set of terminals.
type CNFError ¶ added in v0.10.1
type CNFError struct {
P *Production
}
CNFError represents an error for a production rule in the form A → α that does not conform to Chomsky Normal Form (CNF).
type FIRST ¶
type FIRST func(String[Symbol]) *TerminalsAndEmpty
FIRST is the FIRST function associated with a context-free grammar.
FIRST(α), where α is any string of grammar symbols (terminals and non-terminals), is the set of terminals that begin strings derived from α. If α ⇒* ε, then ε is also in FIRST(α).
type FOLLOW ¶
type FOLLOW func(NonTerminal) *TerminalsAndEndmarker
FOLLOW is the FIRST function associated with a context-free grammar.
FOLLOW(A), for non-terminal A, is the set of terminals 𝑎 that can appear immediately to the right of A in some sentential form; that is; the set of terminals 𝑎 such that there exists a derivation of the form S ⇒* αAaβ for some α and β strings of grammar symbols (terminals and non-terminals).
type LL1Error ¶ added in v0.10.1
type LL1Error struct { A NonTerminal Alpha, Beta String[Symbol] FOLLOWA *TerminalsAndEndmarker FIRSTα, FIRSTβ *TerminalsAndEmpty // contains filtered or unexported fields }
LL1Error represents an error where two distinct production rules in the form A → α | β violate LL(1) parsing requirements for a context-free grammar.
type NonTerminal ¶
type NonTerminal string
NonTerminal represents a non-terminal symbol. Non-terminals are syntaxtic variables that denote sets of strings. Non-terminals impose a hierarchical structure on a language.
func (NonTerminal) Equal ¶ added in v0.10.3
func (n NonTerminal) Equal(rhs Symbol) bool
Equal determines whether or not two non-terminal symbols are the same.
func (NonTerminal) IsTerminal ¶
func (n NonTerminal) IsTerminal() bool
IsTerminal always returns false for non-terminal symbols.
func (NonTerminal) Name ¶
func (n NonTerminal) Name() string
Name returns the name of non-terminal symbol.
func (NonTerminal) String ¶
func (n NonTerminal) String() string
String returns a string representation of a non-terminal symbol.
type Production ¶
type Production struct { // Head or left side defines some of the strings denoted by the non-terminal symbol. Head NonTerminal // Body or right side describes one way in which strings of the non-terminal at the head can be constructed. Body String[Symbol] }
Production represents a context-free production rule. The productions of a context-free grammar determine how the terminals and non-terminals can be combined to form strings.
func OrderProductionSet ¶ added in v0.10.1
func OrderProductionSet(set set.Set[*Production]) []*Production
OrderProductionSet orders an unordered set of production rules in a deterministic way.
func (*Production) Equal ¶ added in v0.10.3
func (p *Production) Equal(rhs *Production) bool
Equal determines whether or not two production rules are the same.
func (*Production) IsCNF ¶
func (p *Production) IsCNF() (bool, bool)
IsCNF checks if a production rule is in Chomsky Normal Form (CNF).
A production is in CNF if it is either:
- A → BC: where A, B, and C are non-terminal symbols.
- A → a: where A is a non-terminal symbol and a is a terminal symbol.
The function returns two boolean values:
- The first value indicates if the rule is of the form A → BC.
- The second value indicates if the rule is of the form A → a.
func (*Production) IsEmpty ¶
func (p *Production) IsEmpty() bool
IsEmpty determines whether or not a production rule is an empty production (ε-production).
An empty production (ε-production) is any production of the form A → ε.
func (*Production) IsLeftRecursive ¶
func (p *Production) IsLeftRecursive() bool
IsLeftRecursive determines whether or not a production rule is left recursive (immediate left recursive).
A left recursive production is a production rule of the form of A → Aα
func (*Production) IsSingle ¶
func (p *Production) IsSingle() bool
IsSingle determines whether or not a production rule is a single production (unit production).
A single production (unit production) is a production whose body is a single non-terminal (A → B).
func (*Production) String ¶
func (p *Production) String() string
String returns a string representation of a production rule.
type Productions ¶
type Productions struct {
// contains filtered or unexported fields
}
Productions represents a set of production rules for a context-free grammar.
func NewProductions ¶
func NewProductions() *Productions
NewProductions creates a new instance of the Productions.
func (*Productions) Add ¶
func (p *Productions) Add(ps ...*Production)
Add adds a new production rule.
func (*Productions) All ¶
func (p *Productions) All() iter.Seq[*Production]
All returns an iterator sequence containing all production rules.
func (*Productions) AllByHead ¶
func (p *Productions) AllByHead() iter.Seq2[NonTerminal, set.Set[*Production]]
AllByHead returns an iterator sequence sequence of pairs, where each pair consists of a head non-terminal and its associated set of production rules.
func (*Productions) AllMatch ¶
func (p *Productions) AllMatch(pred generic.Predicate1[*Production]) bool
AllMatch returns true if all production rules satisfy the provided predicate. If the set of production rules is empty, it returns true.
func (*Productions) AnyMatch ¶
func (p *Productions) AnyMatch(pred generic.Predicate1[*Production]) bool
AnyMatch returns true if at least one production rule satisfies the provided predicate.
func (*Productions) Clone ¶ added in v0.10.5
func (p *Productions) Clone() *Productions
Clone returns a deep copy of the production rules, ensuring the clone is independent of the original.
func (*Productions) Equal ¶ added in v0.10.5
func (p *Productions) Equal(rhs *Productions) bool
Equal determines whether or not two sets of production rules are the same.
func (*Productions) Get ¶
func (p *Productions) Get(head NonTerminal) set.Set[*Production]
Get finds and returns a production rule by its head non-terminal symbol. It returns nil if no production rules are found for the specified head.
func (*Productions) Remove ¶
func (p *Productions) Remove(ps ...*Production)
Remove removes a production rule.
func (*Productions) RemoveAll ¶
func (p *Productions) RemoveAll(heads ...NonTerminal)
RemoveAll removes all production rules with the specified head non-terminal.
func (*Productions) SelectMatch ¶ added in v0.10.5
func (p *Productions) SelectMatch(pred generic.Predicate1[*Production]) *Productions
SelectMatch selects a subset of production rules that satisfy the given predicate. It returns a new set of production rules containing the matching productions, of the same type as the original set of production rules.
func (*Productions) String ¶ added in v0.10.5
func (p *Productions) String() string
String returns a string representation of production rules.
type String ¶
type String[T Symbol] []T
String represent a string of grammar symbols.
func LongestCommonPrefixOf ¶
LongestCommonPrefixOf computes the longest common prefix of a list of strings. If the input is empty or there is no common prefix, it returns the empty string ε.
func (String[T]) AnyMatch ¶
func (s String[T]) AnyMatch(pred generic.Predicate1[T]) bool
AnyMatch returns true if at least one symbol satisfies the provided predicate.
func (String[T]) Append ¶
Append appends new symbols to the current string and returns a new string.
func (String[T]) Concat ¶
Concat concatenates the current string with one or more strings and returns a new string.
func (String[T]) ContainsSymbol ¶
ContainsSymbol checks whether a string contains the given symbol.
func (String[Symbol]) NonTerminals ¶
func (s String[Symbol]) NonTerminals() String[NonTerminal]
NonTerminals returns all non-terminal symbols of a string of symbols.
func (String[T]) Prepend ¶ added in v0.11.0
Prepend prepends new symbols to the current string and returns a new string.
type Terminal ¶
type Terminal string
Terminal represents a terminal symbol. Terminals are the basic symbols from which strings of a language are formed. Token name or token for short are equivalent to terminal.
func (Terminal) Equal ¶ added in v0.10.3
Equal determines whether or not two terminal symbols are the same.
func (Terminal) IsTerminal ¶
IsTerminal always returns true for terminal symbols.
type TerminalsAndEmpty ¶
TerminalsAndEmpty is the return type for the FIRST function.
It contains:
- A set of terminals that may appear at the beginning of strings derived from α.
- A flag indicating whether the empty string ε is included in the FIRST set..
func (*TerminalsAndEmpty) String ¶ added in v0.10.1
func (s *TerminalsAndEmpty) String() string
String returns a string representation of the FIRST set.
type TerminalsAndEndmarker ¶
TerminalsAndEndmarker is the return type for the FOLLOW function.
It contains:
- A set of terminals that can appear immediately after the given non-terminal.
- A flag indicating whether the special endmarker symbol is included in the FOLLOW set.
func (*TerminalsAndEndmarker) String ¶ added in v0.10.1
func (s *TerminalsAndEndmarker) String() string
String returns a string representation of the FOLLOW set.