grammar

package
v0.11.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 22, 2025 License: ISC Imports: 12 Imported by: 0

Documentation

Overview

Package grammar implements data structures and algorithms for formal grammars.

In the theory of languages, a formal grammar is a set of rules or productions that define the structure of strings within a formal language. A formal grammar defines which strings in a formal language's alphabet are syntactically valid. It specifies the structure of the strings but does not address their meaning or usage in a given context.

Index

Constants

View Source
const Endmarker = Terminal("\uEEEE")

Endmarker is a special symbol that is used to indicate the end of a string. This special symbol assumed not to be a symbol of any grammar and it is taken from a Private Use Area (PUA) in Unicode.

The endmarker is not a formal part of the grammar itself but is introduced during parsing to simplify the handling of end-of-input scenarios, especially in parsing algorithms like LL(1) or LR(1).

For more information and details, see "Compilers: Principles, Techniques, and Tools (2nd Edition)".

Variables

View Source
var (
	CmpProduction  = cmpProduction
	HashProduction = hashFuncForProduction()

	EqProduction = func(lhs, rhs *Production) bool {
		return lhs.Equal(rhs)
	}

	EqProductionSet = func(lhs, rhs set.Set[*Production]) bool {
		return lhs.Equal(rhs)
	}
)
View Source
var (
	CmpString  = cmpString
	HashString = hashFuncForString()

	EqString = func(lhs, rhs String[Symbol]) bool {
		return lhs.Equal(rhs)
	}
)
View Source
var (
	EqSymbol = func(lhs, rhs Symbol) bool {
		return lhs.Equal(rhs)
	}

	CmpSymbol  = compareFuncForSymbol()
	HashSymbol = hashFuncForSymbol()

	EqTerminal   = generic.NewEqualFunc[Terminal]()
	CmpTerminal  = generic.NewCompareFunc[Terminal]()
	HashTerminal = hash.HashFuncForString[Terminal](nil)

	EqNonTerminal   = generic.NewEqualFunc[NonTerminal]()
	CmpNonTerminal  = generic.NewCompareFunc[NonTerminal]()
	HashNonTerminal = hash.HashFuncForString[NonTerminal](nil)
)

E is the empty string ε.

View Source
var EqTerminalsAndEmpty = func(lhs, rhs *TerminalsAndEmpty) bool {
	return lhs.Terminals.Equal(rhs.Terminals) && lhs.IncludesEmpty == rhs.IncludesEmpty
}
View Source
var EqTerminalsAndEndmarker = func(lhs, rhs *TerminalsAndEndmarker) bool {
	return lhs.Terminals.Equal(rhs.Terminals) && lhs.IncludesEndmarker == rhs.IncludesEndmarker
}

Functions

func WriteString

func WriteString(w io.Writer, s String[Symbol]) (n int, err error)

WriteString writes a string of symbols to the provided io.Writer. It returns the number of bytes written and any error encountered.

func WriteSymbol

func WriteSymbol(w io.Writer, s Symbol) (n int, err error)

WriteSymbol writes the string representation of a symbol to the provided io.Writer. It returns the number of bytes written and any error encountered.

Types

type CFG

type CFG struct {
	Terminals    set.Set[Terminal]
	NonTerminals set.Set[NonTerminal]
	Productions  *Productions
	Start        NonTerminal
}

CFG represents a context-free grammar in formal language theory.

Context-free grammars can express a wide range of programming language constructs while remaining computationally efficient to parse. They are used in computer science and linguistics to describe the syntax of languages.

A context-free grammar G = (V, Σ, R, S) is defined by four sets:

  1. V is a set of terminal symbols from which strings are formed. Terminal symbols are also referred to as tokens.

  2. Σ is a set of non-terminals symbols that denote sets of strings. Non-terminal symbols are sometimes called syntactic variables. Non-terminals impose a hierarchical structure on the language.

  3. R = V × (V ∪ Σ)* is a set of productions, where each production consists of a non-terminal (head), an arrow, and a sequence of terminals and/or non-terminals (body).

  4. S ∈ V is one of the non-terminal symbols designated as the start symbol. The set of strings denoted by the start symbol is the language generated by the grammar.

Context-free languages are a superset of regular languages and they are more expressive.

func NewCFG

func NewCFG(terms []Terminal, nonTerms []NonTerminal, prods []*Production, start NonTerminal) *CFG

NewCFG creates a new context-free grammar.

func (*CFG) AddNewNonTerminal added in v0.10.1

func (g *CFG) AddNewNonTerminal(prefix NonTerminal, suffixes ...string) NonTerminal

AddNewNonTerminal generates and adds a new non-terminal symbol to the grammar. It does so by appending each of the provided suffixes to the given prefix, in order, until it finds a non-terminal that does not already exist in the set of non-terminals.

If all generated non-terminals already exist, the function panics.

func (*CFG) ChomskyNormalForm

func (g *CFG) ChomskyNormalForm() *CFG

ChomskyNormalForm converts a context-free grammar into an equivalent grammar in Chomsky Normal Form.

A context-free grammar G is in Chomsky Normal Form (CNF) if all of its production rules are of the form

  • A → BC, or
  • A → a, or
  • S → ε

where A, B, and C are non-terminal symbols, a is a terminal symbol, and S is the start symbol. Also, neither B nor C may be the start symbol, and the third production rule can only appear if ε is in L(G).

func (*CFG) Clone

func (g *CFG) Clone() *CFG

Clone returns a deep copy of a context-free grammar, ensuring the clone is independent of the original.

func (*CFG) ComputeFIRST

func (g *CFG) ComputeFIRST() FIRST

ComputeFIRST returns the FIRST function for a context-free grammar. The returned function memoizes the FIRST(X) for all grammar symbols (terminals and non-terminals). It will compute FIRST(α) based on the pre-computed FIRST(X) and memoize the result.

FIRST(α), where α is any string of grammar symbols (terminals and non-terminals), is the set of terminals that begin strings derived from α. If α ⇒* ε, then ε is also in FIRST(α).

func (*CFG) ComputeFOLLOW

func (g *CFG) ComputeFOLLOW(first FIRST) FOLLOW

ComputeFOLLOW returns the FOLLOW function for a context-free grammar. The returned function memoizes the FOLLOW(A) for all non-terminals A.

FOLLOW(A), for non-terminal A, is the set of terminals 𝑎 that can appear immediately to the right of A in some sentential form; that is; the set of terminals 𝑎 such that there exists a derivation of the form S ⇒* αAaβ for some α and β strings of grammar symbols (terminals and non-terminals).

func (*CFG) EliminateCycles

func (g *CFG) EliminateCycles() *CFG

EliminateCycles converts a context-free grammar into an equivalent cycle-free grammar.

A grammar is cyclic if it has derivations of one or more steps in which A ⇒* A for some non-terminal A.

func (*CFG) EliminateEmptyProductions

func (g *CFG) EliminateEmptyProductions() *CFG

EliminateEmptyProductions converts a context-free grammar into an equivalent ε-free grammar.

An empty production (ε-production) is any production of the form A → ε.

func (*CFG) EliminateLeftRecursion

func (g *CFG) EliminateLeftRecursion() *CFG

EliminateLeftRecursion converts a context-free grammar into an equivalent grammar with no left recursion.

A grammar is left-recursive if it has a non-terminal A such that there is a derivation A ⇒+ Aα for some string. For top-down parsers, left recursion causes the parser to loop forever. Many bottom-up parsers also will not accept left-recursive grammars.

Note that the resulting non-left-recursive grammar may have ε-productions.

func (*CFG) EliminateSingleProductions

func (g *CFG) EliminateSingleProductions() *CFG

EliminateSingleProductions converts a context-free grammar into an equivalent single-production-free grammar.

A single production a.k.a. unit production is a production rule whose body is a single non-terminal symbol (A → B).

func (*CFG) EliminateUnreachableProductions

func (g *CFG) EliminateUnreachableProductions() *CFG

EliminateUnreachableProductions converts a context-free grammar into an equivalent grammar with all unreachable productions and their associated non-terminal symbols removed.

An unreachable production refers to a production rule in a grammar that cannot be used to derive any string starting from the start symbol.

The function also removes unreachable terminals, which are terminals that do not appear in any reachable production.

func (*CFG) Equal added in v0.10.3

func (g *CFG) Equal(rhs *CFG) bool

Equal determines whether or not two context-free grammars are the same.

func (*CFG) IsCNF

func (g *CFG) IsCNF() error

IsCNF checks if a context-free grammar is in Chomsky Normal Form (CNF).

A context-free grammar G is in Chomsky Normal Form (CNF) if all of its production rules are of the form

  • A → BC, or
  • A → a, or
  • S → ε

where A, B, and C are non-terminal symbols, a is a terminal symbol, and S is the start symbol. Also, neither B nor C may be the start symbol, and the third production rule can only appear if ε is in L(G).

The method returns nil if the grammar is in CNF, or an error detailing which production rules do not conform to CNF.

The returned error is always an instance of errors.MultiError, which contains instances of CNFError, if the CFG is not in Chomsky Normal Form (CNF).

func (*CFG) IsLL1 added in v0.10.1

func (g *CFG) IsLL1() error

IsLL1 checks if a context-free grammar (CFG) is an LL(1) grammar.

LL(1) grammars are a subset of context-free grammars used in predictive parsing, a top-down parsing technique that uses recursive descent without backtracking.

  • The first "L" indicates that the grammar is parsed from Left to right.
  • The second "L" indicates that it produces a left-most derivation of the input string.
  • The "1" means that the parser makes its decisions based on one input symbol of lookahead at each step.

LL(1) grammars are expressive enough to cover most programming constructs. No left-recursive or ambiguous grammar can be LL(1).

The method returns nil if the grammar is LL(1). It returns an error otherwise, providing details about what violates the LL(1) constraints.

The returned error is always an instance of errors.MultiError, which contains instances of LL1Error, if the CFG is not LL(1).

func (*CFG) LeftFactor

func (g *CFG) LeftFactor() *CFG

LeftFactor converts a context-free grammar into an equivalent left-factored grammar.

Left factoring is a grammar transformation for producing a grammar suitable predictive for top-down parsing. When the choice between two alternative A-productions is not clear, we may be able to rewrite the productions to defer the decision until enough of the input has been seen that we can make the right choice.

For example, if we have the two productions

𝑠𝑡𝑚𝑡 → 𝐢𝐟 𝑒𝑥𝑝𝑟 𝐭𝐡𝐞𝐧 𝑠𝑡𝑚𝑡 𝐞𝐥𝐬𝐞 𝑠𝑡𝑚𝑡
    | 𝐢𝐟 𝑒𝑥𝑝𝑟 𝐭𝐡𝐞𝐧 𝑠𝑡𝑚𝑡

on seeing the input 𝐢𝐟, we cannot immediately tell which productions to choose to expand 𝑠𝑡𝑚𝑡.

Note that the resulting left-factored grammar may have ε-productions and/or single productions.

func (*CFG) NullableNonTerminals

func (g *CFG) NullableNonTerminals() set.Set[NonTerminal]

NullableNonTerminals finds all non-terminals in a context-free grammar that can derive the empty string ε in one or more steps (A ⇒* ε for some non-terminal A).

func (*CFG) OrderNonTerminals added in v0.10.1

func (g *CFG) OrderNonTerminals() (String[NonTerminal], String[NonTerminal], String[NonTerminal])

OrderNonTerminals orders the unordered set of grammar non-terminals in a deterministic way.

The goal of this function is to ensure a consistent and deterministic order for any given set of non-terminals.

func (*CFG) OrderTerminals added in v0.10.1

func (g *CFG) OrderTerminals() String[Terminal]

OrderTerminals orders the unordered set of grammar terminals in a deterministic way.

The goal of this function is to ensure a consistent and deterministic order for any given set of terminals.

func (*CFG) String

func (g *CFG) String() string

String returns a string representation of a context-free grammar.

func (*CFG) Symbols added in v0.10.2

func (g *CFG) Symbols() set.Set[Symbol]

Symbols returns the set of all symbols in the grammar, including both terminal and non-terminal symbols.

func (*CFG) Verify

func (g *CFG) Verify() error

verify takes a context-free grammar and determines whether or not it is valid. If the given grammar is invalid, an error with a descriptive message will be returned.

type CNFError added in v0.10.1

type CNFError struct {
	P *Production
}

CNFError represents an error for a production rule in the form A → α that does not conform to Chomsky Normal Form (CNF).

func (*CNFError) Error added in v0.10.1

func (e *CNFError) Error() string

Error implements the error interface. It returns a formatted string describing the error in detail.

type FIRST

type FIRST func(String[Symbol]) *TerminalsAndEmpty

FIRST is the FIRST function associated with a context-free grammar.

FIRST(α), where α is any string of grammar symbols (terminals and non-terminals), is the set of terminals that begin strings derived from α. If α ⇒* ε, then ε is also in FIRST(α).

type FOLLOW

type FOLLOW func(NonTerminal) *TerminalsAndEndmarker

FOLLOW is the FIRST function associated with a context-free grammar.

FOLLOW(A), for non-terminal A, is the set of terminals 𝑎 that can appear immediately to the right of A in some sentential form; that is; the set of terminals 𝑎 such that there exists a derivation of the form S ⇒* αAaβ for some α and β strings of grammar symbols (terminals and non-terminals).

type LL1Error added in v0.10.1

type LL1Error struct {
	A              NonTerminal
	Alpha, Beta    String[Symbol]
	FOLLOWA        *TerminalsAndEndmarker
	FIRSTα, FIRSTβ *TerminalsAndEmpty
	// contains filtered or unexported fields
}

LL1Error represents an error where two distinct production rules in the form A → α | β violate LL(1) parsing requirements for a context-free grammar.

func (*LL1Error) Error added in v0.10.1

func (e *LL1Error) Error() string

Error implements the error interface. It returns a formatted string describing the error in detail.

type NonTerminal

type NonTerminal string

NonTerminal represents a non-terminal symbol. Non-terminals are syntaxtic variables that denote sets of strings. Non-terminals impose a hierarchical structure on a language.

func (NonTerminal) Equal added in v0.10.3

func (n NonTerminal) Equal(rhs Symbol) bool

Equal determines whether or not two non-terminal symbols are the same.

func (NonTerminal) IsTerminal

func (n NonTerminal) IsTerminal() bool

IsTerminal always returns false for non-terminal symbols.

func (NonTerminal) Name

func (n NonTerminal) Name() string

Name returns the name of non-terminal symbol.

func (NonTerminal) String

func (n NonTerminal) String() string

String returns a string representation of a non-terminal symbol.

type Production

type Production struct {
	// Head or left side defines some of the strings denoted by the non-terminal symbol.
	Head NonTerminal
	// Body or right side describes one way in which strings of the non-terminal at the head can be constructed.
	Body String[Symbol]
}

Production represents a context-free production rule. The productions of a context-free grammar determine how the terminals and non-terminals can be combined to form strings.

func OrderProductionSet added in v0.10.1

func OrderProductionSet(set set.Set[*Production]) []*Production

OrderProductionSet orders an unordered set of production rules in a deterministic way.

func (*Production) Equal added in v0.10.3

func (p *Production) Equal(rhs *Production) bool

Equal determines whether or not two production rules are the same.

func (*Production) IsCNF

func (p *Production) IsCNF() (bool, bool)

IsCNF checks if a production rule is in Chomsky Normal Form (CNF).

A production is in CNF if it is either:

  1. A → BC: where A, B, and C are non-terminal symbols.
  2. A → a: where A is a non-terminal symbol and a is a terminal symbol.

The function returns two boolean values:

  • The first value indicates if the rule is of the form A → BC.
  • The second value indicates if the rule is of the form A → a.

func (*Production) IsEmpty

func (p *Production) IsEmpty() bool

IsEmpty determines whether or not a production rule is an empty production (ε-production).

An empty production (ε-production) is any production of the form A → ε.

func (*Production) IsLeftRecursive

func (p *Production) IsLeftRecursive() bool

IsLeftRecursive determines whether or not a production rule is left recursive (immediate left recursive).

A left recursive production is a production rule of the form of A → Aα

func (*Production) IsSingle

func (p *Production) IsSingle() bool

IsSingle determines whether or not a production rule is a single production (unit production).

A single production (unit production) is a production whose body is a single non-terminal (A → B).

func (*Production) String

func (p *Production) String() string

String returns a string representation of a production rule.

type Productions

type Productions struct {
	// contains filtered or unexported fields
}

Productions represents a set of production rules for a context-free grammar.

func NewProductions

func NewProductions() *Productions

NewProductions creates a new instance of the Productions.

func (*Productions) Add

func (p *Productions) Add(ps ...*Production)

Add adds a new production rule.

func (*Productions) All

func (p *Productions) All() iter.Seq[*Production]

All returns an iterator sequence containing all production rules.

func (*Productions) AllByHead

func (p *Productions) AllByHead() iter.Seq2[NonTerminal, set.Set[*Production]]

AllByHead returns an iterator sequence sequence of pairs, where each pair consists of a head non-terminal and its associated set of production rules.

func (*Productions) AllMatch

func (p *Productions) AllMatch(pred generic.Predicate1[*Production]) bool

AllMatch returns true if all production rules satisfy the provided predicate. If the set of production rules is empty, it returns true.

func (*Productions) AnyMatch

func (p *Productions) AnyMatch(pred generic.Predicate1[*Production]) bool

AnyMatch returns true if at least one production rule satisfies the provided predicate.

func (*Productions) Clone added in v0.10.5

func (p *Productions) Clone() *Productions

Clone returns a deep copy of the production rules, ensuring the clone is independent of the original.

func (*Productions) Equal added in v0.10.5

func (p *Productions) Equal(rhs *Productions) bool

Equal determines whether or not two sets of production rules are the same.

func (*Productions) Get

func (p *Productions) Get(head NonTerminal) set.Set[*Production]

Get finds and returns a production rule by its head non-terminal symbol. It returns nil if no production rules are found for the specified head.

func (*Productions) Remove

func (p *Productions) Remove(ps ...*Production)

Remove removes a production rule.

func (*Productions) RemoveAll

func (p *Productions) RemoveAll(heads ...NonTerminal)

RemoveAll removes all production rules with the specified head non-terminal.

func (*Productions) SelectMatch added in v0.10.5

func (p *Productions) SelectMatch(pred generic.Predicate1[*Production]) *Productions

SelectMatch selects a subset of production rules that satisfy the given predicate. It returns a new set of production rules containing the matching productions, of the same type as the original set of production rules.

func (*Productions) String added in v0.10.5

func (p *Productions) String() string

String returns a string representation of production rules.

type String

type String[T Symbol] []T

String represent a string of grammar symbols.

func LongestCommonPrefixOf

func LongestCommonPrefixOf(ss ...String[Symbol]) String[Symbol]

LongestCommonPrefixOf computes the longest common prefix of a list of strings. If the input is empty or there is no common prefix, it returns the empty string ε.

func (String[T]) AnyMatch

func (s String[T]) AnyMatch(pred generic.Predicate1[T]) bool

AnyMatch returns true if at least one symbol satisfies the provided predicate.

func (String[T]) Append

func (s String[T]) Append(syms ...T) String[T]

Append appends new symbols to the current string and returns a new string.

func (String[T]) Concat

func (s String[T]) Concat(ss ...String[T]) String[T]

Concat concatenates the current string with one or more strings and returns a new string.

func (String[T]) ContainsSymbol

func (s String[T]) ContainsSymbol(symbol T) bool

ContainsSymbol checks whether a string contains the given symbol.

func (String[T]) Equal added in v0.10.3

func (s String[T]) Equal(rhs String[T]) bool

Equal determines whether or not two strings are the same.

func (String[T]) HasPrefix

func (s String[T]) HasPrefix(prefix String[T]) bool

HasPrefix checks whether a string starts with the given prefix.

func (String[T]) HasSuffix

func (s String[T]) HasSuffix(prefix String[T]) bool

HasSuffix checks whether a string ends with the given suffix.

func (String[Symbol]) NonTerminals

func (s String[Symbol]) NonTerminals() String[NonTerminal]

NonTerminals returns all non-terminal symbols of a string of symbols.

func (String[T]) Prepend added in v0.11.0

func (s String[T]) Prepend(syms ...T) String[T]

Prepend prepends new symbols to the current string and returns a new string.

func (String[T]) String

func (s String[T]) String() string

String returns a string representation of a string of symbols.

func (String[Symbol]) Terminals

func (s String[Symbol]) Terminals() String[Terminal]

Terminals returns all terminal symbols of a string of symbols.

type Symbol

type Symbol interface {
	fmt.Stringer
	generic.Equaler[Symbol]

	Name() string
	IsTerminal() bool
}

Symbol represents a grammar symbol (terminal or non-terminal).

type Terminal

type Terminal string

Terminal represents a terminal symbol. Terminals are the basic symbols from which strings of a language are formed. Token name or token for short are equivalent to terminal.

func (Terminal) Equal added in v0.10.3

func (t Terminal) Equal(rhs Symbol) bool

Equal determines whether or not two terminal symbols are the same.

func (Terminal) IsTerminal

func (t Terminal) IsTerminal() bool

IsTerminal always returns true for terminal symbols.

func (Terminal) Name

func (t Terminal) Name() string

Name returns the name of terminal symbol.

func (Terminal) String

func (t Terminal) String() string

String returns a string representation of a terminal symbol.

type TerminalsAndEmpty

type TerminalsAndEmpty struct {
	Terminals     set.Set[Terminal]
	IncludesEmpty bool
}

TerminalsAndEmpty is the return type for the FIRST function.

It contains:

  • A set of terminals that may appear at the beginning of strings derived from α.
  • A flag indicating whether the empty string ε is included in the FIRST set..

func (*TerminalsAndEmpty) String added in v0.10.1

func (s *TerminalsAndEmpty) String() string

String returns a string representation of the FIRST set.

type TerminalsAndEndmarker

type TerminalsAndEndmarker struct {
	Terminals         set.Set[Terminal]
	IncludesEndmarker bool
}

TerminalsAndEndmarker is the return type for the FOLLOW function.

It contains:

  • A set of terminals that can appear immediately after the given non-terminal.
  • A flag indicating whether the special endmarker symbol is included in the FOLLOW set.

func (*TerminalsAndEndmarker) String added in v0.10.1

func (s *TerminalsAndEndmarker) String() string

String returns a string representation of the FOLLOW set.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL