Documentation
¶
Index ¶
- Constants
- func ConnectProfile(c *Config, suffix string) apoco.StreamFunc
- func E(str string) string
- func EachStok(r io.Reader, f func(string, Stok) error) error
- func FilterLex(c *Config) apoco.StreamFunc
- func IDFromFilePath(path, fg string) string
- func UpdateInConfig(dest, val interface{})
- type Config
- type DMConfig
- type LMConfig
- type Model
- type ModelData
- type Piper
- type ProfilerConfig
- type Stok
- type StokCause
- type StokType
- type TrainingConfig
Constants ¶
const ( Cautious string = "cautious" Courageous string = "courageous" Redundant string = "redundant" )
const Epsilon = "ε"
Epsilon is used to mark empty strings and slices in the IO of stoks.
const PStep = "recognition/post-correction"
PStep defines the OCR-D processing step.
const StokComment = "#"
StokComment defines the start of comments.
const StokNamePref = "#name="
StokNamePref defines the line prefix for file names.
const Version = "v0.0.61"
Version defines the version of apoco.
Variables ¶
This section is empty.
Functions ¶
func ConnectProfile ¶ added in v0.0.37
func ConnectProfile(c *Config, suffix string) apoco.StreamFunc
ConnectProfile generates the profile by running the profiler or reads the profile from the cache and connects the profile with the tokens.
func EachStok ¶ added in v0.0.42
EachStok calls the given callback function f for each token read from r with the according name. Stokens are read line by line from the reader, lines starting with # are skipped. If a line starting with '#name=x' is encountered the name for the callback function is updated accordingly.
func FilterLex ¶ added in v0.0.60
func FilterLex(c *Config) apoco.StreamFunc
func IDFromFilePath ¶
IDFromFilePath generates an id based on the file group and the file path.
func UpdateInConfig ¶ added in v0.0.46
func UpdateInConfig(dest, val interface{})
UpdateInConfig updates the value in dest with val if the according value is not the zero-type for the underlying type. Dest must be a pointer type to either string, int, float64 or bool. Otherwise the function panics.
Types ¶
type Config ¶ added in v0.0.29
type Config struct { Model string `json:"model,omitempty"` LM map[string]LMConfig `json:"lm"` Profiler ProfilerConfig `json:"profiler"` RR TrainingConfig `json:"rr"` DM DMConfig `json:"dm"` MS TrainingConfig `json:"ms"` FF TrainingConfig `json:"ff"` Nocr int `json:"nocr"` Cache bool `json:"cache"` GT bool `json:"gt"` AlignLev bool `json:"alignLev"` Lex bool `json:"lex"` }
Config defines the command's configuration.
func ReadConfig ¶ added in v0.0.29
ReadConfig reads the config from a json or toml file. If the name is empty, an empty configuration file is returned. If name has the prefix '{' and the suffix '}' the name is interpreted as a json string and parsed accordingly (OCR-D compability).
type DMConfig ¶ added in v0.0.52
type DMConfig struct { TrainingConfig Filter string `json:"filter"` // cautious, courageous or redundant }
DMConfig encloses settings for dm training.
type LMConfig ¶ added in v0.0.52
type LMConfig struct {
Path string `json:"path"`
}
LMConfig configures the path to a language model csv file.
type Model ¶ added in v0.0.52
type Model struct { Models map[string]map[int]ModelData // Models map the name and nocr to the model data. GlobalHistPatterns map[string]float64 // Historical pattern frequencies from the profiler. GlobalOCRPatterns map[string]float64 // OCR pattern frequencies from the profiler. LM map[string]*apoco.FreqList // Language models. }
Model holds the different models for the different training runs for a different number of OCRs. It is used to save and load the models for the automatic postcorrection.
func ReadModel ¶ added in v0.0.52
ReadModel reads a model from a gob compressed input file. If the given file does not exist, the according language models are loaded and a new model is returned. If create is set to false no new model will be created and the model must be read from an existing file.
func (*Model) Get ¶ added in v0.0.52
Get loads the the model and the according feature set for the given configuration.
type ModelData ¶ added in v0.0.52
type ModelData struct { Features []string // Feature names used to train the model. Model *ml.LR // The trained model. }
ModelData holds a linear regression model.
type ProfilerConfig ¶ added in v0.0.52
ProfilerConfig holds the profiler's configuration values.
type Stok ¶ added in v0.0.21
type Stok struct {
OCR, Sug, GT, ID string
OCRConfs []float64
Conf float64
Rank int
Skipped, Short, Lex, Cor bool
}
Stok represents a stats token. Stat tokens explain the correction decisions of apoco and form the basis of the correction protocols.
func MakeStok ¶ added in v0.0.21
MakeStok creates a new stats token from a according formatted line.
func (Stok) Cause ¶ added in v0.0.27
Cause returns the cause of a correction error. There are 3 possibilities. Either the correction candidate was missing, the correct correction candidate was not selected by the reranker or the correct correction canidate would have been available but could not be selected because of the imposed limit of the number of correction candidates. If the limit smaller or equal to 0, no limit is imposed.
type StokType ¶ added in v0.0.27
type StokType int
StokType gives the type of stoks.
const ( SkippedShort StokType = iota // Skipped short token. SkippedShortErr // Error in short token. SkippedNoCand // Skipped no canidate token. SkippedNoCandErr // Error in skipped no candidate token. SkippedLex // Skipped lexical token. FalseFriend // Error in skipped lexical token (false friend). RedundantCorrection // Redundant correction. InfelicitousCorrection // Infelicitous correction. SuccessfulCorrection // Successful correction. DoNotCareCorrection // Do not care correction. SuspiciousNotReplacedCorrect // Accept OCR. DodgedBullet // Dogded bullet. MissedOpportunity // Missed opportunity. SuspiciousNotReplacedNotCorrectErr // Skipped do not care. )
type TrainingConfig ¶ added in v0.0.52
type TrainingConfig struct { Features []string `json:"features"` LearningRate float64 `json:"learningRate"` Ntrain int `json:"ntrain"` }
TrainingConfig encloses different training settings.