Documentation
¶
Overview ¶
Go中文分词
Index ¶
- func Join(a []Text) string
- func SegmentsToSlice(segs []Segment, searchMode bool) (output []string)
- func SegmentsToString(segs []Segment, searchMode bool) (output string)
- type Dictionary
- type Segment
- type Segmenter
- func (seg *Segmenter) Dictionary() *Dictionary
- func (seg *Segmenter) InternalSegment(bytes []byte, searchMode bool) []Segment
- func (seg *Segmenter) LoadDefaultDictionary()
- func (seg *Segmenter) LoadDictionary(files ...string) error
- func (seg *Segmenter) LoadDictionaryFromReader(r io.Reader)
- func (seg *Segmenter) Segment(bytes []byte) []Segment
- type Text
- type Token
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func SegmentsToSlice ¶
func SegmentsToString ¶
输出分词结果为字符串
有两种输出模式,以"中华人民共和国"为例
普通模式(searchMode=false)输出一个分词"中华人民共和国/ns " 搜索模式(searchMode=true) 输出普通模式的再细致切分: "中华/nz 人民/n 共和/nz 共和国/ns 人民共和国/nt 中华人民共和国/ns "
搜索模式主要用于给搜索引擎提供尽可能多的关键字,详情请见Token结构体的注释。
Types ¶
type Dictionary ¶
type Dictionary struct {
// contains filtered or unexported fields
}
Dictionary结构体实现了一个字串前缀树,一个分词可能出现在叶子节点也有可能出现在非叶节点
func NewDictionary ¶
func NewDictionary() *Dictionary
type Segmenter ¶
type Segmenter struct {
// contains filtered or unexported fields
}
分词器结构体
func DefaultSegmenter ¶
func DefaultSegmenter() *Segmenter
DefaultSegmenter creates a new Segmenter with the default dictionary loaded
func (*Segmenter) Dictionary ¶
func (seg *Segmenter) Dictionary() *Dictionary
Dictionary returns the dictionary
func (*Segmenter) InternalSegment ¶
func (*Segmenter) LoadDefaultDictionary ¶
func (seg *Segmenter) LoadDefaultDictionary()
LoadDefaultDictionary loads the default dictionary stored in data
func (*Segmenter) LoadDictionary ¶
LoadDictionary loads a dictionary from a file
Multiple dictionary files can be loaded, with filenames separated by ",". "User Dictionary.txt, Common Dictionary.txt" When a participle appears in both the user dictionary and the general dictionary, the user dictionary is used preferentially.
The format of the dictionary is (one line per participle): Word segmentation text Frequency Part of speech
func (*Segmenter) LoadDictionaryFromReader ¶
LoadDictionaryFromReader loads a dictionary from an io.Reader
The format of the dictionary is (one line per participle): Word segmentation text Frequency Part of speech
type Text ¶
type Text []byte
字串类型,可以用来表达
- 一个字元,比如"中"又如"国", 英文的一个字元是一个词
- 一个分词,比如"中国"又如"人口"
- 一段文字,比如"中国有十三亿人口"