megaloscope

package module
v0.0.0-...-5847dfa Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 1, 2022 License: AGPL-3.0 Imports: 7 Imported by: 0

README

megaloscope 敏感词识别

思路

1.构建敏感词库
支持单个词,多个词组合(这样更合理,如:澳门、读博、网站 单一个词是不构成敏感句子的)
支持拼音检测,
支持排除规则
2.对于输入源信息,拆分成句子,以句子为单位并行检测
3.使用AC算法检测

调用方法

参见DEMO

规则文件编写说明

采用文本文件格式存放
一行一条规则
一条规则可以是1个词,也可以是多个组合词
组合词之间用+号连接,
在组合词(单个词)后可以跟^号,用于定制排除词,
排除词间用|连接

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type MatchResult

type MatchResult struct {
	Line      string //句子
	MatchRule string //匹配规则
}

type MatchResults

type MatchResults []*MatchResult

type Matcher

type Matcher struct {
	// contains filtered or unexported fields
}

func BuildNewMatcher

func BuildNewMatcher(dictionary []string) *Matcher

func NewMatcher

func NewMatcher() *Matcher

func (*Matcher) Build

func (m *Matcher) Build(dictionary []string)

initialize the ahocorasick

func (*Matcher) GetMatchResultSize

func (m *Matcher) GetMatchResultSize(s string) int

just return the number of len(Match(s))

func (*Matcher) Match

func (m *Matcher) Match(s string) []*Term

string match search return all strings matched as indexes into the original dictionary and their positions on matched string

type Megaloscope

type Megaloscope struct {
	AllWords       WordSlice     //所有词
	AllWordsPY     WordSlice     //所有词的拼音
	AllRules       map[int]*Rule //所有词组
	WordsMatcher   *Matcher
	WordsPYMatcher *Matcher
}

敏感词检测

func NewMegaloscope

func NewMegaloscope(filepath string) *Megaloscope

func (*Megaloscope) Discern

func (m *Megaloscope) Discern(src string, threads int) (ret MatchResults)

Discern 多线程扫描检查一篇文章,是否存在敏感词

type Rule

type Rule struct {
	Raw            string    //规则原始定义
	Words          WordSlice //中文词组合
	ExcludeWords   WordSlice //排除词
	WordsPY        WordSlice //词拼音
	ExcludeWordsPY WordSlice //排除词拼音
}

规则

type Term

type Term struct {
	Index       int
	EndPosition int
}

type WordSlice

type WordSlice []string

词组

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL