WebCrawler

package
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 10, 2025 License: AGPL-3.0 Imports: 7 Imported by: 0

Documentation

Index

Constants

View Source
const (
	ModeContain = Mode(1) // 包含:字符串
	ModeEqual   = Mode(2) // 相同:字符串
	ModeRegexp  = Mode(3) // 正则表达式
	ModeElTag   = Mode(4) // 元素的类型
)

Variables

View Source
var (
	Site   webSite
	Http   httpGet
	RegExp regExp
)

Functions

This section is empty.

Types

type Filter

type Filter struct {
	Mode  Mode
	Param interface{}
}

type IFindElement

type IFindElement interface {
	// Find 查找方法:从els中找到符合要求的元素
	// 说明:它通常会将找到的el并处理Text字段后放到自己内部的List,同时也会将找到的el按原始数据返回给外部
	Find(els []*WebElement) []*WebElement

	// GetList 取出内部的List
	// 说明:它是执行Find会后得到了一个内部加工过的List,然后可以通过该函数获得这个List
	GetList() []*WebElement
}

IFindElement 它可以被WebSite的FindTextEls函数调用

type Mode

type Mode uint32

type WebElement

type WebElement struct {
	Tag       string
	Selection *goquery.Selection
	Text      string
}

func (*WebElement) Clone

func (e *WebElement) Clone() *WebElement

type WebHead

type WebHead struct {
	Title       string
	Description string
	Keywords    []string
}

type WebHref

type WebHref struct {
	Href  string
	Title string
}

type WebPage

type WebPage struct {
	Host string
	Url  string

	Head   *WebHead
	Hrefs  map[string]*WebHref
	Texts  []*WebElement
	Images []*WebElement
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL