Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Crawler ¶
type Crawler struct {
Name string // Name of crawler for easy identification
*CrawlerConfig
}
Crawler crawls the URL fetched from Queue and saves the contents to Models.
Crawler will quit after IdleTimeout when queue is empty
func NNewCrawlers ¶
func NNewCrawlers(n int, namePrefix string, cfg *CrawlerConfig) ([]*Crawler, error)
NNewCrawlers returns N new Crawlers configured with cfg. Crawlers will be named with namePrefix.
func NewCrawler ¶
func NewCrawler(name string, cfg *CrawlerConfig) (*Crawler, error)
NewCrawler return pointer to a new Crawler
type CrawlerConfig ¶
type CrawlerConfig struct {
Queue *queue.UniqueQueue // global queue
Models *models.Models // models to use
BaseURL *url.URL // base URL to crawl
UserAgent string // user-agent to use while crawling
MarkedURLs []string // marked URL to save to model
IgnorePatterns []string // URL pattern to ignore
RequestDelay time.Duration // delay between subsequent requests
IdleTimeout time.Duration // timeout after which crawler quits when queue is empty
Logger *log.Logger // will log to [os.Stdout] when nil and when no PrettyLogger; ONLY log to file if also using PrettyLogger
RetryTimes int // no. of times to retry failed request
FailedRequests map[string]int // map to store failed requests stats
KnownInvalidURLs *InvalidURLCache // known map of invalid URLs
Ctx context.Context // context to quit on SIGINT/SIGTERM
PrettyLogger PrettyLogger // optional logger to write to screen
// contains filtered or unexported fields
}
CrawlerConfig to configure a crawler
type InvalidURLCache ¶
type InvalidURLCache struct {
// contains filtered or unexported fields
}
InvalidURLCache is the cache for invalid URLs
type PrettyLogger ¶ added in v0.8.7
type PrettyLogger interface {
// Log will send message to PrettyLogger instance
// to be written on terminal
Log(string)
// Quit should initiate call to quit PrettyLogger when
// the last crawler have exited
Quit()
}
PrettyLogger interface is used to write scrolling logs to terminal
Click to show internal directories.
Click to hide internal directories.