coffeemaker

command module

v0.0.5 Latest Latest Go to latest Published: Jun 6, 2024 License: MIT Imports: 13 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/soumitsalman/coffeemaker

Links

Open Source Insights

README ¶

REDDITOR

Import

go get github.com/soumitsalman/go-reddit

Sample Code

import (
	"encoding/json"
	"fmt"
	"os"
	"time"

	ds "github.com/soumitsalman/beansack/sdk"
	"github.com/soumitsalman/go-reddit/collector"
)

func collectAndStore() {
	config :=collector.NewCollectorConfig(localFileStore)
	collector.NewCollector(config).Collect()
}

func localFileStore(contents []ds.Bean) {
	filename := fmt.Sprintf("outputs_REDDIT_%s", time.Now().Format("2006-01-02-15-04-05.json"))
	file, _ := os.Create(filename)
	defer file.Close()
	json.NewEncoder(file).Encode(contents)

}

This has the most common read functions for reddit api

NEWS COLLECTOR

Simple utility for scraping blogs, news articles and sitemaps with more fidelity than some of the default libraries. This is a wrapper on top existing libraries such as

github.com/go-shiori/go-readability
github.com/gocolly/colly/v2

Usage

Get Package:

go get github.com/soumitsalman/newscollector

Import:

import (
	"github.com/soumitsalman/newscollector/loaders"
)

Collecting One-off URLs:

func main() {
	urls := []string{
		"https://kennybrast.medium.com/planning-a-successful-devops-strategy-for-a-fortune-200-enterprise-56304f1e28a8",
		"https://medium.com/@bohane.michael/navigating-risk-in-investment-fbbec34acd5f",
		"https://mymoneychronicles.medium.com/5-underrated-michael-jackson-songs-dfb6f8b08bb9",
		"https://thehackernews.com/2024/02/new-idat-loader-attacks-using.html",
		"https://thehackernews.com/2024/02/microsoft-releases-pyrit-red-teaming.html",
		"https://blogs.scientificamerican.com/at-scientific-american/systems-analysis-look-back-1966-scientific-american-article/",
		"https://www.scientificamerican.com/article/even-twilight-zone-coral-reefs-arent-safe-from-bleaching/",
		"https://www.scientificamerican.com/blog/at-scientific-american/reception-on-capitol-hill-will-celebrate-scientific-americans-cities-issue/",
		"https://blogs.scientificamerican.com/at-scientific-american/reception-on-capitol-hill-will-celebrate-scientific-americans-cities-issue/",
	}

	collector := loaders.NewDefaultWebTextLoader(&loaders.WebLoaderConfig{})()
	for _, url := range urls {
	 	collector.LoadDocument(url)
	}

	for _, article := range collector.ListAll() {
		fmt.Println(article.ToString())
	}
}

Scraping From Sitemaps:

func main() {
	// built-in sitemap scrapper for thehackersnews.com
	// collector := loaders.NewTheHackersNewsSiteLoader(7)
	// built-in scrapper for Medium's sitemap
	// collector := loaders.NewMediumSiteLoader(2)
	// built-in scrapper for YC's hackernews.com topstories.json
	collector := loaders.NewYCHackerNewsSiteLoader(2)
	// the integer value refers to indicating that the collector will collect posts from the last N number days
	collector.LoadSite()

	for _, article := range collector.ListAll() {
		fmt.Println(article.ToString())
	}
}

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
examples
sdk
beansack
beansack/nlp
beansack/store
newscollector
redditor

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL