bag

package module

v1.0.1 Latest Latest Go to latest Published: Aug 12, 2024 License: MIT Imports: 6 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/GopherML/bag

Links

Open Source Insights

README ¶

Bag

Bag is a powerful yet user-friendly bag of words (BoW) implementation written in Go, leveraging a Naive Bayes classifier for efficient text analysis. It functions both as a library that can be seamlessly integrated into Go code and as an accessible command line tool. This dual functionality allows users to leverage bag of words capabilities directly from the command line, making it accessible from any programming language. The implementation supports a file format that facilitates using bag of words functionality as code, designed for ease of use and flexible integration in various environments.

billboard

What is Bag of Words (BoW)?

The bag of words (BoW) model is a fundamental text representation technique in natural language processing (NLP). In this model, a text (such as a sentence or a document) is represented as an unordered collection of words, disregarding grammar and word order but keeping multiplicity. The key idea is to create a vocabulary of all the unique words in the text corpus and then represent each text by a vector of word frequencies or binary indicators. This vector indicates the presence or absence, or frequency, of each word from the vocabulary within the text. The BoW model is widely used for text classification tasks, including sentiment analysis, due to its simplicity and effectiveness in capturing word occurrences.

Demo

Examples

New

func ExampleNew() {
	var cfg Config
	// Initialize with default values
	exampleBag = New(cfg)
}

NewFromTrainingSet

func ExampleNewFromTrainingSet() {
	var t TrainingSet
	t.Samples = SamplesByLabel{
		"positive": {
			"I love this product, it is amazing!",
			"I am very happy with this.",
			"Very good",
		},

		"negative": {
			"This is the worst thing ever.",
			"I hate this so much.",
			"Not good",
		},
	}

	// Initialize with default values
	exampleBag = NewFromTrainingSet(t)
}

Bag.Train

func ExampleBag_Train() {
	exampleBag.Train("I love this product, it is amazing!", "positive")
	exampleBag.Train("This is the worst thing ever.", "negative")
	exampleBag.Train("I am very happy with this.", "positive")
	exampleBag.Train("I hate this so much.", "negative")
	exampleBag.Train("Not good", "negative")
	exampleBag.Train("Very good", "negative")
}

Bag.GetResults

func ExampleBag_GetResults() {
	exampleResults = exampleBag.GetResults("I am very happy with this product.")
	fmt.Println("Collection of results", exampleResults)
}

Results.GetHighestProbability

func ExampleResults_GetHighestProbability() {
	match := exampleResults.GetHighestProbability()
	fmt.Println("Highest probability", match)
}

TrainingSet File

config:
  ngram-size: 1
samples:
  yes:
    - "yes"
    - "Yeah"
    - "Yep"

  no:
    - "No"
    - "Nope"
    - "Nah"

# Note: This training set is short for the sake of README filesize,
# please look in the examples directory for more complete examples

Road to v1.0.0

Working implementation as Go library
Training sets
Support Character NGrams
Text normalization added to inbound text processing
CLI utility

Long term goals

Generated model as MMAP file

Contributors ✨

Thanks goes to these wonderful people (emoji key):

_{Josh Montoya}
💻 📖

_{Matt Stay}
🎨

_Chewxy
⚠️

_{Jack Muir}
⚠️

This project follows the all-contributors specification. Contributions of any kind welcome!

Documentation ¶

Index ¶

Constants
type Bag
- func (b *Bag) GetResults(in string) (r Results)
- func (b *Bag) Train(in, label string)
type Config
- func (c *Config) Validate() (err error)
type Results
- func (r Results) GetHighestProbability() (match string)
type Samples
type SamplesByLabel
type TrainingSet
type Vocabulary

Constants ¶

View Source

const (
	// DefaultNGramSize is set to 3 (trigram)
	DefaultNGramSize = 3
	// DefaultNGramType is set to word
	DefaultNGramType = "word"
	// DefaultSmoothingParameter is set to 1 (common Laplace smoothing value)
	DefaultSmoothingParameter = 1
)

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Bag ¶

type Bag struct {
	// contains filtered or unexported fields
}

Bag represents a bag of words (BoW) model

func New ¶

func New(c Config) (out *Bag, err error)

New will initialize and return a new Bag with a provided configuration

Example ¶

var (
	cfg Config
	err error
)

// Initialize with default values
if exampleBag, err = New(cfg); err != nil {
	log.Fatal(err)
}

Output:

func NewFromTrainingSet ¶

func NewFromTrainingSet(t TrainingSet) (b *Bag, err error)

NewFromTrainingSet will initialize and return a new pre-trained Bag from a provided training set

Example ¶

var (
	t   TrainingSet
	err error
)

t.Samples = SamplesByLabel{
	"positive": {
		"I love this product, it is amazing!",
		"I am very happy with this.",
		"Very good",
	},

	"negative": {
		"This is the worst thing ever.",
		"I hate this so much.",
		"Not good",
	},
}

// Initialize with default values
if exampleBag, err = NewFromTrainingSet(t); err != nil {
	log.Fatal(err)
}

Output:

func NewFromTrainingSetFile ¶ added in v0.4.1

func NewFromTrainingSetFile(filepath string) (b *Bag, err error)

NewFromTrainingSetFile will initialize and return a new pre-trained Bag from a provided training set filepath

func (*Bag) GetResults ¶

func (b *Bag) GetResults(in string) (r Results)

GetResults will return the classification results for a given input string

Example ¶

exampleResults = exampleBag.GetResults("I am very happy with this product.")
fmt.Println("Collection of results", exampleResults)

Output:

func (*Bag) Train ¶

func (b *Bag) Train(in, label string)

Train will process a given input string and assign it the provided label for training

Example ¶

exampleBag.Train("I love this product, it is amazing!", "positive")
exampleBag.Train("This is the worst thing ever.", "negative")
exampleBag.Train("I am very happy with this.", "positive")
exampleBag.Train("I hate this so much.", "negative")
exampleBag.Train("Not good", "negative")
exampleBag.Train("Very good", "negative")

Output:

type Config ¶

type Config struct {
	// NGramSize represents the NGram size (unigram, bigram, trigram, etc - default is trigram)
	NGramSize int `yaml:"ngram-size"`
	// NGramType represents the NGram type (word or character - default is word)
	NGramType string `yaml:"ngram-type"`
	// SmoothingParameter represents the smoothing value used for the Laplace Smoothing (default is 1)
	SmoothingParameter float64 `yaml:"smoothing-parameter"`
}

func (*Config) Validate ¶

func (c *Config) Validate() (err error)

type Results ¶

type Results map[string]float64

Results (by label) represents the probability of a processed input matching each of the possible labels (classifications)

func (Results) GetHighestProbability ¶

func (r Results) GetHighestProbability() (match string)

Example ¶

match := exampleResults.GetHighestProbability()
fmt.Println("Highest probability", match)

Output:

type Samples ¶

type Samples []string

Samples represents a set of input samples to be used for model training

type SamplesByLabel ¶

type SamplesByLabel map[string]Samples

SamplesByLabel represents sets of samples keyed by label

type TrainingSet ¶

type TrainingSet struct {
	Config `yaml:"config"`

	Samples SamplesByLabel `yaml:"samples"`
}

TrainingSet is used to train a bag of words (BoW) model

type Vocabulary ¶

type Vocabulary map[string]int

Vocabulary represents number of encounters for a given n-gram as string

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
bag-cli

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL