hashtags

package module
v0.11.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 21, 2025 License: GPL-3.0 Imports: 17 Imported by: 1

README

HashTags

Golang GoDoc Go Report Issues Size Tag License


Purpose

Sometimes one might want to search and find socalled #hashtags or @mentions in one's texts (in a broader sense) and store them for later retrieval. This package offers that facility. It provides the THashTags class which can be used to parse texts for the occurrence of both #hashtags and @mentions and store the hits in an internal list for later lookup; that list can be stored in a file and later loaded from that file.

Installation

You can use Go to install this package for you:

go get -u github.com/mwat56/hashtags

Usage

For each #hashtag or @mention a list of IDs is maintained. These IDs can be any (int64) data that identifies the text in which the #hashtag or @mention was found, e.g. some database record reference or article ID. The only condition is that it is unique as far as the program using this package is concerned.

Note that both #hashtag and @mention are stored lower-cased to allow for case-insensitive searches.

To get a THashTags instance there's a simple way:

fName := "mytags.lst"
ht, err := hashtags.New(fName, true)
if nil != err {
	log.PrintF("Problem loading file '%s': %v", fName, err)
}

// ...
// do something with the list
// ...

written, err := ht.Store()
if nil != err {
	log.PrintF("Problem writing file '%s': %v", fName, err)
}

The constructor function New() takes two arguments: A string specifying the name of the file to use for loading/storing the list's data, and a bool value indicating whether the list should be thread-safe or not. The setting for the latter depends on the actual use-case.

The package provides a global boolean configuration variable called UseBinaryStorage which is true by default. It determines whether the data written by Store() and read by Load() use plain text (i.e. hashtags.UseBinaryStorage = false) or a binary data format. The advantage of the plain text format is that it can be inspected by any text related tool (like e.g. grep or diff). The advantage of the binary format is that it is about three to four times as fast when loading/storing data and it uses a few bytes less than the text format. For this reasons it's used by default (i.e. hashtags.UseBinaryStorage == true). During development of your own application using this package, however, you might want to change to text format for diagnostic purposes.

For more details please refer to the package documentation.

Methods

There are several kinds of methods provided:

The following methods can be used to handle hashtags:

  • HashAdd(aHash string, aID int64) bool inserts aHash as used by document aID, returning whether anything changed.
  • HashCount() int returns the number of hashtags currently handled.
  • HashLen(aHash string) int returns the number of documents using aHash.
  • HashList(aHash string) []int64 returns a list of all document IDs using aHash.
  • HashRemove(aHash string, aID int64) bool removes the document aID from the aHash list, returning whether anything changed.

The following methods can be used to handle the document IDs of the list entries.

  • IDlist(aID int64) []string returns a list of hashtags and mentions occurring in the document identified by aID.
  • IDparse(aID int64, aText []byte) bool parses the given aText for hashtags and mentions and stores aID in the respective hashtag/mention lists, returning whether anything changed.
  • IDremove(aID int64) bool deletes the given aID from all hashtag/mention lists, returning whether anything changed.
  • IDrename(aOldID, aNewID int64) bool changes the given aOldID to aNewID in the rare case that a document's ID changed, returning whether anything changed.
  • IDupdate(aID int64, aText []byte) bool replaces the current hashtags/mentions stored for aID with those found in aText, returning whether anything changed.

The following methods can be used to handle mentions:

  • MentionAdd(aMention string, aID int64) bool inserts aMention as used by document aID, returning whether anything changed.
  • MentionCount() int returns the number of mentions currently handled.
  • MentionLen(aMention string) int returns the number of documents using aMention.
  • MentionList(aMention string) []int64 returns a list of all document IDs using aMention.
  • MentionRemove(aMention string, aID int64) bool removes the document aID from the aMention list, returning whether anything changed.
Maintenance methods
  • Clear() *THashTags empties the internal data structures: all #hashtags and @mentions are deleted.
  • Filename() string returns the filename given to the initial New() call for reading/storing the list's contents.
  • Len() int returns the current length of the list i.e. how many #hashtags and @mentions are currently stored in the list.
  • LenTotal() int returns the length of all #hashtag/@mention lists and their respective number of source IDs stored in the list.
  • List() TCountList returns a list of #hashtags/@mentions with their respective count of associated IDs.
  • Load() (*THashTags, error) reads the configured file returning the data structure read from the file given with the New() call and a possible error condition.
  • SetFilename(aFilename string) *THashTags sets the filename for loading/storing the hashtags, returning the updated list instance.
  • Store() (int, error) writes the whole list to the configured file returning the number of bytes written and a possible error.
  • String() string returns the whole list as a linefeed separated string.
Basic Usage

Although there are a lot of options (methods) available, basically the module is quite straightforward to use.

  1. Create a new instance:

     myList := hashtags.New("myFile.db", true)
    
  2. Whenever your application receives a new document, retrieve or create it's ID and text, then call

     ok := myList.IDparse(docID, docText)
    

Libraries

The following external libraries were used building HashTags:

Licence

Copyright © 2019, 2024  M.Watermann, 10247 Berlin, Germany
		All rights reserved
	EMail : <support@mwat.de>

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

You should have received a copy of the GNU General Public License along with this program. If not, see the GNU General Public License for details.


GFDL

Documentation

Overview

Copyright © 2023, 2025 M.Watermann, 10247 Berlin, Germany

    All rights reserved
EMail : <support@mwat.de>

Copyright © 2023, 2025 M.Watermann, 10247 Berlin, Germany

    All rights reserved
EMail : <support@mwat.de>

Package `hashtags` implements a simple #hashtag/@mentions handler.

Copyright © 2019, 2024  M.Watermann, 10247 Berlin, Germany
                All rights reserved
            EMail : <support@mwat.de>

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

You should have received a copy of the GNU General Public License along with this program. If not, see the [GNU General Public License](http://www.gnu.org/licenses/gpl.html) for details.

Copyright © 2019, 2025 M.Watermann, 10247 Berlin, Germany

    All rights reserved
EMail : <support@mwat.de>

Copyright © 2019, 2025 M.Watermann, 10247 Berlin, Germany

....All rights reserved
EMail : <support@mwat.de>

Copyright © 2019, 2025 M.Watermann, 10247 Berlin, Germany

    All rights reserved
EMail : <support@mwat.de>

Copyright © 2019, 2025 M.Watermann, 10247 Berlin, Germany

    All rights reserved
EMail : <support@mwat.de>

Index

Constants

View Source
const (
	// `MarkHash` is the first character in a `#hashtag`.
	MarkHash = byte('#')

	// `MarkMention` is the first character in a `@mention`.
	MarkMention = byte('@')
)

Variables

View Source
var (
	// `UseBinaryStorage` determines whether to use binary storage
	// or not (i.e. plain text).
	//
	// Loading/storing binary data is about three times as fast with
	// the `THashTags` data than reading and parsing plain text data.
	UseBinaryStorage = true
)

Functions

func HashMentionRE added in v0.9.5

func HashMentionRE() *regexp.Regexp

`HashMentionRE()` returns a compiled regular expression used to identify `#hashtags` and `@mentions` in a text.

This regular expression matches strings that start with either '@' or '#' followed by any number of characters that are not whitespace.

Returns:

  • `*regexp.Regexp`: A pointer to the compiled regular expression.

Types

type TCountItem

type TCountItem struct {
	Count int    // number of IDs for this `#hashtag`
	Tag   string // name of `#hashtag`
}

`TCountItem` holds a `#hashtag` and its number of occurrences.

func (TCountItem) Compare added in v0.9.6

func (ci TCountItem) Compare(aItem TCountItem) int

`Compare()` compares two `TCountItem` instances based on their tags and counts.

Parameters:

  • `aItem`: The other TCountItem instance to compare with.

Returns:

  • `-1` if the current instance is less than `aItem`.
  • ` 0` if the current instance is equal to `aItem`.
  • `+1` if the current instance is greater than `aItem`.

func (TCountItem) Equal added in v0.9.6

func (ci TCountItem) Equal(aItem TCountItem) bool

`Equal()` checks whether the current `TCountItem` is equal to `aItem`.

Parameters:

  • `aItem`: The other item to compare with.

Returns:

  • `bool`: Whether the two items are equal.

func (TCountItem) Less added in v0.7.0

func (ci TCountItem) Less(aItem TCountItem) bool

`Less()` checks whether this `TCountItem` is less than `aItem`.

Parameters:

  • `aItem`: The other `TCountItem` instance to compare with.

Returns:

  • `bool`: Whether the current instance is less than the other item.

type TCountList added in v0.7.0

type TCountList []TCountItem

`TCountList` is a list of `TCountItems`.

func (TCountList) Compare added in v0.9.6

func (cl TCountList) Compare(aList TCountList) int

`Compare()` compares the current list with another list.

Parameters:

  • `aList`: The list to compare with.

Returns:

  • `-1` if the current instance is less than `aList`.
  • ` 0` if the current instance is equal to `aList`.
  • `+1` if the current instance is greater than `aList`.

func (TCountList) Equal added in v0.9.6

func (cl TCountList) Equal(aList TCountList) bool

`Equal()` compares the current list with another list.

Parameters:

  • `aList`: The list to compare with.

Returns:

  • `bool`: True if the lists are identical, false otherwise.

func (*TCountList) Insert added in v0.9.1

func (cl *TCountList) Insert(aItem TCountItem) *TCountList

`Insert()` appends `aItem` to the list.

Parameters:

  • `aItem`: The source ID to insert into the list.

func (TCountList) Len added in v0.9.6

func (cl TCountList) Len() int

func (TCountList) Less added in v0.9.6

func (cl TCountList) Less(aList TCountList) bool

`Less()` checks whether this `TCountItem` is less than `aList`.

Parameters:

  • `aList`: The other `TCountList` instance to compare with.

Returns:

  • `bool`: Whether the current instance is less than the other item.

func (TCountList) String added in v0.7.0

func (cl TCountList) String() (rStr string)

`String()` returns the list as a linefeed separated string.

(Implements `fmt.Stringer` interface)

Returns:

  • `string`: The string representation of this list.

func (*TCountList) Swap added in v0.7.0

func (cl *TCountList) Swap(aOldIdx, aNewIdx int) *TCountList

`Swap()` swaps the elements at the specified indices in the list.

If the list is empty, or the old and new indices are the same, or the function, or either of the indices is out of bounds, the function returns the list unchanged.

Parameters:

  • `aOldIdx`: The index of the first element to swap.
  • `aNewIdx`: The index of the second element to swap.

Returns:

  • `*TCountList`: A pointer to the list with the swapped elements.

type THashTagError added in v0.11.0

type THashTagError struct {
	Op   string // operation that caused the error (e.g., "Load", "Store")
	Path string // file path involved in the error, if applicable
	Err  error  // underlying error that occurred
}

`THashTagError` is a custom error type that provides detailed error information for hashtag-related operations.

func (*THashTagError) Error added in v0.11.0

func (e *THashTagError) Error() string

`Error()` implements the error interface, returning a formatted error message.

Returns:

  • `string`: A formatted error message containing the operation, path (if any), and underlying error.

func (*THashTagError) Unwrap added in v0.11.0

func (e *THashTagError) Unwrap() error

`Unwrap()` returns the underlying error.

Returns:

  • `error`: The underlying error that caused this `THashTagError`.

type THashTags added in v0.8.0

type THashTags struct {
	// contains filtered or unexported fields
}

`THashTags` is a list of `#hashtags` and `@mentions` pointing to sources (i.e. IDs).

func New

func New(aFilename string, aSafe bool) (*THashTags, error)

`New()` returns a new `THashTags` instance after reading the given file.

NOTE: An empty filename or if the hash file doesn't exist is not considered an error.

If there is an error, it will be of type `*THashTagError`.

Parameters:

  • `aFilename`: The name of the file to use for loading and storing.

Returns:

  • `*THashTags`: The new `THashTags` instance.
  • `error`: If there is an error, it will be from reading `aFilename`.

func (*THashTags) Checksum added in v0.8.0

func (ht *THashTags) Checksum() uint32

`Checksum()` returns the list's CRC32 checksum.

This method can be used to get a kind of 'footprint' of the current contents of the handled data.

Returns:

  • `uint32`: The computed checksum.

func (*THashTags) Clear added in v0.8.0

func (ht *THashTags) Clear() *THashTags

`Clear()` empties the internal data structures: all `#hashtags` and `@mentions` are deleted.

Returns:

  • `*THashTags`: This cleared list.

func (*THashTags) Filename added in v0.8.0

func (ht *THashTags) Filename() string

`Filename()` returns the configured filename for reading/storing this list's contents.

Returns:

  • `string`: The filename for reading/storing this list.

func (*THashTags) HashAdd added in v0.8.0

func (ht *THashTags) HashAdd(aHash string, aID int64) bool

`HashAdd()` appends `aID` to the list of `aHash`.

If `aHash` is empty it is silently ignored (i.e. this method does nothing) returning `false`.

Parameters:

  • `aHash`: The hash list index to use.
  • `aID`: The object to be added to the hash list.

Returns:

  • `bool`: `true` if `aID` was added, or `false` otherwise.

func (*THashTags) HashCount added in v0.8.0

func (ht *THashTags) HashCount() int

`HashCount()` counts the number of hashtags in the list.

Returns:

  • `int`: The number of hashes in the list.

func (*THashTags) HashLen added in v0.8.0

func (ht *THashTags) HashLen(aHash string) int

`HashLen()` returns the number of IDs stored for `aHash`.

If `aHash` is empty it is silently ignored (i.e. this method does nothing), returning `-1`.

Parameters:

  • `aHash`: The list key to lookup.

Returns:

  • `int`: The number of `aHash` in the list.

func (*THashTags) HashList added in v0.8.0

func (ht *THashTags) HashList(aHash string) []int64

`HashList()` returns a list of IDs associated with `aHash`.

If `aHash` is empty it is silently ignored (i.e. this method does nothing), returning an empty slice.

Parameters:

  • `aHash`: The hash to lookup.

Returns:

  • `[]int64`: The number of references of `aHash`.

func (*THashTags) HashRemove added in v0.8.0

func (ht *THashTags) HashRemove(aHash string, aID int64) bool

`HashRemove()` deletes `aID` from the list of `aHash`.

Parameters:

  • `aHash`: The hash to lookup.
  • `aID`: The referenced object to remove from the list.

Returns:

  • `bool`: `true` if `aID` was removed, or `false` otherwise.

func (*THashTags) IDlist added in v0.8.0

func (ht *THashTags) IDlist(aID int64) []string

`IDlist()` returns a list of `#hashtags` and `@mentions` associated with `aID`.

Parameters:

  • `aID`: The referenced object to lookup.

Returns:

  • `[]string`: The list of `#hashtags` and `@mentions` associated with `aID`.

func (*THashTags) IDparse added in v0.8.0

func (ht *THashTags) IDparse(aID int64, aText []byte) bool

`IDparse()` checks whether `aText` associated with `aID` contains strings starting with `[@|#]` and - if found - adds them to the respective list.

If `aText` is empty it is silently ignored (i.e. this method does nothing), returning `false`.

Parameters:

  • `aID`: The ID to add to the list.
  • `aText:` The text to search.

Returns:

  • `bool`: `true` if `aID` was updated from `aText`, or `false` otherwise.

func (*THashTags) IDremove added in v0.8.0

func (ht *THashTags) IDremove(aID int64) bool

`IDremove()` deletes all `#hashtags` and `@mentions` associated with `aID`.

Parameters:

  • `aID`: The ID to be deleted from all lists.

Returns:

  • `bool`: `true` if `aID` was removed, or `false` otherwise.

func (*THashTags) IDrename added in v0.8.0

func (ht *THashTags) IDrename(aOldID, aNewID int64) bool

`IDrename()` replaces all occurrences of `aOldID` by `aNewID`.

If `aOldID` equals `aNewID` they are silently ignored (i.e. this method does nothing), returning `false`.

This method is intended for rare cases when the ID of a document needs to get changed.

Parameters:

  • `aOldID`: The ID to be replaced in all lists.
  • `aNewID`: The replacement in all lists.

Returns:

  • `bool`: `true` if `aOldID` was renamed, or `false` otherwise.

func (*THashTags) IDupdate added in v0.8.0

func (ht *THashTags) IDupdate(aID int64, aText []byte) bool

`IDupdate()` checks `aText` removing all `#hashtags` and `@mentions` no longer present and adding `#hashtags` and `@mentions` new in `aText`.

Parameters:

  • `aID`: The ID to update.
  • `aText`: The new text to use.

Returns:

  • `bool`: `true` if `aID` was updated, or `false` otherwise.

func (*THashTags) Len added in v0.8.0

func (ht *THashTags) Len() int

`Len()` returns the current length of the list i.e. how many `#hashtags` and `@mentions` are currently stored in the list.

Returns:

  • `int`: The number of all `#hashtag` and `@mention` lists.

func (*THashTags) LenTotal added in v0.8.0

func (ht *THashTags) LenTotal() int

`LenTotal()` returns the length of all `#hashtag` and `@mention` lists stored in the list.

Returns:

  • `int`: The total length of all `#hashtag` and `@mention` lists.

func (*THashTags) List added in v0.8.0

func (ht *THashTags) List() TCountList

`List()` returns a list of `#hashtags` and `@mentions` with their respective count of associated IDs.

Returns:

  • `TCountList`: A list of `#hashtags` and `@mentions` with their counts of IDs.

func (*THashTags) Load added in v0.8.0

func (ht *THashTags) Load() (*THashTags, error)

`Load()` reads the configured file returning the data structure read from the file and a possible error condition.

NOTE: An empty filename or the hash file doesn't exist that is not considered an error.

Returns:

  • `*THashTags`: The updated list.
  • `error`: If there is an error, it will be of type `*PathError`.

func (*THashTags) MentionAdd added in v0.8.0

func (ht *THashTags) MentionAdd(aMention string, aID int64) bool

`MentionAdd()` appends `aID` to the list of `aMention`.

If `aMention` is empty it is silently ignored (i.e. this method does nothing) returning `false`.

Parameters:

  • `aMention`: The list index to lookup.
  • `aID`: The ID to be added to the hash list.

Returns:

  • `bool`: `true` if `aID` was added, or `false` otherwise.

func (*THashTags) MentionCount added in v0.8.0

func (ht *THashTags) MentionCount() int

`MentionCount()` returns the number of mentions in the list.

Returns:

  • `int`: The number of mentions in the list.

func (*THashTags) MentionLen added in v0.8.0

func (ht *THashTags) MentionLen(aMention string) int

`MentionLen()` returns the number of IDs stored for `aMention`.

If `aMention` is empty it is silently ignored (i.e. this method does nothing) returning `-1`.

Parameters:

  • `aMention`: Identifies the ID list to lookup.

Returns:

  • `int`: The number of `aMention` in the list.

func (*THashTags) MentionList added in v0.8.0

func (ht *THashTags) MentionList(aMention string) []int64

`MentionList()` returns a list of IDs associated with `aMention`.

If `aMention` is empty it is silently ignored (i.e. this method does nothing), returning an empty slice.

Parameters:

  • `aMention`: The mention to lookup.

Returns:

  • `[]int64`: The number of references of `aMention`.

func (*THashTags) MentionRemove added in v0.8.0

func (ht *THashTags) MentionRemove(aMention string, aID int64) bool

`MentionRemove()` deletes `aID` from the list of `aMention`.

If `aMention` is empty it is silently ignored (i.e. this method does nothing) returning `false`.

Parameters:

  • `aMention`: The mention to lookup.
  • `aID`: The referenced object to remove from the list.

Returns:

  • `bool`: `true` if `aID` was removed, or `false` otherwise.

func (*THashTags) SetFilename added in v0.8.0

func (ht *THashTags) SetFilename(aFilename string) error

`SetFilename()` sets `aFilename` to be used by this list.

Parameters:

  • `aFilename`: The name of the file to use for storage.

Returns:

  • `error`: If there is an error, it will be of type `*HashTagError`.

func (*THashTags) Store added in v0.8.0

func (ht *THashTags) Store() (int, error)

`Store()` writes the whole list to the configured file returning the number of bytes written and a possible error.

If there is an error, it will be of type `*THashTagError`.

Returns:

  • `int`: Number of bytes written to storage.
  • `error`: A possible storage error, or `nil` in case of success.

func (*THashTags) String added in v0.8.0

func (ht *THashTags) String() string

`String()` returns the whole list as a linefeed separated string.

Returns:

  • `string`: The string representation of this hash list.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL