Documentation
¶
Overview ¶
Package span implements common functions.
Copyright 2015 by Leipzig University Library, http://ub.uni-leipzig.de The Finc Authors, http://finc.info Martin Czygan, <martin.czygan@uni-leipzig.de>
This file is part of some open source application.
Some open source application is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
Some open source application is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with Foobar. If not, see <http://www.gnu.org/licenses/>.
@license GPL-3.0+ <http://spdx.org/licenses/GPL-3.0+>
Index ¶
- Constants
- Variables
- func DetectLang3(text string) (string, error)
- func LanguageIdentifier(s string) string
- func LoadSet(r io.Reader, m map[string]struct{}) error
- func ReadLines(filename string) (lines []string, err error)
- func UnescapeTrim(s string) string
- func UnfreezeFilterConfig(frozenfile string) (dir string, blob string, err error)
- type ArrayFlags
- type FileReader
- type LinkReader
- type SavedLink
- type SavedReaders
- type Skip
- type SkipReader
- type WriteCounter
- type ZipContentReader
- type ZipOrPlainLinkReader
Constants ¶
const ( // AppVersion of span package. Commandline tools will show this on -v. AppVersion = "0.1.240" // KeyLengthLimit was a limit imposed by the memcached protocol, which // was used for blob storage until Q1 2017. We switched the key-value // store, so this limit is somewhat obsolete. KeyLengthLimit = 250 )
Variables ¶
var ISO639BibliographicToThree = map[string]string{
"alb": "sqi",
"arm": "hye",
"baq": "eus",
"bur": "mya",
"chi": "zho",
"cze": "ces",
"dut": "nld",
"fre": "fra",
"geo": "kat",
"ger": "deu",
"gre": "ell",
"ice": "isl",
"mac": "mkd",
"mao": "mri",
"may": "msa",
"per": "fas",
"rum": "ron",
"slo": "slk",
"tib": "bod",
"wel": "cym",
}
ISO639BibliographicToThree maps 639-2 identifier of the bibliographic applications to three-letter 639-3 identifier.
var ISO639NameToThree = map[string]string{}/* 7849 elements not displayed */
ISO639NameToThree maps a language name to three letter identifier.
var ISO639NameToThreeLower = map[string]string{}/* 7850 elements not displayed */
var ISO639OneToThree = map[string]string{}/* 184 elements not displayed */
ISO639OneToThree maps 639-1 identifier (two letters) (if there is one) to a three-letter 639-3 identifier.
var ISSNPattern = regexp.MustCompile(`[0-9]{4,4}-[0-9]{3,3}[0-9X]`)
ISSNPattern is a regular expression matching standard ISSN.
Functions ¶
func DetectLang3 ¶
DetectLang3 returns the best guess 3-letter language code for a given text.
func LanguageIdentifier ¶ added in v0.1.130
LanguageIdentifier returns the three letter identifier from any string. All data from http://www-01.sil.org/iso639-3/codes.asp.
func LoadSet ¶ added in v0.1.130
LoadSet reads the content of from a reader and creates a set from each line.
func ReadLines ¶ added in v0.1.130
ReadLines returns a list of trimmed lines in a file. Empty lines are skipped.
func UnescapeTrim ¶
UnescapeTrim unescapes HTML character references and trims the space of a given string.
func UnfreezeFilterConfig ¶ added in v0.1.130
UnfreezeFilterConfig takes the name of a zipfile (from span-freeze) and returns of the path the thawed filterconfig (along with the temporary directory and error). When this function returns, all URLs in the filterconfig have then been replaced by absolute path on the file system. Cleanup of temporary directory is responsibility of caller.
Types ¶
type ArrayFlags ¶ added in v0.1.130
type ArrayFlags []string
ArrayFlags allows to store lists of flag values.
func (*ArrayFlags) Set ¶ added in v0.1.130
func (f *ArrayFlags) Set(value string) error
Set appends a value.
func (*ArrayFlags) String ¶ added in v0.1.130
func (f *ArrayFlags) String() string
String representation.
type FileReader ¶ added in v0.1.130
type FileReader struct { Filename string // contains filtered or unexported fields }
FileReader creates a ReadCloser from a filename. If postpones error handling up until the first read. TODO(miku): Throw this out.
func (*FileReader) Close ¶ added in v0.1.130
func (r *FileReader) Close() (err error)
Close closes the file.
type LinkReader ¶ added in v0.1.130
type LinkReader struct { Link string // contains filtered or unexported fields }
LinkReader implements io.Reader for a URL.
type SavedLink ¶ added in v0.1.130
type SavedLink struct { Link string // contains filtered or unexported fields }
SavedLink saves the content of a URL to a file.
type SavedReaders ¶ added in v0.1.130
SavedReaders takes a list of readers and persists their content in a temporary file.
func (*SavedReaders) Remove ¶ added in v0.1.130
func (r *SavedReaders) Remove()
Remove remove any left over temporary file.
func (*SavedReaders) Save ¶ added in v0.1.130
func (r *SavedReaders) Save() (filename string, err error)
Save saves all readers to a temporary file and returns the filename.
type SkipReader ¶ added in v0.1.130
type SkipReader struct { CommentPrefixes []string // contains filtered or unexported fields }
SkipReader skips empty lines and lines with comments.
func NewSkipReader ¶ added in v0.1.130
func NewSkipReader(r *bufio.Reader) *SkipReader
NewSkipReader creates a new SkipReader.
func (SkipReader) ReadString ¶ added in v0.1.130
func (r SkipReader) ReadString(delim byte) (s string, err error)
ReadString will return only non-empty lines and lines not starting with a comment prefix.
type WriteCounter ¶ added in v0.1.130
type WriteCounter struct {
// contains filtered or unexported fields
}
WriteCounter counts the number of bytes written through it.
func (*WriteCounter) Count ¶ added in v0.1.130
func (w *WriteCounter) Count() uint64
Count returns the number of bytes written.
type ZipContentReader ¶ added in v0.1.130
type ZipContentReader struct { Filename string // contains filtered or unexported fields }
ZipContentReader returns the concatenated content of all files in a zip archive given by its filename. All content is temporarily stored in memory, so this type should only be used with smaller archives.
type ZipOrPlainLinkReader ¶ added in v0.1.130
type ZipOrPlainLinkReader struct { Link string // contains filtered or unexported fields }
ZipOrPlainLinkReader is a reader that transparently handles zipped and uncompressed content, given a URL as string.
Directories
¶
Path | Synopsis |
---|---|
cmd
|
|
span-check
span-check runs quality checks on input data
|
span-check runs quality checks on input data |
span-compare
WIP: move siskin:bin/indexcompare into a tool, factor out solr stuff into solrutil.go.
|
WIP: move siskin:bin/indexcompare into a tool, factor out solr stuff into solrutil.go. |
span-crossref-snapshot
Given as single file with crossref works API messages, create a potentially smaller file, which contains only the most recent version of each document.
|
Given as single file with crossref works API messages, create a potentially smaller file, which contains only the most recent version of each document. |
span-export
span-export creates various destination formats, mostly for SOLR.
|
span-export creates various destination formats, mostly for SOLR. |
span-freeze
Freeze file containing urls along with the content of all urls.
|
Freeze file containing urls along with the content of all urls. |
span-import
span-reshape is a dumbed down span-import.
|
span-reshape is a dumbed down span-import. |
span-join-assets
span-join-assets combines a directory of json or single column TSV configurations into a single file.
|
span-join-assets combines a directory of json or single column TSV configurations into a single file. |
span-local-data
The span-local-data extracts data from a JSON file - something `jq` can do just as well, albeit a bit slower.
|
The span-local-data extracts data from a JSON file - something `jq` can do just as well, albeit a bit slower. |
span-oa-filter
span-oa-filter will set x.oa to true, if the given KBART file validates a record.
|
span-oa-filter will set x.oa to true, if the given KBART file validates a record. |
span-redact
Redact intermediate schema, that is set fulltext field to the empty string.
|
Redact intermediate schema, that is set fulltext field to the empty string. |
span-review
span-review runs plausibility queries against a SOLR server, mostly facet queries, refs #12756.
|
span-review runs plausibility queries against a SOLR server, mostly facet queries, refs #12756. |
span-tag
span-tag takes an intermediate schema file and a configuration forest of filters for various tags and runs all filters on every record of the input to produce a stream of tagged records.
|
span-tag takes an intermediate schema file and a configuration forest of filters for various tags and runs all filters on every record of the input to produce a stream of tagged records. |
span-update-labels
span-update-labels takes a TSV of IDs and ISILs and updates an intermediate schema record x.labels field accordingly.
|
span-update-labels takes a TSV of IDs and ISILs and updates an intermediate schema record x.labels field accordingly. |
Package sets implements basic set types.
|
Package sets implements basic set types. |
encoding
|
|
csv
Package csv implements a decoder, that supports CSV decoding.
|
Package csv implements a decoder, that supports CSV decoding. |
formeta
Package formeta implements marshaling for formeta (metafacture internal format).
|
Package formeta implements marshaling for formeta (metafacture internal format). |
tsv
Package tsv implements a decoder for tab separated data.
|
Package tsv implements a decoder for tab separated data. |
Package filter implements flexible ISIL attachments with expression trees[1], serialized as JSON.
|
Package filter implements flexible ISIL attachments with expression trees[1], serialized as JSON. |
formats
|
|
doaj
Package doaj maps DOAJ metadata to intermediate schema.
|
Package doaj maps DOAJ metadata to intermediate schema. |
dummy
Package dummy is just a minimal example.
|
Package dummy is just a minimal example. |
elsevier
TODO.
|
TODO. |
jstor
TODO.
|
TODO. |
Package licensing implements support for KBART and ISIL attachments.
|
Package licensing implements support for KBART and ISIL attachments. |
kbart
Package kbart implements support for KBART (Knowledge Bases And Related Tools working group, http://www.uksg.org/kbart/) holding files (http://www.uksg.org/kbart/s5/guidelines/data_format).
|
Package kbart implements support for KBART (Knowledge Bases And Related Tools working group, http://www.uksg.org/kbart/) holding files (http://www.uksg.org/kbart/s5/guidelines/data_format). |
Package parallel implements helpers for fast processing of line oriented inputs.
|
Package parallel implements helpers for fast processing of line oriented inputs. |
Package quality implements quality checks.
|
Package quality implements quality checks. |
Package solrutil implements helpers to access a SOLR index.
|
Package solrutil implements helpers to access a SOLR index. |