Documentation
¶
Index ¶
- func DetectAll(content []byte) ([]chardet.Result, error)
- func DetectAndConvertToUtf8(content []byte) (convertedContent []byte, charset string, confidence int, converted bool, ...)
- func DetectBest(content []byte) (r *chardet.Result, err error)
- func IsValidBig5(content []byte) bool
- func IsValidGB18030(content []byte) bool
- func IsValidGBK(content []byte) bool
- func IsValidUTF16(content []byte) (isUTF16 bool, BE bool, LE bool)
- func IsValidUTF16BE(content []byte) bool
- func IsValidUTF16LE(content []byte) bool
- func IsValidUTF8(content []byte) bool
- func ToUtf8WithDecoder(content []byte, d encoding.Decoder) ([]byte, error)
- func ToUtf8WithEncoding(content []byte, e encoding.Encoding) ([]byte, error)
- type Result
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func DetectAll ¶
DetectAll returns all chardet.Results which have non-zero Confidence. The Results are sorted by Confidence in descending order
Totally same as saintfish/chardet - chardet.NewTextDetector().DetectAll(content)
func DetectAndConvertToUtf8 ¶
func DetectAndConvertToUtf8(content []byte) (convertedContent []byte, charset string, confidence int, converted bool, err error)
Detect and convert content to UTF-8 encoded content.
func DetectBest ¶
DetectBest returns the chardet.Result with highest Confidence.
Totally same as saintfish/chardet - chardet.NewTextDetector().DetectBest(content)
func IsValidBig5 ¶
Check whether content is valid under Big5 rule, referce: https://zh.wikipedia.org/wiki/Big5
func IsValidGB18030 ¶
Check whether content is valid under GB18030 rule, referce: https://zh.wikipedia.org/wiki/GB_18030
func IsValidGBK ¶
Check whether content is valid under GBK rule, referce: https://zh.wikipedia.org/wiki/GBK
func IsValidUTF16 ¶
Check whether content is valid under UTF-16 rule, reference: https://zh.wikipedia.org/wiki/UTF-16
return: isUTF16 bool, BE bool
BE: true if content is valid under UTF-16 BE rule, false if not BE: true if content is valid under UTF-16 LE rule, false if not
func IsValidUTF16BE ¶
Check whether content is valid under UTF-16-BE rule, reference: https://zh.wikipedia.org/wiki/UTF-16
func IsValidUTF16LE ¶
Check whether content is valid under UTF-16-LE rule, reference: https://zh.wikipedia.org/wiki/UTF-16 This function assume content is little endian and then use CheckIsValidUTF16BE's method to valid content
func IsValidUTF8 ¶
Check whether content is valid under UTF-8 rule
func ToUtf8WithDecoder ¶
get a UTF-8 encoded []byte with encoding.Decoder.
Types ¶
type Result ¶
type Result struct { // IANA name of the detected charset. Charset string // IANA name of the detected language. It may be empty for some charsets. Language string // Confidence of the Result. Scale from 1 to 100. The bigger, the more confident. Confidence int // Encoding of the Result, default encoding.Nop. Encoding encoding.Encoding // Whether the charset can be converted by this package Convertible bool }
Result contains all the information that charset detector gives.
func DetectEncoding ¶
DetectEncoding return the Result with highest Confidence and save matched encoding.Encoing in Result if confidence > 95.