Documentation
¶
Overview ¶
Package pptx a Parser for pptx
Index ¶
- func FindNameIterTo(r *qxml.Reader, name string, to string) bool
- func MatchNameIterTo(r *qxml.Reader, namePattern string, toPattern string) bool
- func MaxLineLenWithPrefix(s string, prefix []byte) (string, int)
- func ParseRelsMap(f *zip.File, preffix string) (map[string]string, error)
- func StringTobytes(s string) []byte
- type Image
- type PPTx
- func (pp *PPTx) Close() (err error)
- func (pp *PPTx) ExtractImages() ([]Image, error)
- func (pp *PPTx) ExtractSlideTexts(slides ...int) (string, error)
- func (pp *PPTx) ExtractTexts() (string, error)
- func (pp *PPTx) NumSlides() int
- func (pp *PPTx) SetDrawingsNoFmt(v bool)
- func (pp *PPTx) SetOcrInterface(ocr parsers.OCR)
- func (pp *PPTx) SetParseCharts(v bool)
- func (pp *PPTx) SetParseDiagrams(v bool)
- func (pp *PPTx) SetParseImages(v bool)
- func (pp *PPTx) SetPhraseSep(sep string)
- func (pp *PPTx) SetSlideSep(sep string)
- func (pp *PPTx) SetTableColSep(sep string)
- func (pp *PPTx) SetTableRowSep(sep string)
- type Parser
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func FindNameIterTo ¶
FindNameIterTo finds the given name iteratively in the qxml Reader until it reaches the specified end element.
Parameters:
- r: a pointer to the qxml Reader.
- name: the name to search for in the qxml Reader.
- to: the end element name to stop the search.
Returns:
- true if the name is found before reaching the end element, false otherwise.
func MatchNameIterTo ¶
MatchNameIterTo is a function that matches the name pattern and the to pattern iteratively using the given qxml.Reader. It returns true if the name pattern is matched and false if the to pattern is matched or if the end of the reader is reached.
Parameters:
- r: A pointer to a qxml.Reader object
- namePattern: The regular expression pattern to match the name
- toPattern: The regular expression pattern to match the to
Return:
- bool: true if the name pattern is matched, false otherwise
func MaxLineLenWithPrefix ¶
MaxLineLenWithPrefix calculates the maximum line length in a string with a given prefix.
Parameters:
- s: the input string
- prefix: the prefix to add to each line
Returns:
- string: the modified string with the added prefix
- int: the maximum line length (including the prefix)
func ParseRelsMap ¶
ParseRelsMap parses a zip file and returns a mapping of relationship IDs to target strings.
Parameters:
- f: *zip.File object representing the zip file to parse.
- prefix: string prefix used to construct full part name(target string).
Returns:
- map[string]string: a mapping of relationship IDs to target strings.
- error: an error object indicating any error occurred during parsing.
func StringTobytes ¶
StringTobytes converts a string to a byte slice.
It takes a string parameter `s` and returns a byte slice.
This function is implemented using the `unsafe` package to achieve zero cost conversion.
Types ¶
type PPTx ¶
type PPTx struct {
// contains filtered or unexported fields
}
PPTx represents the XML file structure and settings for parsing a pptx file.
func (*PPTx) Close ¶
Close closes the zipReader and OCR client. After extracting the text, please remember to call this method.
func (*PPTx) ExtractImages ¶
ExtractImages extracts images from the pptx file.
Parameters:
- None
Returns:
- []types.Image: a slice of images extracted from the pptx file.
- error: an error if any occurred during the extraction process.
func (*PPTx) ExtractSlideTexts ¶
ExtractSlideTexts extracts the texts from the specified pptx slides(start 1).
It takes in one or more slide numbers as parameters and returns a string containing the extracted texts. The function also returns an error if there is any issue with parsing the slides.
Parameters:
- slides: An integer slice containing the slide numbers to extract texts from.
Returns:
- string: A string containing the extracted texts.
- error: An error object if there is any issue with parsing the slides.
func (*PPTx) ExtractTexts ¶
ExtractTexts extracts the texts from the pptx file.
It iterates through each slide of the pptx file and appends the text content to a strings.Builder object. The extracted texts are then returned as a string. If there is an error encountered during the parsing of a slide, the function returns the extracted texts up to that point, along with the error.
Returns:
- string: The extracted texts from the pptx file.
- error: An error, if any, encountered during the parsing of the slides.
func (*PPTx) SetDrawingsNoFmt ¶
SetDrawingsNoFmt sets drawings text no outline format.
func (*PPTx) SetOcrInterface ¶
SetOcrInterface overrides default ocr interface.
func (*PPTx) SetParseCharts ¶
SetParseCharts parses charts or not. Default is false.
func (*PPTx) SetParseDiagrams ¶
SetParseDiagrams parses diagrams or not. Default is false.
func (*PPTx) SetParseImages ¶
SetParseImages parses images or not. Default is false. When ocr interface is not set, default tesseract-ocr will be used.
func (*PPTx) SetPhraseSep ¶
SetParagraphSep sets phrase separator. Default is " ".
func (*PPTx) SetSlideSep ¶
SetSlideSep sets slide text separator. Default is "-"x100.
func (*PPTx) SetTableColSep ¶
SetTableColSep sets table column separator. Default is "\t".
func (*PPTx) SetTableRowSep ¶
SetTableRowSep sets table row separator. Default is "\n".