pptx

package
v1.6.6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 3, 2025 License: MIT Imports: 15 Imported by: 0

Documentation

Overview

Package pptx a Parser for pptx

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func FindNameIterTo

func FindNameIterTo(r *qxml.Reader, name string, to string) bool

FindNameIterTo finds the given name iteratively in the qxml Reader until it reaches the specified end element.

Parameters:

  • r: a pointer to the qxml Reader.
  • name: the name to search for in the qxml Reader.
  • to: the end element name to stop the search.

Returns:

  • true if the name is found before reaching the end element, false otherwise.

func MatchNameIterTo

func MatchNameIterTo(r *qxml.Reader, namePattern string, toPattern string) bool

MatchNameIterTo is a function that matches the name pattern and the to pattern iteratively using the given qxml.Reader. It returns true if the name pattern is matched and false if the to pattern is matched or if the end of the reader is reached.

Parameters:

  • r: A pointer to a qxml.Reader object
  • namePattern: The regular expression pattern to match the name
  • toPattern: The regular expression pattern to match the to

Return:

  • bool: true if the name pattern is matched, false otherwise

func MaxLineLenWithPrefix

func MaxLineLenWithPrefix(s string, prefix []byte) (string, int)

MaxLineLenWithPrefix calculates the maximum line length in a string with a given prefix.

Parameters:

  • s: the input string
  • prefix: the prefix to add to each line

Returns:

  • string: the modified string with the added prefix
  • int: the maximum line length (including the prefix)

func ParseRelsMap

func ParseRelsMap(f *zip.File, preffix string) (map[string]string, error)

ParseRelsMap parses a zip file and returns a mapping of relationship IDs to target strings.

Parameters:

  • f: *zip.File object representing the zip file to parse.
  • prefix: string prefix used to construct full part name(target string).

Returns:

  • map[string]string: a mapping of relationship IDs to target strings.
  • error: an error object indicating any error occurred during parsing.

func StringTobytes

func StringTobytes(s string) []byte

StringTobytes converts a string to a byte slice.

It takes a string parameter `s` and returns a byte slice.

This function is implemented using the `unsafe` package to achieve zero cost conversion.

Types

type Image

type Image struct {
	Raw    image.Image
	Name   string
	Format string
}

type PPTx

type PPTx struct {
	// contains filtered or unexported fields
}

PPTx represents the XML file structure and settings for parsing a pptx file.

func NewPPTx

func NewPPTx() *PPTx

func (*PPTx) Close

func (pp *PPTx) Close() (err error)

Close closes the zipReader and OCR client. After extracting the text, please remember to call this method.

func (*PPTx) ExtractImages

func (pp *PPTx) ExtractImages() ([]Image, error)

ExtractImages extracts images from the pptx file.

Parameters:

  • None

Returns:

  • []types.Image: a slice of images extracted from the pptx file.
  • error: an error if any occurred during the extraction process.

func (*PPTx) ExtractSlideTexts

func (pp *PPTx) ExtractSlideTexts(slides ...int) (string, error)

ExtractSlideTexts extracts the texts from the specified pptx slides(start 1).

It takes in one or more slide numbers as parameters and returns a string containing the extracted texts. The function also returns an error if there is any issue with parsing the slides.

Parameters:

  • slides: An integer slice containing the slide numbers to extract texts from.

Returns:

  • string: A string containing the extracted texts.
  • error: An error object if there is any issue with parsing the slides.

func (*PPTx) ExtractTexts

func (pp *PPTx) ExtractTexts() (string, error)

ExtractTexts extracts the texts from the pptx file.

It iterates through each slide of the pptx file and appends the text content to a strings.Builder object. The extracted texts are then returned as a string. If there is an error encountered during the parsing of a slide, the function returns the extracted texts up to that point, along with the error.

Returns:

  • string: The extracted texts from the pptx file.
  • error: An error, if any, encountered during the parsing of the slides.

func (*PPTx) NumSlides

func (pp *PPTx) NumSlides() int

NumSlides returns the number of slides.

func (*PPTx) SetDrawingsNoFmt

func (pp *PPTx) SetDrawingsNoFmt(v bool)

SetDrawingsNoFmt sets drawings text no outline format.

func (*PPTx) SetOcrInterface

func (pp *PPTx) SetOcrInterface(ocr parsers.OCR)

SetOcrInterface overrides default ocr interface.

func (*PPTx) SetParseCharts

func (pp *PPTx) SetParseCharts(v bool)

SetParseCharts parses charts or not. Default is false.

func (*PPTx) SetParseDiagrams

func (pp *PPTx) SetParseDiagrams(v bool)

SetParseDiagrams parses diagrams or not. Default is false.

func (*PPTx) SetParseImages

func (pp *PPTx) SetParseImages(v bool)

SetParseImages parses images or not. Default is false. When ocr interface is not set, default tesseract-ocr will be used.

func (*PPTx) SetPhraseSep

func (pp *PPTx) SetPhraseSep(sep string)

SetParagraphSep sets phrase separator. Default is " ".

func (*PPTx) SetSlideSep

func (pp *PPTx) SetSlideSep(sep string)

SetSlideSep sets slide text separator. Default is "-"x100.

func (*PPTx) SetTableColSep

func (pp *PPTx) SetTableColSep(sep string)

SetTableColSep sets table column separator. Default is "\t".

func (*PPTx) SetTableRowSep

func (pp *PPTx) SetTableRowSep(sep string)

SetTableRowSep sets table row separator. Default is "\n".

type Parser

type Parser struct{}

func (*Parser) Parse

func (p *Parser) Parse(ctx context.Context, reader document.ParserReader, writer io.Writer) error

Parse try to parse a pdf content from a bytes.Reader and write to an io.Writer

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL