pptx

package

v1.6.6 Latest Latest Go to latest Published: Jul 3, 2025 License: MIT Imports: 15 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/bububa/atomic-agents

Links

Open Source Insights

Documentation ¶

Overview ¶

Package pptx a Parser for pptx

Index ¶

func FindNameIterTo(r *qxml.Reader, name string, to string) bool
func MatchNameIterTo(r *qxml.Reader, namePattern string, toPattern string) bool
func MaxLineLenWithPrefix(s string, prefix []byte) (string, int)
func ParseRelsMap(f *zip.File, preffix string) (map[string]string, error)
func StringTobytes(s string) []byte
type Image
type PPTx
- func NewPPTx() *PPTx
type Parser
- func (p *Parser) Parse(ctx context.Context, reader document.ParserReader, writer io.Writer) error

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func FindNameIterTo ¶

func FindNameIterTo(r *qxml.Reader, name string, to string) bool

FindNameIterTo finds the given name iteratively in the qxml Reader until it reaches the specified end element.

Parameters:

r: a pointer to the qxml Reader.
name: the name to search for in the qxml Reader.
to: the end element name to stop the search.

Returns:

true if the name is found before reaching the end element, false otherwise.

func MatchNameIterTo ¶

func MatchNameIterTo(r *qxml.Reader, namePattern string, toPattern string) bool

MatchNameIterTo is a function that matches the name pattern and the to pattern iteratively using the given qxml.Reader. It returns true if the name pattern is matched and false if the to pattern is matched or if the end of the reader is reached.

Parameters:

r: A pointer to a qxml.Reader object
namePattern: The regular expression pattern to match the name
toPattern: The regular expression pattern to match the to

Return:

bool: true if the name pattern is matched, false otherwise

func MaxLineLenWithPrefix ¶

func MaxLineLenWithPrefix(s string, prefix []byte) (string, int)

MaxLineLenWithPrefix calculates the maximum line length in a string with a given prefix.

Parameters:

s: the input string
prefix: the prefix to add to each line

Returns:

string: the modified string with the added prefix
int: the maximum line length (including the prefix)

func ParseRelsMap ¶

func ParseRelsMap(f *zip.File, preffix string) (map[string]string, error)

ParseRelsMap parses a zip file and returns a mapping of relationship IDs to target strings.

Parameters:

f: *zip.File object representing the zip file to parse.
prefix: string prefix used to construct full part name(target string).

Returns:

map[string]string: a mapping of relationship IDs to target strings.
error: an error object indicating any error occurred during parsing.

func StringTobytes ¶

func StringTobytes(s string) []byte

StringTobytes converts a string to a byte slice.

It takes a string parameter `s` and returns a byte slice.

This function is implemented using the `unsafe` package to achieve zero cost conversion.

Types ¶

type Image ¶

type Image struct {
	Raw    image.Image
	Name   string
	Format string
}

type PPTx ¶

type PPTx struct {
	// contains filtered or unexported fields
}

PPTx represents the XML file structure and settings for parsing a pptx file.

func NewPPTx ¶

func NewPPTx() *PPTx

func (*PPTx) Close ¶

func (pp *PPTx) Close() (err error)

Close closes the zipReader and OCR client. After extracting the text, please remember to call this method.

func (*PPTx) ExtractImages ¶

func (pp *PPTx) ExtractImages() ([]Image, error)

ExtractImages extracts images from the pptx file.

Parameters:

None

Returns:

[]types.Image: a slice of images extracted from the pptx file.
error: an error if any occurred during the extraction process.

func (*PPTx) ExtractSlideTexts ¶

func (pp *PPTx) ExtractSlideTexts(slides ...int) (string, error)

ExtractSlideTexts extracts the texts from the specified pptx slides(start 1).

It takes in one or more slide numbers as parameters and returns a string containing the extracted texts. The function also returns an error if there is any issue with parsing the slides.

Parameters:

slides: An integer slice containing the slide numbers to extract texts from.

Returns:

string: A string containing the extracted texts.
error: An error object if there is any issue with parsing the slides.

func (*PPTx) ExtractTexts ¶

func (pp *PPTx) ExtractTexts() (string, error)

ExtractTexts extracts the texts from the pptx file.

It iterates through each slide of the pptx file and appends the text content to a strings.Builder object. The extracted texts are then returned as a string. If there is an error encountered during the parsing of a slide, the function returns the extracted texts up to that point, along with the error.

Returns:

string: The extracted texts from the pptx file.
error: An error, if any, encountered during the parsing of the slides.

func (*PPTx) NumSlides ¶

func (pp *PPTx) NumSlides() int

NumSlides returns the number of slides.

func (*PPTx) SetDrawingsNoFmt ¶

func (pp *PPTx) SetDrawingsNoFmt(v bool)

SetDrawingsNoFmt sets drawings text no outline format.

func (*PPTx) SetOcrInterface ¶

func (pp *PPTx) SetOcrInterface(ocr parsers.OCR)

SetOcrInterface overrides default ocr interface.

func (*PPTx) SetParseCharts ¶

func (pp *PPTx) SetParseCharts(v bool)

SetParseCharts parses charts or not. Default is false.

func (*PPTx) SetParseDiagrams ¶

func (pp *PPTx) SetParseDiagrams(v bool)

SetParseDiagrams parses diagrams or not. Default is false.

func (*PPTx) SetParseImages ¶

func (pp *PPTx) SetParseImages(v bool)

SetParseImages parses images or not. Default is false. When ocr interface is not set, default tesseract-ocr will be used.

func (*PPTx) SetPhraseSep ¶

func (pp *PPTx) SetPhraseSep(sep string)

SetParagraphSep sets phrase separator. Default is " ".

func (*PPTx) SetSlideSep ¶

func (pp *PPTx) SetSlideSep(sep string)

SetSlideSep sets slide text separator. Default is "-"x100.

func (*PPTx) SetTableColSep ¶

func (pp *PPTx) SetTableColSep(sep string)

SetTableColSep sets table column separator. Default is "\t".

func (*PPTx) SetTableRowSep ¶

func (pp *PPTx) SetTableRowSep(sep string)

SetTableRowSep sets table row separator. Default is "\n".

type Parser ¶

type Parser struct{}

func (*Parser) Parse ¶

func (p *Parser) Parse(ctx context.Context, reader document.ParserReader, writer io.Writer) error

Parse try to parse a pdf content from a bytes.Reader and write to an io.Writer

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL