zipper

package
v0.2.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 18, 2025 License: MIT Imports: 20 Imported by: 0

README

Zip compress and extract

Go Reference

This module provides zip utility functions including scanning central directory for file headers, compress a file or a directory (recursively, with or without root dir unwrapping), decompress/extract all files from a ZIP archive, etc.

package main

import (
	"context"
	"os"

	"github.com/nguyengg/xy3/zipper"
)

func main() {
	// to compress directory "path/to/dir", I must first create the archive file and open for writing.
	archive, _ := os.CreateTemp("", "*.zip")
	defer os.Remove(archive.Name())

	_ = zipper.CompressDir(context.TODO(), "path/to/dir", archive, func(options *zipper.CompressDirOptions) {
		zipper.WithBestCompression(&options.CompressOptions)

		// If "path/to/dir" looks like this:
		//	path/to/dir/test/a.txt
		//	path/to/dir/test/path/b.txt
		//	path/to/dir/test/another/path/c.txt
		//
		// With UnwrapRoot=false (default), archive.zip looks like this:
		//	test/a.txt
		//	test/path/b.txt
		//	test/another/path/c.txt
		//
		// With UnwrapRoot=true (default), archive.zip looks like this:
		//	a.txt
		//	path/b.txt
		//	another/path/c.txt
		//
		// If I'm using xy3 to both compress and extract, generally it's safe to turn UnwrapRoot on because
		// extract will automatically unwrap root for me.
		options.UnwrapRoot = true
	})
	_ = archive.Close()

	// this is how I'd extract the archive in full.
	// the method returns the actual output directory that was created/used so that if there was an error, I can
	// delete output directory to clean up artifacts.
	dir, _ := zipper.Extract(context.TODO(), archive.Name(), ".", func(options *zipper.ExtractOptions) {
		// Generally I want to leave this false so that Extract will create a new directory for me to prevent
		// conflicts. Extract will use the name of the archive ("archive.zip") to create directory such as
		// archive, archive-1, archive-2, etc.
		//
		// If I don't want Extract to create a new directory, pass false here in which case the dir argument
		// must point to a valid directory (Extract can take care of creating this directory for me as well).
		options.UseGivenDirectory = false

		// It is best to leave this setting off (default) as well. The setting is applicable only if the archive
		// actually has a single common root directory like:
		//	test/a.txt
		//	test/path/b.txt
		//	test/another/path/c.txt
		//
		// If that's the case, the output directory would look like this with NoUnwrapRoot=false:
		//	./archive/a.txt
		//	./archive/path/b.txt
		//	./archive/another/path/c.txt
		//
		// If "./archive" already exists, Extract would have used "./archive-1", "./archive-2", etc.
		//
		// If I pass NoUnwrapRoot=true, the output directory would look like this:
		//	./archive/test/a.txt
		//	./archive/test/path/b.txt
		//	./archive/test/another/path/c.txt
		//
		// If I pass both NoUnwrapRoot=true and UseGivenDirectory=true, the output directory would look like:
		//	./test/a.txt
		//	./test/path/b.txt
		//	./test/another/path/c.txt
		//
		// And finally, if I pass NoUnwrapRoot=false and UseGivenDirectory=true, the output directory becomes:
		//	./a.txt
		//	./path/b.txt
		//	./another/path/c.txt
		options.NoUnwrapRoot = false
	})
}

Documentation

Index

Constants

View Source
const (
	// DefaultBufferSize is the default value for [Compressor.BufferSize], which is 32 KiB.
	DefaultBufferSize = 32 * 1024
)

Variables

View Source
var ErrNoEOCDFound = errors.New("end of central directory not found; most likely not a zip file")

ErrNoEOCDFound is returned by NewCDScanner if no EOCD was found.

Functions

func CompressDir

func CompressDir(ctx context.Context, dir string, dst io.Writer, optFns ...func(*CompressDirOptions)) error

CompressDir compresses a directory recursively to the archive opened as io.Writer.

See CompressDirOptions for customisation options. For example, if the directory (specified by "dir" argument) is "my-dir" and contains:

my-dir/a.txt
my-dir/path/b.txt
my-dir/another/path/c.txt

By default, the archive content looks like this:

my-dir/a.txt
my-dir/path/b.txt
my-dir/another/path/c.txt

If [CompressDirOptions.UnwrapRoot] is true, the archive content looks like this:

a.txt
path/b.txt
another/path/c.txt

If [CompressDirOptions.WriteDir] is true and [CompressDirOptions.UnwrapRoot] is false, the archive content become:

my-dir/
my-dir/a.txt
my-dir/path/
my-dir/path/b.txt
my-dir/another/
my-dir/another/path/
my-dir/another/path/c.txt

If both [CompressDirOptions.WriteDir] and [CompressDirOptions.UnwrapRoot] are true, the archive content become:

a.txt
path/
path/b.txt
another/
another/path/
another/path/c.txt

This function is a wrapper around [DefaultZipper.CompressDir].

func CompressFile

func CompressFile(ctx context.Context, name string, dst io.Writer, optFns ...func(*CompressOptions)) error

CompressFile compresses a single file to the archive opened as io.Writer.

func CountDirContents

func CountDirContents(ctx context.Context, root string) (n int, size int64, err error)

CountDirContents uses WalkRegularFiles to count all regular files and returns the total size of those files as well.

func DefaultProgressReporter

func DefaultProgressReporter(src, dst string, written int64, done bool)

DefaultProgressReporter is the default progress reporter that only logs upon a file being successfully added to archive.

func Extract

func Extract(ctx context.Context, src, dir string, optFns ...func(*ExtractOptions)) (string, error)

Extract recursively extracts the named archive to the given parent directory.

Returns the name of the output directory which can be different from the argument "dir".

See ExtractOptions for customisation options. For example, if the archive ("default.zip") has file like this:

test/a.txt
test/path/b.txt
test/another/path/c.txt

Using "my-dir" as the dir argument, if "my-dir/test" already exists, the extracted directory looks like this:

my-dir/test-1/a.txt
my-dir/test-1/path/b.txt
my-dir/test-1/another/path/c.txt

If "my-dir/test" didn't exist, the extracted directory looks like this:

my-dir/test/a.txt
my-dir/test/path/b.txt
my-dir/test/another/path/c.txt

If the content of the archive ("no-root.zip") did not have a common top-level directory ("root" directory) like this:

a.txt
path/b.txt
another/path/c.txt

The name of the archive ("no-root.zip") will be used to create a new directory:

my-dir/no-root/a.txt
my-dir/no-root/path/b.txt
my-dir/no-root/another/path/c.txt

If "my-archive" already exists, "my-archive-1", "my-archive-2", etc. will be created.

If [ExtractOptions.UseGivenDirectory] is true, the dir argument is used as the root directory to extract files to. Extract is able to create dir if it didn't exist as a directory prior to invocation.

If [ExtractOptions.NoUnwrapRoot] is true, the common root directory in archive will be created in the extracted directory. This flag is only meaningful if all files in the archive content are under one common top-level directory ("root" directory). For example, the "no-root.zip" example above has no common root because a.txt exists at the top level while b.txt and c.txt shares no common path.

Using "default.zip" example, if [ExtractOptions.NoUnwrapRoot] is true and [ExtractOptions.UseGivenDirectory] is true, the extracted directory for would become:

my-dir/test/a.txt
my-dir/test/path/b.txt
my-dir/test/another/path/c.txt

If [ExtractOptions.NoUnwrapRoot] is true and [ExtractOptions.UseGivenDirectory] is false, however, the extracted directory becomes:

my-dir/default/a.txt
my-dir/default/path/b.txt
my-dir/default/another/path/c.txt

In other words, because [ExtractOptions.UseGivenDirectory] is false, "default" (or "default-1", "default-2") was created as the output directory. So long as [ExtractOptions.UseGivenDirectory] is false, the default settings will always try to extract to a newly created directory to avoid conflicts.

Note: the definition of root is limited to only the top-level directory. Even if the archive may have a longer common root, in this example the archive is still considered to have only "test" as the common root:

test/path/to/a.txt
test/path/to/b.txt
test/path/to/c.txt

This is because most users will compress a directory named "test" wishing to retain the directory structure inside "test", but when extracting they don't necessarily want "test" to exist.

func NewWriterWithDeflateLevel

func NewWriterWithDeflateLevel(level int) func(w io.Writer) *zip.Writer

NewWriterWithDeflateLevel is a [CompressOptions.NewWriter] option.

See flate.NewWriter on the acceptable level, for example flate.BestCompression.

func NoOpProgressReporter

func NoOpProgressReporter(src, dst string, written int64, done bool)

NoOpProgressReporter can be used to turn off progress reporting.

func WalkRegularFiles

func WalkRegularFiles(ctx context.Context, root string, fn func(path string, d fs.DirEntry) error) error

WalkRegularFiles is a specialisation of filepath.WalkDir that applies the callback only to regular files.

This is the same method that Compressor.CompressDir will use to compress files.

func WithBestCompression

func WithBestCompression(options *CompressOptions)

WithBestCompression uses a zip.Writer that registers flate.BestCompression as its compressor.

func WithNoCompression

func WithNoCompression(options *CompressOptions)

WithNoCompression uses a zip.Writer that registers flate.NoCompression as its compressor.

Types

type CDFileHeader added in v0.1.7

type CDFileHeader struct {
	zip.FileHeader

	// DiskNumber is the disk number where file starts.
	//
	// Since floppy disks aren't a thing anymore, this field is most likely unused.
	DiskNumber uint16

	// Offset is the relative offset of local file header.
	//
	// This is the number of bytes between the start of the first disk on which the file occurs, and the start of
	// the local file header.
	//
	// See https://en.wikipedia.org/wiki/ZIP_(file_format)#Central_directory_file_header_(CDFH).
	Offset uint64
}

CDFileHeader extends zip.FileHeader with additional information from the central directory.

type CDScanner added in v0.1.6

type CDScanner interface {
	// RecordCount returns the total number of records.
	RecordCount() int
	// Err returns the last non-error encountered.
	Err() error
	// Next returns the next zip file header.
	//
	// The boolean return value is false if there is no more file header to go over, or if there was an error.
	//
	// Don't mix Next and All as they use the same underlying io.ReadSeeker src.
	Next() (bool, CDFileHeader)
	// All returns the remaining file headers as an iterator.
	//
	// Don't mix Next and All as they use the same underlying io.ReadSeeker src.
	All() iter.Seq[CDFileHeader]
}

CDScanner provides methods to scan a zip file's central directory for information.

CDScanner is not safe for use across multiple goroutine.

func NewCDScanner added in v0.1.6

func NewCDScanner(src io.ReadSeeker, size int64) (CDScanner, error)

NewCDScanner reads from the given src to extract the zip.FileHeader records from the central directory.

Returns an iterator over the zip.FileHeader entries and the expected record count. Any error will stop the iterator. If the src is not a zip file (due to missing end of central directory signature), the first and only entry will be `nil, ErrNoEOCDFound`.

type CompressDirOptions

type CompressDirOptions struct {
	CompressOptions

	// UnwrapRoot determines whether all compressed files are under a single root directory hierarchy or not.
	UnwrapRoot bool

	// WriteDir will write directory entries to the archive.
	WriteDir bool
}

CompressDirOptions customises CompressDir.

type CompressOptions

type CompressOptions struct {
	// ProgressReporter controls how progress is reported.
	//
	// By default, DefaultProgressReporter is used.
	ProgressReporter ProgressReporter

	// BufferSize is the length of the buffer being used for copying/adding files to the archive.
	//
	// BufferSize indirectly controls how frequently ProgressReporter is called; after each copy is done,
	// ProgressReporter is called once.
	//
	// Default to DefaultBufferSize.
	BufferSize int

	// NewWriter allows customization of the zip.Writer being used.
	//
	// Default to a [zip.NewWriter].
	NewWriter func(w io.Writer) *zip.Writer
}

CompressOptions customises CompressFile.

type ExtractOptions

type ExtractOptions struct {
	// ProgressReporter controls how progress is reported.
	//
	// By default, DefaultProgressReporter is used.
	ProgressReporter ProgressReporter

	// BufferSize is the length of the buffer being used for copying/adding files to the archive.
	//
	// BufferSize indirectly controls how frequently ProgressReporter is called; after each copy is done,
	// ProgressReporter is called once.
	//
	// Default to DefaultBufferSize.
	BufferSize int

	// UseGivenDirectory will extract files directly to the dir argument passed to Extract.
	//
	// See Extract for more information on the interaction between UseGivenDirectory and NoUnwrapRoot.
	UseGivenDirectory bool

	// NoUnwrapRoot turns off root unwrapping feature.
	//
	// See Extract for more information on the interaction between UseGivenDirectory and NoUnwrapRoot.
	NoUnwrapRoot bool

	// NoOverwrite will ignore files that already exist at the target directory.
	//
	// By default, Extract will overwrite existing files. If NoOverwrite is true, those files will be skipped.
	NoOverwrite bool
}

ExtractOptions is an opaque struct for customising Extract.

type ProgressReporter

type ProgressReporter func(src, dst string, written int64, done bool)

ProgressReporter is called to provide update on compressing individual files or extracting from an archive.

If being used with CompressFile or CompressDir:

  • src: path of the file being added to the archive
  • dst: path of the file in the archive
  • written: number of bytes of the file specified by src that has been read and written to dst so far
  • done: is true only when the file has been read and written in its entirety

If being used with Extract:

  • src: path of the file in archive being extracted
  • dst: path (relative to output directory) of the file on filesystem
  • written: number of bytes of the file specified by src that has been read and written to dst so far
  • done: is true only when the file has been read and written in its entirety

The method will be called at least once for every file being processed. If the file is small enough to fit into one read (see DefaultBufferSize), then the method is called exactly once with `done` being true.

func NewDirectoryProgressReporter

func NewDirectoryProgressReporter(ctx context.Context, root string, reporter func(src, dst string, written, size int64, done bool, wc, fc int)) (ProgressReporter, error)

NewDirectoryProgressReporter creates a progress reporter intended to be used for compressing a directory.

Specifically, the new progress reporter is aware of how many files are there to be compressed by doing a preflight filepath.WalkDir (also cancellable), and for each file being compressed, the reporter is aware of the total number of bytes for that file. If the initial filepath.WalkDir fails, its error wil be returned.

  • src: path of the file being added to the archive
  • dst: relative path of the file in the archive
  • written: number of bytes of the file that has been read and written to archive so far
  • size: the total number of bytes of the file being compressed. Can be -1 if os.Stat fails.
  • done: is true only when the file has been read and written in its entirety (written==size)
  • wc: the number of files that has been written to archive so far
  • fc: the total number of files to be written to archive

func NewProgressBarReporter

func NewProgressBarReporter(ctx context.Context, root string, bar *progressbar.ProgressBar) (ProgressReporter, error)

NewProgressBarReporter creates a progress report that uses the specified progressbar.ProgressBar.

If the given progress bar is nil, it will be created with progressbar.DefaultBytes.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL