hub

package
v0.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 8, 2025 License: Apache-2.0 Imports: 24 Imported by: 3

README

hub package

Downloads HuggingFace Hub files, a port of huggingFace_hub python library to Go.

Introduction

A simple, straight-forward port of github.com/huggingface/huggingface_hub library for Go.

Features supported:

  • Cache system that matches HuggingFace Hub, so the same cache can be shared with Python.
  • Concurrency safe: only one download when multiple workers are trying to download simultaneously the same model.
  • Allow arbitrary progress function to be called (for progress bar).
  • Arbitrary revision.
  • Parallel download of files, max=20 by default.

TODOs:

  • Add support for optional parameters.
  • Authentication tokens: should be relatively easy.
  • Resume downloads from interrupted connections.
  • Check disk-space before starting to download.

Example

Enumerate files from a HuggingFace repository and download all of them to a cache.

	repo := hub.New(modelID).WithAuth(hfAuthToken)
	var fileNames []string
	for fileName, err := range repo.IterFileNames() {
		if err != nil { panic(err) }
		fmt.Printf("\t%s\n", fileName)
		fileNames = append(fileNames, fileName)
	}
	downloadedFiles, err := repo.DownloadFiles(fileNames...)
	if err != nil { ... }

Documentation

Overview

Package hub can be used to download and cache files from HuggingFace Hub, which may be models, tokenizers or anything.

It is meant to be a port of huggingFace_hub python library to Go, and be able to share the same cache structure (usually under "~/.cache/huggingface/hub").

It is also safe to be used concurrently by multiple programs -- it uses file system lock to control concurrency.

Typical usage will be something like:

repo := hub.New(modelID).WithAuth(hfAuthToken)
var fileNames []string
for fileName, err := range repo.IterFileNames() {
	if err != nil { panic(err) }
	fmt.Printf("\t%s\n", fileName)
	fileNames = append(fileNames, fileName)
}
downloadedFiles, err := repo.DownloadFiles(fileNames...)
if err != nil { ... }

From here, downloadedFiles will point to files in the local cache that one can read.

Environment variables:

- HF_ENDPOINT: Where to connect to huggingface, default is https://huggingface.co - XDG_CACHE_HOME: Cache directory, defaults to ${HOME}/.cache

Index

Constants

View Source
const (
	HeaderXRepoCommit = "X-Repo-Commit"
	HeaderXLinkedETag = "X-Linked-Etag"
	HeaderXLinkedSize = "X-Linked-Size"
)
View Source
const RepoIdSeparator = "--"

RepoIdSeparator is used to separate repository/model names parts when mapping to file names. Likely only for internal use.

Variables

View Source
var (
	// DefaultDirCreationPerm is used when creating new cache subdirectories.
	DefaultDirCreationPerm = os.FileMode(0755)

	// DefaultFileCreationPerm is used when creating files inside the cache subdirectories.
	DefaultFileCreationPerm = os.FileMode(0644)
)
View Source
var SessionId string

SessionId is unique and always created anew at the start of the program, and used during the life of the program.

Functions

func DefaultCacheDir

func DefaultCacheDir() string

DefaultCacheDir for HuggingFace Hub, same used by the python library.

Its prefix is either `${XDG_CACHE_HOME}` if set, or `~/.cache` otherwise. Followed by `/huggingface/hub/`. So typically: `~/.cache/huggingface/hub/`.

func DefaultHttpUserAgent

func DefaultHttpUserAgent() string

DefaultHttpUserAgent returns a user agent to use with HuggingFace Hub API.

Types

type FileInfo

type FileInfo struct {
	Name string `json:"rfilename"`
}

FileInfo represents one of the model file, in the Info structure.

type Repo

type Repo struct {
	// ID of the Repo may include owner/model. E.g.: google/gemma-2-2b-it
	ID string

	// Verbosity: 0 for quiet operation; 1 for information about progress; 2 and higher for debugging.
	Verbosity int

	// MaxParallelDownload indicates how many files to download at the same time. Default is 20.
	// If set to <= 0 it will download all files in parallel.
	// Set to 1 to make downloads sequential.
	MaxParallelDownload int
	// contains filtered or unexported fields
}

Repo from which one wants to download files. Create it with New.

func New

func New(id string) *Repo

New creates a reference to a HuggingFace model given its id.

It uses the default cache directory in ${XDG_CACHE_HOME} (if set) or `~/.cache`, in a format that is shared with huggingface-hub for python library. The cache is share across various programs, including Python programs. Use Repo.WithCacheDir to change it, or NewWithDir to use a plain directory structure, that is not shared across programs.

The id typically include owner/model. E.g.: "google/gemma-2-2b-it"

It defaults to being a RepoTypeModel repository. But you can change it with Repo.WithType.

If authentication is needed, use Repo.WithAuth.

func (*Repo) DownloadFile

func (r *Repo) DownloadFile(file string) (downloadedPath string, err error)

DownloadFile is a shortcut to DownloadFiles with only one file.

func (*Repo) DownloadFiles

func (r *Repo) DownloadFiles(repoFiles ...string) (downloadedPaths []string, err error)

DownloadFiles downloads the repository files (the names returned by repo.IterFileNames), and return the path to the downloaded files in the cache structure.

The returned downloadPaths can be read, but shouldn't be modified, since there may be other programs using the same files.

func (*Repo) DownloadInfo

func (r *Repo) DownloadInfo(forceDownload bool) error

DownloadInfo about the model, if it hasn't yet.

It will attempt to use the "_info_.json" file in the cache directory first.

If forceDownload is set to true, it ignores the current info or the cached one, and download it again from HuggingFace.

See Repo.Info to access the Info directory. Most users don't need to call this directly, instead use the various iterators.

func (*Repo) FileURL

func (r *Repo) FileURL(fileName string) (string, error)

FileURL returns the URL from which to download the file from HuggingFace.

Usually, not used directly (use DownloadFile instead), but in case someone needs for debugging.

func (*Repo) HasFile

func (r *Repo) HasFile(fileName string) bool

HasFile returns whether the repo has given fileName. Notice fileName is relative to the repository, not in local disk.

If the Repo hasn't downloaded its info yet, it attempts to download it here. If it fails, it simply return false. Call Repo.DownloadInfo to handle errors downloading the info.

func (*Repo) Info

func (r *Repo) Info() *RepoInfo

Info returns the RepoInfo structure about the model. Most users don't need to call this directly, instead use the various iterators.

If it hasn't been downloaded or loaded from the cache yet, it loads it first.

It may return nil if there was an issue with the downloading of the RepoInfo json from HuggingFace. Try DownloadInfo to get an error.

func (*Repo) IterFileNames

func (r *Repo) IterFileNames() iter.Seq2[string, error]

IterFileNames iterate over the file names stored in the repo. It doesn't trigger the downloading of the repo, only of the repo info.

func (*Repo) String

func (r *Repo) String() string

String implements fmt.Stringer.

func (*Repo) WithAuth

func (r *Repo) WithAuth(authToken string) *Repo

WithAuth sets the authentication token to use during downloads.

Setting it to empty ("") is the same as resetting and not using authentication.

func (*Repo) WithCacheDir

func (r *Repo) WithCacheDir(cacheDir string) *Repo

WithCacheDir sets the cacheDir to the given directory.

The default is given by DefaultCacheDir: `${XDG_CACHE_HOME}/huggingface/hub` if set, or `~/.cache/huggingface/hub` otherwise.

func (*Repo) WithDownloadManager

func (r *Repo) WithDownloadManager(manager *downloader.Manager) *Repo

WithDownloadManager sets the downloader.Manager to use for download. This is not needed, one will be created automatically if one is not set. This is useful when downloading multiple Repos simultaneously, to coordinate limits by sharing the download manager.

func (*Repo) WithEndpoint added in v0.1.2

func (r *Repo) WithEndpoint(endpoint string) *Repo

WithEndpoint sets the HuggingFace endpoint to use. Default is "https://huggingface.co" or, if set, the environment variable HF_ENDPOINT.

func (*Repo) WithProgressBar

func (r *Repo) WithProgressBar(useProgressBar bool) *Repo

WithProgressBar configures the usage of progress bar during download. Defaults to true.

func (*Repo) WithRevision

func (r *Repo) WithRevision(revision string) *Repo

WithRevision sets the revision to use for this Repo, defaults to "main", but can be set to a commit-hash value.

func (*Repo) WithType

func (r *Repo) WithType(repoType RepoType) *Repo

WithType sets the repository type to use during downloads.

type RepoInfo

type RepoInfo struct {
	ID          string          `json:"id"`
	ModelID     string          `json:"model_id"`
	Author      string          `json:"author"`
	CommitHash  string          `json:"sha"`
	Tags        []string        `json:"tags"`
	Siblings    []*FileInfo     `json:"siblings"`
	SafeTensors SafeTensorsInfo `json:"safetensors"`
}

RepoInfo holds information about a HuggingFace repo, it is the json served when hitting the URL https://huggingface.co/api/<repo_type>/<model_id>

TODO: Not complete, only holding the fields used so far by the library.

type RepoType

type RepoType string

RepoType supported by HuggingFace-Hub

const (
	RepoTypeDataset RepoType = "datasets"
	RepoTypeSpace   RepoType = "spaces"
	RepoTypeModel   RepoType = "models"
)

type SafeTensorsInfo

type SafeTensorsInfo struct {
	Total int

	// Parameters: maps dtype name to int.
	Parameters map[string]int
}

SafeTensorsInfo holds counts on number of parameters of various types.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL