Documentation
¶
Overview ¶
Package hub can be used to download and cache files from HuggingFace Hub, which may be models, tokenizers or anything.
It is meant to be a port of huggingFace_hub python library to Go, and be able to share the same cache structure (usually under "~/.cache/huggingface/hub").
It is also safe to be used concurrently by multiple programs -- it uses file system lock to control concurrency.
Typical usage will be something like:
repo := hub.New(modelID).WithAuth(hfAuthToken) var fileNames []string for fileName, err := range repo.IterFileNames() { if err != nil { panic(err) } fmt.Printf("\t%s\n", fileName) fileNames = append(fileNames, fileName) } downloadedFiles, err := repo.DownloadFiles(fileNames...) if err != nil { ... }
From here, downloadedFiles will point to files in the local cache that one can read.
Environment variables:
- HF_ENDPOINT: Where to connect to huggingface, default is https://huggingface.co - XDG_CACHE_HOME: Cache directory, defaults to ${HOME}/.cache
Index ¶
- Constants
- Variables
- func DefaultCacheDir() string
- func DefaultHttpUserAgent() string
- type FileInfo
- type Repo
- func (r *Repo) DownloadFile(file string) (downloadedPath string, err error)
- func (r *Repo) DownloadFiles(repoFiles ...string) (downloadedPaths []string, err error)
- func (r *Repo) DownloadInfo(forceDownload bool) error
- func (r *Repo) FileURL(fileName string) (string, error)
- func (r *Repo) HasFile(fileName string) bool
- func (r *Repo) Info() *RepoInfo
- func (r *Repo) IterFileNames() iter.Seq2[string, error]
- func (r *Repo) String() string
- func (r *Repo) WithAuth(authToken string) *Repo
- func (r *Repo) WithCacheDir(cacheDir string) *Repo
- func (r *Repo) WithDownloadManager(manager *downloader.Manager) *Repo
- func (r *Repo) WithEndpoint(endpoint string) *Repo
- func (r *Repo) WithProgressBar(useProgressBar bool) *Repo
- func (r *Repo) WithRevision(revision string) *Repo
- func (r *Repo) WithType(repoType RepoType) *Repo
- type RepoInfo
- type RepoType
- type SafeTensorsInfo
Constants ¶
const ( HeaderXRepoCommit = "X-Repo-Commit" HeaderXLinkedETag = "X-Linked-Etag" HeaderXLinkedSize = "X-Linked-Size" )
const RepoIdSeparator = "--"
RepoIdSeparator is used to separate repository/model names parts when mapping to file names. Likely only for internal use.
Variables ¶
var ( // DefaultDirCreationPerm is used when creating new cache subdirectories. DefaultDirCreationPerm = os.FileMode(0755) // DefaultFileCreationPerm is used when creating files inside the cache subdirectories. DefaultFileCreationPerm = os.FileMode(0644) )
var SessionId string
SessionId is unique and always created anew at the start of the program, and used during the life of the program.
Functions ¶
func DefaultCacheDir ¶
func DefaultCacheDir() string
DefaultCacheDir for HuggingFace Hub, same used by the python library.
Its prefix is either `${XDG_CACHE_HOME}` if set, or `~/.cache` otherwise. Followed by `/huggingface/hub/`. So typically: `~/.cache/huggingface/hub/`.
func DefaultHttpUserAgent ¶
func DefaultHttpUserAgent() string
DefaultHttpUserAgent returns a user agent to use with HuggingFace Hub API.
Types ¶
type FileInfo ¶
type FileInfo struct {
Name string `json:"rfilename"`
}
FileInfo represents one of the model file, in the Info structure.
type Repo ¶
type Repo struct { // ID of the Repo may include owner/model. E.g.: google/gemma-2-2b-it ID string // Verbosity: 0 for quiet operation; 1 for information about progress; 2 and higher for debugging. Verbosity int // MaxParallelDownload indicates how many files to download at the same time. Default is 20. // If set to <= 0 it will download all files in parallel. // Set to 1 to make downloads sequential. MaxParallelDownload int // contains filtered or unexported fields }
Repo from which one wants to download files. Create it with New.
func New ¶
New creates a reference to a HuggingFace model given its id.
It uses the default cache directory in ${XDG_CACHE_HOME} (if set) or `~/.cache`, in a format that is shared with huggingface-hub for python library. The cache is share across various programs, including Python programs. Use Repo.WithCacheDir to change it, or NewWithDir to use a plain directory structure, that is not shared across programs.
The id typically include owner/model. E.g.: "google/gemma-2-2b-it"
It defaults to being a RepoTypeModel repository. But you can change it with Repo.WithType.
If authentication is needed, use Repo.WithAuth.
func (*Repo) DownloadFile ¶
DownloadFile is a shortcut to DownloadFiles with only one file.
func (*Repo) DownloadFiles ¶
DownloadFiles downloads the repository files (the names returned by repo.IterFileNames), and return the path to the downloaded files in the cache structure.
The returned downloadPaths can be read, but shouldn't be modified, since there may be other programs using the same files.
func (*Repo) DownloadInfo ¶
DownloadInfo about the model, if it hasn't yet.
It will attempt to use the "_info_.json" file in the cache directory first.
If forceDownload is set to true, it ignores the current info or the cached one, and download it again from HuggingFace.
See Repo.Info to access the Info directory. Most users don't need to call this directly, instead use the various iterators.
func (*Repo) FileURL ¶
FileURL returns the URL from which to download the file from HuggingFace.
Usually, not used directly (use DownloadFile instead), but in case someone needs for debugging.
func (*Repo) HasFile ¶
HasFile returns whether the repo has given fileName. Notice fileName is relative to the repository, not in local disk.
If the Repo hasn't downloaded its info yet, it attempts to download it here. If it fails, it simply return false. Call Repo.DownloadInfo to handle errors downloading the info.
func (*Repo) Info ¶
Info returns the RepoInfo structure about the model. Most users don't need to call this directly, instead use the various iterators.
If it hasn't been downloaded or loaded from the cache yet, it loads it first.
It may return nil if there was an issue with the downloading of the RepoInfo json from HuggingFace. Try DownloadInfo to get an error.
func (*Repo) IterFileNames ¶
IterFileNames iterate over the file names stored in the repo. It doesn't trigger the downloading of the repo, only of the repo info.
func (*Repo) WithAuth ¶
WithAuth sets the authentication token to use during downloads.
Setting it to empty ("") is the same as resetting and not using authentication.
func (*Repo) WithCacheDir ¶
WithCacheDir sets the cacheDir to the given directory.
The default is given by DefaultCacheDir: `${XDG_CACHE_HOME}/huggingface/hub` if set, or `~/.cache/huggingface/hub` otherwise.
func (*Repo) WithDownloadManager ¶
func (r *Repo) WithDownloadManager(manager *downloader.Manager) *Repo
WithDownloadManager sets the downloader.Manager to use for download. This is not needed, one will be created automatically if one is not set. This is useful when downloading multiple Repos simultaneously, to coordinate limits by sharing the download manager.
func (*Repo) WithEndpoint ¶ added in v0.1.2
WithEndpoint sets the HuggingFace endpoint to use. Default is "https://huggingface.co" or, if set, the environment variable HF_ENDPOINT.
func (*Repo) WithProgressBar ¶
WithProgressBar configures the usage of progress bar during download. Defaults to true.
func (*Repo) WithRevision ¶
WithRevision sets the revision to use for this Repo, defaults to "main", but can be set to a commit-hash value.
type RepoInfo ¶
type RepoInfo struct { ID string `json:"id"` ModelID string `json:"model_id"` Author string `json:"author"` CommitHash string `json:"sha"` Tags []string `json:"tags"` Siblings []*FileInfo `json:"siblings"` SafeTensors SafeTensorsInfo `json:"safetensors"` }
RepoInfo holds information about a HuggingFace repo, it is the json served when hitting the URL https://huggingface.co/api/<repo_type>/<model_id>
TODO: Not complete, only holding the fields used so far by the library.
type SafeTensorsInfo ¶
type SafeTensorsInfo struct { Total int // Parameters: maps dtype name to int. Parameters map[string]int }
SafeTensorsInfo holds counts on number of parameters of various types.