parser

package
v0.0.0-...-2dfbb78 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 3, 2025 License: MIT Imports: 6 Imported by: 2

Documentation

Overview

Package parser provides advanced URL parsing capabilities by extending the standard net/url package with additional domain extraction functionality.

This package decomposes a URL’s host component into three primary parts:

  • Subdomain (e.g., "www" in "www.example.com")
  • Second-Level Domain (SLD, e.g., "example" in "www.example.com")
  • Top-Level Domain (TLD, e.g., "com" in "www.example.com")

The custom URL type embeds net/url.URL so that it integrates seamlessly with existing HTTP libraries, while the additional Domain struct holds the parsed components. The Parser type offers methods to parse raw URL strings into this extended URL struct. It also supports applying a default scheme when missing, and uses a suffix array for efficient TLD lookups.

Example Usage:

package main

import (
    "fmt"
    "github.com/hueristiq/hq-go-url/parser"
)

func main() {
    // Create a new parser with a default scheme of "https".
    p := parser.NewParser(parser.WithDefaultScheme("https"))

    // Parse a raw URL string without a scheme.
    parsedURL, err := p.Parse("www.example.com")
    if err != nil {
        fmt.Println("Error parsing URL:", err)
        return
    }

    // Print the reconstructed domain.
    fmt.Println("Domain:", parsedURL.Domain.String())
}

References: - net/url package documentation: https://pkg.go.dev/net/url

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Domain

type Domain struct {
	TopLevelDomain    string
	SecondLevelDomain string
	Subdomain         string
}

Domain represents a parsed domain name, broken down into three main components:

  • Subdomain: The subdomain part of the domain (e.g., "www" in "www.example.com").
  • SecondLevelDomain (SLD): The core part of the domain (e.g., "example" in "www.example.com").
  • TopLevelDomain (TLD): The domain extension (e.g., "com" in "www.example.com").

This structure is useful for analysis or manipulation of domain names.

func (*Domain) String

func (d *Domain) String() (domain string)

String reconstructs a full domain name from its individual components. It joins the non-empty components using a dot ("."). Empty components are omitted from the final output.

Examples:

  • "www.example.com" when all components are provided.
  • "example.com" when the subdomain is empty.
  • "example" when only the SLD is provided.

Returns:

  • A string representing the reconstructed domain.

type Interface

type Interface interface {
	Parse(raw string) (parsed *URL, err error)
}

Interface defines the standard interface for URL parsing functionality. Any type that implements this interface must provide a Parse method to convert a raw URL string into a parsed URL struct.

type OptionFunc

type OptionFunc func(parser *Parser)

OptionFunc defines a function type used for configuring a Parser instance. Options allow customization of the Parser (e.g., setting a default scheme or custom TLDs) during initialization.

func WithDefaultScheme

func WithDefaultScheme(scheme string) OptionFunc

WithDefaultScheme returns an OptionFunc that sets the default scheme for the Parser. This option ensures that URLs missing a scheme are treated as absolute URLs with the specified scheme.

Parameters:

  • scheme (string): The default scheme to set (e.g., "http", "https").

Returns:

  • (OptionFunc): An OptionFunc function that applies the default scheme to a Parser instance.

func WithTLDs

func WithTLDs(TLDs ...string) OptionFunc

WithTLDs returns an OptionFunc that configures the Parser with a custom set of TLDs. This is useful for handling non-standard or niche TLDs that may not be included in the default set.

Parameters:

  • TLDs (...string): A slice of custom TLD strings to be used by the Parser.

Returns:

  • (OptionFunc): An OptionFunc function that applies the custom TLDs to the Parser.

type Parser

type Parser struct {
	// contains filtered or unexported fields
}

Parser is responsible for converting raw URL strings into the custom URL struct that includes both the standard URL components and additional domain details. It supports adding a default scheme if the URL is missing one, and it uses a suffix array for efficient TLD lookup.

Fields:

  • scheme (string): The default scheme (e.g., "http", "https") to apply when missing.
  • sa (*suffixarray.Index): A suffix array used for fast lookup of TLD strings.

func New

func New(ofs ...OptionFunc) (parser *Parser)

New creates and initializes a new Parser instance with default settings. It builds a suffix array using a default set of TLDs from the imported tlds package. Additional configurations can be applied via the provided OptionFunc functions.

Parameters:

  • ofs (...OptionFunc): A variadic list of OptionFunc functions to configure the Parser.

Returns:

  • parser (*Parser): A pointer to the newly created Parser instance.

func (*Parser) Parse

func (p *Parser) Parse(raw string) (parsed *URL, err error)

Parse converts a raw URL string into a URL struct that encapsulates both the standard URL components (parsed by net/url) and the extracted domain components. If a default scheme has been set (via SetDefaultScheme), it is applied to the raw URL string if missing. The host part of the URL is further processed to extract the subdomain, SLD, and TLD using a suffix array.

Parameters:

  • raw (string): The raw URL string to parse.

Returns:

  • parsed (*URL): A pointer to the resulting URL struct with embedded net/url.URL and domain details.
  • err (error): An error if the URL cannot be parsed.

func (*Parser) SetDefaultScheme

func (p *Parser) SetDefaultScheme(scheme string)

SetDefaultScheme sets the default scheme for the Parser. This scheme is applied to URL strings that do not already include a scheme.

Parameters:

  • scheme (string): The default scheme to use (e.g., "http", "https").

func (*Parser) SetTLDs

func (p *Parser) SetTLDs(TLDs ...string)

SetTLDs configures the Parser to use a custom set of TLDs by building a new suffix array. It concatenates the provided TLD strings with a delimiter and initializes the suffix array for lookups.

Parameters:

  • TLDs (...string): A slice of custom TLD strings to be used by the Parser.

type URL

type URL struct {
	*url.URL

	Domain *Domain
}

URL extends the standard net/url.URL struct by embedding it and adding additional domain-related information. The Domain field holds a pointer to a Domain struct representing the parsed domain broken down into subdomain, second-level domain (SLD), and top-level domain (TLD).

This design enables seamless integration with existing HTTP libraries while providing enhanced domain parsing functionality.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL