Documentation
¶
Overview ¶
Package parser provides advanced URL parsing capabilities by extending the standard net/url package with additional domain extraction functionality.
This package decomposes a URL’s host component into three primary parts:
- Subdomain (e.g., "www" in "www.example.com")
- Second-Level Domain (SLD, e.g., "example" in "www.example.com")
- Top-Level Domain (TLD, e.g., "com" in "www.example.com")
The custom URL type embeds net/url.URL so that it integrates seamlessly with existing HTTP libraries, while the additional Domain struct holds the parsed components. The Parser type offers methods to parse raw URL strings into this extended URL struct. It also supports applying a default scheme when missing, and uses a suffix array for efficient TLD lookups.
Example Usage:
package main import ( "fmt" "github.com/hueristiq/hq-go-url/parser" ) func main() { // Create a new parser with a default scheme of "https". p := parser.NewParser(parser.WithDefaultScheme("https")) // Parse a raw URL string without a scheme. parsedURL, err := p.Parse("www.example.com") if err != nil { fmt.Println("Error parsing URL:", err) return } // Print the reconstructed domain. fmt.Println("Domain:", parsedURL.Domain.String()) }
References: - net/url package documentation: https://pkg.go.dev/net/url
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Domain ¶
Domain represents a parsed domain name, broken down into three main components:
- Subdomain: The subdomain part of the domain (e.g., "www" in "www.example.com").
- SecondLevelDomain (SLD): The core part of the domain (e.g., "example" in "www.example.com").
- TopLevelDomain (TLD): The domain extension (e.g., "com" in "www.example.com").
This structure is useful for analysis or manipulation of domain names.
func (*Domain) String ¶
String reconstructs a full domain name from its individual components. It joins the non-empty components using a dot ("."). Empty components are omitted from the final output.
Examples:
- "www.example.com" when all components are provided.
- "example.com" when the subdomain is empty.
- "example" when only the SLD is provided.
Returns:
- A string representing the reconstructed domain.
type Interface ¶
Interface defines the standard interface for URL parsing functionality. Any type that implements this interface must provide a Parse method to convert a raw URL string into a parsed URL struct.
type OptionFunc ¶
type OptionFunc func(parser *Parser)
OptionFunc defines a function type used for configuring a Parser instance. Options allow customization of the Parser (e.g., setting a default scheme or custom TLDs) during initialization.
func WithDefaultScheme ¶
func WithDefaultScheme(scheme string) OptionFunc
WithDefaultScheme returns an OptionFunc that sets the default scheme for the Parser. This option ensures that URLs missing a scheme are treated as absolute URLs with the specified scheme.
Parameters:
- scheme (string): The default scheme to set (e.g., "http", "https").
Returns:
- (OptionFunc): An OptionFunc function that applies the default scheme to a Parser instance.
func WithTLDs ¶
func WithTLDs(TLDs ...string) OptionFunc
WithTLDs returns an OptionFunc that configures the Parser with a custom set of TLDs. This is useful for handling non-standard or niche TLDs that may not be included in the default set.
Parameters:
- TLDs (...string): A slice of custom TLD strings to be used by the Parser.
Returns:
- (OptionFunc): An OptionFunc function that applies the custom TLDs to the Parser.
type Parser ¶
type Parser struct {
// contains filtered or unexported fields
}
Parser is responsible for converting raw URL strings into the custom URL struct that includes both the standard URL components and additional domain details. It supports adding a default scheme if the URL is missing one, and it uses a suffix array for efficient TLD lookup.
Fields:
- scheme (string): The default scheme (e.g., "http", "https") to apply when missing.
- sa (*suffixarray.Index): A suffix array used for fast lookup of TLD strings.
func New ¶
func New(ofs ...OptionFunc) (parser *Parser)
New creates and initializes a new Parser instance with default settings. It builds a suffix array using a default set of TLDs from the imported tlds package. Additional configurations can be applied via the provided OptionFunc functions.
Parameters:
- ofs (...OptionFunc): A variadic list of OptionFunc functions to configure the Parser.
Returns:
- parser (*Parser): A pointer to the newly created Parser instance.
func (*Parser) Parse ¶
Parse converts a raw URL string into a URL struct that encapsulates both the standard URL components (parsed by net/url) and the extracted domain components. If a default scheme has been set (via SetDefaultScheme), it is applied to the raw URL string if missing. The host part of the URL is further processed to extract the subdomain, SLD, and TLD using a suffix array.
Parameters:
- raw (string): The raw URL string to parse.
Returns:
- parsed (*URL): A pointer to the resulting URL struct with embedded net/url.URL and domain details.
- err (error): An error if the URL cannot be parsed.
func (*Parser) SetDefaultScheme ¶
SetDefaultScheme sets the default scheme for the Parser. This scheme is applied to URL strings that do not already include a scheme.
Parameters:
- scheme (string): The default scheme to use (e.g., "http", "https").
func (*Parser) SetTLDs ¶
SetTLDs configures the Parser to use a custom set of TLDs by building a new suffix array. It concatenates the provided TLD strings with a delimiter and initializes the suffix array for lookups.
Parameters:
- TLDs (...string): A slice of custom TLD strings to be used by the Parser.
type URL ¶
URL extends the standard net/url.URL struct by embedding it and adding additional domain-related information. The Domain field holds a pointer to a Domain struct representing the parsed domain broken down into subdomain, second-level domain (SLD), and top-level domain (TLD).
This design enables seamless integration with existing HTTP libraries while providing enhanced domain parsing functionality.