health

package
v0.235.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 8, 2024 License: Apache-2.0 Imports: 15 Imported by: 0

README

Health Package

The health package is designed to help you build a health check easily for your Go application. It includes functionality for checking the status of various components, reporting issues, and collecting health related metrics.

It allows you to expose this through an HTTP API. Additionally, it enables you to gather health status information from your dependent services in a cascading manner. This means that when your application encounters issues, the health check endpoint can quickly identify the root cause, saving you time and effort in troubleshooting.

The health package also enables providing contextual information about the system, application, and its dependencies in a human-readable format in our health check reports. This allows for a quick overview of the system during an outage or partial outage, helping identify problem's root cause efficiently.

By including metrics on the health check endpoint from dependent services, it becomes easier to correlate potential root causes for an outage. The health package is meant to be used by a human primarily and not meant to replace tools such as Prometheus.

The health package focuses on runtime information to provide contextual details about the system's dependencies and their statuses. By checking our systems state during runtime, it helps identify potential issues which affect your application's ability to serve requests normally on its API.

The health package is meant to assist in identifying and troubleshooting runtime issues during outages, and not to verify if your application is correctly implemented.

Features

  • Easy-to-use health check endpoint
  • Support for registering checks and dependencies
  • Customizable health state messages
  • Reporting of issues and errors during health checks
  • HTTP handler for serving health check responses

Usage

To use the health package, you will need to import it into your Go application:

package main

import "go.llib.dev/frameless/pkg/devops/health"

You can then create a new instance of the Monitor and register checks, dependencies and metrics as needed.

For example:

package main

import (
	"context"
	"database/sql"
	"go.llib.dev/frameless/pkg/devops/health"
	"net/http"
	"sync"
)

const metricKeyForHTTPRetryPerSec = "http-retry-average-per-second"

func healthCheckMonitor(appMetrics *sync.Map, db *sql.DB) health.Monitor {
	return health.Monitor{
		// our service related checks
		Checks: health.MonitorChecks{
			func(ctx context.Context) error {
				value, ok := appMetrics.Load(metricKeyForHTTPRetryPerSec)
				if !ok {
					return nil
				}
				averagePerSec, ok := value.(int)
				if !ok {
					return nil
				}
				if 42 < averagePerSec {
					return health.Issue{
						Causes: health.Degraded,
						Code:   "too-many-http-request-retries",
						Message: "There could be an underlying networking issue, " +
							"that needs to be looked into, the system is working, " +
							"but the retry attemt average shouldn't be so high",
					}
				}
				return nil
			},
		},
		// our service's dependencies like DB or downstream services
		Dependencies: health.MonitorDependencies{
			func(ctx context.Context) health.Report {
				var hs health.Report
				err := db.PingContext(ctx)

				if err != nil {
					hs.Issues = append(hs.Issues, health.Issue{
						Causes:  health.Down,
						Code:    "xy-db-disconnected",
						Message: "failed to ping the database through the connection",
					})
				}

				// additional health checks on this DB connection

				return hs
			},
			
			// registering a downstream serice as our dependency by using their /health endpoint 
			health.HTTPHealthCheck("https://downstreamservice.mydomain.ext/health", nil),
		},
	}
}

Once you have registered your checks and dependencies, you can use the Monitor struct's HTTPHandler method to create an HTTP handler for serving health check responses. The http response of the Monitor's HTTP Handler is compatible with how Kubernetes' health check integration.

http.Handle("/health", monitor.HTTPHandler())

Here is an example, that represents a partial outage, due to having the DB connection severed in our downstream service.

  • we have a PARTIAL_OUTAGE status
    • message states that one or more dependencies are experiencing issues
  • we check our dependencies for issues
    • downstream-service-name is in a PARTIAL_OUTAGE state as well
  • we check the dependencies of downstream-service-name
    • we can see that xy-db is in a DOWN state.
  • we reach out to the team responsible for xy-db
{
  "status": "PARTIAL_OUTAGE",
  "message": "The service is running, but one or more dependencies are experiencing issues.",
  "dependencies": [
    {
      "status": "PARTIAL_OUTAGE",
      "name": "downstream-service-name",
      "message": "The service is running, but one or more dependencies are experiencing issues.",
      "issues": [
        {
          "code": "xy-db-disconnected",
          "message": "failed to ping the database through the connection"
        }
      ],
      "dependencies": [
        {
          "status": "DOWN",
          "name": "xy-db",
          "timestamp": "0001-01-01T00:00:00Z"
        }
      ],
      "timestamp": "0001-01-01T00:00:00Z",
      "metrics": {
        "http-request-throughput": 42
      }
    }
  ],
  "timestamp": "2024-03-25T13:29:26Z",
  "metrics": {
    "metric-name": 42
  }
}

Testing

The health package includes a suite of tests that demonstrate its functionality. These tests cover various scenarios, such as when all checks and dependencies pass, when one check or dependency fails, and when there are issues during the health check evaluation process.

To run the tests, you can use the go test command in the pkg/devops/health directory:

$ go test

The output of running the tests is included above and demonstrates the various scenarios that are covered.

Documentation

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func HTTPHealthCheck

func HTTPHealthCheck(healthCheckEndpointURL string, config *HTTPHealthCheckConfig) func(ctx context.Context) Report
Example
package main

import (
	"go.llib.dev/frameless/pkg/devops/health"
)

func main() {
	var m = health.Monitor{
		Dependencies: []health.DependencyCheck{
			health.HTTPHealthCheck("https://www.example.com/health", nil),
		},
	}

	_ = m
}

func StateMessage

func StateMessage(s Status) string

Types

type Check added in v0.203.0

type Check func(ctx context.Context) error

Check represents a health check. Check supposed to yield back nil if the check passes. Check should yield back an error in case the check detected a problem. For problems, Check may return back an Issue to describe in detail the problem. Most Errors will be considered as

type DependencyCheck added in v0.203.0

type DependencyCheck func(ctx context.Context) Report

DependencyCheck serves as a health check for a specific dependency. If an error occurs during the check, it should be represented as an Issue in the returned Report.Issues list.

For example, if a remote service is unreachable on the network, it should be represented as an issue in the Report.Issues that the service is unreachable, and the Issue.Causes should tell that this makes the given dependency health Status considered as Down.

type DetailCheck added in v0.218.0

type DetailCheck func(ctx context.Context) (any, error)

DetailCheck represents a metric reporting function. The result will be added to the Report.Metrics. A DetailCheck results encompass analytical purpose, a status indicators for the service for the given time when the service were called. If numerical values are included, they should fluctuate over time, reflecting the current state.

Values that behave differently depending on how long the application runs are not ideal. For instance, a good metric value indicates the current throughput of the HTTP API,

A challenging metric value would be a counter that counts the total handled requests number from a given application's instance lifetime.

type HTTPHealthCheckConfig

type HTTPHealthCheckConfig struct {
	Name          string
	HTTPClient    *http.Client
	BodyReadLimit iokit.ByteSize
	Unmarshal     func(ctx context.Context, data []byte, ptr *Report) error
}

type Issue

type Issue struct {
	// Code is meant for programmatic processing of an issue detection.
	// Should contain no whitespace and use dash-case/snakecase/const-case.
	Code consttypes.String
	// Message can contain further details about the detected issue.
	Message string
	// Causes will indicate the status change this Issue will cause
	Causes Status
}

Issue represents an issue detected in during a health check.

func (Issue) Error

func (err Issue) Error() string

type IssueJSONDTO added in v0.202.1

type IssueJSONDTO struct {
	Code    string `json:"code,omitempty"`
	Message string `json:"message,omitempty"`
}

type Monitor

type Monitor struct {
	// ServiceName will be used to set the Report.Name field.
	ServiceName string
	// Checks contain the health checks about our own service.
	// Check should return with nil in case the check passed.
	// Check should return back with an Issue or a generic error, in case the check failed.
	// Returned generic errors are considered as an Issue with Down Status.
	Checks []Check
	// Dependencies represent our service's dependencies and their health state (Report).
	// DependencyCheck should come back always with a valid Report.
	Dependencies []DependencyCheck
	// Details represents our service's monitoring metrics.
	Details map[string]DetailCheck
}
Example (Check)
package main

import (
	"context"
	"sync"

	"go.llib.dev/frameless/pkg/devops/health"
)

func main() {
	const detailKeyForHTTPRetryPerSec = "http-retry-average-per-second"
	appDetails := sync.Map{}

	var hm = health.Monitor{
		Checks: []health.Check{
			func(ctx context.Context) error {
				value, ok := appDetails.Load(detailKeyForHTTPRetryPerSec)
				if !ok {
					return nil
				}
				averagePerSec, ok := value.(int)
				if !ok {
					return nil
				}
				if 42 < averagePerSec {
					return health.Issue{
						Causes: health.Degraded,
						Code:   "too-many-http-request-retries",
						Message: "There could be an underlying networking issue, " +
							"that needs to be looked into, the system is working, " +
							"but the retry attemt average shouldn't be so high",
					}
				}
				return nil
			},
		},
	}

	ctx := context.Background()
	hs := hm.HealthCheck(ctx)
	_ = hs // use the results
}
Example (Dependency)
package main

import (
	"context"
	"database/sql"

	"go.llib.dev/frameless/pkg/devops/health"
)

func main() {
	var hm health.Monitor
	var db *sql.DB // populate it with a live db connection

	hm.Dependencies = append(hm.Dependencies, func(ctx context.Context) health.Report {
		var hs health.Report
		err := db.PingContext(ctx)

		if err != nil {
			hs.Issues = append(hs.Issues, health.Issue{
				Causes:  health.Down,
				Code:    "xy-db-disconnected",
				Message: "failed to ping the database through the connection",
			})
		}

		// additional health checks on the DB dependency

		return hs
	})

	ctx := context.Background()
	hs := hm.HealthCheck(ctx)
	_ = hs // use the results
}

func (*Monitor) HTTPHandler

func (m *Monitor) HTTPHandler() http.Handler
Example
package main

import (
	"context"
	"net/http"

	"go.llib.dev/frameless/pkg/devops/health"
)

func main() {
	var m = health.Monitor{
		Checks: []health.Check{
			func(ctx context.Context) error {
				return nil // all good
			},
		},
	}

	mux := http.NewServeMux()
	mux.Handle("/health", m.HTTPHandler())
	_ = http.ListenAndServe("0.0.0.0:8080", mux)
}

func (*Monitor) HealthCheck

func (m *Monitor) HealthCheck(ctx context.Context) Report

type Report

type Report struct {
	// Name field typically contains a descriptive name for the service or application.
	Name string
	// Status is the current health status of a given service.
	//
	// By default, an empty Status interpreted as Up Status.
	// If an Issue in Issues causes Status change, then it will be reflected in the Report.Status as well.
	// If a dependency has a non Up Status, then the current Status considered as PartialOutage.
	Status Status
	// Message field provides an explanation of the current state or specific issues (if any) affecting the service.
	// Message is optional, and when it's empty, the default is inferred from the Report.Status value.
	Message string
	// Issues is the list of issue that the health check functions were able to detect.
	// If an Issue in Report.Issues contain a Issue.Causes, then the Report.Status will be affected.
	Issues []Issue
	// Dependencies are the service dependencies, which are required for the service to function correctly.
	// If a Report has a problemating Status in Report.Dependencies, it will affect the Report.Status.
	Dependencies []Report
	// Timestamp represents the time at the health check report was created
	// Default is the current time in UTC.
	Timestamp time.Time
	// Details encompass analytical data and status indicators
	// for the service for the given time when the service were called.
	// For more about what values it should contain, read the documentation of Metric.
	Details map[string]any
}

func (*Report) Correlate

func (r *Report) Correlate()

func (*Report) Validate

func (r *Report) Validate() error

func (Report) WithIssue added in v0.218.0

func (r Report) WithIssue(issue Issue) Report

type ReportJSONDTO added in v0.202.1

type ReportJSONDTO struct {
	Status       string          `json:"status"`
	Name         string          `json:"name,omitempty"`
	Message      string          `json:"message,omitempty"`
	Issues       []IssueJSONDTO  `json:"issues,omitempty"`
	Dependencies []ReportJSONDTO `json:"dependencies,omitempty"`
	Timestamp    string          `json:"timestamp,omitempty"`
	Details      map[string]any  `json:"details,omitempty"`
}

type Status

type Status string
const (
	// Up means that service is running correctly and able to respond to requests.
	Up Status = "UP"
	// Down means that service is not running or unresponsive.
	Down Status = "DOWN"
	// PartialOutage means that service is running, but one or more dependencies are experiencing issues.
	// PartialOutage also indicates that there has been a limited disruption or degradation in the service.
	// It typically affects only a subset of services or users, rather than the entire system.
	// Examples of partial outages include slower response times, intermittent errors,
	// or reduced functionality for specific features.
	PartialOutage Status = "PARTIAL_OUTAGE"
	// Degraded means that service is running but with reduced capabilities or performance.
	// When a system is in a Degraded state, it means that overall performance or functionality has deteriorated.
	// Unlike a PartialOutage, a Degraded state may impact a broader scope of services or users.
	// It could result in slower overall system performance, increased error rates, or reduced capacity.
	// Monitoring tools often detect this state based on predefined thresholds or deviations from expected behaviour.
	Degraded Status = "DEGRADED"
	// Maintenance means that service is currently undergoing maintenance or updates and might not function correctly.
	Maintenance Status = "MAINTENANCE"
	// Unknown means that service's status cannot be determined due to an error or lack of information.
	Unknown Status = "UNKNOWN"
)

func (Status) IsLessSevere

func (hss Status) IsLessSevere(oth Status) bool

func (Status) IsZero

func (hss Status) IsZero() bool

func (Status) String

func (hss Status) String() string

func (Status) Validate

func (hss Status) Validate() error

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL