Health Checks
Deploying a Service behind a load balancer requires a health check to determine whether a given Process is ready to handle requests.
Health checks must return a valid HTTP response code (200-399) within the configured timeout.
Processes that fail three consecutive health checks are assumed dead and will be terminated and replaced.
Defining Health Checks
Simple Health Check
services:
web:
health: /check
Specifying
healthas a string will set thepathand leave the other options as defaults.
Advanced Health Check
services:
web:
health:
grace: 5
interval: 5
path: /check
timeout: 3
| Attribute | Default | Description |
|---|---|---|
| grace | interval |
The amount of time in seconds to wait for a Process to boot before beginning health checks. Defaults to the value of interval |
| interval | 5 | The number of seconds between health checks |
| path | / | The HTTP endpoint that will be requested |
| timeout | interval - 1 |
The number of seconds to wait for a valid response. Defaults to interval minus one |
| disable | false | Set to true to disable the health check entirely |
Liveness Checks
Liveness checks complement health checks by monitoring the ongoing health of running processes. While health checks (readiness probes) determine when a service is ready to receive traffic, liveness checks determine when a service should be restarted if it becomes unresponsive or enters a broken state.
When a liveness check fails, Kubernetes will restart the container, which can help recover from deadlocks, memory leaks, or other issues that cause a process to become unresponsive while still appearing to be running.
Liveness Check Configuration
services:
web:
liveness:
path: /liveness/check
grace: 15
interval: 5
timeout: 3
successThreshold: 1
failureThreshold: 3
| Attribute | Default | Description |
|---|---|---|
| path | Required. The HTTP endpoint that will be requested for liveness checks | |
| grace | 10 | The amount of time in seconds to wait for a Process to start before beginning liveness checks |
| interval | 5 | The number of seconds between liveness checks |
| timeout | 5 | The number of seconds to wait for a successful response |
| successThreshold | 1 | The number of consecutive successful checks required to consider the probe successful |
| failureThreshold | 3 | The number of consecutive failed checks required before restarting the container |
Important Considerations
- Path is Required: Unlike health checks, you must specify a
pathto enable liveness checks - Conservative Configuration: Liveness checks should be configured conservatively to avoid unnecessary restarts. False positives can cause service disruption
- Separate Endpoints: Consider using different endpoints for health checks and liveness checks to monitor different aspects of your application
- Startup Time: Set an appropriate
graceperiod to allow your application to fully initialize before liveness checks begin
Example Use Cases
Detecting Deadlocks:
services:
worker:
liveness:
path: /worker/health
grace: 30
interval: 10
failureThreshold: 5
Monitoring Memory-Intensive Applications:
services:
processor:
liveness:
path: /memory-check
grace: 45
interval: 15
timeout: 10
failureThreshold: 3
Startup Probes
Startup probes provide a way to check if an application has successfully started before allowing readiness and liveness probes to take effect. This is particularly useful for applications that require significant initialization time or have variable startup durations.
When a startup probe is configured, all other probes are disabled until it succeeds. This prevents Kubernetes from prematurely marking a service as unhealthy or restarting it before initialization completes.
Startup Probe Configuration
A startup probe requires either a path (HTTP check) or tcpSocketPort (TCP check) to define what is checked. Timing parameters (grace, interval, timeout, thresholds) can be set directly on the startup probe. If not explicitly set, they are inherited from the liveness check configuration.
You must configure a
livenesscheck alongside your startup probe. If startup probe timing fields are not explicitly set, they fall back to the liveness values. Without a liveness configuration and without explicit startup probe timing, these values default to zero, which will cause immediate failures.
TCP Startup Probe
services:
web:
build: .
port: 3000
startupProbe:
tcpSocketPort: 3000
liveness:
path: /live
grace: 30
interval: 10
timeout: 5
successThreshold: 1
failureThreshold: 30
| Attribute | Description |
|---|---|
| tcpSocketPort | Required (if path not set). The TCP port to check for startup success |
HTTP Startup Probe
services:
api:
build: .
port: 8080
startupProbe:
path: /startup
liveness:
path: /live
grace: 10
interval: 5
timeout: 3
failureThreshold: 40
| Attribute | Description |
|---|---|
| path | Required (if tcpSocketPort not set). The HTTP endpoint to check for startup success |
Timing Configuration
The startup probe supports its own timing parameters. When explicitly set, these values are used directly. When not set (value is 0), they fall back to the corresponding liveness check values:
| Attribute | Description | Fallback (when not set) |
|---|---|---|
| grace | Initial delay in seconds before probing | liveness.grace (default: 10) |
| interval | Seconds between probe attempts | liveness.interval (default: 5) |
| timeout | Seconds before probe times out | liveness.timeout (default: 5) |
| successThreshold | Consecutive successes to be considered ready | liveness.successThreshold (default: 1) |
| failureThreshold | Consecutive failures before pod is killed | liveness.failureThreshold (default: 3) |
Set these directly on the startup probe to use different values from the liveness check:
services:
api:
startupProbe:
path: /startup
grace: 60
interval: 30
timeout: 5
failureThreshold: 10
liveness:
path: /live
grace: 10
interval: 5
timeout: 3
failureThreshold: 3
In this example, the startup probe uses its own values: 60s grace, 30s interval, 10 failure threshold — allowing a generous startup window (60s + 10 x 30s = 360s) while keeping the liveness probe tight for ongoing monitoring.
Note on versions before 3.24.1: In rack versions prior to 3.24.1, a bug caused the startup probe to always use liveness timing values regardless of explicit configuration. If you are running an older rack version, timing fields set on
startupProbewill have no effect. Update to 3.24.1 or later to use independent startup probe timing.
Use Cases for Startup Probes
Startup probes are ideal for:
- Database Migrations: Applications that run database migrations on startup
- Cache Warming: Services that need to populate caches before serving traffic
- Large Applications: Applications with significant initialization requirements
- Configuration Loading: Services that load extensive configuration or connect to multiple external services
- Legacy Applications: Applications with unpredictable or lengthy startup times
Example: Application with Long Initialization
services:
analytics:
build: .
port: 5000
startupProbe:
tcpSocketPort: 5000
liveness:
path: /live
grace: 60
interval: 15
timeout: 5
failureThreshold: 20
health:
path: /health
interval: 5
In this example:
- The startup probe checks TCP port 5000 using the liveness timing: every 15 seconds, up to 20 failures, allowing approximately 5 minutes for startup (15s × 20)
- Once the startup probe succeeds, the health (readiness) and liveness checks begin
- If startup fails after 20 attempts, the container is restarted
Important Startup Probe Considerations
- Relationship with Other Probes: Liveness and readiness probes are disabled until the startup probe succeeds
- Liveness Required: You must define a liveness check alongside your startup probe. If startup probe timing fields are not explicitly set, they fall back to the liveness values
- Independent Timing: Timing fields (
grace,interval,timeout,successThreshold,failureThreshold) can be set directly on the startup probe to use values independent of the liveness configuration (requires rack version 3.24.1+) - Failure Threshold: Set a high enough
failureThresholdon the startup probe (or on the liveness check if relying on fallback) to accommodate your application's maximum startup time - Startup vs. Liveness: Use startup probes for initialization, liveness probes for ongoing health monitoring
- Resource Planning: Consider that pods may take longer to become ready when using startup probes
gRPC Health Checks
For services that use gRPC instead of HTTP, Convox provides support for gRPC health checks through the gRPC health checking protocol. To enable gRPC health checks, you need to:
- Specify that your service uses the gRPC protocol in the port definition
- Enable the gRPC health check with the
grpcHealthEnabledattribute
Basic Configuration
services:
api:
build: .
port: grpc:50051
grpcHealthEnabled: true
Advanced Configuration
You can customize the gRPC health check behavior using the same health attributes as HTTP health checks:
services:
api:
build: .
port: grpc:50051
grpcHealthEnabled: true
health:
grace: 20
interval: 5
path: /
timeout: 2
| Attribute | Default | Description |
|---|---|---|
| grace | interval |
The amount of time in seconds to wait for a Process to boot before beginning health checks. Defaults to the value of interval |
| interval | 5 | The number of seconds between health checks |
| path | / | The service name to check within your gRPC health implementation |
| timeout | interval - 1 |
The number of seconds to wait for a valid response. Defaults to interval minus one |
| disable | false | Set to true to disable the health check entirely |
Implementation Requirements
Services using gRPC health checks must implement the gRPC Health Checking Protocol, which is defined in the gRPC health checking protocol repository.
This protocol requires your service to implement a Health service with a Check method that returns the service's health status.
Probe Behavior
When grpcHealthEnabled is set to true, Convox configures both:
- A readinessProbe - Determines whether the service is ready to receive traffic
- A livenessProbe - Determines whether the service should be restarted
The readinessProbe ensures that gRPC services won't receive traffic until they are fully ready, while the livenessProbe monitors the ongoing health of the service and initiates restarts if necessary.
Both probes use the health settings defined in your convox.yml, ensuring consistent behavior throughout the service lifecycle.
gRPC probes use a hardcoded
failureThresholdof 5 andsuccessThresholdof 1 for both readiness and liveness. This differs from HTTP probes, where readiness uses afailureThresholdof 3 and liveness uses the configurableliveness.failureThreshold(default 3). The gRPC thresholds are not configurable.
Example Implementation
Here's a minimal example of a gRPC health check implementation in Go:
import (
"context"
"google.golang.org/grpc"
"google.golang.org/grpc/health"
healthpb "google.golang.org/grpc/health/grpc_health_v1"
)
func main() {
server := grpc.NewServer()
// Register your service
// pb.RegisterYourServiceServer(server, &yourServiceImpl{})
// Register the health service
healthServer := health.NewServer()
healthpb.RegisterHealthServer(server, healthServer)
// Set your service as serving
healthServer.SetServingStatus("", healthpb.HealthCheckResponse_SERVING)
// Continue with server initialization...
}
With this implementation and the appropriate configuration in your convox.yml, your gRPC service will properly report its health status to Convox, ensuring that it only receives traffic when it's ready to handle requests.
See Also
- Service Lifecycle Hooks for preStop and postStart container hooks
- Load Balancers for configuring traffic routing
- Rolling Updates for how health checks affect deployments
- Scaling for autoscaling configuration
- deploy-debug for diagnosing health check failures during deployment