gpu_metrics_max_concurrent

Description

The gpu_metrics_max_concurrent parameter caps the number of simultaneous Prometheus QueryRange invocations the rack's GPU metrics handler will issue across all in-flight requests. When the cap is reached the handler fails fast with HTTP 503; the Console surfaces this as a "Server is busy, please retry" banner.

This is a semaphore-style cap, not a queue: a 503 is immediate, not delayed. The Console retries on user action (refresh, dropdown change), not automatically.

Default Value

The default value is 10.

Allowed Range

1 to 50. The upper bound prevents a single operator from saturating Prometheus on shared racks. The validator at pkg/cli/rack.go rejects values above 50 or non-positive.

Use Cases

High-fanout apps with many concurrent dashboards: Operators with several team members watching different per-service GPU charts simultaneously may need to bump from 10 to 20 to avoid 503s.
Cost-sensitive Prometheus: Drop to 5 to push back on chart-load fan-out when running on metered Prometheus.

Setting Parameters

To raise concurrency to 20:

$ convox rack params set gpu_metrics_max_concurrent=20 -r rackName
Setting parameters... OK

To revert to the default:

$ convox rack params set gpu_metrics_max_concurrent=10 -r rackName
Setting parameters... OK

To clear the override (falls back to the handler default 10):

$ convox rack params set gpu_metrics_max_concurrent= -r rackName
Setting parameters... OK

Operational Notes

The cap is rack-wide, not per-app. Two apps each loading a GPU dashboard simultaneously share the budget.
A 503 is a soft signal — the Console surfaces it as a transient banner. Persistent 503s usually indicate the cap is too low for the operator's chart fan-out, not that Prometheus itself is unhealthy.
The cap does NOT protect Prometheus from runaway query latency. If a single QueryRange takes 30s, that slot is unavailable for new requests for the whole window. Pair this parameter with appropriate Prometheus query timeouts.

gpu_metrics_max_pods: Companion cap on the number of services included per request (parameter name is historical).
gpu_observability_enable: The enable switch for the DCGM exporter chart. gpu_metrics_max_concurrent is a no-op when gpu_observability_enable=false.

Version Requirements

This parameter requires at least Convox rack version 3.24.6.