gpu_metrics_max_concurrent
Description
The gpu_metrics_max_concurrent parameter caps the number of simultaneous Prometheus QueryRange invocations the rack's GPU metrics handler will issue across all in-flight requests. When the cap is reached the handler fails fast with HTTP 503; the Console surfaces this as a "Server is busy, please retry" banner.
This is a semaphore-style cap, not a queue: a 503 is immediate, not delayed. The Console retries on user action (refresh, dropdown change), not automatically.
Default Value
The default value is 10.
Allowed Range
1 to 50. The upper bound prevents a single operator from saturating Prometheus on shared racks. The validator at pkg/cli/rack.go rejects values above 50 or non-positive.
Use Cases
- High-fanout apps with many concurrent dashboards: Operators with several team members watching different per-service GPU charts simultaneously may need to bump from
10to20to avoid 503s. - Cost-sensitive Prometheus: Drop to
5to push back on chart-load fan-out when running on metered Prometheus.
Setting Parameters
To raise concurrency to 20:
$ convox rack params set gpu_metrics_max_concurrent=20 -r rackName
Setting parameters... OK
To revert to the default:
$ convox rack params set gpu_metrics_max_concurrent=10 -r rackName
Setting parameters... OK
To clear the override (falls back to the handler default 10):
$ convox rack params set gpu_metrics_max_concurrent= -r rackName
Setting parameters... OK
Operational Notes
- The cap is rack-wide, not per-app. Two apps each loading a GPU dashboard simultaneously share the budget.
- A 503 is a soft signal — the Console surfaces it as a transient banner. Persistent 503s usually indicate the cap is too low for the operator's chart fan-out, not that Prometheus itself is unhealthy.
- The cap does NOT protect Prometheus from runaway query latency. If a single QueryRange takes 30s, that slot is unavailable for new requests for the whole window. Pair this parameter with appropriate Prometheus query timeouts.
Related Parameters
- gpu_metrics_max_pods: Companion cap on the number of services included per request (parameter name is historical).
- gpu_observability_enable: The enable switch for the DCGM exporter chart.
gpu_metrics_max_concurrentis a no-op whengpu_observability_enable=false.
Version Requirements
This parameter requires at least Convox rack version 3.24.6.