cost_tracking_enable

Description

The cost_tracking_enable parameter turns on the rack-side cost-accumulator that powers convox cost and per-app budget caps. With cost tracking enabled, the rack samples the resource usage of every running pod (CPU millicores, memory megabytes, GPU count, GPU vendor) on a fixed cadence, multiplies each sample by the rack's per-instance pricing table, and persists the rolling-window total as a Kubernetes namespace annotation on each app. The cost data is exposed through the AppCost API and surfaces in the Convox Console cost dashboards, the convox cost CLI, and the budget-cap auto-shutdown machinery.

Cost tracking is a prerequisite for the per-app budget cap feature (convox budget set). With cost tracking disabled, budget caps cannot fire because the accumulator is not running — convox budget set is rejected at the CLI with a friendly error pointing here.

Default Value

The default value for cost_tracking_enable is false. Users must opt in to enable the cost accumulator and budget-cap surfaces.

Use Cases

Per-app cost visibility: Surface dollars-per-hour by app and by service in the Convox Console dashboards and the convox cost CLI without integrating an external cost-management tool.
Budget-cap enforcement: Set a monthly USD cap on an app via convox budget set --monthly-cap 1000 <app> and have the rack auto-block deploys (or auto-shutdown services) when the rolling 30-day spend reaches the cap.
Cost-per-utilization analysis: Combined with gpu_observability_enable, surface dollars-per-actual-GPU-hour rather than dollars-per-allocated-GPU-hour for inference and training workloads.
Spend forecasting: The accumulator emits a rolling spend rate that feeds Convox Console forecasting widgets so users can see "at this burn rate, you'll hit your cap in 6 days."

Capacity Considerations

The rack-side cost accumulator itself runs inside the existing rack control plane and adds no separately-allocated CPU or memory on the cluster. However, surfacing cost data in the Convox Console additionally installs the kube-prometheus-stack helm chart (Prometheus operator + state-metrics + a single-replica Prometheus statefulset + node-exporter daemonset) into the convox-monitoring namespace when the user enables monitoring through the Console UI. The chart's combined steady-state footprint is roughly 1 vCPU and 2 GiB of memory; transient install-time spikes can be 1.5x that.

For racks with a single small workload node (e.g. t3.small, t3.medium), enabling Console monitoring on top of cost tracking can overcommit the node and trigger a kubelet failure that drags pods into a stuck Terminating state. Recommended minimums for clusters that intend to surface cost data through the Console:

One workload node of t3.large or larger (or any 2 vCPU / 4+ GiB instance), OR
Two or more workload nodes of any size where the user-workload pods can spread off the prometheus statefulset's node, OR
Karpenter enabled on the rack (karpenter_enable=true) so the rack can grow capacity on demand.

Convox 3.24.6 ships explicit resource requests on the prometheus statefulset and a PodDisruptionBudget so Karpenter pre-provisions a fitting node before scheduling and so voluntary disruption pauses while a replacement reschedules. Smaller clusters still benefit from enabling Karpenter so the rack can react to chart-install spikes.

Setting Parameters

To enable cost tracking on an existing rack:

$ convox rack params set cost_tracking_enable=true -r rackName
Updating parameters... OK

To disable:

$ convox rack params set cost_tracking_enable=false -r rackName
Updating parameters... OK

Disabling stops the cost accumulator on the next rack restart. Existing budget-state annotations on each app are left intact (no data destruction); they stop receiving new samples until the parameter is re-enabled.

Additional Information

This parameter is currently AWS-only.
The cost accumulator runs inside the rack control plane, not as a sidecar — sampling cadence is bounded by the rack's existing resource budget. There is no additional CPU or memory allocation introduced by enabling this parameter.
The pricing table is updated per Convox release. Pin a custom multiplier (e.g., to model your AWS Enterprise Discount Program) via the per-app pricing-adjustment budget option (convox budget set --pricing-adjustment 0.7 <app>).
Spot capacity-type discount is applied automatically (a default factor of 0.30) when the underlying NodePool is configured for spot. Per-instance overrides are configurable via the rack's pricing table.
The rolling-window length is 30 days. Older samples roll off the tail as new samples arrive; the rack does not retain unbounded historical data.

gpu_observability_enable: Pairs with cost tracking to surface dollars-per-actual-GPU-utilization metrics.
webhook_signing_key: Webhook deliveries from cost-tracking events (app:budget:armed, app:budget:fired) carry an HMAC signature when this is set, so receivers can verify authenticity.

Version Requirements

This parameter requires at least Convox rack version 3.24.6.