3.24 Releases

Convox 3.24 upgrades Kubernetes to 1.34, introduces the convox deploy-debug command, adds mixed ARM/x86 architecture support, and adds Karpenter as an opt-in alternative to Cluster Autoscaler for AWS EKS node provisioning. The 3.24.8 release adds Contour (Envoy) as an opt-in alternative to the nginx ingress router on AWS, plus an automatic deploy-recovery fix for Services carrying a duplicate port. The 3.24.7 release adds ECR IAM policy customization, EKS Access Entries for migrating off the aws-auth ConfigMap, Karpenter enablement validation guards, and CloudWatch log streaming rate limiting. The 3.24.6 release adds KEDA-based autoscaling with scale-to-zero, per-app cost tracking and monthly budget caps, GPU observability with DCGM and Prometheus, HMAC webhook signing, and prebuilt image imports. Earlier releases include Fluentd memory tuning, Terraform timeout control, automatic parameter reconciliation across version transitions, and several reliability fixes.

3.24.0

Released: 2026-03-24

Feature Additions

  • Added convox deploy-debug command for diagnosing deploy failures without kubectl access

Updates

  • Upgraded Kubernetes to v1.34
  • Updated BuildKit to v0.28.0
  • Updated CoreDNS to v1.13.2
  • Updated EBS CSI Driver to v1.56.0
  • Updated EFS CSI Driver to v2.3.0
  • Updated Pod Identity to v1.3.10
  • Updated VPC CNI to v1.21.1

Fixes

  • Fixed local development rack DNS routing, TLS certificate issuance, and BuildKit registry push on minikube

View on GitHub

3.24.1

Released: 2026-03-31

Feature Additions

  • Added fluentd_memory rack parameter for configuring Fluentd DaemonSet memory allocation across all providers
  • Added terraform_update_timeout rack parameter for controlling Terraform node group update operation timeouts
  • Added support for mixed ARM/x86 architecture node groups within a single rack with architecture-aware build scheduling via the BuildArch app parameter

Updates

  • Extended rack install parameter templates to Azure, GCP, and DigitalOcean with expanded AWS parameter coverage
  • Improved CLI performance with parallel rack enumeration, lazy loading, and sidecar metadata caching
  • Standardized on Go 1.24.13 across all builds, eliminating Go 1.23 CVEs in the darwin/amd64 CLI

Fixes

  • Fixed API to return correct HTTP status codes (404, 409, 400, 501) instead of 500 for all errors, with JSON error response support
  • Fixed startupProbe using liveness timing values instead of its own configuration
  • Fixed local rack DNS resolution to route through ingress-nginx-controller instead of vestigial router service

View on GitHub

3.24.2

Released: 2026-04-06

Feature Additions

  • Added Karpenter support for AWS EKS as an opt-in alternative to Cluster Autoscaler, with ~25 configurable parameters for workload nodes, build nodes, and custom NodePools

Updates

  • Added automatic rack parameter reconciliation across version transitions. Stale parameters are detected and removed before terraform apply, preventing failures during upgrades, downgrades, and version pinning

Fixes

  • Fixed convox deploy hanging or exiting silently during build log streaming due to an informer cache race condition
  • Fixed internalRouter services returning 404 due to internal DNS resolver routing to the external router instead of the internal router
  • Fixed convox logs failing with HTTP 401 after EKS token rotation (~1 hour of rack uptime)
  • Fixed ECR image cleanup failing silently for apps with required environment variables in convox.yml

View on GitHub

3.24.3

Released: 2026-04-13

Feature Additions

  • Added convox rack karpenter cleanup command for cleaning up orphaned Karpenter nodes after disabling Karpenter
  • Added dedicated field to additional_karpenter_nodepools_config for simple pool isolation without manual taint configuration
  • Added automatic nodeSelectorLabels inheritance for convox run. One-off processes now target the same nodes as their deployed Service
  • Added CLI parameter validation with unknown-key detection, fuzzy suggestions, install-only guards, managed-parameter protection, and type checking
  • Added --force (-f) flag to convox rack params set to override parameter validation guards

Updates

  • Extended dedicated-node toleration auto-injection to Services and Timers targeting convox.io/nodepool pools, matching existing convox.io/label behavior
  • Pinned CoreDNS, EBS CSI controller, EFS CSI controller, and AWS Load Balancer Controller to system nodes when Karpenter is enabled
  • Added unhealthyPodEvictionPolicy: AlwaysAllow to all Convox-managed PDBs, preventing unhealthy pods from blocking node consolidation and scale-down
  • Added Karpenter controller readiness gate before NodePool creation to prevent silently disappearing NodePools
  • Improved convox rack params display to decode additional_karpenter_nodepools_config and karpenter_config as human-readable JSON

Fixes

  • Fixed additional node group Terraform destroy/create cycle caused by for_each key mismatch on racks configured before 3.21.1
  • Fixed spurious EKS node group rolling updates caused by $Latest launch template version string
  • Fixed Karpenter consolidation being silently blocked by CoreDNS topology spread constraints and controller pods landing on workload nodes
  • Fixed LBC Helm value types for nodeSelector and toleration when Karpenter is enabled

View on GitHub

3.24.4

Released: 2026-04-16

Feature Additions

  • Added ecr_docker_hub_cache rack parameter for AWS that provisions an ECR pull-through cache for Docker Hub images on resource pods (Redis, Postgres, MySQL, MariaDB, Memcached, PostGIS). Docker Hub credentials are required
  • Added azure_files_enable rack parameter and azureFiles volumeOption for NFS shared storage on Azure AKS
  • Implemented convox instances terminate for Kubernetes racks with drain-aware node cordoning and EC2 termination on AWS

Updates

  • Masked sensitive values (docker_hub_password, secret_key, token) in convox rack params output as **********
  • Extended Docker Hub imagePullSecrets to resource, service, and timer pods when docker_hub_username and docker_hub_password are set
  • Added aws_s3_bucket_public_access_block on the managed storage bucket as an additional access restriction
  • Added CI linting pipeline with golangci-lint, govulncheck, tflint, and checkov
  • Bumped expr-lang/expr, opentelemetry/sdk, and stdapi for CVE patches
  • Replaced deprecated io/ioutil calls with modern standard library equivalents across the codebase

Fixes

  • Fixed rack install and update failures in AWS opt-in regions by forcing regional STS endpoints
  • Fixed deploy failures when port and ports specify the same port number in convox.yml
  • Fixed KEDA and VPA Helm install race condition on fresh AWS racks
  • Fixed Azure AKS OIDC issuer not enabled on existing clusters at Kubernetes 1.34+
  • Fixed missing cert-manager annotation on Azure API ingress causing TLS failures
  • Fixed PDB disable annotation typo (pdb-disbaledpdb-disabled); both spellings accepted

View on GitHub

3.24.5

Released: 2026-04-22

Feature Additions

  • Added container-level securityContext on services and timers with support for runAsNonRoot, runAsUser, runAsGroup, readOnlyRootFilesystem, allowPrivilegeEscalation, capabilities.add/drop, and seccompProfile (RuntimeDefault or Unconfined). Settings apply to Deployment pods, CronJob pods (timers), convox run, and convox exec containers. Validation catches unsupported seccomp profiles, malformed capability names, and the runAsNonRoot: true + runAsUser: 0 conflict at convox deploy time
  • Added convox env mask, convox env mask set, and convox env mask unset commands to mark environment variable keys as sensitive on a per-app basis. Masked values render as **** in convox env and convox releases info output on a TTY, while piped output and the new --reveal flag continue to show real values. The mask list is stored per-app on the rack and does not trigger a release promotion
  • Added health.port and liveness.port manifest fields so the readiness and liveness probes can target a dedicated health endpoint instead of the main service port. Accepts either scalar (port: 9090) or map (port: { port: 9090, scheme: https }) forms. Readiness auto-inherits the main service scheme when only the port is set; liveness does not auto-inherit. The startup probe continues to target the main service port
  • Added emptyDir.sizeLimit under volumeOptions to size ephemeral volumes (e.g. /dev/shm for ML inference sidecars). Validated at manifest parse time as a Kubernetes resource quantity.
  • Added --gpu and --gpu-vendor flags to convox scale for in-place GPU updates.
  • Added convox services update <service> command mirroring the convox scale update path with the same flag set (--count, --cpu, --memory, --gpu, --gpu-vendor).
  • Added a GPU column to convox scale output. Services with gpu.count: 0 render as -.
  • Added GPU-aware startup probe defaults. Services with scale.gpu.count > 0, port.port > 0, and no explicit startupProbe now receive a TCP startup probe with grace=300s, interval=10s, timeout=5s, failureThreshold=30, successThreshold=1, enough headroom for GPU model loads. Explicit user config always wins.
  • Surfaced GPU fields on the rack API: gpu and gpu-vendor on Service, gpu on Process, cluster-gpu and process-gpu on Capacity, gpu-capacity and gpu-allocatable on Instance.

Updates

  • Added --max-log-requests flag to convox logs and convox rack logs so services with more than 20 pods can stream logs past the default follow-stream concurrency cap. The default remains 20 when the flag is not supplied, preserving prior behavior
  • Added -g / --group filter to convox rack params that narrows output to a curated logical group (karpenter, network, security, scaling, nodes, build, registry, logging, ingress, domain, storage, retention, versions). Supports exact and unique-prefix matching (-g karp resolves to karpenter); ambiguous or unknown inputs print the full group list. Also extended the sensitive-param masking introduced in 3.24.4 to cover access_id, private_eks_host, private_eks_user, and private_eks_pass, closing a CLI leak path for private EKS credentials and DigitalOcean access key IDs
  • Added --reveal flag and TTY-gated masking to convox rack params. Sensitive values now render as ********** only on a TTY without --reveal; piped output always shows real values so existing backup and scripting flows (convox rack params > rack.txt, | grep, | jq) continue to work. Mirrors the pattern added to convox env in the same release.
  • scale.gpu.vendor now maps through an explicit vendor → resource-key table (nvidia, nvidia.comnvidia.com/gpu; amd, amd.comamd.com/gpu). Previously the template used a .com-suffix heuristic which emitted garbage resource keys for unknown or misspelled vendors, causing pods to stay Pending forever. Unknown or unset vendors now default to nvidia.com/gpu. Users with scale.gpu.vendor: nvidia, amd, nvidia.com, or amd.com see no change. Users with an invalid vendor string see their GPU pods begin scheduling on NVIDIA nodes instead of Pending indefinitely.
  • GPU pod scheduling on tainted GPU nodepools (e.g. additional_karpenter_nodepools_config with nvidia.com/gpu=true:NoSchedule) no longer depends on the ExtendedResourceToleration Kubernetes admission controller (which is not enabled by default on EKS). Convox now emits the matching tolerations: entry (operator: Exists, effect: NoSchedule) directly on each pod that declares scale.gpu.count > 0. This applies to service Deployments (via service.yml.tmpl), CronJob pods (via timer.yml.tmpl), convox scale/convox services update runtime mutations (via ServiceUpdate), and one-shot convox run --gpu N pods (via podSpecFromRunOptions). The emitted toleration is effect: NoSchedule only; clusters taint-ing GPU nodes with effect: NoExecute must continue to use the admission controller or custom admission webhooks.
  • convox run --gpu N --gpu-vendor VENDOR now honors the --gpu-vendor flag (previously the run path only emitted nvidia.com/gpu).

Fixes

  • Agent services (agent.enabled: true, backed by Kubernetes DaemonSets) now report their configured cpu and memory values via the rack API's ServiceList response, the convox scale output table, and the Console Services panel. Previously the DaemonSet branch of ServiceList omitted the resource reads, so agent services always showed cpu: 0, memory: 0 regardless of convox.yml scale settings. Any dashboard or tooling that sums per-service resource requests for an app will now include the agent's real footprint.
  • Removed the spurious sensitive = true attribute on the docker_hub_password Terraform variable that was blocking terraform apply against legacy rack state files. The credential remains masked in convox rack params output via the CLI sensitiveParams mechanism, and rack Terraform state continues to be stored encrypted; no protection was removed, only an attribute that was breaking the legacy update path.

Behavior change: privileged: true now renders into Deployment and CronJob pod specs

The top-level privileged: true service flag was previously honored only by convox run on V3. Deployment and CronJob pods silently dropped it. This release brings V3 Deployment and CronJob rendering in line with V2 semantics and the V3 convox run path. If you have privileged: true in a convox.yml and do not actually want a privileged pod, remove the flag before upgrading. On first deploy after 3.24.5, a pod-spec diff will trigger one rolling restart on affected services

Notes

  • To change GPU vendor on a deployed service, edit scale.gpu.vendor in convox.yml and redeploy. Runtime vendor-swap via convox scale --gpu-vendor or convox services update --gpu-vendor is not supported in this release. The new vendor's resource key is added but the previous vendor's key remains in the pod spec, causing scheduling to stall.
  • AWS Neuron (aws.amazon.com/neuron) is not mapped in this release. Users should not set scale.gpu.vendor: neuron.

View on GitHub

3.24.6

Released: 2026-05-20

Feature Additions

  • Added convox builds import-image command for importing prebuilt container images from any public or private registry into the Rack
  • Added imagePullSecrets field in convox.yml for declarative private registry authentication on Services, Timers, and convox run pods
  • Added KEDA-based autoscaling with scale.autoscale block supporting CPU, memory, GPU utilization, queue depth, and custom triggers
  • Added scale-to-zero support with scale.min: 0 and cold-start indicators in convox scale and convox services
  • Added Console-driven Triggers Override with convox services triggers enable/disable/threshold-set CLI commands
  • Added per-App cost tracking and monthly budget caps with convox budget and convox cost commands
  • Added GPU observability infrastructure with DCGM Exporter and Prometheus integration via gpu_observability_enable Rack parameter
  • Added prometheus_url and grafana_url Rack parameters for Prometheus integration and Grafana deep-linking
  • Added eks_api_server_private_access_cidrs Rack parameter for restricting EKS API private endpoint access by CIDR
  • Added eks_log_types Rack parameter for enabling EKS control plane logging

Updates

  • Upgraded internal TLS to ECDSA P-256 certificates and consolidated RBAC across Rack API operations
  • Added Secure and HttpOnly flags to session cookies and server-side read/idle timeouts
  • Added WebSocket proxy authorization, SSRF prevention, tar injection guards, and proxy hardening
  • Added HMAC webhook signing via webhook_signing_key Rack parameter for outbound event notifications
  • Added actor attribution for all API operations to support audit trails
  • Improved credential redaction in log output and error messages

Behavior Changes

  • Bool Rack parameters now persist as canonical true/false regardless of input form (1, 0, t, True, etc.)
  • Admin-role gates added on budget cap mutations (budget set --monthly-cap, budget clear, budget reset --force-clear-cooldown)
  • New webhook event types for promote lifecycle (completed/errored/cancelled), scale overrides, and budget cap events

Fixes

  • Fixed CLI panic on malformed rack params set arguments
  • Fixed Karpenter cleanup timeout and nodeSelector handling during configuration changes
  • Fixed convox releases promote hang when webhook fanout coincided with budget gate evaluation
  • Added reaper for pods stuck in Terminating state beyond their grace period
  • Fixed startup pods reporting as "unhealthy" instead of "starting" in convox ps and convox services
  • Added periodic GPU node label reconciliation to self-heal missed convox.io/gpu-vendor labels on newly provisioned nodes
  • Fixed convox logs -a <app> returning empty output. It now streams logs from all Service pods

View on GitHub

3.24.7

Released: 2026-05-27

Feature Additions

  • Added ecr_full_access and ecr_additional_policy_arn rack parameters for customizing ECR IAM permissions on the Rack API role. ecr_full_access restores pre-3.24.6 blanket ECR access; ecr_additional_policy_arn attaches a user-provided IAM policy for fine-grained repo access
  • Added eks_access_entries rack parameter for migrating from the legacy aws-auth ConfigMap to EKS Access Entries. Creates access entries for both the managing IAM role and the nodes role, enabling users to safely remove the aws-auth ConfigMap after migration

Updates

  • Added Karpenter enablement CLI validation guards for node_capacity_type conflicts and launch template parameter combinations on non-HA racks, preventing scheduling deadlocks during Karpenter enablement
  • Improved case handling for karpenter_enabled and karpenter_auth_mode validation checks using case-insensitive comparison to handle Terraform state format variations

Fixes

  • Fixed CloudWatch FilterLogEvents polling at 40+ calls/second per stream, now throttled to 1-5 calls/second with per-path sleep intervals. Eliminates FilterLogEvents rate limit exhaustion on accounts with multiple racks or concurrent log viewers
  • Fixed Helm release dependency ordering to wait for node group provisioning before chart installation, preventing race conditions on fresh rack installs

View on GitHub

3.24.8

Released: 2026-05-31

Feature Additions

  • Added router_type rack parameter for selecting the AWS rack ingress router. nginx (the default) is unchanged; contour switches the rack to a Contour (Envoy) ingress. Switching router_type on a rack with running apps takes every app offline until each one is redeployed, so it is currently intended for new racks or staging. See Ingress Router for the full migration caveat
  • Added contour_internal_tls, contour_cpu_request, contour_memory_request, envoy_cpu_request, and envoy_memory_request rack parameters for configuring the Contour (Envoy) ingress when router_type=contour. contour_internal_tls (default on) encrypts traffic between the internal router and services using a rack-issued certificate

Fixes

  • Fixed deploy failures that could occur after the 3.24.4 port-deduplication change when a Service's last-applied configuration carried a duplicate port. The rack now recovers automatically by recreating the affected Service on the next deploy, so a single redeploy clears the failure with no manual intervention

View on GitHub

See Also