3.24 Releases
Convox 3.24 upgrades Kubernetes to 1.34, introduces the convox deploy-debug command, adds mixed ARM/x86 architecture support, and adds Karpenter as an opt-in alternative to Cluster Autoscaler for AWS EKS node provisioning. The 3.24.8 release adds Contour (Envoy) as an opt-in alternative to the nginx ingress router on AWS, plus an automatic deploy-recovery fix for Services carrying a duplicate port. The 3.24.7 release adds ECR IAM policy customization, EKS Access Entries for migrating off the aws-auth ConfigMap, Karpenter enablement validation guards, and CloudWatch log streaming rate limiting. The 3.24.6 release adds KEDA-based autoscaling with scale-to-zero, per-app cost tracking and monthly budget caps, GPU observability with DCGM and Prometheus, HMAC webhook signing, and prebuilt image imports. Earlier releases include Fluentd memory tuning, Terraform timeout control, automatic parameter reconciliation across version transitions, and several reliability fixes.
3.24.0
Released: 2026-03-24
Feature Additions
- Added
convox deploy-debugcommand for diagnosing deploy failures without kubectl access
Updates
- Upgraded Kubernetes to v1.34
- Updated BuildKit to v0.28.0
- Updated CoreDNS to v1.13.2
- Updated EBS CSI Driver to v1.56.0
- Updated EFS CSI Driver to v2.3.0
- Updated Pod Identity to v1.3.10
- Updated VPC CNI to v1.21.1
Fixes
- Fixed local development rack DNS routing, TLS certificate issuance, and BuildKit registry push on minikube
3.24.1
Released: 2026-03-31
Feature Additions
- Added
fluentd_memoryrack parameter for configuring Fluentd DaemonSet memory allocation across all providers - Added
terraform_update_timeoutrack parameter for controlling Terraform node group update operation timeouts - Added support for mixed ARM/x86 architecture node groups within a single rack with architecture-aware build scheduling via the
BuildArchapp parameter
Updates
- Extended rack install parameter templates to Azure, GCP, and DigitalOcean with expanded AWS parameter coverage
- Improved CLI performance with parallel rack enumeration, lazy loading, and sidecar metadata caching
- Standardized on Go 1.24.13 across all builds, eliminating Go 1.23 CVEs in the darwin/amd64 CLI
Fixes
- Fixed API to return correct HTTP status codes (404, 409, 400, 501) instead of 500 for all errors, with JSON error response support
- Fixed startupProbe using liveness timing values instead of its own configuration
- Fixed local rack DNS resolution to route through ingress-nginx-controller instead of vestigial router service
3.24.2
Released: 2026-04-06
Feature Additions
- Added Karpenter support for AWS EKS as an opt-in alternative to Cluster Autoscaler, with ~25 configurable parameters for workload nodes, build nodes, and custom NodePools
Updates
- Added automatic rack parameter reconciliation across version transitions. Stale parameters are detected and removed before
terraform apply, preventing failures during upgrades, downgrades, and version pinning
Fixes
- Fixed
convox deployhanging or exiting silently during build log streaming due to an informer cache race condition - Fixed
internalRouterservices returning 404 due to internal DNS resolver routing to the external router instead of the internal router - Fixed
convox logsfailing with HTTP 401 after EKS token rotation (~1 hour of rack uptime) - Fixed ECR image cleanup failing silently for apps with required environment variables in
convox.yml
3.24.3
Released: 2026-04-13
Feature Additions
- Added
convox rack karpenter cleanupcommand for cleaning up orphaned Karpenter nodes after disabling Karpenter - Added
dedicatedfield toadditional_karpenter_nodepools_configfor simple pool isolation without manual taint configuration - Added automatic
nodeSelectorLabelsinheritance forconvox run. One-off processes now target the same nodes as their deployed Service - Added CLI parameter validation with unknown-key detection, fuzzy suggestions, install-only guards, managed-parameter protection, and type checking
- Added
--force(-f) flag toconvox rack params setto override parameter validation guards
Updates
- Extended
dedicated-nodetoleration auto-injection to Services and Timers targetingconvox.io/nodepoolpools, matching existingconvox.io/labelbehavior - Pinned CoreDNS, EBS CSI controller, EFS CSI controller, and AWS Load Balancer Controller to system nodes when Karpenter is enabled
- Added
unhealthyPodEvictionPolicy: AlwaysAllowto all Convox-managed PDBs, preventing unhealthy pods from blocking node consolidation and scale-down - Added Karpenter controller readiness gate before NodePool creation to prevent silently disappearing NodePools
- Improved
convox rack paramsdisplay to decodeadditional_karpenter_nodepools_configandkarpenter_configas human-readable JSON
Fixes
- Fixed additional node group Terraform destroy/create cycle caused by
for_eachkey mismatch on racks configured before 3.21.1 - Fixed spurious EKS node group rolling updates caused by
$Latestlaunch template version string - Fixed Karpenter consolidation being silently blocked by CoreDNS topology spread constraints and controller pods landing on workload nodes
- Fixed LBC Helm value types for nodeSelector and toleration when Karpenter is enabled
3.24.4
Released: 2026-04-16
Feature Additions
- Added
ecr_docker_hub_cacherack parameter for AWS that provisions an ECR pull-through cache for Docker Hub images on resource pods (Redis, Postgres, MySQL, MariaDB, Memcached, PostGIS). Docker Hub credentials are required - Added
azure_files_enablerack parameter andazureFilesvolumeOption for NFS shared storage on Azure AKS - Implemented
convox instances terminatefor Kubernetes racks with drain-aware node cordoning and EC2 termination on AWS
Updates
- Masked sensitive values (
docker_hub_password,secret_key,token) inconvox rack paramsoutput as********** - Extended Docker Hub
imagePullSecretsto resource, service, and timer pods whendocker_hub_usernameanddocker_hub_passwordare set - Added
aws_s3_bucket_public_access_blockon the managed storage bucket as an additional access restriction - Added CI linting pipeline with golangci-lint, govulncheck, tflint, and checkov
- Bumped
expr-lang/expr,opentelemetry/sdk, andstdapifor CVE patches - Replaced deprecated
io/ioutilcalls with modern standard library equivalents across the codebase
Fixes
- Fixed rack install and update failures in AWS opt-in regions by forcing regional STS endpoints
- Fixed deploy failures when
portandportsspecify the same port number inconvox.yml - Fixed KEDA and VPA Helm install race condition on fresh AWS racks
- Fixed Azure AKS OIDC issuer not enabled on existing clusters at Kubernetes 1.34+
- Fixed missing cert-manager annotation on Azure API ingress causing TLS failures
- Fixed PDB disable annotation typo (
pdb-disbaled→pdb-disabled); both spellings accepted
3.24.5
Released: 2026-04-22
Feature Additions
- Added container-level
securityContexton services and timers with support forrunAsNonRoot,runAsUser,runAsGroup,readOnlyRootFilesystem,allowPrivilegeEscalation,capabilities.add/drop, andseccompProfile(RuntimeDefaultorUnconfined). Settings apply to Deployment pods, CronJob pods (timers),convox run, andconvox execcontainers. Validation catches unsupported seccomp profiles, malformed capability names, and therunAsNonRoot: true+runAsUser: 0conflict atconvox deploytime - Added
convox env mask,convox env mask set, andconvox env mask unsetcommands to mark environment variable keys as sensitive on a per-app basis. Masked values render as****inconvox envandconvox releases infooutput on a TTY, while piped output and the new--revealflag continue to show real values. The mask list is stored per-app on the rack and does not trigger a release promotion - Added
health.portandliveness.portmanifest fields so the readiness and liveness probes can target a dedicated health endpoint instead of the main service port. Accepts either scalar (port: 9090) or map (port: { port: 9090, scheme: https }) forms. Readiness auto-inherits the main service scheme when only the port is set; liveness does not auto-inherit. The startup probe continues to target the main service port - Added
emptyDir.sizeLimitundervolumeOptionsto size ephemeral volumes (e.g./dev/shmfor ML inference sidecars). Validated at manifest parse time as a Kubernetes resource quantity. - Added
--gpuand--gpu-vendorflags toconvox scalefor in-place GPU updates. - Added
convox services update <service>command mirroring theconvox scaleupdate path with the same flag set (--count,--cpu,--memory,--gpu,--gpu-vendor). - Added a
GPUcolumn toconvox scaleoutput. Services withgpu.count: 0render as-. - Added GPU-aware startup probe defaults. Services with
scale.gpu.count > 0,port.port > 0, and no explicitstartupProbenow receive a TCP startup probe withgrace=300s,interval=10s,timeout=5s,failureThreshold=30,successThreshold=1, enough headroom for GPU model loads. Explicit user config always wins. - Surfaced GPU fields on the rack API:
gpuandgpu-vendoronService,gpuonProcess,cluster-gpuandprocess-gpuonCapacity,gpu-capacityandgpu-allocatableonInstance.
Updates
- Added
--max-log-requestsflag toconvox logsandconvox rack logsso services with more than 20 pods can stream logs past the default follow-stream concurrency cap. The default remains20when the flag is not supplied, preserving prior behavior - Added
-g/--groupfilter toconvox rack paramsthat narrows output to a curated logical group (karpenter,network,security,scaling,nodes,build,registry,logging,ingress,domain,storage,retention,versions). Supports exact and unique-prefix matching (-g karpresolves tokarpenter); ambiguous or unknown inputs print the full group list. Also extended the sensitive-param masking introduced in 3.24.4 to coveraccess_id,private_eks_host,private_eks_user, andprivate_eks_pass, closing a CLI leak path for private EKS credentials and DigitalOcean access key IDs - Added
--revealflag and TTY-gated masking toconvox rack params. Sensitive values now render as**********only on a TTY without--reveal; piped output always shows real values so existing backup and scripting flows (convox rack params > rack.txt,| grep,| jq) continue to work. Mirrors the pattern added toconvox envin the same release. scale.gpu.vendornow maps through an explicit vendor → resource-key table (nvidia,nvidia.com→nvidia.com/gpu;amd,amd.com→amd.com/gpu). Previously the template used a.com-suffix heuristic which emitted garbage resource keys for unknown or misspelled vendors, causing pods to stay Pending forever. Unknown or unset vendors now default tonvidia.com/gpu. Users withscale.gpu.vendor: nvidia,amd,nvidia.com, oramd.comsee no change. Users with an invalid vendor string see their GPU pods begin scheduling on NVIDIA nodes instead of Pending indefinitely.- GPU pod scheduling on tainted GPU nodepools (e.g.
additional_karpenter_nodepools_configwithnvidia.com/gpu=true:NoSchedule) no longer depends on theExtendedResourceTolerationKubernetes admission controller (which is not enabled by default on EKS). Convox now emits the matchingtolerations:entry (operator: Exists,effect: NoSchedule) directly on each pod that declaresscale.gpu.count > 0. This applies to service Deployments (viaservice.yml.tmpl), CronJob pods (viatimer.yml.tmpl),convox scale/convox services updateruntime mutations (viaServiceUpdate), and one-shotconvox run --gpu Npods (viapodSpecFromRunOptions). The emitted toleration iseffect: NoScheduleonly; clusters taint-ing GPU nodes witheffect: NoExecutemust continue to use the admission controller or custom admission webhooks. convox run --gpu N --gpu-vendor VENDORnow honors the--gpu-vendorflag (previously the run path only emittednvidia.com/gpu).
Fixes
- Agent services (
agent.enabled: true, backed by Kubernetes DaemonSets) now report their configuredcpuandmemoryvalues via the rack API'sServiceListresponse, theconvox scaleoutput table, and the Console Services panel. Previously the DaemonSet branch ofServiceListomitted the resource reads, so agent services always showedcpu: 0, memory: 0regardless ofconvox.ymlscale settings. Any dashboard or tooling that sums per-service resource requests for an app will now include the agent's real footprint. - Removed the spurious
sensitive = trueattribute on thedocker_hub_passwordTerraform variable that was blockingterraform applyagainst legacy rack state files. The credential remains masked inconvox rack paramsoutput via the CLIsensitiveParamsmechanism, and rack Terraform state continues to be stored encrypted; no protection was removed, only an attribute that was breaking the legacy update path.
Behavior change: privileged: true now renders into Deployment and CronJob pod specs
The top-level privileged: true service flag was previously honored only by convox run on V3. Deployment and CronJob pods silently dropped it. This release brings V3 Deployment and CronJob rendering in line with V2 semantics and the V3 convox run path. If you have privileged: true in a convox.yml and do not actually want a privileged pod, remove the flag before upgrading. On first deploy after 3.24.5, a pod-spec diff will trigger one rolling restart on affected services
Notes
- To change GPU vendor on a deployed service, edit
scale.gpu.vendorinconvox.ymland redeploy. Runtime vendor-swap viaconvox scale --gpu-vendororconvox services update --gpu-vendoris not supported in this release. The new vendor's resource key is added but the previous vendor's key remains in the pod spec, causing scheduling to stall. - AWS Neuron (
aws.amazon.com/neuron) is not mapped in this release. Users should not setscale.gpu.vendor: neuron.
3.24.6
Released: 2026-05-20
Feature Additions
- Added
convox builds import-imagecommand for importing prebuilt container images from any public or private registry into the Rack - Added
imagePullSecretsfield inconvox.ymlfor declarative private registry authentication on Services, Timers, andconvox runpods - Added KEDA-based autoscaling with
scale.autoscaleblock supporting CPU, memory, GPU utilization, queue depth, and custom triggers - Added scale-to-zero support with
scale.min: 0and cold-start indicators inconvox scaleandconvox services - Added Console-driven Triggers Override with
convox services triggers enable/disable/threshold-setCLI commands - Added per-App cost tracking and monthly budget caps with
convox budgetandconvox costcommands - Added GPU observability infrastructure with DCGM Exporter and Prometheus integration via
gpu_observability_enableRack parameter - Added
prometheus_urlandgrafana_urlRack parameters for Prometheus integration and Grafana deep-linking - Added
eks_api_server_private_access_cidrsRack parameter for restricting EKS API private endpoint access by CIDR - Added
eks_log_typesRack parameter for enabling EKS control plane logging
Updates
- Upgraded internal TLS to ECDSA P-256 certificates and consolidated RBAC across Rack API operations
- Added
SecureandHttpOnlyflags to session cookies and server-side read/idle timeouts - Added WebSocket proxy authorization, SSRF prevention, tar injection guards, and proxy hardening
- Added HMAC webhook signing via
webhook_signing_keyRack parameter for outbound event notifications - Added actor attribution for all API operations to support audit trails
- Improved credential redaction in log output and error messages
Behavior Changes
- Bool Rack parameters now persist as canonical
true/falseregardless of input form (1,0,t,True, etc.) - Admin-role gates added on budget cap mutations (
budget set --monthly-cap,budget clear,budget reset --force-clear-cooldown) - New webhook event types for promote lifecycle (
completed/errored/cancelled), scale overrides, and budget cap events
Fixes
- Fixed CLI panic on malformed
rack params setarguments - Fixed Karpenter cleanup timeout and nodeSelector handling during configuration changes
- Fixed
convox releases promotehang when webhook fanout coincided with budget gate evaluation - Added reaper for pods stuck in Terminating state beyond their grace period
- Fixed startup pods reporting as "unhealthy" instead of "starting" in
convox psandconvox services - Added periodic GPU node label reconciliation to self-heal missed
convox.io/gpu-vendorlabels on newly provisioned nodes - Fixed
convox logs -a <app>returning empty output. It now streams logs from all Service pods
3.24.7
Released: 2026-05-27
Feature Additions
- Added
ecr_full_accessandecr_additional_policy_arnrack parameters for customizing ECR IAM permissions on the Rack API role.ecr_full_accessrestores pre-3.24.6 blanket ECR access;ecr_additional_policy_arnattaches a user-provided IAM policy for fine-grained repo access - Added
eks_access_entriesrack parameter for migrating from the legacyaws-authConfigMap to EKS Access Entries. Creates access entries for both the managing IAM role and the nodes role, enabling users to safely remove theaws-authConfigMap after migration
Updates
- Added Karpenter enablement CLI validation guards for
node_capacity_typeconflicts and launch template parameter combinations on non-HA racks, preventing scheduling deadlocks during Karpenter enablement - Improved case handling for
karpenter_enabledandkarpenter_auth_modevalidation checks using case-insensitive comparison to handle Terraform state format variations
Fixes
- Fixed CloudWatch
FilterLogEventspolling at 40+ calls/second per stream, now throttled to 1-5 calls/second with per-path sleep intervals. EliminatesFilterLogEventsrate limit exhaustion on accounts with multiple racks or concurrent log viewers - Fixed Helm release dependency ordering to wait for node group provisioning before chart installation, preventing race conditions on fresh rack installs
3.24.8
Released: 2026-05-31
Feature Additions
- Added
router_typerack parameter for selecting the AWS rack ingress router.nginx(the default) is unchanged;contourswitches the rack to a Contour (Envoy) ingress. Switchingrouter_typeon a rack with running apps takes every app offline until each one is redeployed, so it is currently intended for new racks or staging. See Ingress Router for the full migration caveat - Added
contour_internal_tls,contour_cpu_request,contour_memory_request,envoy_cpu_request, andenvoy_memory_requestrack parameters for configuring the Contour (Envoy) ingress whenrouter_type=contour.contour_internal_tls(default on) encrypts traffic between the internal router and services using a rack-issued certificate
Fixes
- Fixed deploy failures that could occur after the 3.24.4 port-deduplication change when a Service's last-applied configuration carried a duplicate port. The rack now recovers automatically by recreating the affected Service on the next deploy, so a single redeploy clears the failure with no manual intervention
See Also
- Releases for the full release history
- Ingress Router for choosing between the nginx and Contour (Envoy) routers
- Karpenter for Karpenter node autoscaling configuration
- deploy-debug for deploy failure diagnostics
- BuildArch for architecture-aware build scheduling
- fluentd_memory for Fluentd memory tuning
- terraform_update_timeout for Terraform timeout configuration
- Health Checks for startupProbe configuration
- Workload Placement for mixed-architecture placement strategies
- releases_to_retain_after_active for release cleanup configuration
- ecr_docker_hub_cache for the Docker Hub pull-through cache
- azure_files_enable for Azure Files NFS volumes
- Volumes for the
azureFilesandawsEfsvolumeOption reference - Instance for
convox instances terminatebehavior on v3 racks - securityContext for container-level hardening on services and timers
- env for the env mask commands and
--revealflag - rack params for the
-g/--groupfilter and masking behavior - Separate Health Port for routing probes to a dedicated endpoint
- logs for the
--max-log-requestsflag - Budget Caps for the per-app monthly cap, alert threshold, and at-cap-action configuration introduced in 3.24.6
- Cost Tracking for the per-app spend rollup, breakdown, and unpriced-instance-types warning surface
- Webhook Signing for the HMAC-SHA256 outbound webhook signing key flow added in 3.24.6
ack_bymigration for the form-param deprecation cycle introduced in 3.24.6 (Sunset:Thu, 01 Oct 2026 00:00:00 GMTper RFC 8594)- Webhooks for the
Convox-Signatureheader reference (formatt=<unix-ts>,v1=<hex1>[,v1=<hex2>]; multiplev1=segments may appear during key rotation, and receivers verify against any one) - Rack Roles for the org-Admin / org-Member role distinction surfaced by the new webhook-signing reveal gate
- budget and cost for the new CLI command surfaces
- ecr_full_access for restoring pre-3.24.6 blanket ECR access
- ecr_additional_policy_arn for attaching custom ECR IAM policies
- eks_access_entries for migrating to EKS Access Entries
- Karpenter Enablement Guards for the new CLI validation on Karpenter parameter combinations