Agentic AI, Post-Training
PyTorch, vLLM
Ray
Kubernetes
Bare-Metal GPUs
Workloads like agentic AI and post-training are exploding. They run on PyTorch and inference engines like vLLM. Those frameworks rely on distributed compute — increasingly Ray — which sits on top of Kubernetes and bare-metal GPU clusters.
K8s can schedule containers. It can’t manage multi-tenant GPU allocation, handle fractional GPU sharing, or optimize utilization across training and inference jobs competing for the same hardware. Kueue and Volcano are early attempts, but not production-complete.
We sit between orchestration and distributed compute. Our eBPF-based control plane attaches kernel-level probes to GPU drivers — giving Kubernetes real-time visibility into VRAM pressure, compute occupancy, and thermal state — and closes the scheduling loop with dynamic taints and a 500ms reconciliation cycle.
GPU availability is the bottleneck. We make sure every GPU-hour counts.