Self-hosted vLLM across a heterogeneous GPU fleet — mixed VRAM, mixed vendors, Envoy load balancing.
# GPU Inference ::: {.callout-note} Self-hosted vLLM across a heterogeneous GPU fleet --- mixed VRAM, mixed vendors, Envoy load balancing. ::: ## The GPU fleet ## vLLM deployment ## Envoy L7 load balancing ## Authentication for the OpenAI API ## ROCm and CUDA coexistence