Tagged with

Gpu

This post thumbnail

24 June 2026 06:30 PM

Deploy Qwen2.5-1.5B-Instruct on a Kubernetes GPU node with vLLM, expose it as an OpenAI-compatible API, and verify it with a real curl request.