Contact — AutoscaleWorks

Get in Touch

Whether you're exploring self-hosted LLMs, planning a GPU cluster migration, or building intelligent security systems — we'd love to hear about your project.

✉

Email

[email protected]

🌎

Location

United States

🕑

Response Time

Within 24 hours on business days

💻

GitHub

github.com/mpwusr

Typical Engagements

vLLM on GKE GPU Clusters RAG Pipelines CV Surveillance Terraform IaC Helm Charts

FAQ

Common Questions

What GPU types do you work with?

Primarily NVIDIA L4 and H100 on GKE, but we also deploy on A100, T4, and AMD MI300X depending on workload requirements and cloud availability.

Do you work with cloud providers other than GCP?

Yes. While GKE is our primary platform, we also deploy on AWS (EKS), Azure (AKS), and Red Hat OpenShift AI. Our Terraform modules are cloud-agnostic where possible.

What models can you deploy with vLLM?

Any model supported by vLLM: Llama 3.x, Mistral, Mixtral, Qwen, Phi, DeepSeek, and more. We handle quantization, caching, and optimization for your specific use case.

What's the typical project timeline?

A standard vLLM + RAG deployment on GKE typically takes 2-4 weeks from kickoff to production. Complex multi-model systems or large migrations may take longer.

Let's Talk Infrastructure