LLMOps — Legal QA Platform

The problem

A working LLM demo is not a product. To run legal Q&A in production you need reliability, observability, cost control, and a way to ship changes safely — none of which a notebook gives you.

What I built

A full LLMOps platform that takes a legal-QA system from prototype to production-grade service:

Model routing with LiteLLM — a single interface across providers/models with fallback.
Semantic caching in Redis — repeated and near-duplicate questions are served from cache, cutting latency and cost.
Observability — ELK for logs, Prometheus + Grafana for metrics and dashboards.
Infrastructure as code — the whole stack defined in Terraform and deployed on Google Kubernetes Engine with CI/CD.

Architecture

GKE hosts the services; Terraform makes the environment reproducible.
CI/CD pipelines build, test, and roll out changes.
LiteLLM + Redis sit in front of the models for routing and caching; the ELK + Prometheus/Grafana stack makes behaviour and cost visible.

Outcome

A legal-QA service that is observable, cost-aware, reproducible, and safe to iterate on — the difference between a demo and something a team can actually operate.

What you get

If you have an LLM prototype that needs to become a real service, I can build the deployment, routing, caching, observability, and IaC around it.