Voice Infrastructure at Scale: Lessons From Our Stack

Running thousands of concurrent voice sessions isn’t “a bigger server.” It’s a set of deliberate boundaries: where audio flows, where state lives, and what happens when any layer degrades.

Media vs. control

We keep WebRTC media on paths optimized for jitter and packet loss, while tool calls, LLM turns, and CRM writes ride on separate request flows. That separation means a slow database query doesn’t starve the audio pipeline — and a spike in ASR load doesn’t block hang-up signaling.

Regional reality

Latency isn’t abstract for voice. We run inference and telephony entry points in regions close to where calls originate, especially for India and Southeast Asia, instead of forcing every round trip through a single US hub.

What we’re improving

Sharper backpressure when upstream STT or LLM providers throttle
Per-tenant quotas so one noisy neighbor can’t crowd out production traffic
More observability at the conversation level: trace IDs that follow a call end-to-end for support and for your own dashboards

We’ll keep publishing notes like this as the stack evolves — not marketing fluff, but the tradeoffs we actually make.

Media vs. control

Regional reality

What we’re improving

Keep reading