Contents

Your voice AI is mid-conversation. The user has just explained their problem in detail. Then the WebSocket drops.

When it reconnects, does your AI remember what was being discussed? Or does it start fresh with "Hello, how can I help you today?"

State management across connection interruptions is one of the more challenging aspects of building real-time voice applications. Get it wrong and your users experience a frustrating loss of context. Get it right and they never notice the reconnection happened.

Why Connections Drop

WebSocket connections in voice applications drop more often than you might expect:

Network instability. Mobile users move between cell towers. WiFi connections fluctuate. Corporate firewalls occasionally reset long-lived connections.

Load balancer timeouts. Many infrastructure configurations close idle connections after 60-120 seconds. Voice applications have natural pauses that can trigger these timeouts.

Intentional disconnection. Some operations require temporarily closing the media stream. Sending DTMF tones (pressing phone buttons) is a common example, as the telephony provider may require a stream restart after tone transmission.

Server deployments. Rolling deployments can cycle the server instance handling a call, requiring clients to reconnect to a new host.

Each of these scenarios should be invisible to the user. The conversation should continue seamlessly.

The State Management Challenge

Voice applications accumulate several types of state during a conversation:

Conversation history. What has been said so far? The AI needs this context to generate coherent responses.

Call metadata. Who is calling? What company are they from? What's their support ticket number?

Conversation stage. Are we gathering initial information? Confirming details? Wrapping up?

Pending operations. Was the AI in the middle of a tool call? Waiting for external data?

Audio state. What audio has been sent? What's currently playing? What's queued?

Some of this state lives on the server. Some lives in the WebSocket connection itself. When the connection drops, you need a strategy for each category.

Pattern 1: Connection Parameters

The simplest approach is to pass critical state through connection parameters when the WebSocket reconnects.

Most telephony platforms support custom parameters on media stream connections. When the connection establishes, these parameters are available to your handler. You might pass the client name, company, and ticket ID as URL parameters on the WebSocket connection.

When you need to reconnect (after sending DTMF tones, for example), you generate instructions that reconnect with the same parameters. The telephony platform passes these through, and your server can restore the basic context immediately.

This pattern works well for immutable metadata that was known at call start. It doesn't help with state that evolved during the conversation.

Pattern 2: Server-Side State Store

For conversation history and evolving state, maintain a server-side store keyed by call identifier.

The store holds everything that accumulates during a call: the conversation history, any metadata collected, the current conversation stage, and timestamps for activity tracking. When a new WebSocket connection opens, the handler looks up the call state by ID. If state exists, you're reconnecting to an ongoing conversation. If not, it's a new call.

This distinction matters. A reconnection should resume seamlessly with existing context. A new call should start with a greeting. The server-side store lets you tell the difference.

Important considerations:

TTL cleanup. Call state should have a time-to-live. Calls that ended without explicit cleanup shouldn't persist forever. We use 30 minutes as a reasonable default, long enough to handle extended holds, short enough to avoid memory bloat.

Memory vs external store. For single-server deployments, an in-memory store works fine. For multi-server deployments, you need shared storage like Redis to ensure any server can resume any call.

Concurrency. What happens if two WebSocket connections claim the same call ID simultaneously? This can occur during reconnection races. Implement locking or accept-last-writer semantics.

Pattern 3: Event Sourcing

For complex applications, consider event sourcing: store a log of everything that happened during the call, then replay events to reconstruct state.

Every significant action becomes an event with a timestamp and payload: user speech, AI responses, tool calls, DTMF sent, stage changes. These events are appended to a log keyed by call ID. To reconstruct state, you replay the events in order, rebuilding the conversation history and current stage from the sequence of actions.

Event sourcing adds complexity but provides powerful capabilities:

Audit trail. Every action is recorded with timestamps.
Debugging. Replay events to understand what happened during a problematic call.
Recovery. Reconstruct state at any point in the conversation history.

Handling Pending Operations

What if the connection drops while the AI is waiting for an external API response? Or in the middle of generating a response?

Idempotent operations. Design tool calls to be safely retriable. If the AI was looking up order status and the connection dropped, it should be able to retry the lookup on reconnection.

Operation timeouts. Don't wait forever for pending operations. If a tool call hasn't completed within a reasonable timeout, assume it failed and let the AI retry or explain the issue to the user.

Generation checkpoints. For streaming AI responses, periodically checkpoint what's been sent. On reconnection, either restart generation or continue from the checkpoint.

When handling reconnection with pending operations, check how long the operation has been running. If it's exceeded your timeout threshold, reset and have the AI apologise for losing track. If it's still within bounds, attempt to continue or retry the operation.

Audio State Recovery

Audio state is particularly tricky because you can't replay what's already been heard.

Track what's been played. Maintain a pointer into the response audio stream. On reconnection, skip to where playback left off.

Accept some loss. For brief disconnections, users may miss a word or two. This is usually acceptable. Humans do this in normal conversation ("Sorry, what was that?"). Designing for perfect audio continuity adds significant complexity.

Queue management. Clear the audio queue on disconnection. Stale audio that plays after reconnection creates a jarring experience.

The key insight is that audio state recovery doesn't need to be perfect. Clear pending audio on disconnection, note where playback was interrupted, and continue with the next response on reconnection. Users tolerate minor audio gaps far better than they tolerate context loss.

Testing Reconnection Scenarios

Reconnection handling is notoriously under-tested. Here's how to do it properly:

Chaos testing. Randomly drop connections during integration tests. Verify that conversations can continue.

Specific scenarios. Test each reconnection trigger explicitly:

Connection drop during user speech
Connection drop during AI response
Connection drop during tool execution
Connection drop during silence/idle

State verification. After reconnection, assert that all critical state was preserved:

Conversation history is intact
Call metadata is accessible
Conversation stage is correct

User experience testing. Have humans test the reconnection experience. Technical correctness doesn't guarantee good UX.

Production Monitoring

Monitor reconnection patterns in production:

Reconnection rate. What percentage of calls experience reconnections? High rates may indicate infrastructure issues.
Context loss incidents. How often do users have to repeat information after reconnection? This measures whether your state preservation is working.
Reconnection timing. How long do reconnections take? Long reconnection times compound the problem.

Set alerts for anomalies. A spike in reconnections often precedes user complaints.

The Graceful Degradation Mindset

Perfect state preservation isn't always achievable. Design for graceful degradation:

Acknowledge uncertainty. If state might be stale, have the AI briefly confirm: "Just to make sure I have this right, you mentioned your order number is 12345?"

Fail toward helpfulness. If critical state is lost, don't pretend otherwise. "I apologise, but I seem to have lost some context. Could you briefly remind me what you needed help with?"

Log for debugging. When state loss occurs, log enough detail to diagnose the cause. Patterns in state loss often reveal fixable infrastructure issues.

Reconnection handling is a solved problem in the sense that the patterns are well understood. It's an unsolved problem in the sense that every application has unique state requirements. The principles here provide a foundation. The specific implementation will depend on your architecture and user needs.

Building reliable voice applications?

SwiftCase provides a robust platform for voice-enabled workflow automation, handling the complexity of real-time communications so your team can focus on business processes.

Book a demo | Explore the platform | View pricing

Contents

Your voice AI is mid-conversation. The user has just explained their problem in detail. Then the WebSocket drops.

When it reconnects, does your AI remember what was being discussed? Or does it start fresh with "Hello, how can I help you today?"

Why Connections Drop

WebSocket connections in voice applications drop more often than you might expect:

Network instability. Mobile users move between cell towers. WiFi connections fluctuate. Corporate firewalls occasionally reset long-lived connections.

Load balancer timeouts. Many infrastructure configurations close idle connections after 60-120 seconds. Voice applications have natural pauses that can trigger these timeouts.

Server deployments. Rolling deployments can cycle the server instance handling a call, requiring clients to reconnect to a new host.

Each of these scenarios should be invisible to the user. The conversation should continue seamlessly.

The State Management Challenge

Voice applications accumulate several types of state during a conversation:

Conversation history. What has been said so far? The AI needs this context to generate coherent responses.

Call metadata. Who is calling? What company are they from? What's their support ticket number?

Conversation stage. Are we gathering initial information? Confirming details? Wrapping up?

Pending operations. Was the AI in the middle of a tool call? Waiting for external data?

Audio state. What audio has been sent? What's currently playing? What's queued?

Some of this state lives on the server. Some lives in the WebSocket connection itself. When the connection drops, you need a strategy for each category.

Pattern 1: Connection Parameters

The simplest approach is to pass critical state through connection parameters when the WebSocket reconnects.

This pattern works well for immutable metadata that was known at call start. It doesn't help with state that evolved during the conversation.

Pattern 2: Server-Side State Store

For conversation history and evolving state, maintain a server-side store keyed by call identifier.

This distinction matters. A reconnection should resume seamlessly with existing context. A new call should start with a greeting. The server-side store lets you tell the difference.

Important considerations:

Memory vs external store. For single-server deployments, an in-memory store works fine. For multi-server deployments, you need shared storage like Redis to ensure any server can resume any call.

Concurrency. What happens if two WebSocket connections claim the same call ID simultaneously? This can occur during reconnection races. Implement locking or accept-last-writer semantics.

Pattern 3: Event Sourcing

For complex applications, consider event sourcing: store a log of everything that happened during the call, then replay events to reconstruct state.

Event sourcing adds complexity but provides powerful capabilities:

Audit trail. Every action is recorded with timestamps.
Debugging. Replay events to understand what happened during a problematic call.
Recovery. Reconstruct state at any point in the conversation history.

Handling Pending Operations

What if the connection drops while the AI is waiting for an external API response? Or in the middle of generating a response?

Idempotent operations. Design tool calls to be safely retriable. If the AI was looking up order status and the connection dropped, it should be able to retry the lookup on reconnection.

Operation timeouts. Don't wait forever for pending operations. If a tool call hasn't completed within a reasonable timeout, assume it failed and let the AI retry or explain the issue to the user.

Generation checkpoints. For streaming AI responses, periodically checkpoint what's been sent. On reconnection, either restart generation or continue from the checkpoint.

Audio State Recovery

Audio state is particularly tricky because you can't replay what's already been heard.

Track what's been played. Maintain a pointer into the response audio stream. On reconnection, skip to where playback left off.

Queue management. Clear the audio queue on disconnection. Stale audio that plays after reconnection creates a jarring experience.

Testing Reconnection Scenarios

Reconnection handling is notoriously under-tested. Here's how to do it properly:

Chaos testing. Randomly drop connections during integration tests. Verify that conversations can continue.

Specific scenarios. Test each reconnection trigger explicitly:

Connection drop during user speech
Connection drop during AI response
Connection drop during tool execution
Connection drop during silence/idle

State verification. After reconnection, assert that all critical state was preserved:

Conversation history is intact
Call metadata is accessible
Conversation stage is correct

User experience testing. Have humans test the reconnection experience. Technical correctness doesn't guarantee good UX.

Production Monitoring

Monitor reconnection patterns in production:

Reconnection rate. What percentage of calls experience reconnections? High rates may indicate infrastructure issues.
Context loss incidents. How often do users have to repeat information after reconnection? This measures whether your state preservation is working.
Reconnection timing. How long do reconnections take? Long reconnection times compound the problem.

Set alerts for anomalies. A spike in reconnections often precedes user complaints.

The Graceful Degradation Mindset

Perfect state preservation isn't always achievable. Design for graceful degradation:

Acknowledge uncertainty. If state might be stale, have the AI briefly confirm: "Just to make sure I have this right, you mentioned your order number is 12345?"

Fail toward helpfulness. If critical state is lost, don't pretend otherwise. "I apologise, but I seem to have lost some context. Could you briefly remind me what you needed help with?"

Log for debugging. When state loss occurs, log enough detail to diagnose the cause. Patterns in state loss often reveal fixable infrastructure issues.

Building reliable voice applications?

SwiftCase provides a robust platform for voice-enabled workflow automation, handling the complexity of real-time communications so your team can focus on business processes.

Book a demo | Explore the platform | View pricing

Maintaining State Across WebSocket Reconnections in Voice Applications

Why Connections Drop

The State Management Challenge

Pattern 1: Connection Parameters

Pattern 2: Server-Side State Store

Pattern 3: Event Sourcing

Handling Pending Operations

Audio State Recovery

Testing Reconnection Scenarios

Production Monitoring

The Graceful Degradation Mindset

Building reliable voice applications?

Related Articles

Detecting Humans vs Machines in Voice AI: AMD and VAD Explained

Text Normalisation for Natural AI Speech: Making TTS Sound Human

AI Navigating IVR Menus: How Voice Agents Automate Phone System Interactions

Get automation insights delivered

Related Free Tools

How we build SwiftCase

Maintaining State Across WebSocket Reconnections in Voice Applications

Why Connections Drop

The State Management Challenge

Pattern 1: Connection Parameters

Pattern 2: Server-Side State Store

Pattern 3: Event Sourcing

Handling Pending Operations

Audio State Recovery

Testing Reconnection Scenarios

Production Monitoring

The Graceful Degradation Mindset

Building reliable voice applications?

Related Articles

Detecting Humans vs Machines in Voice AI: AMD and VAD Explained

Text Normalisation for Natural AI Speech: Making TTS Sound Human

AI Navigating IVR Menus: How Voice Agents Automate Phone System Interactions

Get automation insights delivered

Related Free Tools

How we build SwiftCase