Voice Sessions — How Savo's Voice AI Works
The technology behind Savo's voice-first conversation experience
Signal Sessions are voice-first. Participants speak naturally with Savo, an AI interviewer, using their device's microphone and speakers — no headset required, no special software to install. The conversation happens in a browser, over a real-time audio connection.
The Audio Connection
Savo uses real-time bidirectional audio streaming for Signal Sessions. When a participant starts a session, the browser establishes a live audio connection. Audio streams in both directions: the participant speaks and Savo listens, then Savo responds using synthesized speech. The participant hears Savo's response through their device speakers.
The connection is designed for low-latency, natural conversation — responses typically begin within a second of the participant finishing their turn.
Voice Activity Detection
Savo uses voice activity detection (VAD) to manage turn-taking — detecting when a participant has finished speaking and it's time for Savo to respond. VAD handles the natural pauses and overlaps that occur in real conversation, including situations where a participant interrupts or speaks over Savo (barge-in). The system is designed to feel like a natural conversation, not a walkie-talkie.
What Participants See and Hear
The conversation screen shows a visual wave animation that reflects Savo's state — speaking (active animation), listening (gentle idle), or processing (brief transitional state). This gives the participant a clear signal about whose turn it is.
A microphone indicator confirms that the participant's audio is being received. Real-time captions of Savo's speech are available and can be toggled on or off. Participant speech is not transcribed on-screen.
Common Audio Situations
Microphone not detected: The mic setup screen at the start of the session walks participants through granting microphone permission in their browser. If a mic isn't detected after permission is granted, the participant is directed to text mode.
Poor audio quality: If the system detects a poor audio signal, the participant receives a non-disruptive notification with suggestions — check the microphone position, reduce background noise, or try text mode.
Connection drops: If the internet connection drops during a session, the system attempts to reconnect automatically. Session state is preserved during a connection loss, so the conversation can resume without losing progress.
Extended silence: If a participant goes silent for approximately 2 minutes, Savo gently checks in (""Are you still there?""). If there's no response, the session closes gracefully and any data collected is preserved.
Browser Compatibility
Signal Sessions work in any modern browser that supports real-time audio — Chrome, Firefox, Safari, and Edge on both desktop and mobile. Participants don't need to install anything. If a participant is on an unsupported browser, they receive specific guidance on which browsers work and a direct path to text mode if needed.