The Science Behind Signal Sessions
How Savo's methodology turns conversation into measurable, reliable data
A Signal Session isn't a chatbot interaction. It's a structured measurement instrument — designed in advance to elicit specific kinds of evidence, scored against pre-specified dimensions, with every output traceable to the language that produced it.
That distinction matters. An AI that can hold a fluent conversation isn't the same as an AI that can produce defensible measurement. Savo is the latter.
The measurement problem
Good measurement of what people think requires three things at once: depth (what each person really thinks and why), scale (enough respondents to generalize), and quantification (scores you can compare across people, cohorts, and time).
For decades, no single instrument delivered all three. In-depth interviews gave depth and rigor but not scale — you can only run so many per day, and each interview adds a new source of variance. Surveys gave scale and quantification but not depth — they can only find what you thought to ask for, and a rating scale can't tell you why someone chose that number.
Signal Sessions resolve that trade-off. They deliver interview-grade depth at survey-grade scale, with every score grounded in behavioral evidence.
How rigor is maintained
The depth is what makes Signal Sessions powerful. The rigor is what makes them trustworthy. Four mechanisms keep the measurement sound:
Calibrated scoring. Every dimension is scored against a pre-specified rubric with behavioral signal anchors — documented criteria for what evidence at each score level looks like. The rubrics are human-authored by qualified scientists and calibrated against reference sets, with ongoing monitoring to catch drift. This is what the Standards for Educational and Psychological Testing require of any instrument that produces quantitative scores from qualitative material: documented procedures, rater calibration, and evidence of reliability.
Evidence gating. When a session doesn't produce enough evidence to score a dimension reliably, the system abstains. It doesn't produce a neutral score of 3. ""No Score"" is a meaningful output — it means the conversation didn't go deep enough on that dimension to support a reliable finding. Abstention is honest. Confabulation is not.
Traceability. Every dimension score links back to the specific language that produced it. The evidence chain is auditable at the sentence level. You can show exactly what someone said and why it scored the way it did.
Separation of roles. The AI agent conducting the conversation is not the AI agent scoring it. A separate signal-quality monitor operates in parallel throughout the session. Each role is accountable for a different source of error, which prevents any single agent from self-reinforcing its own outputs.
Structured, not just smart
Savo implements purpose-built Interview Modes — conversation protocols designed for specific measurement goals. Explore & Discover, Profile & Characterize, Recall & Reconstruct, and Intake & Triage each use a different facilitation strategy matched to what they're trying to elicit.
Each mode follows a structured protocol, which means every participant experiences the same core framework while receiving adaptive follow-ups based on what they actually say. That combination — consistency of structure, flexibility of execution — is what makes scores comparable across hundreds of participants.
What you can trust
A Savo Insights report isn't a summary of what participants said. It's a scored, evidence-backed measurement of the dimensions the event was designed to capture — with every finding traceable to the specific language that produced it.
That's the difference between plausible and valid. Valid is the bar Savo holds itself to.
For how dimensions are defined and scored, see Understanding Dimensions. For how evidence traceability works in practice, see Evidence Units.