Picking an AI Voice That Doesn't Pull You Out of the Scene

You can write a perfect character card and still lose the scene the moment they speak out loud, because TTS is unforgiving in a way text isn't. A slightly-too-cheerful "I missed you" makes the line impossible to take seriously. A pitch that's half a step off makes the character sound 20 years younger than they are.

Reverie gives you a lot of control over how a character sounds. This is a guide to using it well - which engine, which voice, which knobs to actually turn.

The two engines: Edge (free) and MiniMax (premium)

Reverie ships with two TTS providers:

Microsoft Edge TTS (free, default). A workhorse. 17+ languages, multiple voice options per language, gender-specific defaults, fast generation. The catch: it's a steady-state reader. It says lines clearly but doesn't dramatically act them. Excellent for most scenes; underpowered for high-emotion ones.

MiniMax (premium). Higher fidelity, multiple voice IDs per language, and the feature that actually matters: emotion support. The synthesis can carry an emotional register - tenderness, anger, hesitation - in a way Edge cannot. Costs more per generation.

The right mental model isn't "free vs. paid." It's: Edge for everyday lines, MiniMax for moments you want to remember.

If you're running a long arc where the character is mostly bantering, Edge will carry it. The moment you hit the scene where the character finally says the thing they've been holding back, switch. It's the audio equivalent of paying for the premium model on a key reply - cheap insurance for the lines that matter.

How voice selection actually resolves

Reverie picks the voice for a character using this hierarchy:

User preference (your override) for this character, if set.
Character default that the creator chose.
Locale + gender fallback - the default Edge voice for the language and gender.

Practical implications:

If a character "sounds wrong," your override beats whatever the creator set. You don't have to hope the creator updates the card.
If you don't set anything, you're hearing the locale fallback. That's almost certainly not the best voice for your character; it's just the safest.
The fallback exists so a character never has no voice. It's the floor, not the goal.

Choosing a voice that fits the character

The most common mistake is choosing a voice by demographic match (the character is a 30-year-old woman, pick a 30-year-old woman voice). Demographic match gets you a generic voice. Character match gets you something memorable.

A 30-year-old battle medic doesn't sound like a 30-year-old yoga instructor. Same age, same gender, completely different voice. Listen for:

Default register. Is the voice's neutral tone warm or cool? Bright or muted? Match that to the character's resting state, not their dramatic state.
Pacing. Some voices read fast even at default speed. Some take their time. A breathless voice on a stoic character is a bad fit, no matter how "good" the voice technically is.
Implied age. Voices have a perceived age that may not match the character's stated one. A 19-year-old character with a 35-year-old voice will feel uncanny. Adjust pitch slightly, or pick a different voice.

The test: play a 15-second sample of a boring line ("Yeah. I'll be there. Around eight.") in the voice you're considering. If the boring line sounds wrong, the dramatic ones will be unrecoverable.

The knobs worth turning

Reverie exposes a small set of parameters. Most users either ignore them or over-tune them. Here's what each one is actually for.

Speed

Default is 1.0x. Most voices benefit from a small adjustment, rarely a big one.

0.9-0.95x for thoughtful characters, older characters, characters who hesitate.
1.0-1.05x for most characters.
1.1-1.15x for nervous characters, fast talkers, comic relief.
Below 0.85x or above 1.2x is a red flag - you're fighting the voice instead of choosing a different one.

Pitch

Use sparingly. A small shift (a few percent) can age a voice up or down convincingly. A large shift makes the voice sound processed - artificial, even when the underlying TTS is good. If you're tempted to push pitch hard, pick a different voice.

Emotion (MiniMax only)

The most underused feature. Emotion tagging lets the synthesis carry a register - the difference between "I missed you" said warmly and the same line said hollowly. If you're using MiniMax and not touching emotion, you're paying for the engine and not using its main advantage.

Use it for: emotional beats, scenes where text and tone need to disagree, moments you'd want a human voice actor to bring something to.

Don't use it for: every line. Emotion-on-everything reads as a soap opera. Default state should be neutral; the emotion lands harder when it's not constant.

Voice and language

Reverie supports voices in 17 languages with gender-specific Edge defaults. A few practical notes:

Locale, not just language. "Spanish" is not one accent. If your character is Argentinian and the voice is Castilian, the character will sound off to anyone who hears that distinction. Pick locale carefully.
Cross-language characters. If your character switches languages mid-conversation, the voice will switch engines/voices according to the language of the rendered text. Most setups handle this fine; some character voices have no cross-language equivalent and will sound jarringly different across languages.
Native-language characters. A character whose native language isn't English often sounds wrong in any English voice that doesn't have an appropriate accent. Edge's accented English voices, where they exist, are usually better than a default American voice.

Common voice mistakes

Picking the "best" voice instead of the right one. The voice with the most natural prosody is not the right voice if it sounds 25 when your character is 50.
Treating speed as a quality knob. Speed is a personality knob. Slower isn't "better;" it's a different character.
Layering emotion on every line. See above. Restraint reads as competence; constant emotion reads as drama-club.
Skipping the dull-line test. A voice has to handle "okay" and "mm-hmm" and "what time?" most of the time. If those lines sound wrong, the showpiece lines will too.
Forgetting you can override. The character creator made a choice. You're allowed to make a different one. The override is one tap away.

How this stacks with the rest of Reverie

Character writing - the voice rules in the character card translate directly to TTS choices. "Cuts their own sentences off when getting too sincere" suggests a voice with natural pause variability, not a steady reader.
Pacing - voice plus slowed speed amplifies a slow scene; voice plus default speed can rush the same scene without you noticing.
MiniMax for premium scenes - same logic as picking Claude for the showpiece reply. Use the premium engine where it matters; don't burn it on banter.

The takeaway

A good voice is one you stop noticing. It carries the line and gets out of the way.

Pick by character, not demographic. Test on boring lines. Touch speed and emotion lightly. Upgrade for the scenes you'll remember.

The voice isn't separate from the character. It is the character, for everyone who listens.