Documentation
Conversations/Conversations/Dual-model first-response comparison
Conversations

Dual-model first-response comparison

On the first reply of a new chat, Reverie quietly generates a second response from a different model and shows both side by side. You pick the one you prefer. That choice feeds back into the model picker's community stats.

When you send your first message to a character, Reverie often runs the reply twice — once with the model you have selected, and once with a randomly-chosen alternative. Both responses appear side by side under a Choose a response card. Whichever you pick becomes the canonical first reply for the conversation.

The other reply isn't discarded — your choice is recorded as a head-to-head data point that contributes to the win-rate stats shown on the model picker. Over time, the community's choices accumulate into a real ranking that's transparently visible to every user.

What you see

After the AI's first reply, a small card appears:

  • Primary response — generated with your active model
  • Alternative response — generated with a different model the system picked
  • Choose a response — radio between the two
  • Don't show again — checkbox to permanently disable the comparison
  • A Beta badge

Pick one and the chat continues from that reply.

When it triggers

Only on the first AI reply of a new chat thread. After that, normal chat resumes and no further alternatives are generated.

It runs only when both models qualify:

  • Your primary model is in the comparison pool
  • An alternative model is available

The system excludes:

  • Free models (multiplier 0)
  • Very expensive models (multiplier > 2.0)
  • Hidden models
  • Always-on reasoning models (DeepSeek R1) — slower output throws off the pairing
  • Group chats and special chat modes that need consistent model behavior

Cost

The alternative response uses a separate, smaller credit check before it's generated. If you don't have enough credits for both, the alternative is skipped and you just see the primary reply.

Comparison-generated alternatives are recorded with recordOnly: true — they don't deduct credits twice. You're charged once for the response you actually pick.

How the data flows back

Each time you pick Primary or Alternative, the system records a head-to-head outcome for that model pair. The model picker aggregates these and shows:

  • Win rate per model pair (e.g. "GLM 5 wins 62% of comparisons against MiMo V2 Flash") — visible above a sample threshold so small numbers don't lie
  • Confidence level (1–5 stars) based on sample size
  • Like rate from a separate thumb-up / thumb-down signal on individual replies
  • Performance stats — time to first token, total time, tokens per second — measured from real chat metadata

These are the numbers you see when you tap into a model in the picker: "Win rate: 62% · Confidence: ⭐⭐⭐⭐" etc.

Turning it off

Tap the Don't show again checkbox in the comparison card. Comparison turns off for your account from then on — no more dual generations, primary response only.

You can re-enable later by toggling Chat settings → Show first-response comparison (or whatever your current preference key is named).

Why this exists

Two reasons:

  1. You get a better first reply for free — the AI's first message sets the tone for the whole chat, so it's worth getting right. Picking from two candidates is higher-quality than rolling once.
  2. The platform gets honest model rankings. Internal benchmarks measure "which model writes better English." Real users vote on "which model writes the character better, in my taste, on this specific scene." Those are different signals — only the second one matters for picking a model for actual chats.

The win-rate stats in the model picker exist because of comparison. Without it, model rankings would be marketing claims.

  • Choosing a model — using the stats to pick a daily driver
  • First-response enhancement — six one-tap rewrites of the chosen first reply
  • Forking — A/B comparison anywhere in the chat (not just the first reply) by forking and switching models

On this page