Blind A/B Win Rate
When users compare two replies without knowing which model wrote which, we record the pick. Win rates come from those blind matchups, scored with Wilson confidence intervals.
Models from eight vendors, each with quality scores from blind A/B tests, user like rates, and live speed stats — right in the picker. Switch any time, mid-conversation included.
DeepSeek V4 Flash
DeepSeek
GLM 5
Z.AI
Llama 3.1 8B
Meta
Our position
Every platform says its AI is great. We'd rather show you the win rates and let the models argue for themselves.
— Reverie team
Transparent by default
Four metrics, measured from real conversations on Reverie — not vendor benchmarks copied from a press release.
When users compare two replies without knowing which model wrote which, we record the pick. Win rates come from those blind matchups, scored with Wilson confidence intervals.
Every thumbs-up and thumbs-down on real replies rolls into a per-model like rate, so you can see how each model lands with actual roleplayers.
How long before the model starts answering — average, median, and p95, measured from live traffic on our own infrastructure.
Raw generation speed once the reply starts flowing. Fast models keep long scenes moving — and you can see exactly which ones those are.
Every quality stat carries a 1–5 star confidence level based on sample size, so you know how settled a number is before you trust it.
New models start in an 'evaluating' state — we show them without quality claims until enough blind comparisons have accumulated to say something honest.
The current lineup
Read live from our model registry — when the lineup changes, this table changes with it.
| Model | Context | Reasoning | Cost |
|---|---|---|---|
DeepSeek V3.2Basic DeepSeek | 164K | — | 0.5× credits |
DeepSeek V4 FlashBasic DeepSeek | 164K | Optional | 0.3× credits |
DeepSeek V4 ProBasic DeepSeek | 164K | Optional | 0.7× credits |
DeepSeek R1Basic DeepSeek | 164K | Always on | 1× credits |
MiMo V2 FlashBasic Xiaomi | 262K | Optional | 0.3× credits |
MiMo V2.5Basic Xiaomi | 262K | Optional | 0.3× credits |
GLM 4.5 AirBasic Z.AI | 131K | Optional | 0.5× credits |
GLM 4.7Basic Z.AI | 200K | Optional | 1× credits |
GLM 5Advanced Z.AI | 200K | Optional | 1.3× credits |
Gemini 3 Flash PreviewAdvanced | 1M | — | 1.2× credits |
Llama 3.1 8BBasic Meta | 131K | — | Free |
Credit multipliers are relative to the baseline credit rate. Image and video generation models are available separately in chat.
What you get
Switching models isn't a settings-menu easter egg here. It's how the product is meant to be used.
Change models between messages without losing the thread. Bring in a sharper model for the pivotal scene, drop back for small talk.
Didn't like a reply? Re-roll it with a different model and keep whichever version reads better. Those choices feed the win-rate stats.
Creators can set a preferred model for each character, so it speaks with the engine it was written for. Your own pick always overrides it.
A capable free model stays on the menu at zero credits, with fair-use limits — so running out of credits never means running out of conversation.
Each model shows its credit multiplier — from 0.3× budget models to 2× frontier ones — so cost is a choice you make, not a surprise on the bill.
Pure reasoning models for intricate plots, and hybrid ones that think only when asked. Pick the brain that fits the scene.
Common questions
Most platforms pick one model, brand it, and tell you it's wonderful. Reverie runs many and publishes how they actually perform against each other — in the product, where you pick.
Quality scores come from users choosing between two anonymous replies. Neither label nor vendor is visible during the comparison, so the numbers measure writing, not branding.
Beyond aggregate scores, the picker shows direct matchup data — which model beats which, and by how much, in the comparisons users actually ran.
A score from forty comparisons is not a score from four thousand. Each metric carries a 1–5 star confidence level derived from sample size, displayed alongside the number.
Time-to-first-token and tokens-per-second are measured from production conversations — average, median, and p95 — not quoted from a vendor datasheet.
Different scenes want different brains. A long slow-burn romance, a tactical war council, and a quick comedic exchange don't have the same ideal model — so you shouldn't be locked to one.
Chat models from DeepSeek, Google, Z.AI, Xiaomi, Meta and more, with context windows from 131K to a million tokens, all behind one interface and one credit balance.
Models are priced individually, from 0.3× to 2× the baseline credit rate, and the multiplier is printed on the model card. Cheaper models for everyday scenes, frontier models when it matters.
Dedicated reasoning models think before they write for intricate plots; hybrid models reason on demand; multimodal models can read the images you upload into chat.
The free model stays available regardless of your balance, with fair-use limits — a floor under every conversation, not a trial that expires.
When you're ready
Open the model picker in any chat, sort by the numbers, and find your favorite.