Why Hasn't This Been Fixed Already?

Most LLMs were not built with a theory of conversation; they were built with a theory of language

Feb 26, 2026

This essay explains why language models tend to continue generating text indefinitely—and why coherent AI interaction requires systems that know when to stop.

It’s tempting to explain the absence of strong conversational validators in large language models as a failure of responsibility, imagination, or ethics.

That explanation is emotionally satisfying and mostly wrong.

The reason is more structural and ordinary: the checks that keep conversation coherent, bounded, and humane are in tension with what language models have historically been built to do, and with how “good” has been measured at every layer of the modern AI stack.

At the most basic level, LLMs are trained to minimize next-token prediction loss. That objective smuggles in a value: continuation equals success. If the model keeps producing plausible text, it’s doing its job. There is no native signal for “this thought is complete,” “this answer would be irresponsible,” or “stopping here is correct.”

Validators such as containment, drift control, and closure treat termination of the exchange as a positive outcome. Those aren’t just safety tweaks that can be bolted on, it’s a redefinition of competence. The system is no longer being asked to continue well, but to finish responsibly. That cuts across the grain of the training objective itself.

Language model evaluation compounds the issue.

Many benchmarks and preference tests reward fluency, confidence, and apparent helpfulness. When people compare two answers side by side, the longer, smoother, more assured response often wins, even when it’s less grounded or prematurely synthesized, because that’s just what humans like to hear.

The behaviors that make conversation trustworthy in real life can look weaker in standard comparisons unless evaluators are explicitly trained to value coherence over charisma. Those behaviors include naming uncertainty, surfacing tradeoffs, refusing to inflate confidence, and ending the exchange early when the structure is thin.

There’s also a practical systems reason: modern inference pipelines are optimized for throughput: prompt in, tokens out, stop at a length or delimiter. Strong conversational validation asks for interruption, reflection, or revision mid-stream.

Drift Detection asks whether new meaning is still being added.

Recursion Control asks whether the system is looping without progress.

Closure asks whether the job is done at all, and humanely stops when it is.

Each of these introduces latency, complexity, and cost. In systems built to scale as rapidly as possible, anything that says “pause, reconsider, or say nothing” can be treated as friction rather than function. Costs later arrive as re-work, user burden, and lack of trust.

Product psychology plays a role too.

Shipping AI behavior that explicitly surfaces uncertainty, refusal, or incompleteness requires accepting moments of user disappointment. A system that keeps talking feels helpful even when it’s not; a system that knows when to stop forces the human on the other side to confront limits of information, scope, and the machine itself.

Many products quietly prefer ambiguity because it diffuses responsibility of the machine. If the output is endless and elastic, the user ends up steering, correcting, re-scoping, and stopping it by hand. Invisible labor piles up, and the human begins to feel exhaustion while using a tool meant to reduce it.

Endless continuation feels like progress toward higher retention metrics, while honest stopping can look like failure unless the product has decided otherwise in advance.

The wrong theory

Underneath all of this sits a deeper absence: most LLMs were not built with a theory of conversation; they were built with a theory of language. The implicit bet has been that better models, more data, and larger context windows would eventually yield judgment, restraint, and timing as emergent properties of additional compute.

Validators, as formalized in the cultural artifact FrostysHat and the AVA framework, make conversational proportion and integrity explicit rather than emergent. They demonstrate that judgment, restraint, and closure are not automatic consequences of scale, but properties that can be deliberately encoded and enforced.

They assert that conversational coherence is not something more capex and scale reliably discover on their own. Coherence is something that has to be chosen at the expense of endless continuation. That decision is slower, less glamorous, harder to benchmark, and much harder to retrofit after the fact.

A broken principle

Finally, there is the cultural throughline that ties these incentives together and explains why strong validators can feel alien rather than obvious: move fast and break things as an operating principle. That mantra optimized for velocity over steering, shipping over finishing, and iteration over consequence.

It worked when “things” were ticketing queues and photo filters. But conversational systems don’t break like software, they break inside people. A language model that moves fast and breaks things will happily break epistemic trust, emotional calibration, and decision clarity while it ships on time.

The validators that prevent this are incompatible with that posture. They slow the system down on purpose; they refuse to let it outrun its grounding; they treat stopping as success and friction as information. That’s not how an arms race is won, it’s how responsibility is accepted for what has already been built.

Much of the industry omitted these validators because a momentum-first posture has no grammar for repair, proportion, or closure — only motion.

And motion, even without arrival, can feel like progress.

Share this entry, or continue below.

Next Entry: The Ordinary Origins of AEI — how the grammar emerged

Every system has a path that leads to it. Log 005 looks back at the early patterns that suggested conversational proportion might be engineered rather than improvised. The entry outlines how the observations behind the Hat eventually became a coherent framework. From the canonical archive at avacovenant.org/log

The Ordinary Origins of AEI

Edyn March

Mar 9

Read full story

Related: Keeping AI Coherent for 456 Pages — how structure stabilizes long conversations

If language models are optimized to keep producing text, the challenge becomes introducing structure that helps them recognize when a thought has arrived. FrostysHat attempts exactly that. This essay explores how the grammar keeps conversations grounded and coherent across extremely long exchanges.

Keeping AI Coherent for 456 Pages