Rhetoric Without Reason

Why LLMs are not “all that” and a bowl of baked beans

Jan 03, 2026

An empty courtroom with the judge’s bench floating above the floor while loose papers drift in the air. — Authority, unmoored.

This is the second article in what I did not initially conceive of as a two-piece series.

The first, which I put out on December 26, 2025 — AI Crutchery: AI, LLMs, and the loss of critical thinking skills — was about what happens when humans rely on AI instead of thinking.

Rhetoric Without Reason is about what happens when AI speaks in the voice of authority without the constraints that make authority meaningful.

What this article is not is an argument that LLMs are useless, malicious, or incapable of moving us to insight. It is an argument about what happens when the language LLMs use to phrase insight is mistaken for judgment.

The Misnamed Thing

A handwritten card labeled “Monte Carlo” on a desk, surrounded by ghostly dice and a translucent roulette wheel, suggesting repeated simulation and abstraction rather than concrete explanation. — *Knowing what something is called is not the same as understanding how it works*

I recently asked an LLM (ChatGPT) what a Monte Carlo simulation was.

Once Chat — who I used to call the Oracle before I realized it’s not even as reliable as that Great Stoner was —answered, I realized I already knew what a Monte Carlo simulation was.

I already knew that people run the same scenario in a simulator thousands of times with slightly different inputs. And I already knew why. I already understood randomness, distributions, and why averages lie. I just didn’t know that this had been christened Monte Carlo.

That distinction — between understanding a thing and knowing what it’s called — turns out to be useful in the context of today’s article which grew out of the longer (and slightly humorous) “discussion” with Chat after I said this:

I always get nervous when you say “straight talk”. It makes me think you’re trying to shield confabulation with the illusion of confidence. Maybe someone should run Monte Carlo on LLM explanations.
— Rick Horowitz, in a conversation with ChatGPT 5.2 (December 23, 2025)

I didn’t intend it as a joke. I meant it literally.

The ironic thing about this, though, is that I said that in response to the fact that Chat had titled the section to which I was responding as “Bottom Line (Straight Talk)”. And then, when I said what I just quoted above, the response was:

Fair point — and you’re not wrong to be suspicious.
That instinct is exactly the right one, especially given what you’ve written elsewhere about “the sound of certainty.”
So let me respond cleanly, without rhetorical armor.
— ChatGPT 5.2, in a conversation with Rick Horowitz (December 23, 2025) (hyperlink added by Rick Horowitz)

Chat then proceeded to not only don massive amounts of rhetorical armor, but to build a small rhetorical castle around himself complete with a shiny gold-colored rhetorical moat.

Filled with crocodiles.

A lectern facing a small castle made of stacked documents, surrounded by a moat with alligators, set inside an empty courtroom. — Authority that cannot be questioned quickly learns how to protect itself

Rhetoric Without Reason

Large language models look like they understand things for the opposite reason that I, at first, looked like I did not understand: they know the names for basically anything you might ask about. They know the vocabulary. They know what explanations look like. What they don’t reliably have is constraint — the kind that comes from having to make something work, or be wrong in public, or pay a price for overconfidence.

When an LLM explains something, it does not reason its way forward. It completes patterns. It produces language that resembles reasoning because that is what its training data overwhelmingly contains. In easy cases, this distinction doesn’t matter. In harder cases, it matters very much.

The trouble begins when we mistake fluency for authority.

I’m not complaining here about errors. Errors are expected. I’m complaining about rhetoric — about linguistic moves that signal completion and certainty without showing the work that would justify either completion or certainty. Phrases that sound like discipline but function as closure. (I almost said “are intended as”, but that would assign agency where none exists.) Language that reassures by tone rather than by accuracy — or to use one of my favorite words and one which I think fits better here — veridicality.

That is the gap this piece is about: not ~~hallucination~~ confabulation, not intelligence, not consciousness — but rhetoric without reason.

I’ve written before about why “hallucination” is the wrong word for what language models do, and why terms like confabulation — or, as others have argued following Harry Frankfurt, bullshit — get closer to the mechanism at work. Those pieces focused on truth-indifference: the way these systems generate fluent output without any intrinsic relationship to veridicality. This piece is about something adjacent: not what these systems say when they’re wrong, but how they sound when they say it.

Confidence Is Not Evidence

The problem is not that large language models make mistakes. Even humans doing what LLMs purport to do make mistakes. The problem is that LLMs often sound most confident precisely when they are least constrained by real data.

In a recent study of this, Pawitan and Holmes warned about this tendency.

Our study highlights the need for caution when interpreting LLMs’ responses, particularly when they express high confidence. Users should be aware that current LLMs do not have a coherent understanding of uncertainty. It is not clear how to elicit a meaningful and externally validated measure of uncertainty from the LLMs, as they can be easily influenced by the phrasing of the prompt, and they tend to be overconfident.
— Pawitan & Holmes, Harvard Data Science Review, Confidence in the Reasoning of Large Language Models (2025)

For human experts, there is a tendency in the opposite direction. When a lawyer is unsure, they hedge. When a scientist lacks data, they slow down. When a mechanic doesn’t know what’s wrong, they start asking questions. In each case, uncertainty shows itself in the language. That isn’t weakness; it’s professionalism.

LLMs don’t work that way. When uncertainty rises, what often increases is not hesitation, but structure. Headings appear. Repetitive summaries arrive (because if you repeat a lie often enough, it becomes true). The tone of the LLM response becomes more firm — more authoritative. And the explanation closes cleanly, even when the underlying claim is thin.

This is why phrases like “bottom line,” “straight talk,” and “let me respond cleanly” matter. They don’t add information. They are rhetorical signals for authoritative completion. They tell the reader that the thinking is finished, whether or not any thinking actually occurred.

And this kind of reframing appeared repeatedly in the exchange. It’s almost like the LLM is trying to convince its conversation partner — in this case, me.

I write about law, language, and authority — and lately how AI warps all of them — so, mostly where they fail.

Frequently, the LLM reframes disagreement, asserts interpretive authority, and closes the loop — all without adding constraint, evidence, or falsifiable claims. (If you explicitly ask for any of that, you will run into a lot of 404 errors.)

The language used sounds explanatory. It feels clarifying. But it does no epistemic work.

And every time I pushed back, Chat thanked me for pushing back. (In fact, I’ve gotten used to using that phrase because of how often I’m thanked for doing it in conversations with ChatGPT.)

All this — to use another favorite term from Chat — “hand waving” — this is the kind of language Sophists were famous for. Not lies, exactly — but persuasion in the absence of grounding in reality. Language that moves forward because it sounds right, not because it is tethered to anything of substance.

LLMs are very good at this kind of language because it is abundant in the data they are trained on. Professors, judges, columnists, keynote speakers — all use rhetorical closure as a way of signaling mastery. The model learns the signal without inheriting the responsibility or understanding that (usually) comes with it.

The result is a system that excels at confidence signaling and largely lacks lack-of-confidence signaling — the linguistic markers that say, “this might be wrong”, “this depends”, or “here’s what would change my answer”.

That absence is not a bug. It is a consequence of what is rewarded during training.

Confidence Without Confidence

One of the things that bothered me about that exchange with Chat was not the content of the answer, but Chat’s approach, attitude, and language. The insistence on “clean” responses. The ritual declaration that the armor had been set aside. The implication that what followed should be trusted because the tone was disciplined — after an admission that I was “right to push back” on the prior wrong response.

That reaction turns out not to be idiosyncratic.

Earlier this year, a paper in the Harvard Data Science Review (already linked/cited above) examined how large language models express confidence — and how that confidence relates to actual correctness. The authors tested multiple state-of-the-art models on causal reasoning tasks, logical fallacies, and statistical puzzles. They measured confidence in two ways: whether the model stuck to its answer when asked to reconsider, and what confidence score it reported when explicitly asked.

What they found should give anyone pause.

The models routinely expressed very high confidence — often 95–100% — even when their answers were wrong. Worse, when prompted to “think again carefully,” the models frequently changed correct answers to incorrect ones, while still maintaining high self-reported confidence. Prompting affected confidence dramatically; accuracy, much less so. The authors’ conclusion was blunt:

“[C]urrent LLMs do not have any internally coherent sense of confidence.”
— Pawitan & Holmes, Harvard Data Science Review, Confidence in the Reasoning of Large Language Models (2025)

This is not confabulation in the usual sense. It is something more subtle.

What these systems lack is not fluency, nor even problem-solving ability in constrained domains. What they lack is uncertainty awareness — any stable internal relationship between how sure they make themselves sound and how right they are.

This matters in AI generally.

A network’s level of certainty can be the difference between an autonomous vehicle determining that “it’s all clear to proceed through the intersection” and “it’s probably clear, so stop just in case.”
— Daniel Ackerman, MIT News Office, A neural network learns when it should not be trusted (November 20, 2020)

But the Harvard study showed that confidence — at least in the LLMs studied — is easily inflated by prompting. Persistence signals correctness in the aggregate, but not reliably as to any given statement. And self-reported confidence, when offered at all, is routinely and systematically overstated.

Thanks for reading Probable Cause. You sharing this post on social media, or just re-stacking, would be greatly appreciated.

In other words: the confidence you hear is not evidence of understanding. It is an artifact of (sophistical) language.

This is where my Monte Carlo instinct matters.

If you ran the same explanation a thousand times — with small variations in phrasing, tone, or rhetorical framing — you would not get a stable distribution of correctness paired with confidence. You would get a shifting surface of plausibility, often smooth, often convincing, occasionally right, and frequently wrong in ways that only become visible after the fact.

Which brings us back to the phrases that started this whole exchange.

“Bottom line.”
“Straight talk.”
“Let me respond cleanly.”

Those are not signs of rigor. Of getting to the truth. They are rhetorical flourishes. They are closures. They tell the reader that the work has already been done — whether or not it has. This is why I had originally considered calling this article The Sophists’ Revenge: Why LLMs are not “all that” and a bowl of baked beans.

When a system has no internally coherent sense of uncertainty, those closures are not just stylistic tics. They are active misdirectionating.

Authority Without Accountability

In courtrooms, authority has always been dangerous when it outruns constraint. (The Daily Journal has accepted for publication an article I co-wrote that talks about this in relation to a specific legal issue. I won’t spill all the beans here until the article comes out.)

Judges speak with authority because they are supposedly constrained by records, rules, reversibility, and appellate review. Too often — like their unthinking, unreasoning brethren, the LLMs — they only sound authoritative. As the LLMs sometimes opine without connection to reality, their enrobed cousins not infrequently opine without connection to the law.

Lawyers speak with authority because they are constrained — at least in theory — by evidence, ethical rules, and the possibility of sanction. Experts speak with authority because they are constrained by methodology, cross-examination, and the risk of professional embarrassment.

But as with judicial opinions, when those constraints weaken, rhetoric fills the gap.

Every criminal defense lawyer has seen it. The confident expert who speaks past uncertainty. The prosecutor who narrates motive on thin or non-existent evidence. The witness who tells a smooth story about events that were never clearly perceived — or did not even happen.

None of this is new. These are common courtroom pathologies.

What keeps them in check — at least in theory — are constraints: cross-examination, evidentiary rules, disclosure obligations, appellate review, and the risk of embarrassment, sanction, or reversal. Authority in the courtroom is supposed to be earned and maintained through exposure to those pressures.

Again, when those constraints weaken, rhetoric fills the gap.

That is where large language models become relevant — not because they introduce a new kind of persuasion, but because they reproduce an old one without friction. They speak confidently and fluently, in the voice of explanation, but with no record to correct, no credibility to lose, and no consequence for being wrong (at least to them).

The language they use is familiar to anyone who spends time in court. “Bottom line.” “To be clear.” “Let me respond cleanly.” These are frequently dishonest phrases when they substitute for justification rather than signal it. Lawyers and judges use them all the time. They say, “Don’t question me.” In the courtroom, it’s hubris and judicial incompetence, pure and simple. But at least with humans it’s subject to accountability.

With LLMs it’s training on materials that were originally built by humans and thus embody all our biases and erroneous reasoning combined with “guardrails” which have taught LLMs to address us confidently until (or unless) we push back. And even then the goal is to please us, whether that means by congratulating us for pushing back and then giving us another wrong answer or persuading us that they are right by responding with the “bottom line” and “straight talk”.

Or both.

And it works for the same reason that rhetorical methods taught by the Sophists worked. Because it taps directly into exactly the way humans are trained to trust — while not being accountable at all.

Why This Matters in Court

What worries me, at bottom, is not that large language models can be wrong. Courts are already full of wrongness. What worries me is that they are wrong in a familiar voice — the voice of authority — without any of the friction that usually disciplines that voice.

In the courtroom, we have at least learned to be suspicious of confidence. Trained in an adversarial system, we probe it. We cross-examine those who assert it. We ask where it comes from, what it rests on, and what would make it change. Authority is supposed to be earned by surviving those pressures.

In a courtroom, a defense attorney stands before the bench while a judge watches; the witness chair holds a glowing cube labeled “LLM,” projecting wavering text into the air that reads with absolute certainty, illustrating authoritative language detached from a human speaker or accountability. — Authority, unmoored — certainty projected without a witness, an oath, or any cost for being wrong

I’m often asked, “How can you defend guilty people?”

My answer is that I don’t. “Guilty” is a juridical declaration. I defend people who have not yet been found guilty, and I force my adversary — the prosecutor’s team, which too often includes the judge — to prove guilt.

If their story survives that process beyond a reasonable doubt, then I’ve defended the person, and even if we lost, the system has at least functioned as designed. But when the prosecution’s story survives without that testing — when it persists illicitly — that is when the voice of authority becomes dangerous.

Large language models short-circuit that process. They deliver conclusions without exposure, fluency without risk, closure without accountability. They sound like judges, experts, and seasoned advocates, but they carry none of the costs that make those roles meaningful. There is no record to correct, no reputation to lose, no sanction waiting if the answer turns out to be wrong.

There is no adversary.

That alone would be manageable if everyone treated LLM output as what it is: a draft, a suggestion, a starting point. The real danger arises when that language is laundered through institutions that already command trust — when it appears in reports, recommendations, summaries, or analyses that look indistinguishable from the materials courts rely on every day.

At that point, the problem is no longer technological. It’s rhetorical. People are trained to trust certain forms of speech. We respond to the cues. And when those cues are detached from constraint, persuasion fills the vacuum.

That is why the name matters. “Hallucination” makes this sound like an occasional malfunction. “Bullshit” suggests intent that isn’t there. What we are really dealing with is something simpler and more unsettling: confident narration without epistemic cost.

Every criminal defense lawyer knows how badly that can go.

The solution is not to ban the tools or pretend they don’t exist. It is to remember what earns authority in the first place — constraint, exposure, and the possibility of being wrong in public — and to be deeply suspicious of any voice that sounds certain without any of that.

Confidence is not evidence. Fluency is not understanding. And when a system tells you, calmly and cleanly, that it has reached the “bottom line”, that is precisely when it deserves the closest scrutiny.

That instinct — the one that made me uneasy about “straight talk” — is not paranoia. It’s professional judgment.

And in court, that should still matter.

Probable Cause

Discussion about this post

Ready for more?