As always, I’ll start the season with my opinion on the entire season:
Every rule begins as a sentence before it becomes code.
The following is the third part of Epistemic Testing.
Also read chapter 2:
And chapter 1:
Turning Dialogue into Executable Proof
Recall the conversation from Chapter 1 where the developer and Domain Expert established a simple rule for the cart: Adding an item increases the count. That was an executable understanding. Now, let’s see how a new, more complex business rule becomes an additional layer of proof of work in the form of a test or an experiment.
Every meaningful test begins(or should be) long before a single line of code. It starts as a conversation; the place where ideas collide, and the invisible parts of a system are brought into language. This is the invisible language of software. This is not small talk; this is where software design is(again, should be) born. Dialogue is the cornerstone of understanding, and the test is its proof of work. Let’s sit in on one such conversation.
Domain Expert (PM):
Our shop has a promotion: buy three mugs, get one free.
Developer:
Okay, so if I add four mugs, the total price should be for three?
Domain Expert:
Exactly. The fourth one is free.
That seems clear enough until the developer asks the question every good tester eventually asks:
Developer:
And what happens if someone adds five mugs? Or six? Do they still get one free, or two?
The room pauses. Silence, that sacred moment when a fuzzy idea meets reality. That’s when discovery begins.
Domain Expert:
Oh, good question. The rule should be repeated, for every group of four mugs, one is free.
Developer:
So buying eight mugs means paying for six?
Domain Expert:
Yes. But only for mugs, not other items! // TODO
Hoora! You’ve just discovered your first expectation; a behavior you suppose the software should have. Before going any further, let’s turn that belief into a test. Remember from the previous chapter: a test is a judgment, a precise statement of truth we want the machine to eventually verify.
The Test: For every group of four tea mugs in the order, one should be free.

But hold on, we’re not done yet. A test without context is just a sentence. To give it meaning, we need experiments that will make this rule measurable and verifiable. So, switch hats, take off your discovery hat and put on your experiment hat.
Now, let’s start exploring possible experiments:
Developer:
If each mug costs $20 and the order contains 3 mugs, the total price should be 3 * $20 = $60.
If each mug costs $20 and the order contains 4 mugs, the total price should still be $60, one mug free.
Domain Expert:
Exactly!
Developer:
If each mug costs $20 and the order contains 5 mugs, the total should be 4 * $20 = $100.
And if the order has 7 mugs, that’s 6 * $20= $120.
Domain Expert:
Yes, that’s right.
Developer:
Great. Let’s push it further. If each mug costs $20 and the order contains 8 mugs, should the total be $120 (two free mugs)?
Domain Expert:
Yes, that’s correct, two sets of four, two free mugs.
Developer:
Perfect. One last check. If each mug costs $20 and a tray costs $30, and the order contains three mugs and one tray, then the total should be (3×20) + 30 = $90, right?
Domain Expert:
Yes, that’s exactly what I’d expect.
Developer:
Awesome. We’ve hit enough experiments for our test. Each of these examples confirms and clarifies the rule.

This short dialogue has now produced a shared specification by example, a conversation that has become a structured, measurable truth.
The dialogue found the rule. The experiment made it measurable. The code makes it enduring. Each phase plays a different role, discovery gives meaning, experimentation gives clarity, and automation gives permanence.
Now that you’ve gathered enough experiments with your Domain Expert, it’s time to put on your automation hat. This is where conversation turns into code, where understanding becomes executable truth.
This short exchange has produced a shared specification by example. We can now turn it into something the computer understands, an executable version of the conversation.
Each experiment we explored becomes a specification by example, written not for humans to remember, but for machines to verify again and again. Here’s what our shared understanding looks like, encoded as proof of work:
// Specification by example, encoded as proof of work.
describe('Buy 3, Get 1 Free promotion', () => {
test('three mugs cost full price (no discount)', () => {
const order = new Order();
const price = 20;
for (let i = 0; i < 3; i++)
order.addItem({ name: 'Tea Mug', price });
expect(order.total()).toBe(60);
});
test('four mugs cost the price of three', () => {
const order = new Order();
const price = 20;
for (let i = 0; i < 4; i++) order.addItem({ name: 'Tea Mug', price });
expect(order.total()).toBe(60);
});
test('five mugs cost the price of five (discount applies once)', () => {
const order = new Order();
const price = 20;
for (let i = 0; i < 5; i++) order.addItem({ name: 'Tea Mug', price });
expect(order.total()).toBe(100);
});
test('seven mugs cost the price of seven (no second free mug yet)', () => {
const order = new Order();
const price = 20;
for (let i = 0; i < 7; i++) order.addItem({ name: 'Tea Mug', price });
expect(order.total()).toBe(140);
});
test('eight mugs cost the price of six (two free mugs)', () => {
const order = new Order();
const price = 20;
for (let i = 0; i < 8; i++)
order.addItem({ name: 'Tea Mug', price });
expect(order.total()).toBe(120);
});
test('promotion applies only to mugs, not to mixed items', () => {
const order = new Order();
const price = 20;
// Three mugs (eligible for promo) and one tray (not eligible)
for (let i = 0; i < 3; i++)
order.addItem({ name: 'Tea Mug', price });
order.addItem({ name: 'Tray', price: 30 });
expect(order.total()).toBe(90);
});
});
Each test() block is not just a check; it’s a replayable moment of understanding. The story you uncovered through conversation is now living inside the codebase as a self-executing narrative of truth. The human dialogue discovered the rule. The experiments formalized it. And the script now preserves it, faithfully, permanently, and machine-verifiably.
Each example is a sentence in the language of understanding. Together, they form a miniature living specification. If a new developer joins tomorrow, these examples explain the business rule better than any document or meeting ever could. They don’t describe behavior, they demonstrate it. And because they’re executable, the computer becomes our partner in preserving that understanding.
Wait, wait. Read the last paragraph very carefully, while adding a possibility for each sentence. Those are not facts, they can guarantee what they said, unless under certain conditions, which I’ll talk about in the next chapters! Woow, I promise so please promise to continue with me! // TODO
That’s the essence of Specification by Example:
- Start with a dialogue to surface assumptions.
- Turn examples into executable tests.
- Let those tests shape and verify the implementation.
Through this process, tests emerge from understanding, not after it. They become a language of truth between people and machines. Listen to this magic dialogue between those Developer and Domain Expert, and write it somewhere in your notebook:
Developer (reflecting): So the rule didn’t exist in the code until it existed in our words.
Domain Expert: Right, and the test is what keeps our words from drifting away.
That’s the deeper meaning of proof of work. A test is not just a script that checks behavior, it’s the crystallization of dialogue, a living memory of a moment when understanding became clear enough to be verified.
Language-Driven Design
The heart of software design isn’t the architecture diagram, it’s the words we use when we talk about it. Words like available, confirmed, reserved, or pending form the earliest version of our model. They sound innocent, but when two people use the same word differently, the system that emerges from their shared misunderstanding begins to rot before it’s even deployed.
Tests, in this sense, are not code artifacts; they’re instruments of discovery. They are how we make meaning concrete. A test begins in language, in dialogue long before it becomes a line of code. It’s born in conversation, refined in thought, and only later formalized in automation. The conversation is the crucible where ambiguity burns away.
Example: The Booking Conversation
Imagine a Domain Expert and a Developer sitting together to refine a rule in a hotel booking system.
Domain Expert: A room can’t be double-booked.
Developer: Meaning if someone has already confirmed a booking, nobody else can reserve it for the same dates?
Domain Expert: Exactly.
Developer: Okay, but what about someone holding the room without confirming yet, say, in their cart?
Domain Expert: Hmm. That’s just a hold. It expires after fifteen minutes.
Developer: So two people can hold the same room, but only one can confirm it first?
Domain Expert: Yes, the moment it’s confirmed, all other holds are invalid.
Developer: How many people can hold the same room?
Domain Expert: //TODO,
//TODO
That short exchange doesn’t sound technical. But look closer: it produced at least three distinct rules, each with its own test.
- A confirmed booking prevents new confirmations for the same date.
- A held room expires after fifteen minutes.
- A confirmation cancels all existing holds on that room.
None of these were visible until the dialogue exposed them. This is testing as exploration, the art of discovering hidden rules through conversation.
The tests already exist conceptually. You could write them on sticky notes, index cards, or the whiteboard:
- Cannot confirm a room that’s already confirmed.
- Hold expires after fifteen minutes.
- Confirming a booking invalidates all other holds.
At this stage, the test is not about syntax or code. It’s about meaning. These cards are thought experiments, small proofs of understanding, still written in human language.
Three Hats: Discovering, Shaping, and Automating
Testing, when seen epistemically, isn’t a single activity. It’s a dialogue that moves through three distinct modes of understanding, discovering, shaping, and automating. Each mode has its own rhythm, tools, and language. You might call them hats, but they are really states of mind, different relationships between you, the domain, and the machine.
1. The Discoverer’s Hat, Exploration: Testing the Idea, Not the Code
When you wear the Discoverer’s Hat, you’re not verifying behavior yet; you’re probing meaning. You sit beside the Domain Expert, not your keyboard.
Your questions sound like:
What happens if the guest cancels after check-in?
Can a premium user book two rooms at once?
What’s the earliest date a reservation can start?
Here, you’re testing the concept, not the system. You gather stories, edge cases, and contradictions.You play with the domain’s boundaries. Artifacts from this phase are rough sketches of truth, sticky notes, user stories, What if… scenarios. You might express them in natural language:
We should not allow two users to confirm the same room for the same date.
At this stage, ambiguity is welcome, it’s how you find the edges of understanding. The goal is breadth, not correctness.
2. The Shaper’s Hat. Turning Conversation into Measurable Examples
Now, you switch to the Formalizer’s Hat. This is where meaning becomes measurable, still human-readable, but structured and unambiguous. You’re no longer discovering new truths; you’re shaping existing ones into examples that everyone can agree on.
You and the domain expert collaborate to answer questions like:
What does success look like?
How would we know this rule holds?
Can we express that with specific numbers, names, and outcomes?
In this mode, the language becomes tighter, not yet code, but concrete. For example, your discovery statement: We should not allow two users to confirm the same room for the same date.
| Given | When | Then |
| Room101 confirmed by Alice for 1 Nov 2025 | Bob tries to confirm Room101 for 1 Nov 2025 | Booking is rejected with “Room unavailable |
3. The Engineer’s Hat, Automation: Encoding Proof into the Machine
Finally, you put on the Engineer’s Hat. You’ve explored the domain and agreed on measurable examples, now you transform them into executable proof. Each example becomes a living test, translated into your language and framework:
test('prevents double booking for same room and date', () => {
const hotel = new Hotel();
const date = '2025-11-01';
hotel.confirm('Room101', date, 'Alice');
expect(() => hotel.confirm('Room101', date, 'Bob')).toThrow('Room unavailable');
});
This is the machine-readable version of your shared truth. You’re no longer negotiating meaning, you’re preserving it. When these automated tests run, the computer continually replays the agreements between human understanding and system behavior, confirming, again and again, that what you believe is still true. Automation doesn’t create knowledge; it ensures knowledge remains valid.
Why These Modes Matter
Each of the three modes, discovering, shaping and automating, represents a different way of knowing, a different state in the life of understanding.
Each of these modes depends on the other. Discovery without shaping is storytelling without structure. Shaping without automation is clarity without endurance. Automation without discovery is just busy machinery repeating an unexamined belief.
That’s why the order matters. Discover with openness; stay curious about what could be true. Shaping with precision; make that truth explicit and measurable. Automate with discipline; ensure that truth remains intact over time.
When you mix them up, when you start scripting before formalizing, or formalizing before you’ve really discovered, ambiguity seeps into your codebase like fog. But when you move through them deliberately, you create a continuous chain of trust, a smooth progression from conversation to code, from belief to measurable proof.
The beauty of epistemic testing lies in this progression. It turns human conversation into living verification.
- Exploration finds meaning.
- Formalization defines it.
- Automation preserves it.
Each step narrows uncertainty until what was once a story becomes a self-verifying artifact of truth. The conversation finds the rule. The formalization measures it. The code preserves it. And that is how ideas survive contact with reality.
Exploration as Continuous Practice
Exploration doesn’t stop once the code exists. Every change, every bug report, every new idea is a new conversation. The best teams don’t rush from requirement to code, they stay in dialogue long enough to discover the invisible edges of meaning.
When a new scenario appears, say, a customer booking multiple rooms at once, the team can simply return to their examples:
Developer: So if I confirm two rooms together, do they share the same hold timer?
Domain Expert: Yes, they’re part of the same reservation, one timer.
Developer: Then confirming one should confirm both?
Domain Expert: Yes, the reservation is atomic.
Another card. Another rule. Another test waiting to be scripted.
Practice in Action
You’re joining a team that maintains a subscription-based music platform. During a conversation, the Domain Expert says:
When a user cancels their subscription, they should still have access until the end of the billing period.
The Developer asks:
“Okay, so if someone cancels on the 10th, but their billing cycle ends on the 30th, they can still stream music until then?”
“Yes,” the expert replies.
“And if they reactivate before the 30th?”
“Then their access continues; no interruption.”
“What if they reactivate after the 30th?”
“Then they start a new cycle from that day.”
Now it’s your turn.
Step 1: Wear the Discoverer’s Hat
From this dialogue, extract the core beliefs (the rules that seem true). Write them down as plain sentences, no numbers, no code, just meaning.
Ask yourself:
- What are we really saying about time and access?
- Which edge cases were left fuzzy?
- What assumptions are being made about billing periods?
(Tip: don’t solve, uncover.)
Step 2: Switch to the Shaper’s Hat
Now, take those sentences and turn them into measurable examples. Express at least three concrete experiments using Given / When / Then style.
For example:
- Given a monthly subscription starting Oct 1 …
- When the user cancels on Oct 10 …
- Then access remains valid until Oct 30.
Challenge yourself:
- What happens if the user cancels on the same day as renewal?
- What if the system clock is in a different timezone?
- Does “access” include both streaming and downloads?
Your goal here isn’t correctness, it’s measurability. Each example should be concrete enough that two people could agree on what pass or fail means.
Step 3: Finally, Wear the Engineer’s Hat
Choose one of your formalized examples and turn it into an automated test in your favorite language or framework.
Keep it simple, but make sure it encodes the logic clearly; Arrange, Act, Assert.
Think carefully before writing:
- What exactly are you asserting?
- Are you testing a belief, or just replaying behavior?
When your test passes, ask yourself: Is this a test of code, or a test of understanding?
To Wrap
- Every test begins as a sentence. The quality of your code-level tests depends on the quality of your conversations.
- Discovery => Shaping => Automation isn’t a workflow. It’s a chain of meaning. Each mode transforms understanding into a different kind of artifact.
- Discovery reveals beliefs. Shaping defines evidence. Automation preserves truth.
- Rushing to automation before discovery produces fragile scripts with shallow meaning.
- The most powerful tests are those that both humans and machines can read and agree upon.
The true proof of testing maturity isn’t coverage, it’s clarity: how clearly your code replays what your team once understood to be true.
Remember: A good test doesn’t just check correctness. It captures understanding and ensures that understanding can never silently drift away.
In the next chapter, we’ll build directly on this idea of executable agreement to explore how writing tests becomes the primary language for dialogue between different team members, moving beyond a technical check and into a powerful design medium.
Leave a Reply