The Voice Spec Template: A Fill-In Framework for AI Writers

Q: What is the minimum viable voice spec?

Three sections: required actions, banned phrases, and cadence anchors. A spec built on those three produces more consistent output than one with six writing priors and no example sentences. Required actions are pass/fail conditions a reviewer can check. Banned phrases close the register shortcuts the model reaches for automatically. Cadence anchors are three example sentences, opener, mid-article, closer, the model uses to match rhythm.

A voice spec, the structured document that tells an AI writing pipeline who to sound like and what moves to make every article, has seven distinct components. Most teams who build one fill in two or three. The output from a two-component spec is the model guessing at the voice. The output from a fully populated one is the voice.

This template covers all seven: what each component does, a filled-in example from the Senior Practitioner persona, and a blank version you can copy into your own document. The methodology behind each component, why this format works and what earlier versions got wrong, is covered in Stop Calling It Tone. This post is the artifact: the format, field by field, ready to populate.

The blank template

Copy the block below into a document. Instructions for each section are in the field-by-field breakdown that follows.

VOICE SPEC: [Persona Name]

WHO IS THIS WRITER?
[2–3 sentences: background, disposition, and when this voice applies]

WRITING PRIORS  (3–5 items, high-level stances this voice holds or avoids)
1.
2.
3.

REQUIRED ACTIONS  (every article must perform these, write each as a pass/fail condition)
1.
2.
3.

SENTENCE RHYTHM
[One sentence: dominant length, how paragraphs open, where fragments appear]

BANNED PHRASES  (register tells to block, phrases that undermine this specific voice)
-
-
-

CADENCE ANCHORS  (three example sentences, match the rhythm, not the topic)
Opening:
Mid-article:
Closing:

The three sections with the most direct impact on output are Required Actions, Banned Phrases, and Cadence Anchors. A spec built on just those three produces more consistent output than one with six Writing Priors and nothing else. Fill those three completely before touching the others.

Who is this writer?

This section does two things: it gives the persona a name (a short label used to select this spec from your library, which never appears in the generated text itself) and describes who the writer is. The name is administrative. The description is not, the model reads it directly.

Write the description as a casting note, not a job posting. A casting note answers three questions in two or three sentences: who is this writer, what is their background and disposition, and when does this voice apply? The "when does this voice apply" clause matters more than most teams expect. Without it, the model has no basis for deciding when to prioritize this persona over other guidance in the prompt.

Take the Senior Practitioner casting note: "10+ years operating in the field. Opinionated, evidence-backed, scarred. Default voice when the topic rewards lived experience over instruction." Tenure and disposition in the first two sentences. Application scope in the third.

In practice: a description that reads "A confident expert writer" is a job posting. It describes an attribute. A casting note describes a context. The model performs the character; the description tells it when to walk on stage.

Write this section as a casting note, who, disposition, and when.

Writing priors

Writing priors (the high-level stances and register rules this voice consistently holds or avoids) are guardrails, not instructions. They work by exclusion, they narrow the model's trained defaults without specifying what to do instead.

The three writing priors for the Senior Practitioner persona:

"Speak from experience, not from research"
"Confident, willing to disagree with consensus"
"Domain-specific terms used precisely, not for show"

These stop the model from reaching for its default hedges, "opinions vary," "it depends," "many experts believe." They do not produce Senior Practitioner voice on their own. That is what Required Actions does.

Concretely: the difference between writing priors and required actions is the difference between a disposition and a behavior. A prior says "speak from experience." A required action says "open at least one paragraph per section with a first-person experience sentence." One is an attitude the model can satisfy with a vague statement. The other is a condition a reviewer can check.

Writing priors exclude the wrong register. Required actions produce the right one.

Required actions

Required actions (concrete rhetorical moves the article must perform, one per item, written as a pass/fail condition) are where most of the voice control lives. They are the section that separates a functioning spec from a decorative one.

The test for a required action: can a reviewer confirm it happened in under ten seconds? "Sound conversational" fails the test. "Open at least one paragraph per section with a first-person experience sentence" passes. Either it happened or it did not.

The four required actions for the Senior Practitioner persona:

Open at least one paragraph per section with first-person experience: "I've seen…", "When I worked on…", "In [year] we…"
Name a specific entity in each section, a tool, brand, person, place, or year. Never anonymous filler ("a company," "some teams"). Always the actual name.
Call out one mistake-pattern per article: "The mistake most operators make is…" / "Where this fails is…" / "I've seen this break when…"
Include one hyper-specific tangent per article, 2–3 sentences detouring to a named client and year before returning to the main thread.

Each one is a binary condition. A review pass can check every item in under a minute.

e.g., how to write a required action: start with a verb phrase that specifies the move, "Open…", "Name…", "Call out…", "Include…", and then describe the exact form. If the phrase "at least once" or "per section" does not appear naturally, the action may not be specific enough to test.

Write three to five required actions per persona. Fewer than three means the voice is under-specified. More than six usually means some entries are attitudes rather than testable conditions, those belong in Writing Priors instead.

Required actions are the definition of done for voice. Write them so a reviewer can confirm each one without your help.

Sentence rhythm

The sentence rhythm line (a single sentence describing dominant structure: length, how paragraphs open, and where fragments appear) labels the cadence pattern that the Cadence Anchors section demonstrates. One sentence only. The model reads the rhythm description alongside the example sentences and uses both together to predict the pattern before generating.

The Senior Practitioner sentence rhythm: "Medium-length declaratives with occasional sharp fragments for emphasis. Paragraphs open with subjects or first-person ('I've…', 'Most teams…'), rarely with transitional adverbs."

A wrong rhythm description is "Confident and direct." That describes attitude, not rhythm. A correct one names sentence length, how paragraphs open, and where short structures appear.

Imagine struggling to write the rhythm description. Write the Cadence Anchors first, then derive the description from the three examples. The pattern becomes visible when you have three example sentences to read side by side.

The rhythm line names the cadence. Cadence Anchors demonstrate it. Write either one first.

Banned phrases

Banned phrases are exact strings the model must not produce, paraphrase, or replace with synonyms. They work because the model's trained defaults lean toward the vocabulary you are trying to avoid. Blocking a specific phrase closes a path the model reaches for automatically.

A positive instruction like "use concrete, specific language" tells the model to do something it already does in a generic way. A ban defines a boundary.

Six banned phrases for the Senior Practitioner persona:

"best practice"
"industry standard"
"as a beginner you might think"
"studies show that"
"many experts believe"
"in this article we'll explore"

These are differentiating bans, phrases that undermine Senior Practitioner register specifically. Generic AI-register tells ("leveraging," "delve into," "it's important to note") are handled separately by a pipeline-level filter that runs across all personas. You do not need to repeat those in the per-persona ban list.

Consider what each ban forces. "In this article we'll explore" forces an immediate concrete lead instead of a content-announcement opener. "Many experts believe" forces an opinionated position instead of a hedged consensus take. Each ban steers the model toward the voice's correct register by closing one exit that leads away from it.

Four to eight per-persona bans is enough. A list of forty is usually two or three patterns written in many different formulations. If your ban list is that long, collapse it: name the register pattern and block the three or four most common surface forms of it.

Banned phrases close the shortcut paths the model reaches for in this specific voice. They do not need to cover every possible wrong word.

Cadence anchors

Cadence anchors (three example sentences, one for the article opening, one for the mid-article body, one for the closing, used to pattern-match rhythm rather than topic) are the few-shot mechanism for sentence shape. The model reads these as demonstrations. It matches clause length, where the period falls, and how fragments are deployed. It does not copy the topic.

Three cadence anchors for the Senior Practitioner persona:

Opening: "I've watched six different teams ship the same kind of process. Three failed at the same handoff."
Mid-article: "The data exists. Most people never query past the dashboard view."
Closing: "The fix isn't a better template. It's killing the template entirely."

Each anchor has a different shape. The opener establishes presence with a concrete number. The mid-article pairs a short declarative with a gap callout. The closer pivots and inverts. Three shapes give the model a range to modulate across an article, a single anchor produces a single rhythm that repeats in every section.

Take these three sentences and read them aloud in sequence. If they sound like the same person in three different gears, they are right. If they sound like three different people, the voice has not settled yet, which is a useful signal to catch before you run the spec on a real article.

Write three distinct rhythmic shapes. Not three restatements of the same sentence.

The filled-in Senior Practitioner spec

Here is the complete Senior Practitioner specification using the template format above. This is what a fully populated spec looks like before it goes into a generation pipeline:

VOICE SPEC: Senior Practitioner

WHO IS THIS WRITER?
10+ years operating in the field. Opinionated, evidence-backed, scarred.
Default voice when the topic rewards lived experience over instruction.

WRITING PRIORS
1. Speak from experience, not from research
2. Confident, willing to disagree with consensus
3. Domain-specific terms used precisely, not for show

REQUIRED ACTIONS
1. Open at least one paragraph per section with first-person experience:
   "I've seen…", "When I worked on…", "In [year] we…"
2. Name a specific entity in each section, tool, brand, person, place, or year.
   Never anonymous filler ("a company", "some teams"). Always the actual name.
3. Call out one mistake-pattern per article:
   "The mistake most operators make is…" / "Where this fails is…"
4. Include one hyper-specific tangent per article, 2–3 sentences detouring
   to a named client + year before returning to the main thread.

SENTENCE RHYTHM
Medium-length declaratives with occasional sharp fragments for emphasis.
Paragraphs open with subjects or first-person ("I've…", "Most teams…"),
rarely with transitional adverbs.

BANNED PHRASES
- "best practice"
- "industry standard"
- "as a beginner you might think"
- "studies show that"
- "many experts believe"
- "in this article we'll explore"

CADENCE ANCHORS
Opening:     "I've watched six different teams ship the same kind of process.
              Three failed at the same handoff."
Mid-article: "The data exists. Most people never query past the dashboard view."
Closing:     "The fix isn't a better template. It's killing the template entirely."

Testing the spec before your first article run

Generate a single 300-word section on any topic using the new spec. Then check each required action: did it fire? Check the banned phrases: did any appear?

Missing required actions usually mean the actions are not specific enough to test. Rewrite the failing action until a reviewer can confirm it happened in ten seconds. "Sound conversational" does not pass/fail. "Ask the reader a direct question at least once" does.

For the voice spec to work as part of a content operation, it connects to two other documents: the content brief (which governs what to write, keyword, outline, sources, CTA) and the review process (which checks whether the required actions ran on the finished draft). Our content brief guide covers the brief layer in full. Content Brief vs Style Guide maps out where the voice spec sits in the three-document stack.

For teams running AI-assisted content at volume, the Three-Review Pass covers how the voice spec connects to editorial quality control downstream, specifically how required actions become checkable criteria in the voice fidelity review pass.

Frequently Asked Questions

What is the minimum viable voice spec?

Three sections: Required Actions, Banned Phrases, and Cadence Anchors. A spec built on those three produces more consistent output than one with six Writing Priors and no anchor examples. The other sections add precision; those three supply the structural constraints the model needs to hold a consistent register across a long article.

How many required actions should I write?

Three to five. Fewer than three means the voice is under-specified, the model fills the gap with its own trained defaults. More than six usually means some entries are attitudes rather than testable conditions. Each required action should answer this: can a reviewer confirm it happened just by reading the article, without your explanation?

How many banned phrases is enough?

Four to eight per-persona bans, in addition to whatever your pipeline applies globally. The per-persona list should target phrases that undermine this specific voice, not general AI-register tells, which are better handled at the pipeline level. A very long ban list usually signals that the voice itself is not clearly defined yet.

Should I build a different voice spec for each audience segment?

Voice, the structural substrate of how the writer sounds, usually stays constant across segments. Tone adapts. A single voice spec with stable required actions and cadence anchors can serve multiple audience contexts while the generation prompt adapts register per context. Build one spec per voice, not one per audience.

Can I extract a voice spec from existing content rather than writing one from scratch?

Yes, and it is usually more accurate. Pull your five to ten best-performing or most representative pieces. Read them for repeating moves: what does your best content do in the first paragraph of every section? What phrases appear in your strongest work but feel wrong in generic AI output? Those observations become your required actions, your banned phrases, and your cadence anchors. You are documenting the voice that already works, not inventing one.

What is a voice spec?

A voice spec is a structured document that tells an AI writing pipeline, or a human writer, who to sound like and what rhetorical moves to make. It differs from a style guide (which governs brand consistency: grammar, vocabulary, formatting) and a content brief (which governs the assignment: keyword, audience, outline, CTA). A voice spec governs the execution layer: the sentence-level moves that determine whether prose sounds like a specific author or like the statistical average of the internet. For a full comparison of all three documents, see Content Brief vs Style Guide.