Contents

Your AI voice agent reads out a customer's phone number: "Your number is zero seven billion, one hundred twenty-three million..."

The customer hangs up.

Text-to-speech engines are remarkably good at converting written language to natural-sounding audio. But they're trained on prose, not data. Phone numbers, postcodes, reference codes, dates, currency amounts: these are the building blocks of business conversations, and TTS engines routinely mangle them.

Text normalisation solves this by preprocessing text before it reaches the TTS engine, converting data formats into speakable phrases the engine can pronounce correctly.

The Problem with Raw Data

Consider what happens when you send these strings directly to a TTS engine:

Input	TTS Output (Unprocessed)
`07123456789`	"Zero seven billion, one hundred twenty-three million..."
`SW1A 1AA`	"Swuh-wun-ah one double-a" or similar gibberish
`£1,234.56`	"Pound sign one comma two three four point five six"
`25/12/2024`	"Twenty-five slash twelve slash two thousand twenty-four"
`REF123456`	"Ref one hundred twenty-three thousand..."

None of these are how a human would read the information aloud. A human says "oh seven one two three, four five six, seven eight nine" for the phone number. They say "S W one A, one A A" for the postcode. The TTS engine doesn't know these conventions unless you teach it.

UK Phone Numbers

Phone numbers should be read as individual digits with natural grouping. UK mobile numbers follow a predictable pattern: five digits, then three, then three.

The normalisation approach strips non-digit characters, then groups the digits according to the phone number type. For mobiles starting with 07, the grouping is 5-3-3. For landlines, it's typically 4-3-4. International formats (+44) are converted to national format first.

Crucially, commas are inserted between groups to create natural pauses. "0 7 1 2 3, 4 5 6, 7 8 9" sounds like a phone number. "0 7 1 2 3 4 5 6 7 8 9" sounds like a robot counting.

Input	Output
`07123456789`	`0 7 1 2 3, 4 5 6, 7 8 9`
`+447123456789`	`0 7 1 2 3, 4 5 6, 7 8 9`
`0121 234 5678`	`0 1 2 1, 2 3 4, 5 6 7 8`

UK Postcodes

Postcodes combine letters and numbers in specific patterns. The outward code (first part) identifies the postal district; the inward code (last three characters) identifies the specific address group.

The normalisation approach spaces out each character individually, with a comma pause between the outward and inward codes. If the postcode arrives without a space (like "SW1A1AA"), the normaliser splits it before the last three characters.

Input	Output
`SW1A 1AA`	`S W 1 A, 1 A A`
`M1 1AA`	`M 1, 1 A A`
`B338TH`	`B 3 3, 8 T H`

The comma between outward and inward codes creates the same pause a human uses when dictating a postcode.

Currency

Currency requires context-aware formatting. "£1,234.56" should become "1,234 pounds and 56 pence", not "pound sign one comma two three four point five six".

The approach involves:

Identifying the currency symbol and mapping it to spoken names (pound/pounds, pence for GBP; dollar/dollars, cents for USD)
Separating whole and decimal parts
Using singular or plural forms based on the amount
Handling special cases like amounts under £1 ("99 pence" not "0 pounds and 99 pence")
Recognising multiplier suffixes like "k", "m", "bn" for thousands, millions, billions

Input	Output
`£1,234.56`	`1,234 pounds and 56 pence`
`£0.99`	`99 pence`
`£1.5m`	`1.5 million pounds`
`$100`	`100 dollars`
`€1`	`1 euro`

Dates and Times

Dates in business contexts usually appear as DD/MM/YYYY (UK format) or YYYY-MM-DD (ISO format). Neither reads naturally without transformation.

The normalisation converts numeric dates to spoken form with ordinal day numbers and full month names: "25/12/2024" becomes "the 25th of December 2024". The ordinal suffix (st, nd, rd, th) is determined by the day number.

Times require similar treatment. The approach converts 24-hour format to 12-hour with am/pm, handles special cases like noon and midnight, and formats minutes naturally ("2 oh 5 pm" for 2:05pm rather than "2 zero 5 pm").

Input	Output
`25/12/2024`	`the 25th of December 2024`
`2024-01-15`	`the 15th of January 2024`
`14:30`	`2 30 pm`
`2:05pm`	`2 oh 5 pm`
`12:00`	`noon`

Email Addresses

Email addresses contain symbols that TTS engines pronounce literally. "@" becomes "at sign" and "." becomes "dot" or "period" depending on the engine.

The normalisation replaces "@" with " at " and "." with " dot ", producing output that sounds natural when spoken.

Input	Output
`john.smith@example.com`	`john dot smith at example dot com`

Car Registrations

UK vehicle registrations follow several formats depending on age. Modern registrations (post-2001) use the pattern AA00 AAA. The letters and numbers should be read separately with a pause between groups.

The approach normalises the registration to uppercase, then groups characters by type (letters vs digits), inserting comma pauses when switching between letter and digit groups.

Input	Output
`AB12 CDE`	`A B, 1 2, C D E`
`A123 BCD`	`A, 1 2 3, B C D`

Business Names and Acronyms

Business data contains abbreviations that should be spelled out rather than pronounced as words. "AX Motors" should be "A X Motors", not "Axe Motors". But common words like "UK" or "TV" should remain as-is because TTS engines handle them correctly.

The approach maintains a list of common words and abbreviations that TTS handles well (UK, US, TV, LTD, PLC, CEO, etc.) and leaves these unchanged. Other uppercase sequences of 2-4 letters are spaced out for spelling.

Additional business-specific patterns include:

"T/A" becomes "trading as"
Ampersands become "and"
Letter-number-letter patterns like "B2B" become "B 2 B"

Input	Output
`AX Motors`	`A X Motors`
`B&M Retail`	`B and M Retail`
`ABC Ltd T/A XYZ`	`A B C Ltd trading as X Y Z`
`UK Business`	`UK Business` (unchanged)
`B2B Services`	`B 2 B Services`

Stripping Internal Syntax

Sometimes language models output internal syntax as text instead of using proper function calls. Tool call syntax, JSON fragments, or other machine-readable content should never be read aloud.

The normalisation identifies and removes these patterns, leaving only the human-readable content.

Input	Output
`Thank you. [action: end] Goodbye.`	`Thank you. Goodbye.`

Performance Optimisation

Text normalisation runs on every response before TTS generation. For high-volume systems, avoiding unnecessary processing matters.

The optimisation approach uses a quick pre-check to determine whether text contains any patterns that need normalisation. Simple conversational responses like "How can I help you today?" skip the full normalization pipeline entirely. Only text containing phone numbers, postcodes, currency symbols, dates, or other data patterns gets processed.

Pattern Ordering Matters

The order in which you apply transformations affects results. Consider the text "at 14:30 on AB12 CDE".

If you process acronyms before times, "AB" might get spaced out before the car registration pattern matches. If you process car registrations before postcodes, "AB12 CDE" correctly matches as a vehicle registration rather than being misidentified as a malformed postcode.

A robust implementation processes patterns in order of specificity:

Email addresses (before other dot processing affects them)
Car registrations (specific alphanumeric patterns)
Times with "at" prefix (before acronym processing)
Business name patterns (T/A, ampersands, acronyms)
Dates (UK and ISO formats)
Times (12-hour format)
Currency (with and without multipliers)
Postcodes
Phone numbers (most specific patterns first)
Reference numbers

Each pattern is designed to avoid false positives on the others.

Testing Edge Cases

Real-world data produces surprising edge cases:

"Call me at 2" - Is "2" a time or just the number two?
"Unit 2A" - Should "2A" be spaced out?
"SW1A" without the inward code - Partial postcode or abbreviation?
"REF: TBC" - Reference number or "to be confirmed"?

The solution is specificity. "at 2pm" triggers time formatting; "at 2" alone doesn't. Reference patterns require at least 4 alphanumeric characters after the prefix. Postcodes require both outward and inward codes to match.

When in doubt, leave text unchanged. It's better for TTS to mispronounce an edge case than for normalisation to corrupt valid prose.

The Complete Pipeline

Text normalisation slots into the voice AI pipeline between the language model and TTS:

User speaks → Speech-to-text transcription
LLM processes → Generates response text
Normalisation → Converts data formats to speakable phrases
TTS generates → Converts normalised text to audio
Audio plays → User hears natural pronunciation

The normalisation step is invisible to both the language model and the user. The LLM can output "Your reference is REF123456" naturally, and the user hears "Your reference is REF 1 2 3 4 5 6" without either party knowing about the transformation.

Regional Considerations

The patterns described here focus on UK conventions: UK phone formats, UK postcodes, UK date ordering (DD/MM/YYYY), pounds sterling. Adapting for other regions requires:

Different phone number patterns and groupings
Different postal code formats (US ZIP codes, German PLZ, etc.)
Different date ordering (MM/DD/YYYY for US)
Different currency handling

A production system serving multiple markets needs either locale detection or explicit configuration to apply the correct regional patterns.

Building voice experiences with business data?

SwiftCase integrates voice AI with workflow automation, handling the complexity of natural speech synthesis so your AI agents can read customer data, reference numbers, and business information clearly. Our platform manages the technical details so you can focus on your processes.

Book a demo | Explore the platform | View pricing

Contents

Your AI voice agent reads out a customer's phone number: "Your number is zero seven billion, one hundred twenty-three million..."

The customer hangs up.

Text normalisation solves this by preprocessing text before it reaches the TTS engine, converting data formats into speakable phrases the engine can pronounce correctly.

The Problem with Raw Data

Consider what happens when you send these strings directly to a TTS engine:

Input	TTS Output (Unprocessed)
`07123456789`	"Zero seven billion, one hundred twenty-three million..."
`SW1A 1AA`	"Swuh-wun-ah one double-a" or similar gibberish
`£1,234.56`	"Pound sign one comma two three four point five six"
`25/12/2024`	"Twenty-five slash twelve slash two thousand twenty-four"
`REF123456`	"Ref one hundred twenty-three thousand..."

UK Phone Numbers

Phone numbers should be read as individual digits with natural grouping. UK mobile numbers follow a predictable pattern: five digits, then three, then three.

Crucially, commas are inserted between groups to create natural pauses. "0 7 1 2 3, 4 5 6, 7 8 9" sounds like a phone number. "0 7 1 2 3 4 5 6 7 8 9" sounds like a robot counting.

Input	Output
`07123456789`	`0 7 1 2 3, 4 5 6, 7 8 9`
`+447123456789`	`0 7 1 2 3, 4 5 6, 7 8 9`
`0121 234 5678`	`0 1 2 1, 2 3 4, 5 6 7 8`

UK Postcodes

Input	Output
`SW1A 1AA`	`S W 1 A, 1 A A`
`M1 1AA`	`M 1, 1 A A`
`B338TH`	`B 3 3, 8 T H`

The comma between outward and inward codes creates the same pause a human uses when dictating a postcode.

Currency

Currency requires context-aware formatting. "£1,234.56" should become "1,234 pounds and 56 pence", not "pound sign one comma two three four point five six".

The approach involves:

Identifying the currency symbol and mapping it to spoken names (pound/pounds, pence for GBP; dollar/dollars, cents for USD)
Separating whole and decimal parts
Using singular or plural forms based on the amount
Handling special cases like amounts under £1 ("99 pence" not "0 pounds and 99 pence")
Recognising multiplier suffixes like "k", "m", "bn" for thousands, millions, billions

Input	Output
`£1,234.56`	`1,234 pounds and 56 pence`
`£0.99`	`99 pence`
`£1.5m`	`1.5 million pounds`
`$100`	`100 dollars`
`€1`	`1 euro`

Dates and Times

Dates in business contexts usually appear as DD/MM/YYYY (UK format) or YYYY-MM-DD (ISO format). Neither reads naturally without transformation.

Input	Output
`25/12/2024`	`the 25th of December 2024`
`2024-01-15`	`the 15th of January 2024`
`14:30`	`2 30 pm`
`2:05pm`	`2 oh 5 pm`
`12:00`	`noon`

Email Addresses

Email addresses contain symbols that TTS engines pronounce literally. "@" becomes "at sign" and "." becomes "dot" or "period" depending on the engine.

The normalisation replaces "@" with " at " and "." with " dot ", producing output that sounds natural when spoken.

Input	Output
`john.smith@example.com`	`john dot smith at example dot com`

Car Registrations

The approach normalises the registration to uppercase, then groups characters by type (letters vs digits), inserting comma pauses when switching between letter and digit groups.

Input	Output
`AB12 CDE`	`A B, 1 2, C D E`
`A123 BCD`	`A, 1 2 3, B C D`

Business Names and Acronyms

Additional business-specific patterns include:

"T/A" becomes "trading as"
Ampersands become "and"
Letter-number-letter patterns like "B2B" become "B 2 B"

Input	Output
`AX Motors`	`A X Motors`
`B&M Retail`	`B and M Retail`
`ABC Ltd T/A XYZ`	`A B C Ltd trading as X Y Z`
`UK Business`	`UK Business` (unchanged)
`B2B Services`	`B 2 B Services`

Stripping Internal Syntax

Sometimes language models output internal syntax as text instead of using proper function calls. Tool call syntax, JSON fragments, or other machine-readable content should never be read aloud.

The normalisation identifies and removes these patterns, leaving only the human-readable content.

Input	Output
`Thank you. [action: end] Goodbye.`	`Thank you. Goodbye.`

Performance Optimisation

Text normalisation runs on every response before TTS generation. For high-volume systems, avoiding unnecessary processing matters.

Pattern Ordering Matters

The order in which you apply transformations affects results. Consider the text "at 14:30 on AB12 CDE".

A robust implementation processes patterns in order of specificity:

Email addresses (before other dot processing affects them)
Car registrations (specific alphanumeric patterns)
Times with "at" prefix (before acronym processing)
Business name patterns (T/A, ampersands, acronyms)
Dates (UK and ISO formats)
Times (12-hour format)
Currency (with and without multipliers)
Postcodes
Phone numbers (most specific patterns first)
Reference numbers

Each pattern is designed to avoid false positives on the others.

Testing Edge Cases

Real-world data produces surprising edge cases:

"Call me at 2" - Is "2" a time or just the number two?
"Unit 2A" - Should "2A" be spaced out?
"SW1A" without the inward code - Partial postcode or abbreviation?
"REF: TBC" - Reference number or "to be confirmed"?

When in doubt, leave text unchanged. It's better for TTS to mispronounce an edge case than for normalisation to corrupt valid prose.

The Complete Pipeline

Text normalisation slots into the voice AI pipeline between the language model and TTS:

User speaks → Speech-to-text transcription
LLM processes → Generates response text
Normalisation → Converts data formats to speakable phrases
TTS generates → Converts normalised text to audio
Audio plays → User hears natural pronunciation

Regional Considerations

The patterns described here focus on UK conventions: UK phone formats, UK postcodes, UK date ordering (DD/MM/YYYY), pounds sterling. Adapting for other regions requires:

Different phone number patterns and groupings
Different postal code formats (US ZIP codes, German PLZ, etc.)
Different date ordering (MM/DD/YYYY for US)
Different currency handling

A production system serving multiple markets needs either locale detection or explicit configuration to apply the correct regional patterns.

Building voice experiences with business data?

Book a demo | Explore the platform | View pricing

The Problem with Raw Data

UK Phone Numbers

UK Postcodes

Currency

Dates and Times

Email Addresses

Car Registrations

Business Names and Acronyms

Stripping Internal Syntax

Performance Optimisation

Pattern Ordering Matters

Testing Edge Cases

The Complete Pipeline

Regional Considerations

Building voice experiences with business data?

Related Articles

Detecting Humans vs Machines in Voice AI: AMD and VAD Explained

AI Navigating IVR Menus: How Voice Agents Automate Phone System Interactions

Maintaining State Across WebSocket Reconnections in Voice Applications

Get automation insights delivered

Related Free Tools

How we build SwiftCase

The Problem with Raw Data

UK Phone Numbers

UK Postcodes

Currency

Dates and Times

Email Addresses

Car Registrations

Business Names and Acronyms

Stripping Internal Syntax

Performance Optimisation

Pattern Ordering Matters

Testing Edge Cases

The Complete Pipeline

Regional Considerations

Building voice experiences with business data?

Related Articles

Detecting Humans vs Machines in Voice AI: AMD and VAD Explained

AI Navigating IVR Menus: How Voice Agents Automate Phone System Interactions

Maintaining State Across WebSocket Reconnections in Voice Applications

Get automation insights delivered

Related Free Tools

How we build SwiftCase