Transcription System Setup

Transcription is a required foundation for AI insights on voice calls. If a call has no transcript (or a very low-quality transcript), AI Tasks cannot reliably produce insights.

This chapter covers transcription from a platform operator perspective: configuring transcription engines and the transcription pipeline for a multi-tenant environment.

What transcription provides (operator view)

Converts audio to text (transcripts) per call
Stores transcripts so they can be:
viewed by users
searched
analyzed by AI Tasks (CSAT, topics, summaries, etc.)

Typical transcription pipeline stages

Audio ingestion (recordings + metadata)
Transcription job creation (queueing)
Speech-to-text execution (provider/model)
Transcript post-processing (punctuation, diarization, redaction as applicable)
Transcript persistence and indexing
Availability in UI and downstream analytics

Engine selection and language strategy

Engine selection (examples)

Operators may configure one or more transcription engines/providers. Consider documenting: - which providers are supported - how to choose engines per tenant, per language, or per region (if applicable)

Language strategy

Common approaches: - Auto-detect language (simplest, may be less accurate for short calls) - Tenant-configured languages (more accurate, requires setup) - Per-conversation language hint (from ingestion metadata)

Validation / smoke test (operator)

For a test tenant: 1. Ingest a short call (30–60 seconds) with known audio clarity. 2. Confirm transcript is produced within expected latency. 3. Verify: - transcript is visible in call details - speaker separation (if supported) is reasonable - timestamps align with the audio 4. Confirm the transcript is available to AI Tasks (see AI Assistant job smoke test).

Monitoring and alerting (transcription)

Track, at minimum: - transcription backlog/lag - job failure rate - provider timeouts/quotas - average transcription latency - percent of calls with missing/empty transcripts - language distribution and “unknown language” rate

Operational best practices

Provide a reprocessing mechanism (retranscribe) for:
provider incidents
improved models
bug fixes in diarization/punctuation
Define tenant-level retention for audio and transcripts (aligned with compliance requirements)
If using external providers, document:
credential rotation
regional/data residency constraints
rate limits and quotas

Implementation notes

Transcription is typically configured via Administration > Speech Analytics > Transcription
Multiple transcription engines may be available with per-tenant or per-language selection
Features like speaker diarization and punctuation depend on the transcription engine
Document a baseline "supported audio quality" guideline and minimum call duration thresholds
Implement and document a "transcription health dashboard" for operators
Provide a "reprocess transcription" runbook and clearly describe expected impact (cost/time)

EDITOR NOTE: fill in with product specifics

Purpose of this section

Give operators a concrete and supportable transcription setup process, including monitoring and reprocessing.

Missing / unclear (confirm with Engineering/Product)

Where transcription is configured in UI
A) Administration > Speech Analytics > Transcription
B) Administration > Speech Analytics > Transcription Engines
C) Other (provide exact path)
Transcription engine model
A) Single global engine for all tenants
B) Multiple engines with per-tenant selection
C) Multiple engines with per-language selection
D) Other (explain)
Features supported
Speaker diarization: A) Yes B) No C) Optional/per-engine
Punctuation/casing: A) Yes B) No C) Optional/per-engine
Redaction: A) Pre-ingestion B) Post-transcription C) Both D) Not supported
Retry & failure handling
A) Automatic retries with backoff
B) Manual retry only
C) Both
Backfill / historical transcription
A) Supported via bulk job
B) Not supported
C) Supported only via API/import