Transcription System Setup
Transcription is a required foundation for AI insights on voice calls. If a call has no transcript (or a very low-quality transcript), AI Tasks cannot reliably produce insights.
This chapter covers transcription from a platform operator perspective: configuring transcription engines and the transcription pipeline for a multi-tenant environment.
What transcription provides (operator view)
- Converts audio to text (transcripts) per call
- Stores transcripts so they can be:
- viewed by users
- searched
- analyzed by AI Tasks (CSAT, topics, summaries, etc.)
Typical transcription pipeline stages
- Audio ingestion (recordings + metadata)
- Transcription job creation (queueing)
- Speech-to-text execution (provider/model)
- Transcript post-processing (punctuation, diarization, redaction as applicable)
- Transcript persistence and indexing
- Availability in UI and downstream analytics
Engine selection and language strategy
Engine selection (examples)
Operators may configure one or more transcription engines/providers. Consider documenting: - which providers are supported - how to choose engines per tenant, per language, or per region (if applicable)
Language strategy
Common approaches: - Auto-detect language (simplest, may be less accurate for short calls) - Tenant-configured languages (more accurate, requires setup) - Per-conversation language hint (from ingestion metadata)
Validation / smoke test (operator)
For a test tenant: 1. Ingest a short call (30–60 seconds) with known audio clarity. 2. Confirm transcript is produced within expected latency. 3. Verify: - transcript is visible in call details - speaker separation (if supported) is reasonable - timestamps align with the audio 4. Confirm the transcript is available to AI Tasks (see AI Assistant job smoke test).
Monitoring and alerting (transcription)
Track, at minimum: - transcription backlog/lag - job failure rate - provider timeouts/quotas - average transcription latency - percent of calls with missing/empty transcripts - language distribution and “unknown language” rate
Operational best practices
- Provide a reprocessing mechanism (retranscribe) for:
- provider incidents
- improved models
- bug fixes in diarization/punctuation
- Define tenant-level retention for audio and transcripts (aligned with compliance requirements)
- If using external providers, document:
- credential rotation
- regional/data residency constraints
- rate limits and quotas
Implementation notes
- Transcription is typically configured via
Administration > Speech Analytics > Transcription - Multiple transcription engines may be available with per-tenant or per-language selection
- Features like speaker diarization and punctuation depend on the transcription engine
- Document a baseline "supported audio quality" guideline and minimum call duration thresholds
- Implement and document a "transcription health dashboard" for operators
- Provide a "reprocess transcription" runbook and clearly describe expected impact (cost/time)
EDITOR NOTE: fill in with product specifics
Purpose of this section
Give operators a concrete and supportable transcription setup process, including monitoring and reprocessing.
Missing / unclear (confirm with Engineering/Product)
- Where transcription is configured in UI
- A)
Administration > Speech Analytics > Transcription - B)
Administration > Speech Analytics > Transcription Engines -
C) Other (provide exact path)
-
Transcription engine model
- A) Single global engine for all tenants
- B) Multiple engines with per-tenant selection
- C) Multiple engines with per-language selection
-
D) Other (explain)
-
Features supported
- Speaker diarization: A) Yes B) No C) Optional/per-engine
- Punctuation/casing: A) Yes B) No C) Optional/per-engine
-
Redaction: A) Pre-ingestion B) Post-transcription C) Both D) Not supported
-
Retry & failure handling
- A) Automatic retries with backoff
- B) Manual retry only
-
C) Both
-
Backfill / historical transcription
- A) Supported via bulk job
- B) Not supported
- C) Supported only via API/import