Transcription System Setup
Transcription is a required foundation for AI insights on voice calls. If a call has no transcript (or a very low-quality transcript), AI Tasks cannot reliably produce insights.
This chapter covers transcription from a platform operator perspective: configuring transcription engines and the transcription pipeline for a multi-tenant environment.
What transcription provides (operator view)
- Converts audio to text (transcripts) per call
- Stores transcripts so they can be:
- viewed by users
- searched
- analyzed by AI Tasks (CSAT, topics, summaries, etc.)
Typical transcription pipeline stages
- Audio ingestion (recordings + metadata)
- Transcription job creation (queueing)
- Speech-to-text execution (provider/model)
- Transcript post-processing (punctuation, diarization, redaction as applicable)
- Transcript persistence and indexing
- Availability in UI and downstream analytics
Engine selection and language strategy
Engine selection (examples)
Operators may configure one or more transcription engines/providers. Consider documenting: - which providers are supported - how to choose engines per tenant, per language, or per region (if applicable)
Language strategy
Common approaches: - Auto-detect language (simplest, may be less accurate for short calls) - Tenant-configured languages (more accurate, requires setup) - Per-conversation language hint (from ingestion metadata)
Validation / smoke test (operator)
For a test tenant: 1. Ingest a short call (30–60 seconds) with known audio clarity. 2. Confirm transcript is produced within expected latency. 3. Verify: - transcript is visible in call details - speaker separation (if supported) is reasonable - timestamps align with the audio 4. Confirm the transcript is available to AI Tasks (see AI Assistant job smoke test).
Monitoring and alerting (transcription)
Track, at minimum: - transcription backlog/lag - job failure rate - provider timeouts/quotas - average transcription latency - percent of calls with missing/empty transcripts - language distribution and “unknown language” rate
Operational best practices
- Provide a reprocessing mechanism (retranscribe) for:
- provider incidents
- improved models
- bug fixes in diarization/punctuation
- Define tenant-level retention for audio and transcripts (aligned with compliance requirements)
- If using external providers, document:
- credential rotation
- regional/data residency constraints
- rate limits and quotas
Implementation notes
- Transcription is typically configured via
Administration > Speech Analytics > Transcription - Multiple transcription engines may be available with per-tenant or per-language selection
- Features like speaker diarization and punctuation depend on the transcription engine
- Document a baseline "supported audio quality" guideline and minimum call duration thresholds
- Implement and document a "transcription health dashboard" for operators
- Provide a "reprocess transcription" runbook and clearly describe expected impact (cost/time)