Monitoring and Alerting
This chapter defines what “healthy” looks like for Conversation Analytics and what operators should monitor and alert on.
Goal: detect issues before tenants notice (missing transcripts, stale dashboards, failed AI Tasks, provider outages).
Monitoring layers (recommended)
1) Ingestion health
- ingestion throughput (conversations/hour)
- ingestion error rate
- duplicate rate (if applicable)
- latency from source → platform
- per-tenant ingestion volume anomalies
2) Transcription health (voice)
- transcript completion latency (p50/p95)
- transcript failure rate
- percent of calls with missing transcript
- language detection distribution and “unknown” rate
- provider-specific error codes/quota events
3) AI Assistant job health
- backlog/lag (time from transcript ready → tasks completed)
- execution success rate
- retry rate (high retries often = provider instability)
- dead-letter/poison count
4) AI Task health
- task-level failure rate (per task, per tenant)
- schema validation failures (JSON invalid)
- average tokens/request (cost proxy)
- number of conversations skipped due to filters (sanity check)
5) AI Engine health
- API availability (errors, timeouts)
- rate limiting/quota
- latency
- cost per tenant/task
6) User-facing indicators (synthetic checks)
- can you open a transcript?
- do dashboards load?
- does search return results for known test filters?
Alerting recommendations
Severity levels
- SEV-1: platform-wide outage or major data loss (ingestion down, job stopped)
- SEV-2: partial outage (one provider down, one region impacted)
- SEV-3: degradation (latency high, retries spiking)
- SEV-4: tenant-specific issues
Suggested alert conditions
- ingestion backlog grows continuously for > X minutes
- transcript missing rate exceeds Y%
- AI Assistant job lag exceeds Z minutes
- AI engine error rate > threshold
- schema validation failures spike after task change
- per-tenant usage spikes unexpectedly (cost guardrail)
Operators should tune X/Y/Z based on tenant expectations and workload.
Operational dashboards (recommended)
Maintain at least: - Pipeline health dashboard (ingestion → transcription → AI tasks) - Top failing tasks (by failures and by tenants impacted) - Usage dashboard (requests/tokens/cost by tenant and by task) - Overrides dashboard (tenants with prompt/filter overrides)
Incident response workflow (recommended)
When an alert fires: 1. Identify scope: platform-wide vs tenant-specific 2. Identify stage: ingestion vs transcription vs AI job vs engine 3. Apply the runbook (see Troubleshooting → Runbooks) 4. Communicate status to impacted tenants (template recommended) 5. Post-incident: - root cause - remediation - prevention (guardrails, monitoring improvements)
Built-in monitoring in MiaRec
The AI Assistant job view provides built-in monitoring through several tabs:
Figure: All Runs tab showing execution history with success/failure chart over time.
Job monitoring tabs
- Latest run – Current execution status and progress
- All runs – Historical execution chart showing patterns over time
- Processing records – Individual conversation processing status and results
- Logs – Detailed execution logs for troubleshooting
Figure: Processing Records tab showing per-conversation execution status.
AI Assistant menu structure
The AI Assistant section (Administration > Speech Analytics > AI Assistant) includes:
- AI Tasks – Task configuration and tenant activation
- Global Tasks – System-wide task definitions
- Engines – LLM provider/model configuration
- Jobs – Processing job configuration and monitoring

