Skip to content

Monitoring and Alerting

This chapter defines what “healthy” looks like for Conversation Analytics and what operators should monitor and alert on.

Goal: detect issues before tenants notice (missing transcripts, stale dashboards, failed AI Tasks, provider outages).


1) Ingestion health

  • ingestion throughput (conversations/hour)
  • ingestion error rate
  • duplicate rate (if applicable)
  • latency from source → platform
  • per-tenant ingestion volume anomalies

2) Transcription health (voice)

  • transcript completion latency (p50/p95)
  • transcript failure rate
  • percent of calls with missing transcript
  • language detection distribution and “unknown” rate
  • provider-specific error codes/quota events

3) AI Assistant job health

  • backlog/lag (time from transcript ready → tasks completed)
  • execution success rate
  • retry rate (high retries often = provider instability)
  • dead-letter/poison count

4) AI Task health

  • task-level failure rate (per task, per tenant)
  • schema validation failures (JSON invalid)
  • average tokens/request (cost proxy)
  • number of conversations skipped due to filters (sanity check)

5) AI Engine health

  • API availability (errors, timeouts)
  • rate limiting/quota
  • latency
  • cost per tenant/task

6) User-facing indicators (synthetic checks)

  • can you open a transcript?
  • do dashboards load?
  • does search return results for known test filters?

Alerting recommendations

Severity levels

  • SEV-1: platform-wide outage or major data loss (ingestion down, job stopped)
  • SEV-2: partial outage (one provider down, one region impacted)
  • SEV-3: degradation (latency high, retries spiking)
  • SEV-4: tenant-specific issues

Suggested alert conditions

  • ingestion backlog grows continuously for > X minutes
  • transcript missing rate exceeds Y%
  • AI Assistant job lag exceeds Z minutes
  • AI engine error rate > threshold
  • schema validation failures spike after task change
  • per-tenant usage spikes unexpectedly (cost guardrail)

Operators should tune X/Y/Z based on tenant expectations and workload.


Maintain at least: - Pipeline health dashboard (ingestion → transcription → AI tasks) - Top failing tasks (by failures and by tenants impacted) - Usage dashboard (requests/tokens/cost by tenant and by task) - Overrides dashboard (tenants with prompt/filter overrides)


When an alert fires: 1. Identify scope: platform-wide vs tenant-specific 2. Identify stage: ingestion vs transcription vs AI job vs engine 3. Apply the runbook (see Troubleshooting → Runbooks) 4. Communicate status to impacted tenants (template recommended) 5. Post-incident: - root cause - remediation - prevention (guardrails, monitoring improvements)


Built-in monitoring in MiaRec

The AI Assistant job view provides built-in monitoring through several tabs:

AI Assistant Job - All Runs

Figure: All Runs tab showing execution history with success/failure chart over time.

Job monitoring tabs

  • Latest run – Current execution status and progress
  • All runs – Historical execution chart showing patterns over time
  • Processing records – Individual conversation processing status and results
  • Logs – Detailed execution logs for troubleshooting

AI Assistant Job - Processing Records

Figure: Processing Records tab showing per-conversation execution status.

AI Assistant menu structure

The AI Assistant section (Administration > Speech Analytics > AI Assistant) includes:

  • AI Tasks – Task configuration and tenant activation
  • Global Tasks – System-wide task definitions
  • Engines – LLM provider/model configuration
  • Jobs – Processing job configuration and monitoring