Skip to content

Usage, Limits, and Cost Controls

LLM-powered analytics introduces variable cost and performance. This chapter describes the controls operators should implement and document to keep the platform predictable.


Cost drivers (what matters most)

  1. Number of executions
  2. conversations processed × enabled AI Tasks per tenant
  3. Prompt + transcript size
  4. long transcripts increase tokens dramatically
  5. Retries
  6. failures with retries can multiply cost
  7. Model choice
  8. higher-end models cost more per token and may have higher latency

1) Default tasks to Disabled

Publish global tasks but keep them disabled for new tenants until explicitly enabled.

2) Use task filters

Examples: - duration > 15 seconds for voice calls - minimum text length for chats/emails/tickets - exclude internal/test queues

3) Cap transcript/thread size (if supported)

Options: - truncate to last N minutes - summarize in stages (two-pass approach) - drop low-value content (hold music segments)

4) Standardize prompt templates

Avoid per-tenant prompt sprawl: - provide recommended prompts - allow overrides but track them (audit/monitor) - consider approval workflows for high-cost tasks (if practical)

5) Rate limiting / quotas (if supported)

  • per-tenant execution limits per hour/day
  • per-tenant token budget
  • per-engine concurrency limits

Usage reporting (what to expose)

At minimum, track: - executions per tenant per task - tokens per tenant per task (input/output) - average tokens/execution - estimated cost (if provider pricing known)

Recommended UI/report: - top 10 tenants by cost - top 10 tasks by cost - anomaly detection (spikes after a prompt change)


Performance and latency considerations

  • Long transcripts increase latency.
  • JSON schema validation failures may increase retries (if the platform retries on invalid JSON).
  • If near-real-time dashboards are needed, avoid expensive multi-output tasks in the hot path.

Because prompts can change cost and quality: - run prompt changes through a test suite (representative transcripts) - measure token usage before/after - stage rollouts (pilot tenants) - document rollback steps


Implementation notes

  • Track "tokens per execution" as the most actionable cost KPI
  • Use task filters aggressively to control eligible conversations
  • Monitor job processing records and logs for error patterns
  • If no built-in quotas exist, implement operational procedures: manual throttling and staged rollouts
  • Contact MiaRec for specific rate limiting and usage reporting capabilities in your deployment