Usage, Limits, and Cost Controls

LLM-powered analytics introduces variable cost and performance. This chapter describes the controls operators should implement and document to keep the platform predictable.

Cost drivers (what matters most)

Number of executions
conversations processed × enabled AI Tasks per tenant
Prompt + transcript size
long transcripts increase tokens dramatically
Retries
failures with retries can multiply cost
Model choice
higher-end models cost more per token and may have higher latency

Primary cost controls (recommended)

1) Default tasks to Disabled

Publish global tasks but keep them disabled for new tenants until explicitly enabled.

2) Use task filters

Examples: - duration > 15 seconds for voice calls - minimum text length for chats/emails/tickets - exclude internal/test queues

3) Cap transcript/thread size (if supported)

Options: - truncate to last N minutes - summarize in stages (two-pass approach) - drop low-value content (hold music segments)

4) Standardize prompt templates

Avoid per-tenant prompt sprawl: - provide recommended prompts - allow overrides but track them (audit/monitor) - consider approval workflows for high-cost tasks (if practical)

5) Rate limiting / quotas (if supported)

per-tenant execution limits per hour/day
per-tenant token budget
per-engine concurrency limits

Usage reporting (what to expose)

At minimum, track: - executions per tenant per task - tokens per tenant per task (input/output) - average tokens/execution - estimated cost (if provider pricing known)

Recommended UI/report: - top 10 tenants by cost - top 10 tasks by cost - anomaly detection (spikes after a prompt change)

Performance and latency considerations

Long transcripts increase latency.
JSON schema validation failures may increase retries (if the platform retries on invalid JSON).
If near-real-time dashboards are needed, avoid expensive multi-output tasks in the hot path.

Guardrails for prompt changes (recommended)

Because prompts can change cost and quality: - run prompt changes through a test suite (representative transcripts) - measure token usage before/after - stage rollouts (pilot tenants) - document rollback steps

Implementation notes

Track "tokens per execution" as the most actionable cost KPI
Use task filters aggressively to control eligible conversations
Monitor job processing records and logs for error patterns
If no built-in quotas exist, implement operational procedures: manual throttling and staged rollouts
Contact MiaRec for specific rate limiting and usage reporting capabilities in your deployment

EDITOR NOTE: fill in with product specifics

Purpose of this section

Provide operators a practical toolkit to prevent runaway spend and performance regressions.

Missing / unclear (confirm with Product/Engineering)

Usage visibility
A) Built-in "Usage" screen exists with tokens/cost
B) Only raw logs; partners must build dashboards
C) Partial (executions only)
Rate limiting / quotas
A) Platform supports per-tenant quotas
B) Platform supports per-engine quotas
C) Not supported
Transcript truncation
A) Platform automatically truncates long transcripts
B) Platform supports configurable truncation
C) No truncation; tasks must handle it
Retry policy on invalid JSON
A) Retries until valid JSON produced
B) Fails fast (no retries)
C) Configurable