← Use casesUSE CASE
§ USE CASE · BI

BI agent / KPI co-pilot

A natural-language interface for operational analytics, grounded on the headless semantic layer. Operators ask questions in plain language; the agent resolves them against governed metric and dimension definitions, generates SQL, runs it on the warehouse, and returns the answer alongside the SQL, the rows it pulled, and the metric definitions it used. No ungrounded numbers, no dashboard drift between tools, no parallel definitions of revenue.

  • BI
  • Semantic layer
  • Self-service analytics
  • Governance

Executive summary

Most organizations have plenty of dashboards and not enough trustable answers. The same metric — net revenue, churn, gross margin — gets calculated three different ways depending on which tool the operator opens. Self-service BI made the surface area larger, not the answers more defendable.

The KPI co-pilot binds a conversational interface to a single governed semantic layer. When an operator asks 'what was net revenue by segment last quarter?', the agent does not invent the formula — it resolves 'net revenue', 'segment' and 'last quarter' against the canonical definitions, generates SQL, runs it on the warehouse, and returns the answer with the SQL and the metric definitions visible.

The agent never answers operational questions from model memory. Every numeric claim resolves to: the metric definition that produced it, the SQL that ran, and the source rows that came back. If the metric drifts, the answer drifts with it — by design.

What this case fixes

Operators who need a number now have three options: open a dashboard that may not have the right cut, ping the data team and wait, or paste numbers into ChatGPT and hope. The first is rigid, the second is slow, the third is unsafe — generic chatbots over data are confidently wrong because they have no notion of how the business defines its metrics.

What the operation needs is a chat interface that is *only* allowed to answer through the governed semantic layer. Metric definitions live in one place; the agent queries through them; every answer is reproducible because the SQL is shown. The dashboards a human reads and the answers an agent returns come from the same source — that is the operational definition of one source of truth.

How it runs

Three phases on the Xophia orchestration core, all executed under the asking user's identity and row-level security context. The agent only ever reads; warehouse credentials are issued per-tenant and scoped to read-only roles.

  1. 01AGENT

    Resolve intent against the semantic layer

    The agent classifies the question and resolves the named entities (metrics, dimensions, time grains, filters) against the registered semantic layer. Ambiguous resolutions ('which segment definition?') trigger a one-shot clarification rather than a guess.

  2. 02AGENT

    Generate and validate SQL

    The agent emits SQL through the semantic layer's compiler, not free-hand. The SQL is type-checked against the warehouse schema and the operator's row-level security context before any execution. Invalid plans are rejected, not patched.

  3. 03DATA

    Execute on the warehouse, return with provenance

    The query runs against the warehouse under a per-tenant read-only role. The answer ships with the SQL, the metric definitions used, the row count and the run cost. The operator can drill into source rows on any cell.

  4. 04HUMAN

    Operator inspection or escalation

    Numbers that disagree with a dashboard surface the difference: which metric definition the dashboard used vs. which one the agent used. The escalation path is to fix the metric definition once, not to patch one of the surfaces.

  5. 05AUTOMATION

    Log decision for audit and feedback

    Every question, resolution, SQL, result-set hash and operator action is logged. The aggregate signal (which metrics get asked about most, which resolutions get clarified) feeds back into semantic-layer maintenance.

Systems and connectors

SystemRole
Cube / dbt Metrics / MetricFlowHeadless semantic layer (governed metric definitions)
Snowflake · BigQuery · Databricks · PostgresWarehouse (execution surface)
Slack · Microsoft TeamsConversational entry point
Looker · Tableau · MetabaseDashboard surfaces (read for cross-check)
Xophia audit logReproducibility and governance

KPI targets

Grounded-answer rate

Baseline
Generic LLM-on-warehouse setups: ~50–70% answers backed by traceable SQL
Target
100% of answers expand into the metric definitions and SQL that produced them

Audit-log check: every numeric answer must reference at least one metric definition and one executed SQL statement; ungrounded answers are blocked by policy.

Cross-surface concordance

Baseline
Same metric reported with 5–15% spread across dashboards and exports
Target
≤ 0.5% deviation between agent answers and the canonical dashboard for the same cut

Daily diff job over a fixed set of canonical metrics: agent answer vs. dashboard query vs. semantic-layer ground truth. Deviation is flagged and reviewed.

Time-to-answer for ad-hoc operational questions

Baseline
Hours to days when routed through the data team
Target
≤ 30 seconds for in-semantic-layer questions; clarification or escalation otherwise

From question submitted in Slack/Teams to answer rendered, measured at the agent gateway.

Semantic-layer coverage

Baseline
Whatever fraction of asked questions can be answered today
Target
Increasing month-over-month, driven by escalation telemetry

Of the questions the agent receives, % that are fully answerable without escalation. Escalations are the input to the semantic-layer roadmap.

Sample execution trace

An anonymized run from a Wednesday afternoon. Every line below corresponds to a row in the audit log; nothing is editorialized.

A revenue lead in Slack asks: 'What was net revenue by customer segment last quarter, in CLP?'. The agent resolved the question against the semantic layer, ran one query and returned the answer with provenance.

ToolInputOutput
agent.classify_intentquestion='What was net revenue by customer segment last quarter, in CLP?'intent=metric_query, metrics=['net_revenue'], dimensions=['customer_segment'], grain=quarter, period='last_quarter', currency=CLP
semantic.resolvemetrics=['net_revenue'], dimensions=['customer_segment']net_revenue → defs.revenue.net_revenue@v3 (gross_revenue − refunds − credit_notes); customer_segment → defs.customer.segment_v2
semantic.compile_sqlmetric=net_revenue, dim=customer_segment, period=2026-Q1, currency=CLP, rls_user=USR-117Compiled query: SELECT customer_segment, SUM(net_revenue) FROM analytics.fct_revenue WHERE quarter='2026-Q1' GROUP BY customer_segment — type-checked, RLS-bound.
warehouse.run_queryengine=Snowflake, role=xophia_read_USR-117, query_hash=qh-7c2a4 rows returned (mid_market: CLP 1,242M; enterprise: CLP 893M; smb: CLP 318M; channel: CLP 96M). Run cost: 0.04 credits; latency: 1.8s.
console.render_answerresult=qh-7c2a, definitions=['net_revenue@v3','segment_v2']Answer rendered with table, definitions panel and 'show SQL' affordance. Drill-into-rows enabled per cell.

Outcome. Answer delivered to the operator in 4.1s end-to-end, $0.012 agent cost. Audit row Q-2026-05-06-USR-117-qh-7c2a recorded with the full chain (intent, resolutions, SQL, result hash). Cross-checked nightly against the canonical dashboard with 0% deviation.

Want to scope this for your stack?

30 minutes with engineering. We map the case to your CRM, your policies and your approval boundaries, and respond with a scoped estimate within 24 hours.

Request a working session