§ USE CASE · BI

BI agent / KPI co-pilot

A natural-language interface for operational analytics, grounded on the headless semantic layer. Operators ask questions in plain language; the agent resolves them against governed metric and dimension definitions, generates SQL, runs it on the warehouse, and returns the answer alongside the SQL, the rows it pulled, and the metric definitions it used. No ungrounded numbers, no dashboard drift between tools, no parallel definitions of revenue.

BI
Semantic layer
Self-service analytics
Governance

Executive summary

Most organizations have plenty of dashboards and not enough trustable answers. The same metric — net revenue, churn, gross margin — gets calculated three different ways depending on which tool the operator opens. Self-service BI made the surface area larger, not the answers more defendable.

The KPI co-pilot binds a conversational interface to a single governed semantic layer. When an operator asks 'what was net revenue by segment last quarter?', the agent does not invent the formula — it resolves 'net revenue', 'segment' and 'last quarter' against the canonical definitions, generates SQL, runs it on the warehouse, and returns the answer with the SQL and the metric definitions visible.

The agent never answers operational questions from model memory. Every numeric claim resolves to: the metric definition that produced it, the SQL that ran, and the source rows that came back. If the metric drifts, the answer drifts with it — by design.

What this case fixes

Operators who need a number now have three options: open a dashboard that may not have the right cut, ping the data team and wait, or paste numbers into ChatGPT and hope. The first is rigid, the second is slow, the third is unsafe — generic chatbots over data are confidently wrong because they have no notion of how the business defines its metrics.

What the operation needs is a chat interface that is *only* allowed to answer through the governed semantic layer. Metric definitions live in one place; the agent queries through them; every answer is reproducible because the SQL is shown. The dashboards a human reads and the answers an agent returns come from the same source — that is the operational definition of one source of truth.

How it runs

Three phases on the Xophia orchestration core, all executed under the asking user's identity and row-level security context. The agent only ever reads; warehouse credentials are issued per-tenant and scoped to read-only roles.

01AGENT
Resolve intent against the semantic layer
The agent classifies the question and resolves the named entities (metrics, dimensions, time grains, filters) against the registered semantic layer. Ambiguous resolutions ('which segment definition?') trigger a one-shot clarification rather than a guess.
02AGENT
Generate and validate SQL
The agent emits SQL through the semantic layer's compiler, not free-hand. The SQL is type-checked against the warehouse schema and the operator's row-level security context before any execution. Invalid plans are rejected, not patched.
03DATA
Execute on the warehouse, return with provenance
The query runs against the warehouse under a per-tenant read-only role. The answer ships with the SQL, the metric definitions used, the row count and the run cost. The operator can drill into source rows on any cell.
04HUMAN
Operator inspection or escalation
Numbers that disagree with a dashboard surface the difference: which metric definition the dashboard used vs. which one the agent used. The escalation path is to fix the metric definition once, not to patch one of the surfaces.
05AUTOMATION
Log decision for audit and feedback
Every question, resolution, SQL, result-set hash and operator action is logged. The aggregate signal (which metrics get asked about most, which resolutions get clarified) feeds back into semantic-layer maintenance.

Systems and connectors

System	Role
Cube / dbt Metrics / MetricFlow	Headless semantic layer (governed metric definitions)
Snowflake · BigQuery · Databricks · Postgres	Warehouse (execution surface)
Slack · Microsoft Teams	Conversational entry point
Looker · Tableau · Metabase	Dashboard surfaces (read for cross-check)
Xophia audit log	Reproducibility and governance

KPI targets

Grounded-answer rate

Baseline: Generic LLM-on-warehouse setups: ~50–70% answers backed by traceable SQL
Target: 100% of answers expand into the metric definitions and SQL that produced them

Audit-log check: every numeric answer must reference at least one metric definition and one executed SQL statement; ungrounded answers are blocked by policy.

Cross-surface concordance

Baseline: Same metric reported with 5–15% spread across dashboards and exports
Target: ≤ 0.5% deviation between agent answers and the canonical dashboard for the same cut

Daily diff job over a fixed set of canonical metrics: agent answer vs. dashboard query vs. semantic-layer ground truth. Deviation is flagged and reviewed.

Time-to-answer for ad-hoc operational questions

Baseline: Hours to days when routed through the data team
Target: ≤ 30 seconds for in-semantic-layer questions; clarification or escalation otherwise

From question submitted in Slack/Teams to answer rendered, measured at the agent gateway.

Semantic-layer coverage

Baseline: Whatever fraction of asked questions can be answered today
Target: Increasing month-over-month, driven by escalation telemetry

Of the questions the agent receives, % that are fully answerable without escalation. Escalations are the input to the semantic-layer roadmap.

Sample execution trace

An anonymized run from a Wednesday afternoon. Every line below corresponds to a row in the audit log; nothing is editorialized.

A revenue lead in Slack asks: 'What was net revenue by customer segment last quarter, in CLP?'. The agent resolved the question against the semantic layer, ran one query and returned the answer with provenance.

Tool	Input	Output
`agent.classify_intent`	`question='What was net revenue by customer segment last quarter, in CLP?'`	intent=metric_query, metrics=['net_revenue'], dimensions=['customer_segment'], grain=quarter, period='last_quarter', currency=CLP
`semantic.resolve`	`metrics=['net_revenue'], dimensions=['customer_segment']`	net_revenue → defs.revenue.net_revenue@v3 (gross_revenue − refunds − credit_notes); customer_segment → defs.customer.segment_v2
`semantic.compile_sql`	`metric=net_revenue, dim=customer_segment, period=2026-Q1, currency=CLP, rls_user=USR-117`	Compiled query: SELECT customer_segment, SUM(net_revenue) FROM analytics.fct_revenue WHERE quarter='2026-Q1' GROUP BY customer_segment — type-checked, RLS-bound.
`warehouse.run_query`	`engine=Snowflake, role=xophia_read_USR-117, query_hash=qh-7c2a`	4 rows returned (mid_market: CLP 1,242M; enterprise: CLP 893M; smb: CLP 318M; channel: CLP 96M). Run cost: 0.04 credits; latency: 1.8s.
`console.render_answer`	`result=qh-7c2a, definitions=['net_revenue@v3','segment_v2']`	Answer rendered with table, definitions panel and 'show SQL' affordance. Drill-into-rows enabled per cell.

Outcome. Answer delivered to the operator in 4.1s end-to-end, $0.012 agent cost. Audit row Q-2026-05-06-USR-117-qh-7c2a recorded with the full chain (intent, resolutions, SQL, result hash). Cross-checked nightly against the canonical dashboard with 0% deviation.

Want to scope this for your stack?

30 minutes with engineering. We map the case to your CRM, your policies and your approval boundaries, and respond with a scoped estimate within 24 hours.

Request a working session

Executive summary

What this case fixes

How it runs

Resolve intent against the semantic layer

Generate and validate SQL

Execute on the warehouse, return with provenance

Operator inspection or escalation

Log decision for audit and feedback