Case Study / Planera

Planera

Inspectable Analytics Copilot

I built Planera as an analytics workspace where planning, SQL generation, validation, and execution stay visible enough for a human to trust the answer.

NL-to-SQLValidation PipelinesLLM OrchestrationExecution Tracing
Trust Surface
Traceable query path
Risk Control
Validation before answer
Section 01

The Problem Worth Solving

Analytics teams live between two bad options: either someone writes every query by hand and the workflow stays slow, or an AI tool hides the work behind a polished answer and forces users to trust a black box.

What I wanted to solve was not just query generation. I wanted a system that could shorten time-to-insight without removing the operator's ability to inspect what happened, understand where the answer came from, and step in when the model was wrong.

Section 02

Why a Simpler Version Breaks

A basic NL-to-SQL demo looks convincing right up until the schema gets messy, the business logic has exceptions, or the user asks a question that requires changing grain, joining the right tables, and preserving the correct filters. At that point, plausible SQL is not the same thing as correct SQL.

In production, users need more than an answer box. They need validation signals, recovery paths, and enough visibility to tell whether a failure came from the prompt, the schema, or the model's assumptions. Without that, a single hallucination is enough to make the whole surface feel unsafe.

Section 03

How I Framed the System

I did not frame Planera as a chatbot. I framed it as an inspectable analytics workspace where every query is a controlled system interaction with visible state transitions.

That changed the design philosophy. Instead of trying to expose private model reasoning, I focused on exposing the operational trail: what the system believed the question was, how it planned the work, what SQL it produced, what validations passed or failed, and how the final answer was synthesized.

Section 04

Core Architecture

Planner

Translates the user question into a structured intent, identifies the tables and metrics involved, and turns an open-ended request into a constrained execution plan.

SQL Generation Engine

Produces candidate SQL from the planner output, keeping the generation step tied to explicit schema context instead of free-form guessing.

Validation Layer

Checks syntax, schema compatibility, and result-shape expectations before the answer is allowed to inherit model confidence it has not earned.

Execution Engine

Runs validated queries, captures failures with enough context to debug them, and keeps query execution separate from narrative synthesis.

Synthesis

Turns validated outputs into a readable response while preserving the trace between the user's question, the query, and the result.

Persistence

Stores the query thread, execution trace, and validation results so the workflow remains inspectable across retries, reviews, and follow-up analysis.

Section 05

Critical Flow

01

Data upload

The user connects or uploads structured data, and the system builds enough schema awareness to ground future planning and SQL generation.

02

User query

A natural-language question arrives with all the ambiguity that makes analytics hard in the first place: fuzzy intent, shorthand business language, and implicit constraints.

03

System planning

Planera turns that request into a scoped task, identifies the likely tables and metrics, and prepares the query-generation step with explicit structure.

04

SQL execution

The generated SQL is executed only after it passes the relevant checks, so the system does not confuse confident prose with a valid analytic result.

05

Validation

Execution results are inspected for mismatches, empty outputs, and other signals that the query technically ran but semantically missed the user's intent.

06

Traced response

The final answer returns with the supporting trail intact, so the user can inspect the generated SQL, review validation signals, and recover quickly if the system drifted.

Section 06

The Decisions That Shaped the Build

Make planning visible

I treated the system plan as a first-class artifact because hidden intent resolution is where trust usually starts to erode.

Validate before summarizing

I separated answer generation from answer permission. The model can propose SQL, but it does not get to narrate success until the data path has earned it.

Persist the full interaction trail

Storing query attempts, validation outcomes, and execution traces makes recovery and iteration much faster than forcing users to restart from scratch.

Optimize for operator recovery

When the system fails, the user should be one step away from understanding why, not three layers away behind hidden orchestration.

Section 07

Failure Modes and Tradeoffs

Hallucinated SQL can still look credible

The hardest failures are not syntax errors. They are plausible queries that run cleanly and return the wrong business answer with enough confidence to mislead the user.

Schema mismatches break otherwise good plans

Even strong intent modeling falls apart when table names, metric definitions, or join assumptions drift away from what the generator thinks is true.

Hidden reasoning creates trust debt

If the user cannot inspect the operational path, the system turns every surprise into a credibility problem. That is why I exposed artifacts instead of asking for blind trust.

Validation adds friction, but it is worth it

Every validation layer introduces latency and implementation overhead, but removing those checks would save milliseconds at the cost of long-term trust.

Section 08

What Changed

The biggest shift was not that the system answered faster. It was that the interaction became safer to trust. Users could see the generated SQL, inspect validation signals, and understand the boundary between model assistance and verified execution.

That changes adoption dynamics. Instead of treating analytics AI like a novelty, Planera makes it easier to use the system as a controlled workspace where visibility, recovery, and confidence are built into the product.

Section 09

What I'd Improve Next

Schema-aware memory

I would push more durable understanding of business definitions, common joins, and prior query patterns into the planning layer so the system gets better at repeat analysis.

Stronger repair loops

The next step is better guided recovery when validation fails, especially around automated query repair and clearer explanations of what changed between attempts.

Deeper evaluation coverage

I would expand the evaluation harness to stress ambiguous prompts, schema drift, and edge-case metric definitions so trust does not depend on only the happy path.