AI ProductAgentic AILLM Integration

Ask Iris: An AI Agent That Does the Experimentation Work

I designed and built Ask Iris — an agentic AI assistant embedded in the Iris platform that doesn't just answer questions about experiments, it runs workflows: writing specs, analyzing results, generating test ideas, and surfacing insights across 12,000+ experiments.

17 tool capabilities that write specs, analyze results, and surface insights.
Unlocks cross-program intelligence that compounds client impact over time.
Gives teams their time back — no more searching, formatting, or starting from scratch.

The Problem

How can we unlock 12,000 experiments worth of insights to improve CRO program performance?

Iris solved the operational problem — the testing workflow was streamlined and knowledge was accessible at a smaller scale. But when a program has over 1,000 past tests, and you want insights across programs and industries, using all of that data efficiently becomes a real challenge.

Strategists spent significant time digging through past experiments to avoid re-testing and to build novel strategies — manually reconstructing context that already existed somewhere in the system.

There was a second problem: repetitive tasks like spec writing and results analysis consumed hours per experiment, and quality varied depending on who did the work. Ask Iris didn't just make these faster — it standardized and improved the output.

📝
Spec Writing
Program managers used spec templates to bootstrap experiment specifications, but still needed to manually craft all the requirements engineers use to build.
📊
Results Analysis
Pulling results from testing platforms, contextualizing findings, and writing closeout reports was time-intensive and repetitive across every experiment.
🔍
Institutional Knowledge
With 12,000+ experiments across 300+ clients, no one could keep it all in their head. Past learnings were technically available but practically inaccessible.

What I Built

An AI assistant that acts, not just answers.

Ask Iris is an AI assistant embedded directly in the Iris platform. Users interact with it through natural language, but under the hood it's an agentic system with 17 purpose-built tools that can read, write, search, and take action across the entire experiment database.

The key distinction: Ask Iris doesn't just answer questions — it does work. It writes specs, analyzes closeout results, generates test ideas and prioritizes them using ImpactLens, captures screenshots for UX analysis, and searches a vector knowledge base. It's the layer that turns a breadth of experiment data into accessible, actionable insights.

Ask Iris generating an experiment specificationAsk Iris analyzing experiment resultsAsk Iris generating an impact presentation

17 tools across four capability areas:

Experiment Management
Fetch, list, and count experiments with filters
Create and edit experiments
Retrieve test images and templates
Access swimlanes and KPIs
Spec & Analysis Workflows
Two-step spec generation from hypothesis to full spec
Multi-turn closeout wizard for results analysis
Impact presentation generation
Knowledge Base
Semantic search across experiment history
Document retrieval by ID
Filtered queries by client, status, and outcome
Research & Capture
Webpage screenshot capture for UX analysis
ImpactLens test idea generation
Interactive confirmation flows for destructive actions

Under the Hood

Production AI, built with intention.

Ask Iris runs as a Next.js application embedded in the Iris platform via iframe, with its own authentication layer, streaming infrastructure, and state management. I made deliberate architecture decisions to ensure it could operate reliably at scale across 300+ multi-tenant client environments.

Iris Platform (Parent App)
JWT auth + PostMessage
Ask Iris — Next.js App
💬
React Chat UI
Assistant-UI + streaming
🤖
AI SDK + Tools
Vercel AI SDK + 17 tools
📋
Prompt Engine
LangSmith versioning
OpenAI
GPT-4o inference
Qdrant
Vector search
Redis
Sessions + state
Iris API
Experiment data

Key architecture decisions:

Why LangSmith for prompt management?
Prompts are the product logic in an AI application. Hardcoding them means every change requires a deploy. LangSmith lets me version, evaluate, and update system prompts in production without touching code — and the tracing capabilities give visibility into every conversation and tool execution for debugging and quality improvement. This also makes prompts accessible to SMEs, who are frequently best positioned to iterate them for expected output.
Why Qdrant for knowledge base search?
We needed semantic search — the ability to ask natural language questions across 12,000+ experiments and get relevant results, not just keyword matches. Qdrant was the right fit for two reasons: it’s fully API-driven, so we could ingest text chunks directly without creating and managing document objects, and its robust metadata filtering and sorting let us layer structured filters (client, status, outcome, experiment type) on top of semantic search. That combination is what makes Ask Iris queries like ‘what checkout tests have won for ecommerce clients in the last year?’ actually work.

My Role

Product vision, system design, and hands-on building.

Ask Iris was my initiative from concept through production. I identified the opportunity — that Iris's structured experiment data was an untapped asset for AI — pitched the vision to leadership, defined the product scope, designed the agent architecture, wrote the system prompts, and led the engineering effort to ship it.

This isn't a product I managed from a distance. I wrote the prompt architecture, defined every tool schema, designed the multi-step workflows for spec writing and closeout analysis, and built the adoption strategy that drove usage across the team. I work directly in the codebase alongside engineering — prototyping with AI-assisted development tools, debugging tool execution chains, and iterating on prompt behavior based on LangSmith traces.

Product
Opportunity identification, vision, roadmap, success metrics, adoption strategy
AI / Design
Agent architecture, prompt engineering, tool schema design, workflow design
Technical
System architecture, AI-assisted development, debugging, LangSmith observability
Leadership
Executive pitch, change management, cross-functional alignment, usage coaching

The Foundation

Ask Iris exists because Iris exists.

The most important product decision behind Ask Iris was made years earlier: structuring experiment data in Iris so it could be queried, analyzed, and acted on programmatically. Without that structured data layer — 12,000+ experiments with consistent schemas for hypotheses, specifications, results, and learnings — there's nothing for an AI agent to work with. Ask Iris is the payoff of building the right foundation first.

📊
Iris Platform
The experimentation platform I built from zero — managing the full testing lifecycle for 300+ clients. Its structured data layer is what makes Ask Iris possible.
See Iris case study
🎯
ImpactLens
A predictive modeling engine that uses historical experiment data to forecast which tests will drive the highest impact — lifting average client outcomes by 103%.
See ImpactLens case study

Building AI products on top of real data?

I've designed agent architectures, written production prompt systems, and shipped AI tools that do real work — not demos. Let's talk about what you're building.

Get in Touch