Page
Work · Case study
Subject
Internal AI agent · Drops, a Kahoot! Company
Role
Senior PM · Solo build
Filed
2026
CASE.01 · Case study Featured Agentic AI Product intelligence Internal tool 2026

“One agent for every place our context lives.”

I built Drops an internal AI agent that turns scattered product context into answers the team can trust.

Our product context lived in a dozen systems: Confluence, Jira, Slack, Amplitude, GrowthBook, Sanity, RevenueCat. Pulling it together for a decision was slow, high-effort work reserved for the big calls. So I built a tool-using agent that reads across all of them, works in four modes with 17 on-demand skills, and cites every claim back to source. It’s live internally and the team uses it daily.

FIG.A · THE SYSTEM, LIVE 7 sources → 1 agent → 4 digests
Internal AI agent system diagram Seven data sources (Confluence, Jira, Slack, Amplitude, GrowthBook, Sanity and RevenueCat) feed into a single agent that works in four modes with seventeen on-demand skills, which produces four scheduled digests: an anomaly pulse, weekly feedback themes, a competitor scan and experiment readouts. LIVE · INTERNAL TO DROPS READ-ONLY Confluence Jira Slack Amplitude GrowthBook Sanity RevenueCat AGENT 4 modes 17 skills Anomaly pulse Feedback themes Competitor scan Experiment reads 7 SOURCES → 1 AGENT → 4 DIGESTS CUSTOM INTEGRATIONS · EVERY CLAIM CITED
Problem · The brief

Our context was spread across a dozen systems, and pulling it together was a job in itself.

I’d joined an AI experimental team at Drops, and this was one of the first projects we prioritised. The reason was simple: our data lived everywhere. Delivery in Jira, decisions in Confluence, experiments in GrowthBook, behaviour in Amplitude, content rules in Sanity, monetisation in RevenueCat, and user feedback scattered across Slack. Any real investigation meant stitching those sources together by hand.

On a small team with an ambitious roadmap, that effort had a cost. Combining everything into one coherent picture was slow enough that we only did it for the biggest calls, and even then it usually pulled product, data and sometimes engineering together. Good decisions still got made, but later than they should have, and on a narrower slice of what we actually knew.

Approach · How I ran it

AI’s real superpower is context. So wire the agent into everything, and make it show its work.

The bet was simple: an agent’s real edge is holding context. If one tool could read across every source we use and reason over them together, the half-day investigation becomes a question you ask in plain language. So I built a tool-using agent, not a chatbot, with its own custom integrations into each system, hosted internally so anyone on the team can use it.

The hard part was keeping it fast and trustworthy at the same time. I gave it four modes and seventeen skills that load on demand, so a simple question stays lean while a complex one pulls in exactly the methodology it needs. Then I wrapped it in a harness: it never invents a number, and every statistic, quote or finding links straight back to the Confluence page, Jira ticket or Slack thread it came from, so the team can check the source rather than just trust the agent.

  • STEP.01 Wire every source Custom integrations into Confluence, Jira, Slack, Amplitude, GrowthBook, Sanity and RevenueCat. Read-only by design, so it reads live state and never writes back.
  • STEP.02 Modes + on-demand skills Four modes (strategy, feedback, competitive and an experimental personas mode), backed by 17 skills the agent loads only when a question needs them. Lean by default, deep on demand.
  • STEP.03 A harness for trust Quality control on every answer: no fabricated stats, and a citation on each claim back to its source. A confident wrong answer is worse than no answer.
  • STEP.04 Automate the recurring work I turned the same wiring on other teams’ manual jobs: a daily anomaly pulse, a weekly feedback-theme digest, a competitor digest and weekly experiment readouts, running in production.
  • STEP.05 Live, and built to hand off It’s live internally and used daily. I’m iterating on response quality from the team’s feedback, and writing its documentation so less-technical teammates can extend it with Claude Code, without depending on me.
Outcome · The numbers

One place to ask, and an answer the team can trust.

wired into one agent
7 sources
read live, in one place
analytical lenses
4 modes
strategy · feedback · competitive
loaded on demand
17 skills
lean, deep when needed
cited to source
Every claim
no fabricated numbers

I’ve watched the team fold it into how they actually work. Project status that used to mean piecing together Slack threads and Confluence docs is now a single question. PMs use it to strategise roadmaps and epics from user feedback, reviews and past decisions at once, to check where a sprint has got to, and to pull together their own updates.

It’s become a genuine analysis layer over our data, too: people read Amplitude and RevenueCat to inform decisions, and connect a GrowthBook experiment back to the user feedback and themes that explain why a variant is behaving the way it is. The real change is reach. This work now happens routinely, on the full picture, instead of only on the calls big enough to justify the effort.

Reflection · What I'd do differently

The hard part was context, not wiring.

The real work was context management and progressive loading: keeping the agent fast and cheap on simple questions while making sure the complex ones still had everything they needed to answer well. That balance, lean by default and deep on demand, is what I’d point to as the genuine engineering.

The part I enjoyed most was building the skills, each one a small, sharp piece of methodology the agent reaches for when a question calls for it. Getting that library right is the difference between a chatbot and something that does the analysis.

Learning · How this changed me

Consistency is the whole game, and the work that never ends.

My biggest lesson was about the harness and quality control. Making an AI respond exactly the way you want, every time, across a genuinely wide range of requests, is hard, and response quality is the thing I’m still iterating on most. I built harnesses for each new use case. As the team leans on it, their feedback keeps surfacing the next thing to tighten, and I work through it piece by piece.

The other lesson is about handing it off. The interesting challenge now is making the agent work confidently for everyone else, not just for me. So I’m writing its documentation to encode the goals, the non-negotiables and the room to move, so a less-technical teammate can open it with Claude Code and improve their own scenarios without waiting on me.