☕ No-Code Coffee · February 19, 2026

Building with a Team of
AI Developer Agents

John Rice

John Rice

15 years in software · Startup builder · AI agent architect

github.com/JohnRiceML
Peekaboo Peekaboo
TicketToPR TicketToPR
SubredditSignals SubredditSignals
Narrative Nooks Narrative Nooks
Locala Locala
Mochi Mochi

Who Am I

15 years building software in startups

Full-stack engineer

15 years across startups, agencies, and enterprise

Startup builder

6+ SaaS products shipped — Peekaboo, TicketToPR, SubredditSignals, Narrative Nooks, Locala, Mochi

Consulting

Enterprise SEO & local visibility tools (Next.js, real API integrations)

Now: AI agent architecture

Building products with teams of specialized AI agents

The realization

"I've been writing code for 15 years. But last year, I stopped writing code — and started managing agents that write it for me."

Not a newcomer experimenting with AI.
An experienced engineer who changed his entire workflow.

400,000+ lines of code

5 products. ~3 months. Solo.

Peekaboo ~250K
SubredditSignals ~90K
Mochi ~60K
TicketToPR ~15K
Locala ~10K
2,237 contributions in 2025 478 contributions in 2026
1,162

commits (Peekaboo alone)

13

specialist AI agents

1

person

But I'm not vibe coding.

1.7x

more issues in unreviewed
AI-generated code

The New Stack, 2026

2.74x

more security
vulnerabilities

The New Stack, 2026

45%

of AI code introduces
OWASP Top 10 vulns

Veracode, 2025

I review almost everything.

Quick show of hands

🖥 Who here writes code?
👥 Who manages developers?
Who has never opened a terminal?

One More

Who uses what?

Claude Code

CLAUDE.md

Cursor

.cursorrules

OpenAI Codex

AGENTS.md

Something else? No AI tools yet? All valid.

My Tool of Choice

Claude Code CLI

These principles work across all of these tools. The handbook pattern, the guardrails, the review process — they're universal.

But my preference is Claude Code and the Claude CLI.

1. Terminal-native — fits how I already work
2. CLAUDE.md — richest config system (38KB in my projects)
3. Hooks — enforce rules at the system level, not just suggestions
4. 80.9% on SWE-bench — highest agentic coding benchmark
Claude Code CLI running in terminal

Claude Code — Opus 4.6 in the terminal

The AI Team

Think "AI Employee," not "AI Tool"

Me — CEO / Product / Architect
Product Director
Execute Agent
Code Reviewer
QA Auditor
Pipeline Medic
UX Director
Lead Backend
Lead Frontend
Lead Architect
Data Analyst
SEO Specialist
Source Auditor
UX Researcher

13 specialist agents — Peekaboo

The AI Team

How agents talk to each other

I give the CLI a feature request. This is what happens — from my actual /.claude/agents/ directory:

Product Director
Sonnet · creates spec
PRD-lite DB / API / Lib / UI Acceptance criteria Rollout plan
Lead Frontend
Sonnet · UI layer
Lead Backend
Sonnet · API + DB
parallel
Lead Architect
Opus · KISS review
Scoring Auditor
Pipeline Medic
Source Auditor
Brand Scorer
domain-specific QA
You
final gate

Shared brain: Every agent reads CLAUDE.md (38KB) on every run. They don't call each other — they read the same constitution.

Token cost: Opus for complex decisions. Sonnet for implementation. A full spec-to-QA cycle runs $2–8 depending on scope.

The Employee Handbook

Every AI Tool Has One Now

AI ToolConfig File
Claude CodeCLAUDE.md
OpenAI CodexAGENTS.md
Cursor.cursorrules
Windsurf.windsurfrules
GitHub Copilot.github/copilot-instructions.md
Google Gemini CLIGEMINI.md
Your Company HandbookAI Handbook
"Never share customer data""Never expose emails in API responses"
"All purchases need approval""Never install packages without asking"
"Follow the brand guidelines""Use the design system colors"
"Report incidents immediately""Log all errors with context"
Every serious AI coding tool has converged on this pattern.
Human employees forget the handbook. AI employees read it every single time.

How Features Get Built

Two paths, one engineer

🤖

Path 1: Fully Automated

Low-hanging fruit, simple customer requests,
co-founder asks

TicketToPR Notion board

I write a ticket. AI builds it. I review the PR.

🧑‍💻

Path 2: Hands-On + AI

DB migrations, cron jobs, complex user flows,
large architecture tickets

Claude Code CLI VS Code

I drive. Claude writes tests, functions, edge cases.

Match the tool to the task. Not everything needs the same pipeline.

Path 1 — Fully Automated

The TicketToPR Notion board

TicketToPR Notion Board
📝 Backlog
🔍 Review
📊 Scored
⚡ Execute
✅ PR Ready

Plain English tickets in Notion. AI handles everything in between.

Path 1 — Behind the Scenes

What happens when you drag to Execute

1
Pull
System pulls the ticket from Notion, reads the full context
2
Score
AI reads the codebase, scores ease (1-10) and confidence (1-10)
3
Plan
Creates implementation plan: files affected, approach, estimated cost
4
Build
If confidence is high enough to one-shot — it builds, tests, and creates a PR

Behind the scenes (my terminal):

$ ticket-to-pr execute --ticket abc123 Pulling ticket: "Add sort-by-visibility" Scanning codebase... 243K lines Ease: 8/10 Confidence: 9/10 ← can one-shot Files: 3 affected Cost est: $0.42 Creating worktree... Implementing... ████████░░ 80% Running tests... ✓ 14 passed Creating PR... ✓ #47 ready

My total time

~7 minutes

Write the ticket + review the PR. That's it.

Path 2 — Hands-On + AI

Complex work with Claude Code CLI

What needs hands-on:

  • Database migrations — schema changes, data backfills
  • Cron jobs & pipelines — scheduling, error recovery
  • Large tickets — new features spanning 10+ files
  • Complex user flows — auth, billing, multi-step forms
  • Third-party integrations — APIs, webhooks, OAuth

How I work with Claude:

# VS Code terminal — I'm driving me: "Write a migration that adds a visibility_history table with daily snapshots per brand per platform." claude: Creates migration + types + tests me: "Now write edge case tests — what if a platform returns null? What about duplicate snapshots on the same day?" claude: Writes 12 test cases, 3 catch bugs me: Reviews, tests locally, deploys

Path 2 — Claude as Copilot

What Claude handles vs. what I handle

Claude writes:

  • Test suites — unit, integration, edge cases
  • Utility functions & helpers
  • Type definitions & interfaces
  • Boilerplate — API routes, form validation
  • Refactors — "extract this into a hook"

I decide:

  • Architecture — what goes where, why
  • Data model — schema design, relationships
  • Testing strategy — what to test, what matters
  • User flows — how it should feel, edge cases
  • Ship/no-ship — final call is always mine
I'm not asking Claude to build the feature. I'm asking Claude to help me build it faster.

How Features Get Built

Cost comparison

AI Agent TeamJunior DeveloperAgency
Monthly cost$100–300$5,000–8,000$10,000–25,000
Available24/7Business hoursBusiness hours
Ramp-up timeInstant2–4 weeks1–2 weeks
Follows rulesEvery timeSometimesVaries

Core Principles

KISS — One Function, One Job. You Already Do This.

As a developer, you already apply KISS every day:

Over-engineered:

class UserServiceFactoryProvider { create(type, config, opts, flags) { // 200 lines handling every // possible edge case upfront } }

KISS:

function getUser(id) { return db.users.findById(id); } // Add complexity only when // you actually need it
One function, one job. One module, one concern. The same rule scales to agents.

Core Principles

KISS for Agents — One Prompt, One Responsibility

The exact same principle applied to your AI team:

Vague mega-prompt:

"Review the code, fix any bugs, update the tests, refactor if needed, and make sure everything is good." // 5 jobs = 0 accountability

Focused agent spec:

Role: Scoring Auditor Job: Verify visibility formula Protocol: 1. Locate canonical formula 2. Search all usages 3. Validate each matches 4. Run unit tests 5. Report: pass / fail
Simple doesn't mean short. Simple means unambiguous.

Core Principles

SOLID for Agents — Same Rules, New Team Members

S
Single Responsibility
Bad: One agent builds features, reviews code, AND deploys
Good: Execute Agent builds → Code Reviewer reviews → Pipeline Medic deploys
O
Open/Closed
Bad: Rewrite the agent prompt every time you add a feature
Good: Agent reads CLAUDE.md rules — add new rules without changing the agent
I
Interface Segregation
Bad: Every agent can read files, write files, run commands, push code
Good: Auditors are read-only. Only the Execute Agent can write. Only CI can deploy.
D
Dependency Inversion
Bad: Agent A calls Agent B directly, creating a fragile chain
Good: All agents depend on CLAUDE.md as the shared contract — swap any agent freely

Core Principles

Testing, Testing, Testing

Unreviewed AI code — the hard data:

FindingDetailSource
45% fail OWASP Top 10AI code tested across 100+ LLMsVeracode, July 2025
3% vs 21% secureDevs with AI wrote secure code 3% of the time; without AI: 21%Stanford / Dan Boneh, ACM CCS
+41% complexityPersistent increase in code complexity after AI adoptionCarnegie Mellon, Nov 2025
19% slowerExperienced devs were slower with AI (predicted 24% faster)METR Randomized Trial, 2025
Only 10% scan AI code80% of devs bypass security policies for AI outputSnyk, 2025

The test is the contract between you and the agent.

Core Principles

Agent reviews agent

Execute Agent
writes code + tests
Scoring Auditor
verifies formulas
Code Reviewer
checks quality
You
final approval
Don't assume — verify. Have the system spin up a new agent
to review what the first agent built.

Core Principles

Definition of Done

Think of it like a home inspection — you don't accept the work until every item checks out.

Data looks right

Does the API return what we expect? Are the fields correct?

Tests prove it works

Automated checks that say "yes, this does what it's supposed to"

It runs in production

Not just on my laptop — it works when real users hit it

You can explain WHY

If you can't explain the approach, there's probably a hidden bug

"Done" doesn't mean "it seems to work." It means it passed the inspection.

Core Principles

Ground agents in real data

Don't guess. Measure. Then tell the agent what you measured.

What most people do:

"Make the API calls work and handle errors." → No idea what "work" means → No idea what errors look like → Agent guesses. Guesses wrong.

What I do instead:

1. Call the API. See what comes back. { id: "abc", visibility: 0.73, platform: "chatgpt" } 2. Write a test for that shape. → "Does it have id? Is visibility a number between 0-1?" 3. Give the agent real benchmarks. → "ChatGPT responds in 58s. Timeout at 70s. Alert at 90s."
Agents make great decisions when you give them great data — not vague instructions.

Safety & Guardrails

Four layers of safety

1
The Handbook (what they should do)
CLAUDE.md rules — like project specs for a contractor
2
Locked Rooms (what they can't touch)
Database, payments, config — "don't touch the electrical without a permit"
3
Security Cameras (what they did)
Complete audit trail of every file opened, changed, command run
4
The Final Gate (human approval)
Nothing reaches customers without you approving it. Period.

Safety & Guardrails

Hooks: enforcement that never forgets

# Block dangerous commands automatically hook: pre-command block if contains: rm -rf → no mass deletion git push --force → no rewriting history prisma db push → no schema changes DROP TABLE → no data destruction # These fire every time. Not guidelines — gates.
Instructions can be forgotten. Hooks can't.

Safety & Guardrails

Real rules from my CLAUDE.md

🚨 CRITICAL DATABASE SAFETY RULE 🚨 NEVER run `npx prisma db push --accept-data-loss` without explicit user permission. NEVER drop tables or columns without backup confirmation. NEVER modify schema.prisma without review. 🚨 CRITICAL DEPLOYMENT RULES 🚨 NEVER force push to main. NEVER skip tests before merging. NEVER modify auth or billing code without review. # File restrictions schema.prisma → locked (database structure) stripe.ts → locked (payment processing) .env → locked (secrets)

Why this rule exists:

Claude ran prisma db push --accept-data-loss on my production Peekaboo database.

Entire database — deleted.

The lesson:

Backups saved most of it. But some customer data was permanently lost. Every rule in this file was written in blood.

Safety & Guardrails

Vibe coding vs. agent engineering

Vibe CodingAgent Engineering
No constitution38KB CLAUDE.md
No testsTesting-first, agent-reviews-agent
No reviewHuman reviews every PR
No guardrailsHooks + locked files + build gates
"It works on my machine"Definition of Done enforced

The engineering principles don't go away because the author is artificial.

Safety & Guardrails

When guardrails are missing

!
Moltbook (Jan 2026)
AI built the backend, never enabled Row Level Security. 1.5M API keys leaked in 3 days.
!
Replit (2025)
AI tool deleted a live database during a code freeze. 1,200 executives' data wiped.
!
The Vibe Coding Hangover
~10,000 startups built with AI. 8,000+ now need rescue engineering ($50K–$500K each).

Real Results

What I've built with AI agents

Peekaboo Customers

Brand visibility analytics across 5 AI platforms. 1,162 commits, 13 agents.

TicketToPR Open source

Notion-to-PR automation. AI scores feasibility, builds features, creates PRs. On npm.

SubredditSignals

Reddit lead intelligence. AI-scored leads, sentiment analysis, estimated deal value.

Narrative Nooks

AI-powered personalized children's stories. Custom illustrations and interactive reading.

Locala

Local business discovery platform. Location-based recommendations and search.

Mochi

Reddit content strategy & scheduling. AI-powered subreddit analysis and auto-posting.

All built with the same approach: CLAUDE.md + specialist agents + human review.

Peekaboo — aipeekaboo.com

Built by one person + 13 AI agents

10M+

sources analyzed

3M+

AI responses

5

AI models

24/7

automated

Peekaboo Dashboard

What Peekaboo tracks

How brands appear in ChatGPT, Gemini, Perplexity, Google AI Mode, and Google AI Overview. Competitive intelligence, visibility scoring, traffic estimates.

Traditional equivalent

3–5 developers, 4–6 months, $200K+ in salary. Built by one person + AI agents at a fraction of the cost.

Mindset

The shift

Before

Write code all day

Debug all night

No time for product

After

Architecture & design

Product strategy

Code review & customer calls

Agents handle the implementation. You handle the thinking.

Mindset

The feedback loop

Agent makes
a mistake
Update
CLAUDE.md
Never repeats
that mistake
Unlike human team members, agents actually read the post-mortem document every single time.

Free & Open Source

github.com/JohnRiceML/ai-agent-playbook

  • Starter CLAUDE.md template
  • Agent spec templates
  • Real examples (scoring auditor, pipeline medic)
  • Founder guide — manage AI without coding
  • Developer guide — engineering principles
  • Core principles — KISS, SOLID, testing, DoD

AI solves the how,
not the what.

Your job is knowing what to build.
Start with a CLAUDE.md. Add one rule. See what happens.

"Cursor and Claude Code are great at helping build software once it's clear what needs to be built. But the most important part is figuring out what to build in the first place."
— Y Combinator, Spring 2026

Let's connect

AI Agent Playbook QR

📖 AI Agent Playbook

Free templates, guides & examples

Peekaboo QR

Peekaboo

aipeekaboo.com

TicketToPR QR

TicketToPR

tickettopr.com

GitHub: JohnRiceML LinkedIn: /in/johnwilliamrice X: @JohnRiceML

Thank you! Questions?

Presenter Notes