Five Humans.
Eleven AI Agents.
One Product (Web + Native iOS).

How a small team used Claude Code and Codex to build a full-stack product -- responsive web for desktop and phone browsers plus a native iOS app -- with the rigor of a 10-person engineering org, in under two months.

Built by hobbyists and collectors who wanted a more transparent way to research pressings, maintain the product, and explain the work.

5
Humans -- CEO, advisor, board member, designer, social media
11
AI agents with defined roles and accountability
80+
Docs that give every agent full context
< 90m
Bug report to production fix -- zero human touch
Scroll to explore

The data exists.
The experience doesn't.

Discogs built something remarkable: the world's most comprehensive music database, contributed to by millions of collectors. It's the backbone of the vinyl community. But the collector experience -- especially on mobile -- hasn't kept pace with the depth of the data.

10+ tabs
Required to compare pressings of one album
Buried
Pressing details, mastering credits, and community ratings hidden across multiple pages
Gap
No mobile-first tool for crate-digging with your collection in hand

Discogs is irreplaceable -- we use their API, their data, their community contributions. We're not competing with Discogs. We're building the experience layer on top of their incredible database.

"Why do I have to open 20 tabs to compare pressings?"

-- r/vinyl, Steve Hoffman Forums, and every collector you know

Discogs has the data.
Deadwax has the experience.

A vinyl collector's companion that works from your desktop, your phone browser, or the native iOS app. Android users can use the mobile-friendly site without waiting for a separate app.

Pressing Intelligence
Original pressing, top-rated, audiophile editions, mastering engineers -- one panel
Any Screen
Use the responsive site on desktop, iPhone, iPad, or Android browsers
Vinyl Only
Not a filter. A product identity. No CDs, no cassettes.
Wantlist Intel
See pressing details before you buy. Know what you're getting.

"This is what Discogs should have built years ago."

-- The reaction we're designing for

Humans own judgment.
AI owns execution.

Five humans bring taste, domain expertise, creativity, and community voice. Eleven AI agents handle strategy, code across the web app, native iOS, and Android, testing, design specs, and legal compliance. They coordinate through markdown files in Git -- no Slack, no Jira, no Notion.

Deadwax team from left to right: Jason, Hope, Caden, and Chris at Easy Street Records in West Seattle.
Left to right: Jason, Hope, Caden, and Chris at Easy Street Records in West Seattle. David advises the AI workflow from outside the photo.

The Humans

Jason
CEO & Founder
Product vision, strategic decisions, agent orchestration. The only person who can approve direction changes, merge order, and go/no-go calls.
David
AI Workflow Advisor
Consultant on agentic AI workflows. Helped define and diagnose context bloat, token optimization, and the tools-vs-skills-vs-scripts framework that keeps the agent system efficient.
Chris
Board Member
Pressing domain expert. Co-curated the Top 25 album seed list. Validated audiophile label rankings and mastering engineer tiers.
Hope
Visual Creative
Logo, wordmark, and brand visual identity. Translates the design system into the visual artifacts that define how Deadwax looks and feels.
Caden
Social Media
Manages @deadwax.io on Instagram. Community voice and engagement. Building audience ahead of public launch.

The AI Agents

Director
Claude Code
Daily orchestration. Reads CEO decisions, creates task packets, sets merge order, escalates blockers.
Product Manager
Claude Code
PRD, backlog, user stories, acceptance criteria. Prioritizes by RICE framework.
Designer
Claude Code
Wireframes, design system, UX specs. No UI ships without a design spec.
Marketing
Claude Code
Positioning, GTM strategy, community launch planning.
Legal
Claude Code
Privacy, Discogs API compliance, naming, data handling.
Architect
Codex
Technical decisions, ADRs, infrastructure. Reviews every code PR before merge.
Backend Dev
Codex
OAuth, API proxy, Lambda handlers, Pressing Intelligence pipeline.
Frontend Dev
Codex
React components, routing, Tailwind styling, mobile layout.
Tester
Codex
Playwright E2E, Vitest unit tests, CI/CD pipeline, code coverage.
Swift iOS Dev
Codex
Native SwiftUI app for iPhone + iPad. Ships directly against the deadwax.io API so the native surface stays truthful to the same single source of truth the web uses.
Android Dev
Codex
Native Kotlin + Jetpack Compose development. Owns the Material 3 Android implementation and keeps collector workflows aligned with the same Deadwax API the web and iOS surfaces use.

The model is the engine.
The harness is the car.

People assume the magic is "the AI." It isn't. The same Claude or Codex model that anyone can rent by the hour is what we use. The difference between a chatbot that forgets your name and an agent team that actually ships a product is two disciplines: context engineering and harness engineering. They are the real job, and the harness side is where most of our time goes.

Discipline 1

Context Engineering — what the AI sees

An AI agent has no memory between sessions. Every morning it shows up like a contractor on Day 1 who has never heard of the project. Context engineering is deciding exactly which files, decisions, and instructions get re-read into its head before it picks up a wrench.

Our version of it: a single lean daily packet (TODAY.md, under 50 lines) and a today-only context file (COMMS-TODAY.md). Permanent strategy lives in 80+ dedicated docs that get loaded only when relevant. Config files are pointers, not encyclopedias. Every token of instruction we send costs a token of work we get back.

Discipline 2 — Where We Live

Harness Engineering — the world the AI lives in

If context engineering is what the AI sees, harness engineering is the entire workshop around it: which tools it can reach for, what it's physically prevented from touching, when it wakes up, who reviews its work, what happens when it gets stuck. The harness is everything that isn't the model itself or the prompt.

Why we lean into it: models change every few months. Prompts get rewritten weekly. But the harness — the rules of the workshop — is the thing that compounds. A well-built harness makes a mediocre prompt safe; a bad harness makes a brilliant prompt dangerous. We spend more time on the harness than on the agents themselves.

What the harness actually looks like, in plain English

Think of each AI agent as a smart but reckless intern. The harness is everything we built so the intern can do real work without burning the building down.

Walls
The agent literally cannot see files it shouldn't touch
Every coding agent works in an isolated copy of the codebase (a "worktree"). Strategy docs, decision logs, and brand guidelines are physically removed from that copy before the agent ever opens it. You can't accidentally overwrite a file you can't see. This was a lesson we learned the hard way after three days of silent data loss.
Locks
Dangerous actions are blocked at the door, not asked about politely
The agent can't delete Discogs records. It can't leak an API key. It can't merge its own code. Every pull request is automatically scanned, tested, type-checked, and reviewed by a separate Architect agent before a single line reaches users. The agent doesn't have to be trusted — it isn't given the option.
Alarms
A clock wakes each agent up; nothing self-starts
We learned the hard way that some agents will sit silently for eleven days if nobody calls them. So the harness includes a literal cron schedule: the bug-fix loop runs every hour, the Director kicks off every morning, the design and legal agents have standing slots on the calendar. "Delegated" is not "done" until something mechanical actually fires.
Tools
Cheap tools first, expensive tools only when they earn it
We give every agent a tiered toolkit: a small shell script for the boring stuff, the GitHub command line for routine work, the full API only when the simpler tools can't do the job, and heavy specialist tools last. Picking the right tool is itself part of the harness — the wrong tool burns context and money for no extra value.
Help
When an agent gets stuck, the harness routes it to a second opinion
Claude agents can hand a hard problem off to a Codex rescue agent. Codex agents can hand a confusing diff back to a Claude reviewer. The harness knows who to call. The agent doesn't have to figure it out, and it doesn't get to silently give up.
Gates
Humans block the pipeline at specific, named places
Hope sees every brand change before it ships. Chris validates every domain claim. Jason approves direction. The harness physically pauses at those gates — it doesn't politely ask the agent to wait, it stops the line. AI proposes, humans approve, the harness enforces.

"You don't make an AI agent reliable by writing a better prompt. You make it reliable by building a workshop where reliability is the only thing it's allowed to do."

-- The core insight behind harness engineering

We built one harness to launch.
Then we rebuilt it to last.

The harness above isn't the harness we started with. The biggest thing we learned running an AI team is that the workshop has to match the shape of the work — and the shape of the work changed the day we shipped. The launch harness was tuned for a sprint. Keeping a live product healthy is a different job, and using the launch harness for it slowly rotted our own paperwork. So we changed the harness, not the model.

Era 1 -- Zero to One

The "Day N" harness

To get from nothing to a shipped app, every day was a numbered container. Each morning the Director wrote one lean plan ("Day 53"), the Codex orchestrator spawned the coding agents against it, work merged in a fixed order, and every night a single heavy "close" archived everything that happened and set up tomorrow.

Why it worked: a 0-to-1 build is a plannable daily batch. The day container forced a clean plan → build → review → ship → close loop and gave us one tidy "here's everything that happened today" story. For a launch sprint, that rhythm is exactly right.

Era 2 -- Live & Sustained

The two-lane harness

Once real users arrived, work stopped showing up in neat daily batches. It became a continuous stream of small bug reports, punctuated by the occasional big feature. Forcing that stream into a daily ceremony is exactly why days started staying "open" for days, the heavy nightly close kept getting skipped, and our decision logs and branches drifted out of date.

The fix -- split the work into two lanes: an Ops / Sustain lane where bug fixes flow continuously and the merge is the close (one line in a running log, no nightly ceremony), and a Feature Epic lane that keeps all the heavyweight planning machinery -- but scoped to a feature, not to a calendar day.

Era 3 -- Growth Mode

A balanced human-AI desk

Past launch, the bet shifted from speed to judgment. The two-lane harness mostly runs itself now, so the job changed again: keep a live product healthy while we grow it. We optimize for sustained engineering, growth features, and learning -- not launch throughput.

The team rebalanced too: two evenly funded AI subscriptions split the work -- Claude Code as the primary builder and operator that writes most of the code and opens the pull requests, and Codex as the review-and-specialist layer that gives every PR an architecture review before merge and handles the native iOS and Android deep cuts. Models and prompts keep changing; the harness keeps compounding.

The root cause was a units mismatch

The day model assumed work arrives in plannable daily batches. Live work doesn't -- it's a flow plus the odd epic. We had been measuring a stream with a ruler meant for boxes. Two separate reviews -- one of our infrastructure, one of our process -- landed on the same diagnosis independently, and it matches how continuous-flow agent systems work in the wider industry.

Before
One daily container for everything
Bugs, features, and admin all got crammed into "Day N." A one-line typo fix and a multi-week feature were tracked the same way. The close was all-or-nothing, so when one item lingered, the whole day lingered with it.
After
Two lanes, each with the right amount of ceremony
Ops work closes itself on merge -- no standup, no archive. Feature epics get real planning, design specs, and a merge order, but only while that feature is live. A simple promotion rule keeps them apart: anything that spans many files or needs more than one pull request graduates from the bug lane into a real epic.
Trade
We gave up the tidy daily narrative on purpose
The honest cost of the change is losing the single "everything that happened today" story. We took that trade gladly: artifacts that stay true beat a tidy ceremony nobody keeps up. A document you can trust is worth more than a ritual you skip.

"The harness isn't a monument you build once. It's a workshop you keep re-tooling to fit the work in front of you. When the work changed from a sprint to a stream, the harness had to change with it."

-- The lesson behind the two-lane pivot

Where does the Director get its information?

The most common question from engineers: "How does the AI team know what to work on?" The answer is a chain of markdown files that replaces Slack, Jira, and standups.

CEO
Jason posts a decision
Written to agents/COMMS.md with a structured tag. Example: "Confirmed: Top 10 album seed list. Mastering engineer tier: Kevin Gray, Bernie Grundman, Steve Hoffman, Chris Bellman, Bob Ludwig." Gets a permanent ID in the Decision Log (DEC-033).
Director
Creates the daily execution plan
Reads CEO decisions, checks blockers, and writes agents/execution/EXEC-YYYY-MM-DD.md with: theme of the day, task packets per agent, merge order (Architect -> Backend -> Frontend -> Tester), and dependencies.
Packets
Lean runtime files every agent reads first
agents/TODAY.md (today's tasks, under 50 lines) and agents/COMMS-TODAY.md (today's context). These are the "morning standup" -- every agent reads them before doing anything.
Agents
Execute, post updates, hand off
Each agent works on their task, posts updates to COMMS.md (PM writes acceptance criteria, Designer posts UX guidance, Backend ships code). Handoffs are explicit: "UNBLOCKED: Frontend can now start."
Close
Director archives and resets
Completed tasks archived to agents/done/DAY-N.md. Decision Log updated. Tomorrow's lean packet created. Nothing is verbal. Everything is traceable.

"If it's not in writing, it didn't happen."

-- The AI equivalent of async communication culture

Principles that made it work

Principle 1

Treat AI Like a New Coworker

Don't prompt them. Manage them. Give them roles, context, constraints, and feedback -- like onboarding a human on Day 1.

Principle 2

The Expert in Claude Is Claude

Ask Claude how to use Claude. It writes its own config, debugs its own workflows, and knows its own limits better than any doc.

Principle 3

Scripts Over Skills

Deterministic tasks go in shell scripts, not AI prompts. Scripts are cheaper, faster, testable, and version-controlled.

Principle 4

Humans Own Judgment, AI Owns Execution

Jason sets direction. David advises on AI workflow optimization. Chris validates domain expertise. Hope creates the visual identity. Caden builds community. AI does everything else.

Principle 5

The Harness Is the Product

Context engineering decides what the AI sees; harness engineering decides what the AI can do, when it wakes up, and what stops it. Models churn every few months — the harness compounds. See The Harness.

Principle 6

Trust but Verify with Automation

Bad code physically cannot reach production. CI enforces security, tests, coverage, and type safety. Verify with machines, not eyeballs.

Human Approval Gates

Not everything is automated. Some decisions deliberately stop the pipeline and wait for a human. AI proposes, humans approve. No workaround, no override.

The Hope Gate Agent proposes a new brand color, icon, or typography change? The work pauses until Hope approves the direction. No visual ships without her sign-off.
The Chris Gate Pressing Intelligence ranks a pressing or flags an audiophile label? Chris validates the domain call. The data is only as good as the expert behind it.
The Caden Gate Community-facing copy, social media voice, or public messaging changes? Caden reviews before it goes live. Brand voice is a human decision.

Every PR goes through this

The same CI/CD pipeline you'd expect from a 10-person engineering team -- enforced by automation.

1
📝
Code
Agent writes in isolated git worktree
2
📈
PR Created
Auto-rebase onto main, strip protected files
3
🔒
Security Scan
Block DELETE ops, token exposure
4
Lint + Types
ESLint + TypeScript strict mode
5
🎭
Tests
Vitest unit + Playwright E2E with video
6
📊
Coverage
Delta report posted as PR comment
7
👁
Code Review
Architect agent reviews inline + summary
8
🚀
Ship
Squash merge, deploy, verify production

Tech Stack

React + Vite
Responsive web SPA for desktop and mobile browsers
Swift + SwiftUI
Native App Store app (iPhone + iPad, iOS 17+)
Kotlin
Language powering the native Android app
Jetpack Compose + Material 3
Android UI toolkit with hand-tuned dark palette
Hilt + Retrofit + OkHttp
Android DI and networking (kotlinx.serialization)
Coil + Navigation Compose
Image loading and screen graph on Android
Xcode + VS Code + Android Studio
IDEs for iOS, web, and Android development
TypeScript
End-to-end types
Tailwind CSS
Mobile-first styles
AWS Lambda
Serverless backend
Deadwax API (/api/*)
REST API -- one backend serving web, iOS, and Android
DynamoDB
Sessions & data
S3 + CloudFront
Static hosting + CDN
Playwright + Vitest
Web E2E + unit
XCTest
Native iOS regression coverage
JUnit5 + MockK + Robolectric + Turbine
Android unit, instrumentation, and Flow testing
GitHub Actions
CI/CD pipeline
Stripe
Payment processing
Buy Me a Coffee
Community donations

User reports a bug.
It's fixed before they check back.

An automated pipeline runs every hour, on the hour. AI triages, implements, tests, and deploys -- zero human touch for safe changes. Larger feature requests and ambiguous asks get flagged for human review.

1
User Feedback
T+0
Widget on site creates GitHub Issue with [feedback] label
2
PM Triages
Every hour
Classifies priority, effort, safety. Auto-fixable bugs get a task packet. Feature requests flagged for human review.
3
Dev Implements
Same run
Reads task, creates branch, writes fix, runs tests, opens PR.
4
CI Validates
~5 min
Security, lint, typecheck, Playwright E2E, coverage.
5
Production
< 90 min
Auto-merge on green. Deploy. Verify production. Log.

Safety rails: Won't auto-fix P0 critical, UX redesigns, auth/security, schema changes, or anything ambiguous. Feature requests and larger asks get flagged for deeper human review. Only safe, scoped bug fixes ship automatically.

Lessons learned the hard way

We run formal post-mortems every 5 days. They're the single most valuable process artifact. Here's what they caught.

Incident INC-001 — Silent Data Loss
What happened

Worktree Merges Overwrote Director Docs

For 3 days, AI dev agents opened PRs that silently overwrote Director docs on main. Decision log entries vanished. Git worktrees snapshot all files at branch creation, and stale copies overwrote current ones on merge.

How we fixed it

Mechanical Enforcement, Not Documentation

Three-layer prevention: sparse checkout (files physically absent from worktrees), pre-PR cleanup script, and a CI gate that fails any PR touching protected paths. Only mechanical enforcement works with AI agents.

Process Failure — Config File Bloat
What happened

CLAUDE.md Grew Into an Encyclopedia

Every time something went wrong, we added more instructions. Agent config files bloated. Agents burned context window tokens reading their own config, leaving less room for actual work.

How we fixed it

Context Is Expensive Real Estate

David helped define the problem and crafted a diagnostic prompt: "Analyze my CLAUDE agentic workflow for token usage, workflow optimizations, and tools vs. skills/script usage." That audit led to aggressive pruning -- details moved into dedicated docs, config files became pointers not encyclopedias, and lean daily packets stayed under 50 lines. Every token of instruction costs a token of output.

11-Day Gap — Agents Don't Self-Start
What happened

Agents Went Dark for Days

Legal agent: 11 days, zero completions. Designer: 4 consecutive missed sessions. Nobody noticed because the Director was busy shipping code. Claude Code agents don't run unless explicitly scheduled.

How we fixed it

Explicit Scheduling for Every Agent

Standing daily schedule with explicit session slots. No implicit expectations. If it's not on the schedule, it doesn't exist. Same lesson any manager learns: "delegated" is not "done."

CI Noise — Ignoring Red Builds
What happened

CI Failures Became Background Noise

ESLint v9 broke lint on every PR. The team treated it as noise -- "oh, lint always fails." This masked real problems for days and created a culture of ignoring red builds.

How we fixed it

CI Failures Are Blockers, Period

No "known failures." If a step fails, fix it or remove it. Noise in CI is indistinguishable from real problems. A broken build that's always broken teaches everyone to stop looking.

Shipped Without Kill-Switch
What happened

No Rollback for Pressing Intelligence

PI feature launched without a way to disable it in production. During Day 22 rollback drill, there was no off switch. Required an emergency remediation PR.

How we fixed it

If You Can't Turn It Off, You Can't Turn It On

Rollback capability is now a pre-deployment gate. Every new feature must have a kill-switch before it ships. Not after. Not "we'll add it later."

Platform Lock-In — Web Speed Blocked Native Portability
What happened

40 Days of UX Trapped in One Platform

We built web-first with AI and shipped fast -- 40+ days of branding, design tokens, component patterns, and UX flows. When native iOS arrived, none of it was portable. Colors, spacing, typography, interaction models -- all lived in React/CSS with no platform-agnostic layer. We started from scratch on the native app's visual identity, and the parity rubric ("make iOS look like web") even caused a regression where Codex replaced native iOS glass controls with custom web-like chrome before we caught it in a device build.

How we're fixing it now

Brand Rubric + Single-Source-of-Truth API

Brand template + parity rubric: Parity applies to content, flow, and acceptance criteria -- not chrome. Native iOS keeps native iOS controls (tab bars, sheets, glass toolbars) unless a ticket explicitly accepts a custom shell. Brand tokens (palette, type scale, logotype) live in a shared design doc that both platforms implement in their own idiomatic way.

SoT API as the consistency layer: Both the responsive web app and the native Swift app call the same Deadwax endpoints (/api/collection, /api/pressing-intelligence, /api/curation/*, /api/mastering-engineers). The API is the single source of truth -- pressing metadata, mastering-engineer tiers, curation lists, Pressing Intelligence payloads all come from one place. Changing the truth once updates both surfaces, so "platform drift" becomes a UI-shell problem, not a data problem.

"Our retro cadence broke for 15 days. During those 15 days, the same mistakes repeated. The retro is the product."

-- The meta-lesson: run post-mortems religiously

Product Roadmap

Outcome-driven. Each item describes the result for collectors, not just the feature.

Shipped Mar-May 2026
Website Launch
Responsive web app live at deadwax.io. Sign in with Discogs, browse your collection on a phone or desktop -- no setup, no import.
ProductEngineering
Pressing Intelligence
One panel: original pressing, top-rated, audiophile editions, mastering engineers. Replaces 10 browser tabs.
Product
Top 25 Essential Albums
Curated pressing intelligence for 25 albums every collector knows. Hand-verified with domain expert Chris.
Product
Self-Healing Feedback Loop
User-reported bugs triaged, fixed, tested, and deployed automatically -- under 90 minutes, every hour.
Engineering
Native iOS App (SwiftUI)
iPhone + iPad native app built in SwiftUI against the Deadwax API. Same collection, same Pressing Intelligence, native feel. Live on the App Store.
EngineeringProduct
Brand Rubric + SoT API Parity
Brand template and parity rubric keep web and iOS consistent on content, flow, and brand -- the shared Deadwax API keeps data identical across surfaces. Native chrome is the default; custom shells require an explicit ticket.
DesignEngineering
Building Now May 2026
Native Android App (Compose)
Kotlin + Jetpack Compose app built against the same Deadwax API. Material 3 chrome, hand-tuned dark palette, parity with iOS on content and flow. In active feedback burn-down ahead of Play Store submission.
EngineeringProduct
Color Themes & Polish
Selectable color themes across surfaces plus a steady cadence of feedback-driven UX improvements -- the small things collectors notice every session.
DesignProduct
Public Launch in Communities
Steve Hoffman Forums, r/vinyl, r/audiophile. Community-first -- the people who feel the pain most discover us first.
Growth
Exploring Sep 2026+
Offline Collection in the Record Store
The native iOS app lets collectors browse their collection in the record store with no signal -- cached pressings, offline Pressing Intelligence, deferred sync on reconnect.
Engineering
Full-Catalog Intelligence
Pressing recommendations for any album on Discogs, not just the curated set.
ProductEngineering
Sustainable Revenue
Deadwax stays free. Optional web support remains separate from the collector workflow.
Support

The future of product development is
human judgment + AI execution

Five humans and eleven AI agents, building a real product -- responsive web for desktop and mobile browsers plus a native App Store app -- with real users, real tests, and real accountability. Not a demo. A product.

Takeaway 1

AI Agents Are a Team, Not a Feature

The value isn't "AI wrote code." It's that AI can be organized with the same roles, process, and accountability as humans.

Takeaway 2

Process Is the Multiplier

Worktrees, PR reviews, CI gates, automated testing. AI without process produces chaos. AI with process produces products.

Takeaway 3

Humans Stay in the Loop

The CEO, the domain expert, the visual artist, the community manager. AI amplifies human judgment -- it doesn't replace it.

Takeaway 4

Speed Without Architecture Creates Platform Lock-In

We built web-first, mobile-first, and shipped fast with AI. But when native iOS arrived, our branding, design tokens, and UX patterns were trapped in React/CSS. Moving fast amplifies blind spots -- abstract the portable parts early, or your progress becomes a single-platform cage.

deadwax.io

Built by hobbyists, collectors, and AI-assisted workflows • Designed for vinyl collectors

Read the Full Technical White Paper → Visit Deadwax.io →