Research Driven Development: Engineering AI-Assisted Software for Capital Markets

AI coding tools have changed the economics of software engineering. Tools like Claude Code and OpenAI Codex CLI can generate thousands of lines of working code in minutes. That capability is remarkable, but it shifts the bottleneck. Producing code is no longer the hard part. Knowing what code to produce (and whether it belongs in your system at all) is.

At Ironlight, we have been adapting our engineering process around this shift. The methodology we have found most effective is one we call Research Driven Development, a term that others in the industry have also begun using to describe the move from "build first" to "research first."

Why the Stakes Are Higher in Capital Markets

Most software teams can tolerate a bad deploy. You roll back, fix the bug, ship again. When your platform settles regulated securities on-chain, the calculus is different. A subtle defect in settlement logic is a compliance event. A flaky dependency under load is operational risk. That is the bar when your platform settles regulated securities.

This is the environment in which AI-generated code becomes genuinely dangerous if left undirected. An LLM can produce a sophisticated solution to a localized problem while simultaneously introducing a systemic vulnerability that remains invisible until the system is under real load. The volume of code AI can produce is an asset only if there is a disciplined process to validate it. Without that process, it is liability.

The Methodology: Research Driven Development

Research Driven Development (RDD) is a methodology built around a simple observation: when you are solving problems where the right answer is not known in advance, research is more valuable than planning. Instead of writing a detailed specification and then implementing it, you define a goal, design an experiment, record the results, and use what you learned to design the next experiment.

The methodology has four pillars:

Goal, not plan. Define what you need to achieve without presuming you know how to achieve it. Plans assume knowledge you may not have. Goals give direction without constraining the solution space.
Experiments, not tasks. Break work into sequential experiments with tight feedback loops. Each experiment produces knowledge, whether it succeeds or fails.
Immutable documentation. Record everything: the research, the experiments, the results, the reasoning. Never revise history. This creates an engineering record that functions like a lab notebook.
AI as collaborator. AI tools participate across all phases: researching options, generating experimental code, reviewing results, and documenting findings. This compresses cycle times without sacrificing depth.

RDD is not a replacement for established practices like test-driven development or code review. It is a layer above them: a framework for deciding what to build before you build it.

What This Looks Like in Practice

Theory is only useful if it holds up under real engineering constraints. Here are three examples from Ironlight's recent work where RDD shaped the outcome.

Selecting a database migration tool. When we needed to add versioned migrations to our schema management, we did not adopt the most popular tool and move on. We evaluated seven candidates across ten structured phases. Each phase produced documented findings: compatibility with our MySQL trigger syntax, behavior under schema drift, CLI ergonomics, CI integration characteristics. The field narrowed from seven to three to one. The tool we selected, Atlas, was chosen not because it was the obvious default, but because ten phases of evidence demonstrated it was the right fit for our specific constraints.

Building a deterministic FIX testing framework. Our matching engine processes orders over the FIX protocol, and we needed a testing framework that could verify matching behavior with full determinism. Through a series of structured experiments, we built approximately 98,500 lines of testing infrastructure: seeded test generators that produce repeatable client sessions, weighted playbooks for randomized-but-reproducible action sequences, and snapshot-based regression detection. Each experiment validated a specific layer of the framework before the next was built on top of it. The result is a CI pipeline that can verify every production code path against baseline snapshots, catching regressions automatically before they reach deployment.

Migrating 155 components from Svelte 4 to Svelte 5. Svelte 5 is not backwards compatible with Svelte 4. The Svelte team provides an automated migration tool, but it only handled 129 of our 155 components, leaving 26 that required manual intervention with no clear path forward. This is exactly the kind of problem RDD is designed for: a known goal, an unknown solution. The first experiment assessed feasibility and identified the gap. Subsequent experiments tackled the remaining components, resolved dependency conflicts, validated the async compiler, and exercised critical user flows end to end. Each phase was documented, each decision recorded. The migration shipped cleanly because every edge case had been identified and addressed in sequence, not discovered in production.

The Tools That Make It Practical

RDD as a methodology predates modern AI tools, but those tools are what make it practical at the pace capital markets engineering demands. Claude Code and OpenAI Codex CLI are not autocomplete engines. They are research infrastructure.

A tool like Claude Code can read an entire codebase, understand the relationships between components, and produce structured analysis of a specific question ("What are the implications of replacing this dependency?" or "Where does this schema diverge from the ORM model?") in minutes. It can generate experimental code, run it against the existing test suite, and report the results. It can review a pull request not just for style violations but for semantic issues: authentication gaps, race conditions, arithmetic that will overflow under production data volumes.

This compresses the RDD cycle. An experiment that would take a day of manual investigation (reading documentation, tracing code paths, testing edge cases) can often be completed in a focused session of minutes. The rigor is the same. The speed is different by an order of magnitude.

Critically, the AI does not make the decisions. It produces the research that informs them. The engineer still defines the goal, designs the experiment, evaluates the findings, and decides what to do next. The tool handles the volume; the engineer provides the judgment.

Every Decision Has a Paper Trail

This is where RDD pays its deepest dividend, and it is one that matters especially in regulated markets.

When we evaluated seven migration tools over ten phases, we did not just select Atlas. We produced a permanent record of why the other six were eliminated, each with specific, documented reasons tied to our requirements. No engineer will repeat that evaluation. No future architect will wonder why we did not choose sqlx or golang-migrate. The reasoning is there, in writing, permanently.

When we built our FIX testing framework, every layer of the design was documented as it was validated: why we chose seeded generation over random fuzzing, how we achieved deterministic snapshot comparisons, what trade-offs we made in the playbook weighting. An engineer onboarding onto the ATS codebase can read the experiment log and understand not just how the testing infrastructure works, but why it was built that way.

When we migrated to Svelte 5, the phased experiment record captured every decision: which components the automated tool could handle, which required manual intervention and why, what dependency conflicts emerged and how they were resolved. If a question arises about a specific migration choice six months from now, the answer is not buried in a commit message or lost in someone's memory. It is in the experiment log, with context.

The best findings do not just stay in experiment logs. They get operationalized. We distill critical lessons into AGENTS.md files that live alongside the code they govern. When our migration tool evaluation concluded that Atlas migration files should never be edited by hand, only generated via atlas migrate diff, that rule went into an AGENTS.md file. Now every AI-assisted session in that part of the codebase follows it automatically. The research produced a guardrail, and the guardrail is enforced without anyone having to remember it.

We maintain these experiment logs in a dedicated, version-controlled repository linked to the main codebase. Every issue, every experiment, every finding is structured, searchable, and permanent. For an ATS operating under Regulation SCI, where the SEC requires documented policies for system changes and development practices, this kind of engineering record is not just good practice. It is a compliance asset.

This is what RDD produces beyond working software: auditable engineering judgment. Not just what we built, but why we built it that way, what we considered and rejected, and what evidence informed each decision. For infrastructure that handles regulated financial instruments, this is not a nice-to-have. It is the standard that clients, partners, and regulators should expect. And it is the standard we hold ourselves to.

AI Makes Precision Affordable

There is a reason most software teams do not do this level of research before writing code. Historically, it was too expensive. Evaluating seven tools when you could just pick the popular one and start building? That is a luxury most teams cannot justify when they are shipping against a deadline.

AI changes that equation. When a tool like Claude Code can evaluate a dependency in an afternoon that would take a week of manual research, thoroughness stops being a luxury and becomes the default. When rewriting an approach that turns out to be suboptimal takes hours instead of days, the emotional and economic cost of being wrong on the first try drops to near zero. You stop clinging to sunk costs and start optimizing for correctness.

This is perhaps the least obvious benefit of AI in engineering: it does not just make teams faster at building. It makes teams faster at thinking. And when thinking is fast, you can afford to do more of it before you commit to a line of code.

Engineering for the Long Term

The teams that will struggle with AI are not the ones writing less code. They are the ones writing code without knowing why. Generating output has never been easier. Generating directed output (code that belongs in the system, that addresses a validated need, that has been evaluated against alternatives) still requires engineering discipline.

Research Driven Development is one approach to providing that discipline. It treats the research phase not as a preliminary step before the real work begins, but as the most valuable part of the engineering process. It uses AI to make that research rigorous and fast. And it produces, as a byproduct, the kind of documented, auditable engineering record that capital markets infrastructure demands.

We use these tools every day. They make us more thorough, more precise, and more deliberate. The code we ship is better because we spent more time understanding the problem before we wrote a single line. That is what Research Driven Development is about, and it is how we intend to keep building.