Verification-driven tooling prompts for fast-moving codebases

Introduction

As software projects mature, internal tooling becomes critical for maintainability.

By that point, the codebase has usually been in flight for months: multiple languages, real users, real deadlines, and the usual mix of conventions and shortcuts that accumulate when the priority is shipping. This is also the moment when teams stop trying to redesign the system and start trying to keep a fast-moving codebase understandable.

On a recent project, I helped a team build a dead-code detection script using a GenAI research agent. The capability we wanted is simple to describe and hard to do well: “one command that helps find and remove dead code safely.”

This post focuses on the research phase of that work—discovering facts, constraints, and the current state of a repository before planning or implementing anything. Most of what follows is guidance for shaping research agent output so it is verifiable and complete.

The problem

GenAI can accelerate the wrong thing.

It can assemble a plausible-looking tooling script quickly, but a plausible script is not the same as a correct one.

What went wrong during research

The research phase identified detection use cases, but they were not fully fleshed out. The research agent recommended tools and commands, but did not address validation.

The gaps surfaced later. The working definition of “dead code” was incomplete—it missed unused private members as a detection class.

What we actually used

The project relied on static analysis tools only. Runtime signals (like code coverage) were not included.

The static analysis tools used were:

ruff – Static linter that parses Python AST to find unused imports/variables
vulture – Static analyzer that finds unreferenced definitions by parsing code structure
biome – Static linter that checks TypeScript/JavaScript for unused imports via AST analysis
knip – Static analyzer that traces import/export graphs to find unused modules

What was missed

Biome already covered the primary linting layer for JavaScript/TypeScript, so the bigger gap was not simply the absence of ESLint. Unused private members were not detected because the selected tools were AST-based linters and graph analyzers, but lacked a full type-aware semantic pass.

The script was never updated to fix these gaps.

What we should have done: start with a verification checklist

Dead-code detection is a well-understood problem in many ecosystems. Mature tooling exists for most major languages.

In a codebase that has been in flight for a while, teams are not building a new detector from scratch—they are relying on existing known ecosystem tools.

That changes what a research agent should be asked for. Instead of asking an agent to design a brand-new tool, ask it to recommend the best existing tools that meet a verification checklist.

Using a verification checklist

The research agent produces a verification checklist based on detection use cases from your codebase. The checklist formalizes those use cases into testable criteria.

The process has four parts:

Part 0 gather detection use cases—examples of dead code in your codebase.
Part 1 define exclusions—what should not be flagged.
Part 2 define detection classes—categories of dead code to catch.
Part 3 create verification cases—proof that each detection class is covered.

Part 0: Gather detection use cases

Before the agent can produce a verification checklist, you need to identify concrete examples from your codebase—both what is dead code and what shouldn’t be flagged.

You can gather these use cases in two ways:

Manual inspection: Review the codebase and document examples.

Agent-assisted discovery: Ask an agent to analyze the codebase and identify examples.

The goal is not to find all dead code yet—that’s what the final script will do. The goal is to gather representative examples that span different detection categories and exclusion patterns. These examples become the input for Parts 1-3.

Part 1: Define exclusions

The agent should identify what is explicitly not dead code in your environment based on the codebase context you provide.

This is the part an agent will likely skip, and it is also the part that prevents tooling from turning into a false-positive generator.

The agent’s exclusions list becomes a contract between your team and the tooling.

Common exclusions the agent should identify:

Generated code and vendored artifacts
Feature-flagged code that can be enabled in production
Test-only helpers and fixtures
Dependency injection / service registration: a type name or assembly is listed in config, and the container loads it at startup
Plugin registries and dynamic loading: a type is referenced only via a string in a registry or factory pattern

Part 2: Define detection classes

Based on the use cases you provide, the agent should formalize them into detection classes—categories of dead code to catch, not specific tools.

The agent should identify which of these classes apply to your codebase. A practical starting set:

Semantic unused members
Unused exports and orphan files
Unused dependencies
Unused imports and locals
Dynamic and reflective usage

Part 3: Create verification cases

For each detection class the agent identified, the agent should produce a verification case.

Each verification case has at least three pieces:

A minimal code example that should be flagged
The exact command that should flag it
A clear expected result

These are not unit tests for your product. They are capability tests for your tooling.

If the agent cannot produce a verification case for a detection class, it cannot prove that recommended tools will cover it.

Verification checklist output format

The agent’s verification checklist can follow this format for example:

Detection class	Verification example	Command to run
Semantic unused members	Add an unused private method	Run semantic/type-aware check
Unused exports and orphan files	Export a symbol that is never imported, or create an orphan file	Run graph-based analysis
Unused dependencies	Add a dependency that is never imported	Run dependency analysis
Unused imports and locals	Add an unused import and unused local	Run linter check
Dynamic and reflective usage	Add a dynamically referenced symbol (e.g., plugin registry)	Run full script and verify not flagged

The agent should also produce an exclusions table focused on what should not be flagged:

Exclusion class	Verification example	Command to run
Generated code	A generated folder that should be ignored	Run full script and verify no findings
Plugin registry	A symbol loaded via a registry	Run full script and verify not flagged

A note on inherited prompts and instruction drift

Research agents often come with baseline instructions that define how they behave—whether they implement code, what tools they can run, how they format output.

When you write a task-specific prompt, you’re layering your instructions on top of those baseline instructions.

If your prompt contradicts or doesn’t account for what the agent already knows, the agent may interpret your request in unexpected ways.

In the dead-code detection work, the research agent had baseline instructions that said “do not implement.” My kickoff prompt was specific to the use case—analyze dead code detection tools and recommend approaches. Those instructions were compatible: research without implementation.

But if I had written a prompt that assumed the agent would write and test code, or if the baseline instructions had conflicting scopes (like “only touch Python files” vs. “analyze TypeScript”), the result would have been incomplete or incorrect.

Treat this as part of verification work:

Know what baseline instructions your research agent inherits
Write task-specific prompts that complement, not contradict, those baseline instructions

Closing thought

The biggest win is not a specific tool. It is a repeatable process you can trust.

When code production accelerates, verification becomes a limiting factor. A verification checklist turns tooling prompts from a guess into a contract.

Attribution

Featured image was created by the author for this post. No third-party images, logos, or screenshots are used.

Verification-driven tooling prompts for fast-moving codebases

Introduction

The problem