Introduction
As software projects mature, internal tooling becomes critical for maintainability.
By that point, the codebase has usually been in flight for months: multiple languages, real users, real deadlines, and the usual mix of conventions and shortcuts that accumulate when the priority is shipping. This is also the moment when teams stop trying to redesign the system and start trying to keep a fast-moving codebase understandable.
On a recent project, I helped a team build a dead-code detection script using a GenAI research agent. The capability we wanted is simple to describe and hard to do well: “one command that helps find and remove dead code safely.”
This post focuses on the research phase of that work—discovering facts, constraints, and the current state of a repository before planning or implementing anything. Most of what follows is guidance for shaping research agent output so it is verifiable and complete.
The problem
GenAI can accelerate the wrong thing.
It can assemble a plausible-looking tooling script quickly, but a plausible script is not the same as a correct one.
What went wrong during research
The research phase identified detection use cases, but they were not fully fleshed out. The research agent recommended tools and commands, but did not address validation.
The gaps surfaced later. The working definition of “dead code” was incomplete—it missed unused private members as a detection class.
What we actually used
The project relied on static analysis tools only. Runtime signals (like code coverage) were not included.
The static analysis tools used were:
- ruff – Static linter that parses Python AST to find unused imports/variables
- vulture – Static analyzer that finds unreferenced definitions by parsing code structure
- biome – Static linter that checks TypeScript/JavaScript for unused imports via AST analysis
- knip – Static analyzer that traces import/export graphs to find unused modules
What was missed
Biome already covered the primary linting layer for JavaScript/TypeScript, so the bigger gap was not simply the absence of ESLint. Unused private members were not detected because the selected tools were AST-based linters and graph analyzers, but lacked a full type-aware semantic pass.
The script was never updated to fix these gaps.
What we should have done: start with a verification checklist
Dead-code detection is a well-understood problem in many ecosystems. Mature tooling exists for most major languages.
In a codebase that has been in flight for a while, teams are not building a new detector from scratch—they are relying on existing known ecosystem tools.
That changes what a research agent should be asked for. Instead of asking an agent to design a brand-new tool, ask it to recommend the best existing tools that meet a verification checklist.
Using a verification checklist
The research agent produces a verification checklist based on detection use cases from your codebase. The checklist formalizes those use cases into testable criteria.
The process has four parts:
- Part 0 gather detection use cases—examples of dead code in your codebase.
- Part 1 define exclusions—what should not be flagged.
- Part 2 define detection classes—categories of dead code to catch.
- Part 3 create verification cases—proof that each detection class is covered.
Part 0: Gather detection use cases
Before the agent can produce a verification checklist, you need to identify concrete examples from your codebase—both what is dead code and what shouldn’t be flagged.
You can gather these use cases in two ways:
Manual inspection: Review the codebase and document examples.
Agent-assisted discovery: Ask an agent to analyze the codebase and identify examples.
The goal is not to find all dead code yet—that’s what the final script will do. The goal is to gather representative examples that span different detection categories and exclusion patterns. These examples become the input for Parts 1-3.
Part 1: Define exclusions
The agent should identify what is explicitly not dead code in your environment based on the codebase context you provide.
This is the part an agent will likely skip, and it is also the part that prevents tooling from turning into a false-positive generator.
The agent’s exclusions list becomes a contract between your team and the tooling.
Common exclusions the agent should identify:
- Generated code and vendored artifacts
- Feature-flagged code that can be enabled in production
- Test-only helpers and fixtures
- Dependency injection / service registration: a type name or assembly is listed in config, and the container loads it at startup
- Plugin registries and dynamic loading: a type is referenced only via a string in a registry or factory pattern
Part 2: Define detection classes
Based on the use cases you provide, the agent should formalize them into detection classes—categories of dead code to catch, not specific tools.
The agent should identify which of these classes apply to your codebase. A practical starting set:
- Semantic unused members
- Unused exports and orphan files
- Unused dependencies
- Unused imports and locals
- Dynamic and reflective usage
Part 3: Create verification cases
For each detection class the agent identified, the agent should produce a verification case.
Each verification case has at least three pieces:
- A minimal code example that should be flagged
- The exact command that should flag it
- A clear expected result
These are not unit tests for your product. They are capability tests for your tooling.
If the agent cannot produce a verification case for a detection class, it cannot prove that recommended tools will cover it.
Verification checklist output format
The agent’s verification checklist can follow this format for example:
| Detection class | Verification example | Command to run |
|---|---|---|
| Semantic unused members | Add an unused private method | Run semantic/type-aware check |
| Unused exports and orphan files | Export a symbol that is never imported, or create an orphan file | Run graph-based analysis |
| Unused dependencies | Add a dependency that is never imported | Run dependency analysis |
| Unused imports and locals | Add an unused import and unused local | Run linter check |
| Dynamic and reflective usage | Add a dynamically referenced symbol (e.g., plugin registry) | Run full script and verify not flagged |
The agent should also produce an exclusions table focused on what should not be flagged:
| Exclusion class | Verification example | Command to run |
|---|---|---|
| Generated code | A generated folder that should be ignored | Run full script and verify no findings |
| Plugin registry | A symbol loaded via a registry | Run full script and verify not flagged |
A note on inherited prompts and instruction drift
Research agents often come with baseline instructions that define how they behave—whether they implement code, what tools they can run, how they format output.
When you write a task-specific prompt, you’re layering your instructions on top of those baseline instructions.
If your prompt contradicts or doesn’t account for what the agent already knows, the agent may interpret your request in unexpected ways.
In the dead-code detection work, the research agent had baseline instructions that said “do not implement.” My kickoff prompt was specific to the use case—analyze dead code detection tools and recommend approaches. Those instructions were compatible: research without implementation.
But if I had written a prompt that assumed the agent would write and test code, or if the baseline instructions had conflicting scopes (like “only touch Python files” vs. “analyze TypeScript”), the result would have been incomplete or incorrect.
Treat this as part of verification work:
- Know what baseline instructions your research agent inherits
- Write task-specific prompts that complement, not contradict, those baseline instructions
Closing thought
The biggest win is not a specific tool. It is a repeatable process you can trust.
When code production accelerates, verification becomes a limiting factor. A verification checklist turns tooling prompts from a guess into a contract.
Attribution
Featured image was created by the author for this post. No third-party images, logos, or screenshots are used.