{"id":2085,"date":"2026-04-01T04:24:38","date_gmt":"2026-04-01T04:24:38","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/all-things-azure\/?p=2085"},"modified":"2026-04-01T04:30:05","modified_gmt":"2026-04-01T04:30:05","slug":"project-nighthawk-a-research-agent-built-for-field-engineering","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/all-things-azure\/project-nighthawk-a-research-agent-built-for-field-engineering\/","title":{"rendered":"Project Nighthawk: A Research Agent Built for Field Engineering"},"content":{"rendered":"<p align=\"center\"><a href=\"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-content\/uploads\/sites\/83\/2026\/04\/nighthawk-cover-small.webp\"><img decoding=\"async\" class=\"aligncenter size-medium wp-image-2095\" src=\"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-content\/uploads\/sites\/83\/2026\/04\/nighthawk-cover-small-300x289.webp\" alt=\"nighthawk cover small image\" width=\"300\" height=\"289\" srcset=\"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-content\/uploads\/sites\/83\/2026\/04\/nighthawk-cover-small-300x289.webp 300w, https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-content\/uploads\/sites\/83\/2026\/04\/nighthawk-cover-small-768x740.webp 768w, https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-content\/uploads\/sites\/83\/2026\/04\/nighthawk-cover-small-24x24.webp 24w, https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-content\/uploads\/sites\/83\/2026\/04\/nighthawk-cover-small.webp 1023w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p style=\"text-align: left;\" align=\"center\">If you work in field engineering, you know the scenario. A customer is deploying AKS in a regulated environment. They hit an issue during node bootstrapping. They want to know exactly what happens when a node joins the cluster, which components run in which order, and whether the behaviour they&#8217;re seeing is expected. The question sounds simple. The answer is not.<\/p>\n<p class=\"code-line\" dir=\"auto\" style=\"text-align: left;\" data-line=\"12\">The answer is spread across half a dozen places at once. It&#8217;s in the source code: AgentBaker, the node controller, cloud-provider-azure. It&#8217;s in a Microsoft Learn article that&#8217;s technically correct but three levels of abstraction above what actually runs on the node. It&#8217;s in the release notes buried in a changelog. It&#8217;s in the institutional knowledge of a teammate who worked on that feature and may or may not be on Teams right now. Assembling a reliable answer means pulling all of that together, reconciling it, and communicating it clearly, ideally in writing that someone else can use later.<\/p>\n<p class=\"code-line\" dir=\"auto\" data-line=\"14\">That&#8217;s the job. And it doesn&#8217;t scale.<\/p>\n<p class=\"code-line\" dir=\"auto\" data-line=\"16\">The knowledge required to do field engineering well grows faster than any individual can absorb it. Services change. Networking models evolve. Identity patterns that were best practice eighteen months ago have been superseded. The expertise exists in the ecosystem, in repos, in docs, in release notes, in the people who built the thing, but the cost of retrieving it, correlating it, and turning it into something actionable is high. And when you do find the answer, it usually lives in your head or a Teams thread that ages out in a week.<\/p>\n<p class=\"code-line\" dir=\"auto\" data-line=\"18\">Project Nighthawk is built to close that gap. Ray Kao and I built it specifically for our work as Global Black Belts: the AKS and ARO questions we field every week require a level of depth and precision that general-purpose AI assistants consistently fall short on.<\/p>\n<h2 id=\"what-nighthawk-is\" class=\"code-line\" dir=\"auto\" data-line=\"20\">What Nighthawk Is<\/h2>\n<p dir=\"auto\" data-line=\"22\"><iframe title=\"Project Nighthawk: AI Research Agents That Make You a Better Azure Solution Engineer\" src=\"\/\/www.youtube.com\/embed\/EOAF84FzJlc\" width=\"1900\" height=\"771\" frameborder=\"0\" allowfullscreen=\"allowfullscreen\" data-mce-fragment=\"1\"><\/iframe>Nighthawk is a multi-agent research system built directly inside VS Code with GitHub Copilot. The core idea is simple: field expertise is not just about knowing things, it&#8217;s about being able to retrieve, verify, and communicate the right things quickly. Nighthawk handles the retrieval and verification so you can focus on the communication and the judgment.<\/p>\n<p class=\"code-line\" dir=\"auto\" data-line=\"24\">You ask it a technical question about AKS or Azure Red Hat OpenShift, and it produces a fact-checked, source-cited technical report in markdown. Not a summary of what a language model remembers from training data. An actual investigation: source code read, official documentation consulted, claims verified, findings written up. The kind of report a senior engineer would produce after two hours of focused research, delivered in a fraction of the time.<\/p>\n<p class=\"code-line\" dir=\"auto\" data-line=\"26\">The entry point is intentionally simple:<\/p>\n<pre><code class=\"code-line\" dir=\"auto\" data-line=\"28\">\/Nighthawk How does AKS implement KMS encryption with customer-managed keys?\r\n<\/code><\/pre>\n<p class=\"code-line\" dir=\"auto\" data-line=\"32\">Behind that single command is a six-agent pipeline that classifies the question, researches it against live source code and official documentation, synthesizes findings into a structured report, and validates every claim before it lands in the\u00a0<code>notes\/<\/code>\u00a0directory.<\/p>\n<h2 id=\"the-problem-with-asking-ai-to-research\" class=\"code-line\" dir=\"auto\" data-line=\"34\">The Problem with Asking AI to Research<\/h2>\n<p class=\"code-line\" dir=\"auto\" data-line=\"36\">Asking a language model directly runs into a predictable problem. LLMs are excellent at synthesizing patterns from training data, but Azure is a moving target. Source code changes. Features ship. Behaviors differ between versions. A model trained six months ago may confidently describe a code path that was refactored in the last release. For general background knowledge, this is usually fine. For the kind of precise, version-specific, behaviorally accurate answers that field engineering demands, it is not.<\/p>\n<p class=\"code-line\" dir=\"auto\" data-line=\"38\">The solution is grounding. Nighthawk doesn&#8217;t ask the model to recall what it knows about AKS kubelet bootstrapping. It directs researcher agents to search the locally cloned AgentBaker repository, read the relevant code, cross-reference the Microsoft Learn documentation, and report what they actually find. Source code is one input. Official docs are another. Release notes and changelogs are another. The researcher correlates all of them and surfaces conflicts when they exist. That&#8217;s closer to how a good engineer actually investigates a problem, and it produces answers that hold up when a customer asks a follow-up.<\/p>\n<p class=\"code-line\" dir=\"auto\" data-line=\"40\">This required a specific architectural choice: Nighthawk researchers operate against locally cloned repositories. Before a research session, you run a one-time setup:<\/p>\n<pre><code class=\"code-line language-bash\" dir=\"auto\" data-line=\"42\">git <span class=\"hljs-built_in\">clone<\/span> --depth=1 https:\/\/github.com\/Azure\/AgentBaker.git repos\/AgentBaker\r\ngit <span class=\"hljs-built_in\">clone<\/span> --depth=1 https:\/\/github.com\/Azure\/AKS.git repos\/AKS\r\ngit <span class=\"hljs-built_in\">clone<\/span> --depth=1 https:\/\/github.com\/kubernetes-sigs\/cloud-provider-azure.git repos\/cloud-provider-azure\r\n<\/code><\/pre>\n<p class=\"code-line\" dir=\"auto\" data-line=\"48\">Before each research run, the researcher agent pulls the latest:<\/p>\n<pre><code class=\"code-line language-bash\" dir=\"auto\" data-line=\"50\">git -C repos\/AgentBaker pull --ff-only\r\n<\/code><\/pre>\n<p class=\"code-line\" dir=\"auto\" data-line=\"54\">The model is now working against the actual current state of the codebase, not a memory of it.<\/p>\n<h2 id=\"six-agents-one-pipeline\" class=\"code-line\" dir=\"auto\" data-line=\"56\">Six Agents, One Pipeline<\/h2>\n<p class=\"code-line\" dir=\"auto\" data-line=\"58\">Nighthawk implements the\u00a0<strong>Agent Handoff Pattern<\/strong>\u00a0as described in the\u00a0<a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/architecture\/ai-ml\/guide\/ai-agent-design-patterns#agent-handoff-pattern-example\" data-href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/architecture\/ai-ml\/guide\/ai-agent-design-patterns#agent-handoff-pattern-example\">Azure Architecture Center AI Agent Design Patterns guide<\/a>: specialized agents complete distinct tasks and pass results to the next agent through well-defined contracts. No single agent tries to do everything; each one is scoped, and the quality of the final output depends on that separation.<\/p>\n<pre><code class=\"code-line\" dir=\"auto\" data-line=\"60\">\/Nighthawk question\r\n        |\r\n        v\r\n  [Orchestrator]   &lt;-- coordinates the workflow\r\n        |\r\n        v\r\n  [Classifier]     &lt;-- AKS or ARO? what question type?\r\n        |\r\n        v\r\n  [Researcher]     &lt;-- searches local repos + Microsoft Learn\r\n        |\r\n        v\r\n  [Synthesizer]    &lt;-- writes the structured report\r\n        |\r\n        v\r\n  [FactChecker]    &lt;-- validates every claim against sources\r\n        |\r\n        v\r\n  notes\/Nighthawk-&lt;date&gt;-&lt;topic&gt;.md\r\n<\/code><\/pre>\n<p class=\"code-line\" dir=\"auto\" data-line=\"82\"><strong>Orchestrator<\/strong>\u00a0coordinates the run. It reads the question, invokes the Classifier, routes to the right Researcher, hands off to the Synthesizer, and triggers the FactChecker. It&#8217;s the glue.<\/p>\n<p class=\"code-line\" dir=\"auto\" data-line=\"84\"><strong>Classifier<\/strong>\u00a0determines which service the question is about (AKS or ARO), what type of question it is (architecture, bug, guidance), and extracts keywords that researchers will use to focus their search. This matters because AKS and ARO have different source repos, different team structures, and very different implementation patterns.<\/p>\n<p class=\"code-line\" dir=\"auto\" data-line=\"86\"><strong>Researchers<\/strong>\u00a0(separate agents for AKS and ARO) do the heavy lifting. They read the Nighthawk-LocalRepos skill to understand exactly which repositories to search and in which order. They use\u00a0<code>grep_search<\/code>\u00a0to find relevant code,\u00a0<code>read_file<\/code>\u00a0to examine it in depth, and the Microsoft Learn MCP server to pull in official documentation. MCP (Model Context Protocol) is what gives the researcher agent structured, tool-mediated access to external knowledge sources without leaving the VS Code context. The output is structured research notes, not a report. That comes later.<\/p>\n<p class=\"code-line\" dir=\"auto\" data-line=\"88\"><strong>Synthesizer<\/strong>\u00a0takes the research notes and writes the actual report. It reads the Nighthawk-ReportTemplates skill before writing, which defines three report formats (architecture, bug, guidance) with specific sections: TL;DR, Technical Deep Dive, Key Findings, and References. Where a concept benefits from a visual &#8212; a flow, a component relationship, a decision tree &#8212; the Synthesizer generates a Mermaid diagram inline. The structure is consistent because the template is encoded, not left to model discretion.<\/p>\n<p class=\"code-line\" dir=\"auto\" data-line=\"90\"><strong>FactChecker<\/strong>\u00a0is the quality gate. It reads the finished report and validates each factual claim against the cited sources. Claims that can be verified get a checkmark. Claims that can&#8217;t get flagged. The summary includes a count of verified and unverified claims so the person sharing the report with a customer knows exactly where to look before they do.<\/p>\n<h2 id=\"what-a-report-looks-like\" class=\"code-line\" dir=\"auto\" data-line=\"92\">What a Report Looks Like<\/h2>\n<p class=\"code-line\" dir=\"auto\" data-line=\"94\">The output for a question like &#8220;What are the required permissions for Terraform-based AKS deployment?&#8221; looks like this:<\/p>\n<ul class=\"code-line\" dir=\"auto\" data-line=\"96\">\n<li class=\"code-line\" dir=\"auto\" data-line=\"96\"><strong>TL;DR<\/strong>: One or two sentences with the direct answer, no hedging<\/li>\n<li class=\"code-line\" dir=\"auto\" data-line=\"97\"><strong>Recommendations table<\/strong>: Specific roles mapped to scope and reason<\/li>\n<li class=\"code-line\" dir=\"auto\" data-line=\"98\"><strong>Terraform examples<\/strong>: Real HCL with working configuration patterns<\/li>\n<li class=\"code-line\" dir=\"auto\" data-line=\"99\"><strong>Feature-specific guidance<\/strong>: What changes when you add BYO VNet, private DNS, ACR, or workload identity<\/li>\n<li class=\"code-line\" dir=\"auto\" data-line=\"100\"><strong>Mermaid diagrams<\/strong>: Architecture flows, component relationships, and decision trees rendered inline where they add clarity<\/li>\n<li class=\"code-line\" dir=\"auto\" data-line=\"101\"><strong>Reference table<\/strong>: Complete list of Microsoft Learn links and GitHub file paths used<\/li>\n<li class=\"code-line\" dir=\"auto\" data-line=\"102\"><strong>Fact-check summary<\/strong>: Claim counts and any flagged items for review<\/li>\n<\/ul>\n<p class=\"code-line\" dir=\"auto\" data-line=\"104\">You can see a full example in the\u00a0<a href=\"https:\/\/file+.vscode-resource.vscode-cdn.net\/Volumes\/Extreme%20SSD\/src\/project-nighthawk\/notes\/Nighthawk-2026-03-31-AKS-Terraform-Permissions.md\" data-href=\"..\/notes\/Nighthawk-2026-03-31-AKS-Terraform-Permissions.md\">notes\/ directory<\/a>. That report was generated by Nighthawk in a single run from the\u00a0<code>\/Nighthawk<\/code>\u00a0command.<\/p>\n<h2 id=\"skills-encoding-expertise-as-reusable-instructions\" class=\"code-line\" dir=\"auto\" data-line=\"106\">Skills: Encoding Expertise as Reusable Instructions<\/h2>\n<p class=\"code-line\" dir=\"auto\" data-line=\"108\">One of the more interesting design choices in Nighthawk is the use of VS Code agent skills to codify workflow knowledge. Skills are markdown files that agents read at the start of a run to understand their operating context.<\/p>\n<p class=\"code-line\" dir=\"auto\" data-line=\"110\">The\u00a0<code>Nighthawk-LocalRepos<\/code>\u00a0skill tells every researcher agent exactly which repositories exist locally, what each one is for, and why running\u00a0<code>git pull<\/code>\u00a0before research is mandatory. Agents don&#8217;t need this information in their system prompt; they load it on demand, which keeps the core agent definitions focused.<\/p>\n<p class=\"code-line\" dir=\"auto\" data-line=\"112\">The\u00a0<code>Nighthawk-ReportTemplates<\/code>\u00a0skill gives the Synthesizer the exact structure for each report type, writing guidelines, and Mermaid diagram conventions. The result is consistent report structure across every research run regardless of which question was asked.<\/p>\n<p class=\"code-line\" dir=\"auto\" data-line=\"114\">This pattern generalizes well beyond Nighthawk. Skills are a clean way to separate durable domain knowledge from the agent definition itself. The agent knows how to reason; the skill tells it what to reason about in this specific context.<\/p>\n<h2 id=\"what-nighthawk-is-not\" class=\"code-line\" dir=\"auto\" data-line=\"116\">What Nighthawk Is Not<\/h2>\n<p class=\"code-line\" dir=\"auto\" data-line=\"118\">Nighthawk is not a general-purpose AI assistant for Azure. It&#8217;s a research pipeline designed for a specific use case: a field engineer needs a deep technical report on a narrow AKS or ARO topic, and they need it grounded in verifiable sources. It doesn&#8217;t replace the judgment that comes from years of working with the platform; it gives that judgment better raw material to work with.<\/p>\n<p class=\"code-line\" dir=\"auto\" data-line=\"120\">It doesn&#8217;t browse the web. It doesn&#8217;t query live Azure APIs. And it&#8217;s deliberately scoped to AKS and ARO because building a quality system for a focused domain is more useful than building a mediocre one for everything. Field expertise is domain-specific, and so is Nighthawk.<\/p>\n<p class=\"code-line\" dir=\"auto\" data-line=\"122\">Adding support for a new Azure service means creating a new researcher agent following the existing pattern. The architecture is designed for that kind of extension; each researcher is isolated and follows the same research contract.<\/p>\n<h2 id=\"getting-started\" class=\"code-line\" dir=\"auto\" data-line=\"124\">Getting Started<\/h2>\n<p class=\"code-line\" dir=\"auto\" data-line=\"126\">If you have VS Code with GitHub Copilot and access to the repository, you&#8217;re ready. Clone the repos once, enable the required tools in the VS Code chat panel, and run:<\/p>\n<pre><code class=\"code-line\" dir=\"auto\" data-line=\"128\">\/Nighthawk What are the networking options for ARO private clusters?\r\n<\/code><\/pre>\n<p class=\"code-line\" dir=\"auto\" data-line=\"132\">The full setup guide is in\u00a0<a href=\"https:\/\/file+.vscode-resource.vscode-cdn.net\/Volumes\/Extreme%20SSD\/src\/project-nighthawk\/USAGE.md\" data-href=\"..\/USAGE.md\">USAGE.md<\/a>.<\/p>\n<h2 id=\"a-note-on-the-architecture-decision\" class=\"code-line\" dir=\"auto\" data-line=\"134\">A Note on the Architecture Decision<\/h2>\n<p class=\"code-line\" dir=\"auto\" data-line=\"136\">We wrote up the rationale for first-principles design choices in\u00a0<a href=\"https:\/\/file+.vscode-resource.vscode-cdn.net\/Volumes\/Extreme%20SSD\/src\/project-nighthawk\/ARCHITECTURE-DECISION-FRAMEWORK.md\" data-href=\"..\/ARCHITECTURE-DECISION-FRAMEWORK.md\">ARCHITECTURE-DECISION-FRAMEWORK.md<\/a>. If you&#8217;re building your own multi-agent system and want to understand why we made specific tradeoffs (why local repos over MCP, why six specialized agents over one general one, why separate Synthesizer and FactChecker stages), that&#8217;s where to look.<\/p>\n<p class=\"code-line\" dir=\"auto\" data-line=\"138\">The short version: quality comes from constraints. A researcher that can only search a defined set of repos produces more accurate output than one that can search anywhere. A FactChecker that runs after synthesis catches more errors than one baked into the synthesis stage. Separation of concerns applies to agents as much as it does to software.<\/p>\n<hr class=\"code-line\" dir=\"auto\" data-line=\"140\" \/>\n<p class=\"code-line\" dir=\"auto\" data-line=\"142\">The repository is at\u00a0<a href=\"https:\/\/github.com\/microsoftgbb\/project-nighthawk\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/github.com\/your-org\/project-nighthawk\">https:\/\/github.com\/microsoftgbb\/project-nighthawk<\/a>. Clone it, run a question, and see what comes back. If you work with AKS or ARO customers regularly, the time savings become obvious fast.<\/p>\n<hr class=\"code-line\" dir=\"auto\" data-line=\"144\" \/>\n<p class=\"code-line\" dir=\"auto\" data-line=\"146\">\n","protected":false},"excerpt":{"rendered":"<p>If you work in field engineering, you know the scenario. A customer is deploying AKS in a regulated environment. They hit an issue during node bootstrapping. They want to know exactly what happens when a node joins the cluster, which components run in which order, and whether the behaviour they&#8217;re seeing is expected. The question [&hellip;]<\/p>\n","protected":false},"author":172655,"featured_media":2095,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[35,1,20,19,134,90],"tags":[],"class_list":["post-2085","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-agents","category-azure","category-developer-productivity","category-github-copilot","category-github-copilot-cli","category-operations"],"acf":[],"blog_post_summary":"<p>If you work in field engineering, you know the scenario. A customer is deploying AKS in a regulated environment. They hit an issue during node bootstrapping. They want to know exactly what happens when a node joins the cluster, which components run in which order, and whether the behaviour they&#8217;re seeing is expected. The question [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/posts\/2085","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/users\/172655"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/comments?post=2085"}],"version-history":[{"count":1,"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/posts\/2085\/revisions"}],"predecessor-version":[{"id":2096,"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/posts\/2085\/revisions\/2096"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/media\/2095"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/media?parent=2085"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/categories?post=2085"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/tags?post=2085"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}