March 12th, 2025

Customer Case Story: Creating a Semantic Kernel Agent for Automated GitHub Code Reviews

Today I want to welcome a guest author to our Semantic Kernel blog, Rasmus Wulff Jensen, to cover how he’s created a Semantic Kernel agent for automated GitHub code review. We’ll turn it over to Rasmus to dive in.

Introduction

If you work in software development, you know that Code reviews are an essential part of the software development process, ensuring quality, security, and maintainability. However, they also consume a significant amount of developers’ time, often delaying feature delivery and increasing workload. Recognizing this challenge, the company Relewise, a Search and Recommendation SASS Business, streamlined its development workflow by introducing an AI Agent that conducts automated pull request (PR) reviews before human intervention.

Why build it yourself?

Given that GitHub Copilot offers a similar preview feature, one might ask why we chose to build our own solution. The key reasons are control and cost; we have full control over Developer (System) Messages, and the option to include additional task-definition data to the system and control every single step of the way + We only pay for token usage, rather than a per-user subscription. Finally, we see the quality of the reviews being much better.

How it all works (Building blocks and flow)

Below are the building blocks and the flow of the AI Agent (each of the 6 steps is explained in detail below)

Image Steps

Step 1: Developer creates Pull Request

When the Developer creates their Pull Request and marks it as ready a GitHub Webhook triggers and informs the AI Agent that it is time to do a review.

Step 2: Get the Pull Request

The AI Agent now determines, based on its setting what type of Review it should do and gathers the PR Diff Content (what code the developer has added, changed, and removed)

Here is how we retrieve the PR diff using the GitHub API

```cs
public async Task<string> GetPrDiff(GitHubClient client, string owner, string repo, int pullRequestNumber)
{
    var diffUrl = $"https://api.github.com/repos/{owner}/{repo}/pulls/{pullRequestNumber}";
    var diff = await client.Connection.Get<string>(
        new Uri(diffUrl),
        new Dictionary<string, string>(),
        "application/vnd.github.v3.diff"
 );

    return diff.Body;
}
```

Step 3: Get Task Definition

Beyond the Code, we also want to feed the AI with a title and description of what task, linked to the PR, that the developer set out to achieve, in order for the AI to evaluate if the code actually accomplishes the job.

Step 4: Conduct the AI Review (+ optional Step 5)

We now have all the data needed for the review, we create our Semantic Kernel object with o3-mini-high for Azure Open AI Services as backend.

var kernelBuilder = Kernel.CreateBuilder();

var httpClient = new HttpClient

{

Timeout = TimeSpan.FromMinutes(5)

};

kernelBuilder.AddAzureOpenAIChatCompletion("o3-mini", endpoint, apiKey, httpClient: httpClient);

var kernel = kernelBuilder.Build();

Our Agent

var agent = new ChatCompletionAgent

{

Name = "RelewisePrReviewer",

Kernel = kernel,

Instructions = """

You are a highly experienced and extremely thorough C# code reviewer.

Your task is to review the provided GitHub PR diff.

You will particularly pay attention to logical and performance bugs and

regressions, and generally be very meticulous in determining what this PR

introduces in terms of new and/or changed features and/or behaviors.

Should you at any point discover that it would be beneficial to you as the

reviewer to see additional types, you may at any point continue to ask me for

additional types, and then I may provide them to you.

""",

Arguments = new KernelArguments(new AzureOpenAIPromptExecutionSettings

{

ReasoningEffort = ChatReasoningEffortLevel.High,

ResponseFormat = typeof(Review)

})

};

Note that we use o3-Mini’ ReasoningEffortLevel and the ResponseFormat is set so instead of a normal response we get structured output that reduce the Developer (System) Message and allow us to get both the review and additional requested files back in a single request.

The Structured Output class is as follows.

public class Review

{

[Description("A Short, concise but sufficient summary of what this PR introduces/changes, and any general observations you find relevant in the context of the review, focus on functional changes that make a difference to the users of the system")]

public required string Summary { get; init; }


[Description("You will for all bug/regression/errors explain what the error is, where it is (include type names and namespaces when possible), why it's an error, and if possible short guidance on how to fix it. Do not use any type of bullet-list formatting")]

public string[]? PotentialIssuesFound { get; init; }


[Description("A List of other repo files that could improve the review if you had them")]

public ReviewAdditionalFile[]? AdditionalFilesThatCouldImproveReview { get; init; }


}

We now call the agents InvokeAsync method to get the Review object (and the token usage statistics).

var chatHistory = new ChatHistory();

chatHistory.AddUserMessage($"""

Task: {taskDefinition.Title}

- Description: {taskDefinition.Description}

---

PR Diff: {prDiffContent}

""");


await foreach (var content in agent.InvokeAsync(chatHistory))

{

var json = content.ToString();

var tokenIn = 0;

var tokenOut = 0;

if (content.Metadata?.TryGetValue("Usage", out var usage) == true && usage is ChatTokenUsage chatTokenUsage)

{

tokenIn = chatTokenUsage.InputTokenCount;

tokenOut = chatTokenUsage.OutputTokenCount;

}

var review = JsonSerializer.Deserialize<Review>(json)!;

}

What happens now depends on the content back; If the AI reports back that it could have made a better review if it had more related files (when it only has the diff and not the full codebase it can only infer what happens in other files and by providing the additional files the review can become more accurate), these optional additional files are collected on the fly using the GitHub API, the original review is discarded and a new review that includes the additional file content are conducted (Step 5)

Some might think, why not ask AI what extra files it needs prior to the Review? It is done this way in order to save Input Tokens which is by far the biggest cost here. So by levering the excellent Structure Output feature we only need to give the PR-diff twice if additional files be needed, and if not a single pass to the LLM was all we needed.

Step 6: Add Pull Request Comment

As what you now do with the Review is a matter of opinion. We have chosen to add a simple PR Comment with potential comments to keep it short, and with a link to the longer review displayed at a link to the full review on an Internal Website.

Image prComment

… But it is also possible to instruct the AI via the structured output to give line-by-line Comments if that suits your team (it is clever enough to do it) or post the full review directly as a GitHub comment.

The review is now done; If no findings, then it is time for a human to make the final review, but if anything is found the original developer can immediately react to it and not lose flow waiting for colleagues to have time to do the review.

Reception from the developers

As we started a prototype of this project prior to having API Access to the Open AI reasoning models, early iterations of the Agent sometimes produced inaccurate findings, leading to skepticism among the developers. However, after switching to reasoning models and refining instructions, accuracy has significantly improved. It has even become a thing for one of the developers to do this 😉

Image thanksAi

Future

We will continue to tweak and upgrade to new LLM models + leverage other tools and techniques from Semantic Kernal and Azure Open AI. Perhaps one day in the distant future we might get so far that once the AI is done with the review it will merge and deploy the code itself… We are not there yet, but this is still a huge first step toward it 🙌

Link to original discussion: https://github.com/microsoft/semantic-kernel/discussions/10634

0 comments