January 5th, 2026
heartlike3 reactions

Generative AI with Large Language Models in C# in 2026

Jeremy Likness
Principal Program Manager - .NET AI experience

Generative AI became the fastest‑growing consumer technology in history, surpassing Instagram and TikTok, reaching 100 million users in under two months. At the end of 2022, OpenAI released a free preview of GPT‑3.5, delivered as a conversational chat client: ChatGPT. The model was fine‑tuned using Reinforcement Learning from Human Feedback (RLHF), marking the moment generative AI hit mainstream awareness. In early 2023, Microsoft responded by launching the Azure OpenAI Service, allowing developers to securely provision and use OpenAI‑compatible models behind Azure‑managed endpoints.

Soon after, Microsoft introduced:

  • Semantic Kernel (SK) → tools for orchestrating prompts, memories, and plugins using C# or Python
  • Microsoft Extensions for AI (MEAI) → unified abstractions for interacting with models (e.g., IChatClient)
  • Microsoft Extensions for Vector Data → standard interfaces for vector databases used in RAG systems

This post takes a step back from rapid AI innovation and focuses on core concepts, providing a foundation for .NET/C# developers working with Microsoft Foundry, GitHub Models, AI Extensions, and local runtimes like Ollama.

Understanding AI Terms

AI has its own distinct set of terms with very specific meanings.

Artificial Intelligence (AI)

AI involves techniques that enable computers to perform tasks typically requiring human intelligence—reasoning, language, planning, or perception. AI is not new, but today most people use “AI” to refer to generative AI.

Generative AI (GenAI)

Generative AI refers to AI systems capable of producing text, images, audio, or other content.

For example: GPT stands for Generative Pre‑trained Transformer. To break that down, we get:

  • Generative → it produces content;
  • Pre‑trained → trained on huge datasets;
  • Transformer → neural‑network architecture enabling high‑quality language modeling

Large Language Models (LLMs)

LLMs are trained on billions of tokens and can generate text, images, code, or reasoning steps. Their ability to operate across multiple languages comes from learning relationships between words—not simple one‑to‑one dictionary translations.

Why translation is hard

Words have many meanings:

  • pass the car
  • mountain pass
  • pass on the opportunity
  • your park pass on the dashboard

Traditional software struggled with such ambiguity; LLMs excel because they operate in semantic space.

Tokens and embeddings

Models don’t read text directly. They break it into tokens:

  • Whole words
  • Word fragments
  • Characters

These tokens are converted into numeric vectors known as embeddings — mathematical representations of meaning.

Example Phrases:

  • “the actor was a star”
  • “they loved the stars”

The word star appears in both, but with different meanings.

Embeddings capture this difference.

Here is a simplified way to visualize this concept. In the graph, the semantic meaning of the word “star” can be plotted based on its proximity to the concept of “celestial body” (a star at night) and the concept of “actor” (star of the show).

Semantic graph

Now imagine billions of such points. Models generate text by navigating this space and predicting the next likely vector.

Examples of semantic distance:

  • school ↔ schol (close distance → spelling correction)
  • cat ↔ dog (close distance → similar animals)
  • cat ↔ laptop (far apart)

Semantic search uses distance in embedding space, not string matching.

Parameters: model size

LLMs are often described by their parameter counts:

7B, 14B, 70B, 123B, etc.

Parameters are trained weights.

More parameters → deeper reasoning, richer knowledge, better nuance.

  • GPT‑1 (2018) → 117M parameters
  • Modern frontier models → 100B–400B+ parameters

Prompts, instructions, and tools

Previous sections covered information about the model. The terms in this section relate directly to input into and output out of the model.

Prompts

User input to the model. “What’s the best way to skin a mango.”

System Instructions

Hidden “blueprint” guiding model behavior. “You are a mango skinner and considered an expert in your area.”

Tools / Functions

LLMs are trained on historical data. Tools let them access current or authoritative information, e.g.:

  • Weather API
  • Database lookup
  • Search engine
  • Company knowledge index

This pattern is referred to as Retrieval‑Augmented Generation (RAG). Let’s look at two scenarios. First, imagine a concierge agent that’s provided with an API for local restaurants and an API for the weather. The user enters the prompt:

Can you book me a dinner this week at a restaurant with outdoor seating? 

The LLM first calls the weather API to determine which evenings are likely to be dry and warmer, then it calls the restaurant API to find what restaurants are open and have available seating. Finally, it returns a list of suggestions that are right on target.

Next, imagine a customer service agent for a retail store that has all of the product information uploaded. The user types,

"What kind of batteries does the traveling wonder cube take?"

The LLM is able to extract the product name, “traveling wonder cube”. It vectorizes the text of the query, then calls the product API with the product name and the vectors. Semantic search is invoked by using a function to find points in the product manual that are semantically closest to the query. This will return the relevant result of the required batteries if such a section exists.

Model Context Protocol (MCP)

Model context protocol, or MCP for short, is a set of standards for interoperability between agents and tools. It makes it easy for models to understand what tools are available and how to call them. This empowers you to build virtual toolboxes that any of your models or agents can call.

What about agents?

Wait, did I say agent? An agent is simply a way of providing a specialized solution that includes a model, tools, and context. A “concierge agent” might include a reasoning model with tools that provide information about weather, events, and local businesses combined with a specialized model capable of generating maps with turn-based instructions. I’ll look at agents more closely and cover C# based solutions in a later post.

I’ve covered all of the foundational concepts, so now it’s time to go hands-on. First, I want to briefly share the timeline between generative AI going mainstream and the tools that are available today.

From GPT-1 to today

Here is a brief look at the evolution of AI in .NET over the past few years.

Evolution of AI

Model management in the .NET Ecosystem

Working with models is about more than identifying the right model and using it. Many companies choose to host their own models out of concerns related to trust, security, and cost. Other companies require fine-tuned models and the ability to perform their own training. Fortunately, working with models in .NET and C# is not only possible, but streamlined with the help of several products and services.

GitHub Models

GitHub Models provides a hosted catalog of open and frontier models through an OpenAI‑compatible API. It is a great way for developers to get started on their AI journey. A few reasons include:

  • No infrastructure required
  • Switch between models with minimal code changes
  • Perfect for prototyping, evaluations, automation, extensions, and CI/CD pipelines

Get started with GitHub models.

Microsoft Foundry (Cloud)

Formerly Azure AI Studio, Microsoft Foundry is the enterprise platform for:

  • Model catalogs (OpenAI, Meta, DeepSeek, Cohere, Mistral, etc.)
  • Agentic workflows (Foundry Agent Service)
  • Security, content safety, governance
  • Monitoring, tracing, evaluations
  • Fine‑tuning and customization

Foundry is where organizations take AI into production at scale.

Explore Microsoft Foundry.

Foundry Local

Foundry Local brings the Foundry developer experience offline:

  • On‑premise, air‑gapped, or edge environments
  • The same agents, tools, evaluations as cloud Foundry
  • Supports hybrid “develop local → deploy cloud” lifecycle

This is a great option for testing new models, testing new code without blowing through budget, and building CI/CD pipelines with minimal overhead and that don’t require a third-party hosted account to succeed.

Ollama (Local Runtime)

Ollama is a popular open‑source engine for running lightweight and mid‑sized models locally.

Features:

  • Runs models like Mistral, Llama 3, Phi‑3
  • Simple CLI and server
  • Excellent for privacy‑sensitive workflows
  • Integrates cleanly with MEAI (IChatClient) via OllamaSharp

Bringing It All Together: A Unified Abstraction

As a .NET Developer you shouldn’t have to choose a single provider or lock into a single solution. That’s why the .NET team invested in a set of extensions that provide consistent APIs for working with models that are universal yet flexible. It also enables scenarios such as middleware to ease the burden of logging, tracing, injecting behaviors and other custom processes you might use. Most of the major providers implement our extensions contracts so that you can, for example, use an IChatClient instance regardless of whether you’re talking to:

  • GitHub Models
  • Azure AI Foundry
  • Open AI / Azure Open AI
  • Foundry Local
  • Ollama
  • Custom provider

… and the code can stay the same.

We’ll dive deeper into these tools in future posts so stay tuned to the .NET blog, subscribe to our newsletter, and join an upcoming community standup on the .NET YouTube!

Author

Jeremy Likness
Principal Program Manager - .NET AI experience

Jeremy is a Principal Product Manager at Microsoft, responsible for the AI experience in .NET. He's also managed minimal APIs, ASP.NET's authentication/authorization capabilities and .NET data products including Entity Framework.

8 comments

Sort by :
  • SHAKIB SADMAN EMON

    A class Master
    mind defination

  • Weihan Li 8 hours ago · Edited

    Since we’ve mentioned the Semantic Kernel, maybe better to mention the Microsoft Agent Framework also since that’s the successor of SK

  • Tim Dawson 9 hours ago

    I appreciate you starting at the beginning, for those of us who have not yet started integrating AI into our code and (in many cases) haven’t even really used it as a consumer much.

  • Prashant Shivhare 11 hours ago

    I’m currently working on several ASP.NET Core projects and would like to implement AI-powered searching and filtering of records based on user or entity profiles (for example, semantic search rather than strict keyword matching).

    Could you share recommended starting steps, architecture flow, or reference patterns for achieving this in .NET? Specifically:

    1. How to model and generate embeddings for existing records
    2. How to store and query them efficiently (e.g., vector databases or alternatives)
    3. How this fits into a typical ASP.NET Core application flow (RAG, APIs, etc.)

    Any guidance or links to sample implementations would be greatly appreciated.

    • Jeremy LiknessMicrosoft employee Author 5 hours ago · Edited

      Here are a few resources to get started. If this doesn't address what you are looking for or you find gaps or issues, please feel to reach out to me directly! My goal is to provide you with the right tools and docs for you to confidently implement the capabilities you described. Email is just my first name and last name with a dot in between at microsoft (dot) com.

      https://devblogs.microsoft.com/dotnet/introducing-data-ingestion-building-blocks-preview/

      https://devblogs.microsoft.com/dotnet/vector-data-qdrant-ai-search-dotnet/

      Read more
  • Viktor Nikolaev

    Hi! Thanks for the article.
    What should I use now for a greenfield project Semantic Kernel or Microsoft Extensions for AI (MEAI)?

    • Jeremy LiknessMicrosoft employee Author

      My pleasure! I would always start with MEAI, then add other packages as needed/when they are identified. The primitives from Semantic Kernal have been migrated to MEAI, and the workflow capabilities are being address by the Microsoft Agent Framework.