Generative AI: The Future of Consumer Technology

Generative AI became the fastest‑growing consumer technology in history, surpassing Instagram and TikTok, reaching 100 million users in under two months. At the end of 2022, OpenAI released a free preview of GPT‑3.5, delivered as a conversational chat client: ChatGPT. The model was fine‑tuned using Reinforcement Learning from Human Feedback (RLHF), marking the moment generative AI hit mainstream awareness. In early 2023, Microsoft responded by launching the Azure OpenAI Service, allowing developers to securely provision and use OpenAI‑compatible models behind Azure‑managed endpoints.

Soon after, Microsoft introduced:

Semantic Kernel (SK) → tools for orchestrating prompts, memories, and plugins using C# or Python
Microsoft Extensions for AI (MEAI) → unified abstractions for interacting with models (e.g., IChatClient)
Microsoft Extensions for Vector Data → standard interfaces for vector databases used in RAG systems

This post takes a step back from rapid AI innovation and focuses on core concepts, providing a foundation for .NET/C# developers working with Microsoft Foundry, GitHub Models, AI Extensions, and local runtimes like Ollama.

Understanding AI Terms

AI has its own distinct set of terms with very specific meanings.

Artificial Intelligence (AI)

AI involves techniques that enable computers to perform tasks typically requiring human intelligence—reasoning, language, planning, or perception. AI is not new, but today most people use “AI” to refer to generative AI.

Generative AI (GenAI)

Generative AI refers to AI systems capable of producing text, images, audio, or other content.

For example: GPT stands for Generative Pre‑trained Transformer. To break that down, we get:

Generative → it produces content;
Pre‑trained → trained on huge datasets;
Transformer → neural‑network architecture enabling high‑quality language modeling

Large Language Models (LLMs)

LLMs are trained on billions of tokens and can generate text, images, code, or reasoning steps. Their ability to operate across multiple languages comes from learning relationships between words—not simple one‑to‑one dictionary translations.

Why translation is hard

Words have many meanings:

pass the car
mountain pass
pass on the opportunity
your park pass on the dashboard

Traditional software struggled with such ambiguity; LLMs excel because they operate in semantic space.

Tokens and embeddings

Models don’t read text directly. They break it into tokens:

Whole words
Word fragments
Characters

These tokens are converted into numeric vectors known as embeddings — mathematical representations of meaning.

Example Phrases:

“the actor was a star”
“they loved the stars”

The word star appears in both, but with different meanings.

Embeddings capture this difference.

Here is a simplified way to visualize this concept. In the graph, the semantic meaning of the word “star” can be plotted based on its proximity to the concept of “celestial body” (a star at night) and the concept of “actor” (star of the show).

Semantic graph

Now imagine billions of such points. Models generate text by navigating this space and predicting the next likely vector.

Examples of semantic distance:

school ↔ schol (close distance → spelling correction)
cat ↔ dog (close distance → similar animals)
cat ↔ laptop (far apart)

Semantic search uses distance in embedding space, not string matching.

Parameters: model size

LLMs are often described by their parameter counts:

7B, 14B, 70B, 123B, etc.

Parameters are trained weights.

More parameters → deeper reasoning, richer knowledge, better nuance.

GPT‑1 (2018) → 117M parameters
Modern frontier models → 100B–400B+ parameters

Prompts, instructions, and tools

Previous sections covered information about the model. The terms in this section relate directly to input into and output out of the model.

Prompts

User input to the model. “What’s the best way to skin a mango.”

System Instructions

Hidden “blueprint” guiding model behavior. “You are a mango skinner and considered an expert in your area.”

Tools / Functions

LLMs are trained on historical data. Tools let them access current or authoritative information, e.g.:

Weather API
Database lookup
Search engine
Company knowledge index

This pattern is referred to as Retrieval‑Augmented Generation (RAG). Let’s look at two scenarios. First, imagine a concierge agent that’s provided with an API for local restaurants and an API for the weather. The user enters the prompt:

Can you book me a dinner this week at a restaurant with outdoor seating?

The LLM first calls the weather API to determine which evenings are likely to be dry and warmer, then it calls the restaurant API to find what restaurants are open and have available seating. Finally, it returns a list of suggestions that are right on target.

Next, imagine a customer service agent for a retail store that has all of the product information uploaded. The user types,

"What kind of batteries does the traveling wonder cube take?"

The LLM is able to extract the product name, “traveling wonder cube”. It vectorizes the text of the query, then calls the product API with the product name and the vectors. Semantic search is invoked by using a function to find points in the product manual that are semantically closest to the query. This will return the relevant result of the required batteries if such a section exists.

Model Context Protocol (MCP)

Model context protocol, or MCP for short, is a set of standards for interoperability between agents and tools. It makes it easy for models to understand what tools are available and how to call them. This empowers you to build virtual toolboxes that any of your models or agents can call.

What about agents?

Wait, did I say agent? An agent is simply a way of providing a specialized solution that includes a model, tools, and context. A “concierge agent” might include a reasoning model with tools that provide information about weather, events, and local businesses combined with a specialized model capable of generating maps with turn-based instructions. I’ll look at agents more closely and cover C# based solutions in a later post.

I’ve covered all of the foundational concepts, so now it’s time to go hands-on. First, I want to briefly share the timeline between generative AI going mainstream and the tools that are available today.

From GPT-1 to today

Here is a brief look at the evolution of AI in .NET over the past few years.

Evolution of AI

Model management in the .NET Ecosystem

Working with models is about more than identifying the right model and using it. Many companies choose to host their own models out of concerns related to trust, security, and cost. Other companies require fine-tuned models and the ability to perform their own training. Fortunately, working with models in .NET and C# is not only possible, but streamlined with the help of several products and services.

GitHub Models

GitHub Models provides a hosted catalog of open and frontier models through an OpenAI‑compatible API. It is a great way for developers to get started on their AI journey. A few reasons include:

No infrastructure required
Switch between models with minimal code changes
Perfect for prototyping, evaluations, automation, extensions, and CI/CD pipelines

Get started with GitHub models.

Microsoft Foundry (Cloud)

Formerly Azure AI Studio, Microsoft Foundry is the enterprise platform for:

Model catalogs (OpenAI, Meta, DeepSeek, Cohere, Mistral, etc.)
Agentic workflows (Foundry Agent Service)
Security, content safety, governance
Monitoring, tracing, evaluations
Fine‑tuning and customization

Foundry is where organizations take AI into production at scale.

Explore Microsoft Foundry.

Foundry Local

Foundry Local brings the Foundry developer experience offline:

On‑premise, air‑gapped, or edge environments
The same agents, tools, evaluations as cloud Foundry
Supports hybrid “develop local → deploy cloud” lifecycle

This is a great option for testing new models, testing new code without blowing through budget, and building CI/CD pipelines with minimal overhead and that don’t require a third-party hosted account to succeed.

Ollama (Local Runtime)

Ollama is a popular open‑source engine for running lightweight and mid‑sized models locally.

Features:

Runs models like Mistral, Llama 3, Phi‑3
Simple CLI and server
Excellent for privacy‑sensitive workflows
Integrates cleanly with MEAI (IChatClient) via OllamaSharp

Bringing It All Together: A Unified Abstraction

As a .NET Developer you shouldn’t have to choose a single provider or lock into a single solution. That’s why the .NET team invested in a set of extensions that provide consistent APIs for working with models that are universal yet flexible. It also enables scenarios such as middleware to ease the burden of logging, tracing, injecting behaviors and other custom processes you might use. Most of the major providers implement our extensions contracts so that you can, for example, use an IChatClient instance regardless of whether you’re talking to:

GitHub Models
Azure AI Foundry
Open AI / Azure Open AI
Foundry Local
Ollama
Custom provider

… and the code can stay the same.

We’ll dive deeper into these tools in future posts so stay tuned to the .NET blog, subscribe to our newsletter, and join an upcoming community standup on the .NET YouTube!