June 19th, 2024

Using Phi-3 & C# with ONNX for text and vision samples

Bruno Capuano
Cloud Advocate

The combination of Small Language Models (SLMs) and ONNX, is a game-changer in AI interoperability. Let’s show how you can harness the power of Phi-3 models within your .NET applications with C# and ONNX.

In the Phi-3 Cookbook repository we can find several samples, including a Console Application that loads a Phi-3 Vision model with ONNX, and analyze and describe an image.

A .NET Console application using Phi-3 Vision analyzes an image

Introduction to Phi-3 Small Language Model

The Phi-3 Small Language Model (SLM) represents a groundbreaking advancement in AI, developed by Microsoft. It’s part of the Phi-3 family, which includes the most capable and cost-effective SLMs available today. These models outperform others of similar or even larger sizes across various benchmarks, including language, reasoning, coding, and math tasks. The Phi-3 models, including the Phi-3-mini, Phi-3-small, and Phi-3-medium, are designed to be instruction-tuned and optimized for ONNX Runtime, ensuring broad compatibility and high performance.

You can learn more about Phi-3 in:

Introduction to ONNX

ONNX, or Open Neural Network Exchange, is an open-source format that allows AI models to be portable and interoperable across different frameworks and hardware. It enables developers to use the same model with various tools, runtimes, and compilers, making it a cornerstone for AI development. ONNX supports a wide range of operators and offers extensibility, which is crucial for evolving AI needs.

Why to Use ONNX on Local AI Development

Local AI development benefits significantly from ONNX due to its ability to streamline model deployment and enhance performance. ONNX provides a common format for machine learning models, facilitating the exchange between different frameworks and optimizing for various hardware environments.

For C# developers, this is particularly useful because we have a set of libraries specifically created to work with ONNX models. In example: Microsoft.ML.OnnxRuntime.

Sample Console Application to use a ONNX model

The main steps to use a model with ONNX in a C# application are:

  • The Phi-3 model, stored in the modelPath, is loaded into a Model object.
  • This model is then used to create a Tokenizer which will be responsible for converting our text inputs into a format that the model can understand.

In example, this is a chatbot implementation from /src/LabsPhi301/Program.cs.

  • The chatbot operates in a continuous loop, waiting for user input.
  • When a user types a question, the question is combined with a system prompt to form a full prompt.
  • The full prompt is then tokenized and passed to a Generator object.
  • The generator, configured with specific parameters, generates a response one token at a time.
  • Each token is decoded back into text and printed to the console, forming the chatbot’s response.
  • The loop continues until the user decides to exit by entering an empty string.
using Microsoft.ML.OnnxRuntimeGenAI;

var modelPath = @"D:\phi3\models\Phi-3-mini-4k-instruct-onnx\cpu_and_mobile\cpu-int4-rtn-block-32";
var model = new Model(modelPath);
var tokenizer = new Tokenizer(model);

var systemPrompt = "You are an AI assistant that helps people find information. Answer questions using a direct style. Do not share more information that the requested by the users.";

// chat start
Console.WriteLine(@"Ask your question. Type an empty string to Exit.");

// chat loop
while (true)
{
    // Get user question
    Console.WriteLine();
    Console.Write(@"Q: ");
    var userQ = Console.ReadLine();    
    if (string.IsNullOrEmpty(userQ))
    {
        break;
    }

    // show phi3 response
    Console.Write("Phi3: ");
    var fullPrompt = $"<|system|>{systemPrompt}<|end|><|user|>{userQ}<|end|><|assistant|>";
    var tokens = tokenizer.Encode(fullPrompt);

    var generatorParams = new GeneratorParams(model);
    generatorParams.SetSearchOption("max_length", 2048);
    generatorParams.SetSearchOption("past_present_share_buffer", false);
    generatorParams.SetInputSequences(tokens);

    var generator = new Generator(model, generatorParams);
    while (!generator.IsDone())
    {
        generator.ComputeLogits();
        generator.GenerateNextToken();
        var outputTokens = generator.GetSequence(0);
        var newToken = outputTokens.Slice(outputTokens.Length - 1, 1);
        var output = tokenizer.Decode(newToken);
        Console.Write(output);
    }
    Console.WriteLine();
}

The running app is similar to this one:

Sample Console App Chat demo

C# ONNX and Phi-3 and Phi-3 Vision

The Phi-3 Cookbook repository showcases how these powerful models can be utilized for tasks like question-answering and image analysis within a .NET environment.

It includes labs and sample projects that demonstrates the use of Phi-3 mini and Phi-3-Vision models in .NET applications.

Project Description
LabsPhi301 This is a sample project that uses a local phi3 model to ask a question. The project load a local ONNX Phi-3 model using the Microsoft.ML.OnnxRuntime libraries.
LabsPhi302 This is a sample project that implement a Console chat using Semantic Kernel.
LabsPhi303 This is a sample project that uses a local phi3 vision model to analyze images.. The project load a local ONNX Phi-3 Vision model using the Microsoft.ML.OnnxRuntime libraries.
LabsPhi304 This is a sample project that uses a local phi3 vision model to analyze images.. The project load a local ONNX Phi-3 Vision model using the Microsoft.ML.OnnxRuntime libraries. The project also presents a menu with different options to interacti with the user.

Summary

If you want to learn more, check also the Vision samples, which demonstrate the capabilities of AI in visual computing using a SLM, ONNX and C#.

If you want to learn more about .NET and AI, check out our additional AI samples at Get started with .NET 8 and AI using new quickstart tutorials.

Author

Bruno Capuano
Cloud Advocate

9 comments

Discussion is closed. Login to edit/delete existing comments.

  • Aaron Carter

    I managed to get my own Phi3 model up and running locally with C#/.NET, and opted for a DirectML/ONNX version (my preferred model image format and execution provider). This went a bit "off the grid" from the examples which show only CPU-driven examples. For anyone else going this route, you simply have to replace the NuGet packages with the alternative ending in ".DirectML" or ".CUDA" or whatever provider you want to use to run it...

    Read more
    • Bruno CapuanoMicrosoft employee Author

      Thanks for the feeback Aaron. The idea of a table / matrix with the suggested packages to do A, B or C is great! (added to our ToDo list!)

      Regarding to training, I don't think we have guidelines on how to train LLMs. However, you can fine-tune Phi-3 to make it work better in a specific domain (defined by the training data). Check the Fine-Tuning Samples in the https://aka.ms/Phi-3CookBook.

      Best!

      Read more
  • Eduardo Cucharro

    Great post Bruno, but I need your help!

    Can you really run it? I’ve tried running the LabsPhi3, but got An unhandled exception of type ‘System.DllNotFoundException’ occurred in Microsoft.ML.OnnxRuntimeGenAI.dll: ‘Unable to load shared library ‘onnxruntime-genai’ or one of its dependencies.

    I’ve tried some of the workarounds found in the issue https://github.com/microsoft/onnxruntime/issues/9707, but none really worked. Can you help?

    • Bruno CapuanoMicrosoft employee Author

      Hi Eduardo
      Are you running this on a Linux environment? I tested this and the Microsoft.ML.OnnxRuntimeGenAI libraries are only available for Windows at this moment. My OS is Windows 11, all updates up-to-date and it’s working fine. I can record a 5-min video if you want describing the project, references, and dev environment to run this.
      Best,
      Bruno

      • Eduardo Cucharro

        No, I’m using MacOS! Thanks for your reply!

      • Bruno CapuanoMicrosoft employee Author

        Thanks for sharing.
        I’ll share the news once we have support for Linux and MacOS.
        Best!

    • Krishna Prasad (Krishna)

      Where you able to resolve this. I am getting exact same error on Ubuntu 22.04

  • Matheus Julio

    Do i need a graphic card to run this model or it works fine on a cpu?

    • Bruno CapuanoMicrosoft employee Author

      Hi Matheus.
      You can run this demo with a CPU. A GPU will be much faster to run the demo, however, a CPU to test is fine.
      Best.