March 13th, 2025

Flexible Tool Selection for ML Model Production

Introduction

When you think about data science and machine learning, the Python and R programming languages come to mind immediately as the default tool selection and for good reason. There is an immense amount of academic research and business R&D in machine learning conducted in those languages. As a result, tooling choices based on Python for data science and machine learning have grown exponentially in the past several years. One consideration that is top of mind for business decision makers is how best to host and deploy machine learning and generative AI capabilities in their production environment.

Production deployment of ML and AI capabilities will generally remain the responsibility of the application DevOps and cloud deployment team already in place. Businesses will generally prefer to leverage existing developer and cloud architect skillets to build, host, observe, monitor, and support ML and AI capability deployments. For example, if a business has expertise in Java or C# for production, it is probably in the best interest of the organization to think about their ML and AI deployments in those languages and related platforms if possible.

We recently worked with a customer that was primarily a .NET shop (with some C++) for their suite of applications. They are experimenting with machine learning and generative AI using the tools they know, primarily C#. This customer has also been hiring more traditional data scientists and machine learning expertise primarily with Python skill sets to augment their expertise as part of their effort to produce and deploy production ready datasets and machine learning models.

Our team engaged to assist the customer on this journey by implementing an MLOps Model Factory implemented in Python leveraging the Microsoft Model Factory starter kit based on Microsoft’s MLOps V2 guidance. The goal for this engagement was to address the need to produce models with strong engineering fundamentals, data traceability, and observability built into the end-to-end model creation process to provide the information needed to decision makers to approve deployment to production branches.

While data science and machine learning experimentation and evaluation were conducted in the Python-based MLOps Model Factory with Azure Machine Learning Studio, the customer’s deployment target to enable machine learning and generative AI capabilities remains in their suite of desktop and web-based applications built on .NET and C#. Implementing a pathway to production starting with building machine learning models in Python and then hosting ML models for production inference in .NET with C# is the focus of this blog post.

Approach

The solution we found was to focus on model outputs as the production candidate artifact from the Model Factory. Our data science team worked with the customer’s data scientist to assist with machine learning model development in Python while our software engineering team worked with the customer’s DevOps team and data engineers to build out the Python-based Model Factory that integrates with Azure Machine Learning Studio for data asset management, compute, reporting, and the model asset management.

Model Factory Artifacts for Prod Model Inference

Model Factory Refinement

There is a well-maintained version of PyTorch authored in C# named TorchSharp. A key difference between the pytorch and the C#-based implementation, TorchSharp, is the model output format. Python supports producing models in .pkl and .pytorch format by default. However, Python also supports the Open Neural Network Exchange format, ONNX Model Runtime as well. .NET supports the ONNX model runtime as the primary model output for both TorchSharp as well as ML.NET. Our team spent time with the customer data scientists creating experiments to produce machine learning model candidates in Python and then validating the model inference performance for consistency by building out a C#-based test harness that was a prototype of their production application.

The collaborative experimentation between the data science and software engineering teams testing model inference implementation and capabilities in both Python and C# with .pytorch and .onnx model formats demonstrated that model performance was consistent regardless of output format, building confidence in our approach.

Understanding Production Model Inference Requirements

Another key deliverable along with the model is the “tokenizer”, which is used to tokenize inputs for model inference and then detokenize the outputs back to an actionable result format. By default, when working with the Python based data science and machine learning frameworks, you will find the tokenizer component is also published in .pkl format. However, there is again a common format in .json for the tokenizer that is supported by both pytorch and torchsharp for language programming platform interoperability.

An additional benefit to focusing on machine learning model inference experimentation for production deployment early on in model development is a greater understanding of how data refinement and filtering to train the model directly translates to how inputs to the model at production may also need to be filtered to ensure model performance meets expectations. The challenge for the data science and software engineering team is to take the data input filtering Python scripts and port them to .NET and C#. Authoring similar unit tests in both languages can help drive consistency in performance.

Producing Usable ML Output Artifacts

As a result of the above work, we enhanced the CD workflow for the Model Factory to include all the necessary components to enable testing and validation in both Python and in C# / .NET to include:

  1. Model output in .pytorch and .onnx formats.
  2. Tokenizer in .json formats
  3. References to the data asset used to train the model.
  4. A README.md that provides context and guidance on how to leverage the artifacts.
  5. An environment management file (conda.yml, requirements.txt, dockerfile, .devcontainer)

With the above inputs and guidance, the customer application software engineering and deployment team can review the Azure Machine Learning Model Registry, select a model, and begin model inference testing and deployment validation into test branches of their downstream application.

Summary

While there is extensive documentation for Python developers building machine learning models and there is extensive documentation for .NET / C# developers to build machine learning models, it can be challenging to bring these two skills sets and capabilities together with a cohesive approach that does not require skill set retooling.

The ONNX format provides one option for the key interoperability between tool sets but there are additional technical requirements as described above such as input filtering that must be considered when delivering models for production inference. We hope that what we shared above provides additional context and confidence on how to proceed technically so that you can focus on hiring the right resources and enable them to be successful in their preferred tooling.

The feature image was generated using Open AI. Terms can be found here.