Hello!
We are delighted to make an announcement regarding the availability of Azure AI Inference within Semantic Kernel (Python). This latest AI connector empowers you to experiment with a broader range of models hosted on Azure within your applications.
The Sample Application: Evaluating LLMs
Here is the location of the full sample.
To demonstrate the capabilities of this new AI connector, we have prepared a sample application. In this application, we will execute three models against the widely recognized Measuring Massive Multitask Language Understanding (MMLU) dataset and produce benchmarking results.
Consider the following scenarios where you will want to reference this sample as a starting point for your own project with Semantic Kernel and Model-as-a-Service on Azure:
- Developing a new dataset or enhancing an existing one for benchmarking LLMs.
- Reproducing results from academic papers that evaluate model performance on open-source datasets. Some of these models may be too large to execute locally, such as Meta-Llama-3-70B and Mistral Large.
You should see an output very similar to the following when the sample finishes running:
As you can see, Phi3-small outperforms Phi3-mini, and Phi3-mini, in turn, outperforms Llama3-8b. It is noteworthy that the observed sequence of performance (not the accuracy values, but the order) aligns with the report presented on page 6 of the paper Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone.
Model Deployments
Before proceeding with the sample application, we need to deploy some models! If you have already done so, please proceed to the next section.
There are two modes of access to the service: Managed compute and Serverless API. Serverless API is also known as pay-as-you-go or Model-as-a-Service (MaaS). You can read more about it here. Depending on the mode of access you choose, model availability is slightly different. You can view the details here.
We will focus on Model-as-a-Service in this blog post. Follow the instructions here to deploy models as serverless APIs. For the purpose of minimizing cost that may induce from running the sample, the models the sample uses are Llama3-8b, Phi3-mini, and Phi3-small.
Setting up the AI connector
Once the models have been deployed, the next step is to configure the connectors in the kernel. Semantic Kernel makes it extremely easy to set up and use multiple AI services. Locate the endpoints and API keys for the deployments you have just created, and then input them into the parameters as shown below.
kernel = Kernel()
kernel.add_service(
AzureAIInferenceChatCompletion(
ai_model_id="Llama3-8b",
api_key="",
endpoint="",
)
)
kernel.add_service(
AzureAIInferenceChatCompletion(
ai_model_id="Phi3-mini",
api_key="",
endpoint="",
)
)
kernel.add_service(
AzureAIInferenceChatCompletion(
ai_model_id="Phi3-small",
api_key="",
endpoint="",
)
)
The AI services will be automatically picked up later in the script to run on the dataset.
The Dataset
The MMLU dataset comprises multiple-choice questions across various subjects and is a widely recognized benchmark for comparing large language model (LLM) performance. The dataset was introduced in this paper of the same title. It is accessible via HuggingFace.
To download the dataset, you will need a HuggingFace access token. Please follow the instructions provided here to create one. You will be prompted to enter this token when you run the sample.
Included in the sample are the following subjects:
datasets = load_mmlu_dataset(
[
"college_computer_science",
"astronomy",
"college_biology",
"college_chemistry",
"elementary_mathematics",
# Add more subjects here.
# See here for a full list of subjects: https://huggingface.co/datasets/cais/mmlu/viewer
]
)
You are free to add or remove subjects.
Evaluation
The script creates a plugin named MMLUPlugin for evaluating samples from the dataset. This plugin includes a kernel function called evaluate, which processes a question and determines when an AI service has correctly predicted the answer.
The kernel function accepts a question and formats it into a prompt to be used as a user message for the chat completion service. In this instance, the function does not include any samples in the prompt, a method known as zero-shot learning. Many papers use multi-shot learning to enhance model accuracy, which explains why the results from this sample may be lower than those reported in the paper mentioned above.
def formatted_question(question: str, answer_a: str, answer_b: str, answer_c: str, answer_d: str):
"""Return a formatted question."""
return f"""
Question: {question}
Which of the following answers is correct?
A. {answer_a}
B. {answer_b}
C. {answer_c}
D. {answer_d}
State ONLY the letter corresponding to the correct answer without any additional text.
"""
The Semantic Kernel team encourages you to experiment with the prompt by incorporating multi-shot learning to potentially improve accuracy values!
The kernel function also takes a parameter that is the name of service and will use that service for inference.
response = await kernel.get_service(service_id).get_chat_message_content(
chat_history,
settings=kernel.get_prompt_execution_settings_from_service_id(service_id),
)
Conclusion
The Semantic Kernel team is dedicated to empowering developers by providing access to the latest advancements in the industry. We have integrated the new Azure AI Inference connector into SK, enabling seamless integration of Model-as-a-Service in your SK-based applications. Additionally, we offer a sample application demonstrating the usage and potential of the new connector. This will give you greater confidence in selecting the appropriate models, including large ones, for your tasks while ensuring you only pay for what you consume. We encourage you to leverage your creativity and build remarkable solutions with SK!
Please reach out if you have any questions or feedback through our Semantic Kernel GitHub Discussion Channel. We look forward to hearing from you! We would also love your support, if you’ve enjoyed using Semantic Kernel, give us a star on GitHub.
0 comments