How to use Hugging Face Models with Semantic Kernel

Nilesh Acharya

Image skpatternlarge

We are thrilled to announce the integration of Semantic Kernel with Hugging Face models!

With this integration, you can leverage the power of Semantic Kernel combined with accessibility of over 190,000+ models from Hugging Face. This integration allows you to use the vast number of models at your fingertips with the latest advancements in Semantic Kernel’s orchestration, skills, planner and contextual memory support.

What is Hugging Face?

Hugging Face is a leading provider of open-source models. Models are pre-trained on large datasets and can be used to quickly perform a variety of tasks, such as sentiment analysis, text classification, and text summarization. Using Hugging Face model services can provide great efficiencies as models are pre-trained, easy to swap out and cost-effective with many free models available.

How to use Semantic Kernel with Hugging Face?

This video will give you a walk-through how to get started or dive right into the Python Sample here. For the remainder of this blog post we will be using the Hugging Face Sample with Skills as reference.

In the first two cells we install the relevant packages with a pip install and import the Semantic Kernel dependances.

!python -m pip install -r requirements.txt

import semantic_kernel as sk
import as sk_hf

Next, we create a kernel instance and configure the hugging face services we want to use. In this example we will use gp2 for text completion and sentence-transformers/all-MiniLM-L6-v2 for text embeddings.

kernel = sk.Kernel()

# Configure LLM service
    "gpt2", sk_hf.HuggingFaceTextCompletion("gpt2", task="text-generation")

We have chosen to use volatile memory, which uses the in-machine memory. We define the text memory skill which we use for this example.

Now we have Kernel setup, the next cell we define the fact memories we want to the model to reference as it provides us responses. In this example we have facts about animals. Free to edit and get creative as you test this out for yourself. Lastly we create a prompt response template that provides the details on how to respond to our query. That is it! Now we are all set to send our query.

The last cell in the notebook, defines the query parameters, relevancy and returns our output.

context = kernel.create_new_context()
context[sk.core_skills.TextMemorySkill.COLLECTION_PARAM] = "animal-facts"
context[sk.core_skills.TextMemorySkill.RELEVANCE_PARAM] = 0.3

context["query1"] = "animal that swims"
context["query2"] = "animal that flies"
context["query3"] = "penguins are?"
output = await kernel.run_async(my_function, input_vars=context.variables)

output = str(output).strip()

query_result1 = await kernel.memory.search_async(
    "animal-facts", context["query1"], limit=1, min_relevance_score=0.3
query_result2 = await kernel.memory.search_async(
    "animal-facts", context["query2"], limit=1, min_relevance_score=0.3
query_result3 = await kernel.memory.search_async(
    "animal-facts", context["query3"], limit=1, min_relevance_score=0.3

print(f"gpt2 completed prompt with: '{output}'")

Feel free to play with token sizes to vary your response lengths and other parameters to test the different responses.

Happy testing!

Next Steps:

Explore the sample in GitHub

Learn more about Semantic Kernel

Join the community and let us know what you think:

Image skpatternsmallbw