June 24th, 2025
0 reactions

Semantic Kernel Python Gets a Major Vector Store Upgrade

Eduard van Valkenburg
Senior Software Engineer

We’re excited to announce a significant update to Semantic Kernel Python’s vector store implementation. Version 1.34 brings a complete overhaul that makes working with vector data simpler, more intuitive, and more powerful. This update consolidates the API, improves developer experience, and adds new capabilities that streamline AI development workflows.

What Makes This Release Special?

The new vector store architecture consolidates everything under semantic_kernel.data.vector and delivers three key improvements:

  1. Simplified API: One unified field model replaces multiple complex field types
  2. Integrated Embeddings: Embedding generation happens automatically where you need it
  3. Enhanced Features: Advanced filtering, hybrid search, and streamlined operations

Let’s explore what makes these changes valuable.

Unified Field Model – Simplified Configuration

We’ve replaced three separate field types with one powerful VectorStoreField class that handles everything you need.

Before: The Old Way (Complex and Verbose)

from semantic_kernel.data import (
    VectorStoreRecordKeyField,
    VectorStoreRecordDataField, 
    VectorStoreRecordVectorField
)

# Multiple classes to remember and configure
fields = [
    VectorStoreRecordKeyField(name="id"),
    VectorStoreRecordDataField(name="text", is_filterable=True, is_full_text_searchable=True),
    VectorStoreRecordVectorField(name="vector", dimensions=1536, distance_function="cosine")
]

After: The New Way (Clean and Intuitive)

from semantic_kernel.data.vector import VectorStoreField
from semantic_kernel.connectors.ai.open_ai import OpenAITextEmbedding

embedding_service = OpenAITextEmbedding(ai_model_id="text-embedding-3-small")

# One class handles all field types
fields = [
    VectorStoreField("key", name="id"),
    VectorStoreField("data", name="text", is_indexed=True, is_full_text_indexed=True),
    VectorStoreField("vector", name="vector", dimensions=1536, 
                    distance_function="cosine", embedding_generator=embedding_service)
]

This approach provides cleaner code with better IDE support, including improved autocomplete and clearer intentions.

Integrated Embeddings – Automatic Generation

The new architecture includes automatic embedding generation directly in your field definitions. No more manual embedding steps—just define what you want embedded, and it happens automatically.

from semantic_kernel.data.vector import VectorStoreField, vectorstoremodel
from semantic_kernel.connectors.ai.open_ai import OpenAITextEmbedding
from typing import Annotated
from dataclasses import dataclass

@vectorstoremodel
@dataclass
class MyRecord:
    content: Annotated[str, VectorStoreField('data', is_indexed=True, is_full_text_indexed=True)]
    title: Annotated[str, VectorStoreField('data', is_indexed=True, is_full_text_indexed=True)]
    id: Annotated[str, VectorStoreField('key')]
    vector: Annotated[list[float] | str | None, VectorStoreField(
        'vector', 
        dimensions=1536, 
        distance_function="cosine",
        embedding_generator=OpenAITextEmbedding(ai_model_id="text-embedding-3-small"),
    )] = None

    def __post_init__(self):
        if self.vector is None:
            # Combine multiple fields for richer embeddings
            self.vector = f"Title: {self.title}, Content: {self.content}"

You can now easily combine multiple fields to create richer embeddings with simple field assignment.

Lambda-Powered Filtering – Type-Safe and Expressive

The new filtering system uses lambda expressions that are type-safe, IDE-friendly, and highly expressive, replacing the previous string-based FilterClause objects.

Before: String-Based Complexity

from semantic_kernel.data.text_search import SearchFilter

# Multiple objects and method calls
text_filter = SearchFilter()
text_filter.equal_to("category", "AI")
text_filter.equal_to("status", "active")

After: Lambda Expression Power

# Clean, readable, and type-safe
results = await collection.search(
    "query text", 
    filter=lambda record: record.category == "AI" and record.status == "active"
)

# Complex filtering with multiple conditions
results = await collection.search(
    "machine learning concepts",
    filter=lambda record: (
        record.category == "AI" and 
        record.score > 0.8 and
        "important" in record.tags and
        0.5 <= record.confidence_score <= 0.9
    )
)

Your IDE can now provide full autocomplete support and catch errors at development time.

Streamlined Operations – Consistent Interface

The new API provides a consistent interface that works with both single records and batches:

from semantic_kernel.connectors.in_memory import InMemoryCollection

collection = InMemoryCollection(
    record_type=MyRecord,
    embedding_generator=OpenAITextEmbedding(ai_model_id="text-embedding-3-small")
)

# Single record or batch - same method
await collection.upsert(single_record)
await collection.upsert([record1, record2, record3])

# Flexible retrieval
await collection.get(["id1", "id2"])  # Get specific records
await collection.get(top=10, skip=0, order_by='title')  # Browse with pagination

# Powerful search with automatic embedding
results = await collection.search("find AI articles", top=10)
results = await collection.hybrid_search("machine learning", top=10)

Instant Search Functions – Simplified Creation

Creating search functions for your kernel is now straightforward:

Before: Multiple Steps and Setup

from semantic_kernel.data import VectorStoreTextSearch

collection = InMemoryCollection(collection_name='collection', record_type=MyRecord)
search = VectorStoreTextSearch.from_vectorized_search(
    vectorized_search=collection, 
    embedding_generator=OpenAITextEmbedding(ai_model_id="text-embedding-3-small")
)
search_function = search.create_search(function_name='search')

After: Streamlined Creation

# Create a search function directly on your collection
search_function = collection.create_search_function(
    function_name="search",
    search_type="vector",  # or "keyword_hybrid"
    top=10,
    vector_property_name="vector"
)

# Add to kernel
kernel.add_function(plugin_name="memory", function=search_function)

Enhanced Data Model Expressiveness

The simplified API doesn’t sacrifice expressiveness. Data models are more capable than before:

@vectorstoremodel(collection_name="documents")
@dataclass
class DocumentRecord:
    # Rich metadata
    id: Annotated[str, VectorStoreField('key')]
    title: Annotated[str, VectorStoreField('data', is_indexed=True, is_full_text_indexed=True)]
    content: Annotated[str, VectorStoreField('data', is_full_text_indexed=True)]
    category: Annotated[str, VectorStoreField('data', is_indexed=True)]
    tags: Annotated[list[str], VectorStoreField('data', is_indexed=True)]
    created_date: Annotated[datetime, VectorStoreField('data', is_indexed=True)]
    confidence_score: Annotated[float, VectorStoreField('data', is_indexed=True)]
    
    # Multiple vectors for different purposes
    content_vector: Annotated[list[float] | str | None, VectorStoreField(
        'vector', 
        dimensions=1536,
        storage_name="content_embedding",
        embedding_generator=OpenAITextEmbedding(ai_model_id="text-embedding-3-small")
    )] = None
    
    title_vector: Annotated[list[float] | str | None, VectorStoreField(
        'vector',
        dimensions=1536, 
        storage_name="title_embedding",
        embedding_generator=OpenAITextEmbedding(ai_model_id="text-embedding-3-small")
    )] = None

    def __post_init__(self):
        if self.content_vector is None:
            self.content_vector = self.content
        if self.title_vector is None:
            self.title_vector = self.title

Better Connector Experience

We’ve also streamlined the connector imports and naming. Everything is now logically organized under semantic_kernel.connectors:

# Clean, consistent imports
from semantic_kernel.connectors.azure_ai_search import AzureAISearchStore
from semantic_kernel.connectors.chroma import ChromaVectorStore
from semantic_kernel.connectors.pinecone import PineconeVectorStore
from semantic_kernel.connectors.qdrant import QdrantVectorStore

# Or use the convenient lazy loading
from semantic_kernel.connectors.memory import (
    AzureAISearchStore,
    ChromaVectorStore,
    PineconeVectorStore,
    QdrantVectorStore
)

Real-World Example: Complete Implementation

Here’s how a complete example looks with the new architecture:

The New Way (Simple and Powerful)

from semantic_kernel.data.vector import VectorStoreField, vectorstoremodel
from semantic_kernel.connectors.in_memory import InMemoryCollection
from semantic_kernel.connectors.ai.open_ai import OpenAITextEmbedding
from typing import Annotated
from dataclasses import dataclass

@vectorstoremodel(collection_name="knowledge_base")
@dataclass
class KnowledgeBase:
    id: Annotated[str, VectorStoreField('key')]
    content: Annotated[str, VectorStoreField('data', is_full_text_indexed=True)]
    category: Annotated[str, VectorStoreField('data', is_indexed=True)]
    vector: Annotated[list[float] | str | None, VectorStoreField(
        'vector', 
        dimensions=1536,
        embedding_generator=OpenAITextEmbedding(ai_model_id="text-embedding-3-small")
    )] = None

    def __post_init__(self):
        if self.vector is None:
            self.vector = self.content

# Create collection with automatic embedding
async with InMemoryCollection(record_type=KnowledgeBase) as collection:
    await collection.ensure_collection_exists()
    
    # Add documents (embeddings created automatically)
    docs = [
        KnowledgeBase(id="1", content="Semantic Kernel is awesome", category="general"),
        KnowledgeBase(id="2", content="Python makes AI development easy", category="programming"),
    ]
    await collection.upsert(docs)
    
    # Search with intelligent filtering
    results = await collection.search(
        "AI development", 
        top=5,
        filter=lambda doc: doc.category == "programming"
    )
    
    # Create kernel search function
    search_func = collection.create_search_function("knowledge_search", search_type="vector")
    kernel.add_function(plugin_name="kb", function=search_func)

What This Means for Your Projects

This update brings several concrete benefits:

  • Faster Development: Less boilerplate, more focus on your AI logic
  • Better Maintainability: Clearer code that’s easier to understand and modify
  • Enhanced Performance: Built-in optimizations and batch operations
  • Future-Proof Architecture: Aligned with .NET SDK for consistent cross-platform development
  • Richer Functionality: Hybrid search, advanced filtering, and integrated embeddings

Ready to Upgrade?

The migration path is well-documented (here), and the benefits are immediate. Check out the comprehensive migration guide and explore the updated samples in samples/concepts/memory/ to see these changes in action.

This release represents a significant step forward in making vector search more accessible and powerful while maintaining the flexibility developers need for sophisticated AI applications.

As part of this release we have also marked the following things as deprecated, MemoryStore abstractions, MemoryStore implementations, Semantic Text Memory and the TextMemoryPlugin. The connectors have been moved to semantic_kernel.connectors.memory_stores so that you can still find them if you really need them, otherwise they will be removed in August.

The future of vector search in Semantic Kernel Python is here. 🌟


Ready to experience the new vector store architecture? Update to Semantic Kernel Python 1.34 and start building with the improved API today.

Author

Eduard van Valkenburg
Senior Software Engineer

Senior Software Engineer - Semantic Kernel Python

0 comments