Taking the EF Core Azure Cosmos DB Provider for a Test Drive

Jeremy

The release of EF Core 6.0 is right on the horizon (as I write this). The team has been hard at work adding features. One area of focus is the Azure Cosmos DB experience. We received feedback that many developers would prefer to use the provider for Cosmos DB but are waiting for certain key features.

Planetary docs

I built a reference app that uses Azure Cosmos DB with EF Core on Blazor Server. It includes search capability, cross-referenced entities, and an interface to create, read, and update. I recently upgraded to the latest EF Core 6.0 version and was able to simplify and remove quite a bit of code!

Screenshot of Planetary Docs

Feature overview

Here are some of the features requested that we added to the EF Core 6.0 Azure Cosmos DB provider.

Implicit ownership

EF Core was built as an object relational mapper. In relational databases, complex relationships are expressed by storing related entities in separate tables and referencing them with foreign keys. EF Core assumes non-primitive entity types encountered in a parent are expressed as foreign key relationships. The relationships are configured using HasMany or HasOne and the instances are assumed to exist independently with a configured relationship. In document databases, the default behavior for entity types is to assume they are embedded documents owned by the parent. In other words, the complex type’s data exists within the context of the parent. In previous versions of EF Core, this behavior had to be configured explicitly for it to work with the Azure Cosmos DB provider. In EF Core 6.0, ownership is implicit. This saves configuration and ensures the behavior is consistent with NoSQL approaches from other providers.

For example, in Planetary Docs there are authors and tags. The entities “own” a list of summaries that point to the URL and titles of related documents. This way, when a user asks “What documents have tag X” I only need one document loaded to answer the question (I load tag X, then iterate its owned collection of titles). Using EF Core 5, I had to explicitly claim ownership:

tagModel.OwnsMany(t => t.Documents);
authorModel.OwnsMany(t => t.Documents);

In EF Core 6, the ownership is implicit so there is no need to configure the entities except to specify partition keys.

Support for primitive collections

In relational databases, primitive collections are often modeled by either promoting them to complex types or converting them to a serialized artifact to store in a single column. Consider a blog post that can have a list of tags. One common approach would be to create an entity that represents a tag:

public class Tag 
{
    public int Id { get; set; }
    public string Text { get; set; }
}

The tag is then referenced:

public ICollection<Tag> Tags { get; set; }

The primitive is promoted to a complex type and stored in a separate table. An alternative is to collapse the tags into a single field that contains a comma-delimited list. This approach requires a value converter to marshal the list into the field for updates and decompose the field into the list for read. It also makes it difficult and expensive to answer questions like, “How many posts are tagged X?” Using EF Core 5, I chose the single column approach. I serialized the list to JSON when writing and deserialized when reading. This is the serialization code:

private static string ToJson<T>(T item) => JsonSerializer.Serialize(item);
private static T FromJson<T>(string json) => JsonSerializer.Deserialize<T>(json);

I configured EF Core to make the conversions:

docModel.Property(d => d.Tags)
    .HasConversion(
        t => ToJson(t),
        t => FromJson<List<string>>(t));

And the resulting document looked like this:

{
    "tags" : "[\"one\", \"two\", \"three\"]"
}

With EF Core 6.0, I simply deleted the code to take advantage of the built-in handling of primitive types. This results in a document like this:

{
    "tags" : [ 
        "one",
        "two",
        "three"
    ]
}

This results in a schema change that Azure Cosmos DB has no problem handling. The C# code, on the other hand, will throw when a current model using tags as an array encounters a legacy record that used tags as a field. How do we handle this when EF Core doesn’t have the concept of NoSQL migrations?

Raw SQL

A popular request is to allow developers to write their own SQL for data access. This is exactly the feature I needed to handle my code migration. For the raw SQL to work, it must project to an existing model. It is an extension of the DbSet<T> for the entity. In my case, it enabled an in-place migration. After updating the code, attempting to load a document would fail. The document had a single string property for “tag” but the C# model is an array, so the JSON serializer would throw an exception. To remedy this, I used a built-in feature of Azure Cosmos DB that will parse a string into an array. Using a query, I project the entity to a document that matches the current schema and then save it back. This is the migration code:

var docs = await Documents.FromSqlRaw(
    "select c.id, c.Uid, c.AuthorAlias, c.Description, c.Html, c.Markdown, c.PublishDate, c.Title, STRINGTOARRAY(c.Tags) as Tags from c").ToListAsync();
foreach (var doc in docs)
{
    Entry(doc).State = EntityState.Modified;
}

This feature empowers developers to craft complex queries that may not be supported by the LINQ provider.

Additional enhancements

In addition to what I already covered, these enhancements also made it in.

Summary

I’m excited about the changes coming and hope that you are, too. Are you using the Cosmos DB provider? Are you considering it now that we’ve added these features? Is there something critical you need that we missed? Let me know in the comments below. Thank you!

8 comments

Comments are closed. Login to edit/delete your existing comments

  • Eric Blankenburg

    Thank you for supporting arrays of primitive types. It’s critical for what we need.

    Question — Cosmos has a private preview of hierarchical partitioning keys. This is another thing we need. When can we expect that in Entity Framework?

    • Jeremy LiknessMicrosoft employee

      Our pleasure!

      We will support features that are out of preview when EF Core is out of preview. Do you know if the Cosmos team has an ETA? I can connect with them on my end but am asking in case you know already based on your interactions.

      The best way to get it on the roadmap is to file an issue on the EFCore repository.

      https://github.com/dotnet/efcore/issues/new/choose

      We typically prioritize issues with the most upvotes as signal for reach of the feature. We are always open to compelling reasons to implement certain features and will consider those in our roadmap. My suggestion is file the issue, share the business case of why it’s important, and we’ll take it into EF Core 7 planning. We always publish the plan once it’s ready so you have a clear indication of how we’re prioritizing features going into the release.

      If it’s feasible it might be able to get implemented as a community OSS extension, the same way you can use SQL hierarchical id. Community includes our team as we work on out of band projects like the ones at https://github.com/efcore.

  • Joris Kommeren

    Sounds good! Nice article.

    I’m curious though, would someone like to tell me why they’re choosing cosmos as a provider over a sql database for EF? Genuinely interested in use cases!

  • Midnight

    Looks like a typo near

    authorModel.OwnsMany(t => t.Documentts);
  • Nirmal Prabhu

    Does EF Core 6 offer support for continuation token with Cosmos DB for pagination? We are lacking this functionality with EF Core 5.

  • Chris DaMour

    any updates on support for heterogenous/polymorphic support for collections? hoping primitive is just a stepping stone