Under the hood of the new Azure Functions extension for Azure Cosmos DB
The Azure Cosmos DB Azure Functions extension version 4 is now GA, and it packs a load of improvements and new features. The goal of this post is to go deep into the technical details about the extension changes and guide you to take advantage of them.
As a starting point, everything I’ll be talking about in the article is available on the GitHub repository where the extension source code is developed. That is the best place to post and share your feedback. The following sections will point to the source code as reference for those curious on how each feature works.
Managed identity authentication
Azure resources support managed identities, and Azure Functions is not the exception. Our Azure Function can have a system assigned managed identity that represents it in the Azure Active Directory tenant for our subscription.
If we combine this with Azure Cosmos DB’s role-based access control, we can assign data access permissions to the Function’s identity directly. Because the new extension is using the latest Azure Cosmos DB .NET SDK, which supports authenticating using TokenCredentials, we can make this work!
The new extension changes the “ConnectionStringSetting” property for a “Connection” one, the big difference (besides the rename) is that the value can point to configuration that specifies different authentication methods. The extension code will obtain the value of the configuration from the environment and parse it. If the configuration is a Connection String, it will use that value to create the CosmosClient. If the configuration is a managed identity, it will call the AzureComponentFactory to resolve the configuration into a TokenCredential, and create the CosmosClient with that.
The steps to make it work are:
- Enable the system assigned identity on the Azure Function
- Assign the desired permissions by giving this identity a role
- Add the Azure Functions Configuration for the name of the “Connection” property in the Function that contains the endpoint (
<ConnectionPropertyValue>__accountEndpoint) and credential (
<ConnectionPropertyValue>__credential). Using the special value “managedidentity” as credential tells the Azure Functions runtime to use the system assigned identity. The below screenshot shows the example of a Function with a “Connection” property with value “MyConnection” and connecting to an Azure Cosmos DB account with the endpoint “https://mycosmosdbaccount.documents.azure.com:443/”:
The Azure Functions trigger for Azure Cosmos DB (AKA CosmosDBTrigger) leverages the Change Feed Processor to dynamically distribute work across Function instances. This component handles work mainly as a background process. As any background process, it can run into failure conditions or problems. Because these failures do not happen on the context where the Function code runs, they are not easily viewable.
- Identify processing failures in your Function code
- Identify connectivity issues affecting the CosmosDBTrigger operations
- Know when an instance started or stopped processing changes
- Know when the CosmosDBTrigger has delivered data to your Function code
- Obtain detailed network diagnostics to troubleshoot latency
These logs are critical to troubleshoot different scenarios. Remember to enable the extension specific logs to gain access to these insights.
The previous extension version used the Azure Cosmos DB SDK V2 (depending on Microsoft.Azure.DocumentDB.Core package). This SDK used Newtonsoft.Json as serialization engine. The new extension, which uses the Azure Cosmos DB SDK V3 (Microsoft.Azure.Cosmos package) also uses Newtonsoft.Json as serialization engine but it gives the user the freedom of replacing the engine with any other. The V3 SDK allows users to provide a custom implementation of the CosmosSerializer class with any serialization technology they want, for example, System.Text.Json.
To achieve this, you just need to declare, on your
FunctionsStartup class, an implementation of the
ICosmosDBSerializerFactory interface that returns instances of your custom serializer:
When the internal
CosmosClient instances are created, they will use your factory to customize the Serializer. Any POCO type used in the Trigger, Input or Output bindings will now be deserialized with your custom serializer.
Which leads us to another change: You are no longer limited to using the
Document type for your Trigger definitions, you can now use the POCO type of your choice that matches your serialization engine, for example:
Another common scenario is having multiple Functions working with the same Azure Cosmos DB account. There are several monitoring options for your account. With some of them, you can identify which application is performing those operations through the User Agent. By using the new extension version, you can now add a custom identifier to the requests made by each Function and the identifier will appear in the User Agent in the monitoring options!
Just go to your host.json in your Function project and add the
userAgentSuffix on the
The CosmosDBTrigger does not retry on failed executions by default, this design aligns with other event-based triggers (like Event Hub). Using a “poison queue” is often a general recommendation. With this approach, all the documents that failed processing are sent to the queue and retried or investigated afterwards, but there is another alternative. Azure Functions added support for Retry Policies, which can be defined at the Function level using decorators that currently support fixed delay and exponential delay in-between retries. The new extension version added support for these Retry Policies.
Keep in mind that this retry won’t only retry on individual failed documents, but the entire batch. If your Trigger delivered 100 events, and your code threw an unhandled exception on number 50, the retry will execute the Function again with all 100 events as input. So, be aware of potential duplicate processing when writing your Function.
Using the CosmosClient directly
Because the extension let’s you access the
CosmosClient directly, you can also take advantage of all the available new SDK features:
How do I migrate?
Given this is a major version change of a package, it might contain some breaking changes in your code. The best migration path is:
- Update the extension package (or bundle) to the latest available version 4.X
- Apply attribute renames:
- Anything named Collection is now Container
- UseMultipleWriteLocations is no longer needed, it is automatically detected
- UseDefaultJsonSerialization is no longer needed, you can fully customize the serialization if you want
- ConnectionStringSetting is now Connection
- Use the Azure Cosmos DB .NET SDK migration guide for moving from V2 .NET SDK types to V3 .NET SDK types.
- Azure Friday – Azure Functions with AAD support
- Azure Functions trigger for Cosmos DB samples
- Azure Functions output binding for Cosmos DB samples
- Minimum required permissions for RBAC and Azure Functions