Distributed tracing with Azure Functions Event Grid triggers

Liudmila

The Azure Event Grid client libraries support distributed tracing for the CloudEvents schema. They populate the Distributed Tracing extension that allows connecting event consumer telemetry to producer calls. The Event Grid documentation shows how to enable tracing in the producer. It also shows how to configure the Event Hubs or Service Bus subscription.

Azure Functions supports distributed tracing with Azure Monitor, which includes built-in tracing of executions and bindings, performance monitoring, and more.

Microsoft.Azure.WebJobs.Extensions.EventGrid package version 3.1.0 or later enables correlation for CloudEvents between producer calls and Functions Event Grid trigger executions as shown on the screenshot below.

Azure portal screenshot showing Azure Function consumer calls linked to producer

If you use an Event Grid trigger with the CloudEvents schema, you only need to update the Microsoft.Azure.WebJobs.Extensions.EventGrid package to the latest version. In other cases, Functions still trace trigger calls so you can monitor how events are processed. However, the connection between send calls on producer and Function trigger calls is missing in Azure Monitor’s end-to-end trace.

In this post, I’ll show how to enable correlation in .NET and Java Functions for:

Get started

Before enabling correlation, make sure you get telemetry from producer and Functions in Azure Monitor:

  1. If you publish CloudEvents using Azure Event Grid client library version 4 with distributed tracing enabled, it:

  2. If you use another library to publish events and it doesn’t support tracing, see the example of manual instrumentation for Event Grid schema below.

  3. Enable tracing on Azure Functions.

  4. If you use Event Grid trigger and CloudEvents schema in your Functions, update the Microsoft.Azure.WebJobs.Extensions.EventGrid package to version 3.1.0. You don’t need to change your Functions code to enable correlation.

Connect send and Function calls

To correlate Azure Function executions to producer traces, we’re going to read context from the event and populate it on Azure Functions telemetry. For CloudEvents, we’ll just read it from the Distributed Tracing extension. For the Event Grid schema, we’ll have to read and write custom properties in the data payload.

.NET

Azure Functions uses System.Diagnostics.Activity under-the-hood. We’ll update the Activity created by Functions to link the producer context. Links connect independent traces together in the Azure Monitor UX.

CloudEvents with webhook (HTTP) trigger

  1. Here, we extract the context from the CloudEvent and link it to the Activity. It’s done in the Azure Monitor-specific format:

    using System.Collections.Generic;
    using System.Diagnostics;
    using System.Text.Json;
    // code omitted for brevity
    
    public static void LinkContext(this Activity activity, 
        IDictionary<string, object> extensionAttributes)
    {
       if (activity != null &&
           extensionAttributes.TryGetValue("traceparent", out var tp) &&
           tp is string traceparent &&
           IsValidTraceparent(traceparent))
       {
           var link = new AzureMonitorLink
           (
               // parse traceparent according to https://www.w3.org/TR/trace-context/
               traceparent.Substring(3, 32), // traceId
               traceparent.Substring(36, 16) // spanId
           );
    
           // consider formatting JSON manually for best performance
           activity.AddTag("_MS.links", JsonSerializer.Serialize(new[] { link }));
    
           if (extensionAttributes.TryGetValue("tracestate", out var ts) &&
               ts is string tracestate)
           {
               activity.TraceStateString = tracestate;
           }
       }
    }
    
    private static bool IsValidTraceparent(string traceparent) => traceparent != null && 
       traceparent.StartsWith("00-") && traceparent.Length == 55;
    
    // Property names match Azure Monitor's over-the-wire format. Don't change them.
    private readonly record struct AzureMonitorLink(string operation_Id, string id);
  2. Now we need to call the LinkContext method in the Function execution as early as possible.

    using Azure.Messaging;
    using System.Diagnostics;
    // code omitted for brevity
    
    [FunctionName("MyFunction")]
    public static async Task<IActionResult> RunAsync(
            [HttpTrigger(AuthorizationLevel.Anonymous, "POST", "OPTIONS", Route = "handler")] HttpRequest req,
            ILogger log)
    {
       // handshake, code omitted for brevity
    
       var @event = CloudEvent.Parse(BinaryData.FromStream(req.Body));
    
       // Activity can be null if Azure Monitor isn't enabled.
       Activity.Current?.LinkContext(@event.ExtensionAttributes);
       // ...
    }
  3. Deploy your Function and trigger it. It takes up to 5 minutes for data to propagate and become accessible through the Azure portal.

  4. View your traces in the Azure portal by navigating to Transaction search on the left for your Application Insights resource. Select See all data in the last 24 hours.

The following Azure portal screenshot shows Azure Function consumer calls linked to the producer:

Azure portal screenshot showing Azure Function consumer calls linked to producer

CloudEvents: batch processing

With batching enabled on Event Grid subscription, we’ll receive multiple events at once and link each of them.

  1. Let’s add a LinkContext method that populates several links to Activity. Adding multiple tracestate properties is unsupported, but it doesn’t affect correlation between producer and Function.

    using Azure.Messaging;
    using System.Collections.Generic;
    using System.Diagnostics;
    using System.Linq;
    using System.Text.Json;
    // code omitted for brevity
    
    public static void LinkContext(this Activity activity, IEnumerable<CloudEvent> events)
    {
       if (activity != null && events.Any())
       {
           var links = new List<AzureMonitorLink>();
           foreach (CloudEvent @event in events)
           {
               if (@event.ExtensionAttributes.TryGetValue("traceparent", out var tp) && tp is string traceparent &&
                   IsValidTraceparent(traceparent))
               {
                   links.Add(new AzureMonitorLink(
                               traceparent.Substring(3, 32), // traceId
                               traceparent.Substring(36, 16))); // spanId
    
                   // multiple tracestates are not currently supported.
               }
           }
    
           activity.AddTag("_MS.links", JsonSerializer.Serialize(links));
       }
    }
  2. Now we need to call the LinkContext method in Function execution as soon as events are deserialized.

    using Azure.Messaging;
    using System.Diagnostics;
    // code omitted for brevity
    
    [FunctionName("BatchWebhook")]
    public static async Task<IActionResult> RunAsync(
            [HttpTrigger(AuthorizationLevel.Anonymous, "POST", "OPTIONS", Route = "handler")] HttpRequest req,
            ILogger log)
    {
       // handshake, code omitted for brevity
    
       var events = CloudEvent.ParseMany(BinaryData.FromStream(req.Body));
       Activity.Current?.LinkContext(events);
       // ...
    }

Event Grid schema

The Event Grid schema doesn’t have dedicated properties to propagate trace context. We’ll use custom properties, inject them into the event on the producer side, and then read them in the Function. We’ll use the same approach to link producer trace context to Activity created by Azure Functions.

  1. Update your Event Grid data model definition to include traceparent and tracestate properties:

    using System.Text.Json.Serialization;
    // code omitted for brevity
    
    internal readonly record struct EventGridData
    {
        [JsonPropertyName("traceparent")]
        public string Traceparent { get; init; }
    
        [JsonPropertyName("tracestate")]
        public string Tracestate { get; init; }
    
        // code omitted for brevity
    }
  2. On the producer side, we’ll create a new Activity and add traceparent and tracestate to event data.

    • If you use the Application Insights SDK, track the new DependencyTelemetry using the StartOperation method. It will create a new Activity under-the-hood. Inject the context of the new Activity to the event data.

      using Azure.Messaging.EventGrid;
      using Microsoft.ApplicationInsights;
      using Microsoft.ApplicationInsights.DataContracts;
      using System.Diagnostics;
      // code omitted for brevity
      
      using (var sendEventDependency = 
          telemetryClient.StartOperation<DependencyTelemetry>("Send Event Grid event"))
      {
       sendEventDependency.Telemetry.Type = "InProc";
       var eventData = new EventGridData
       {
           Traceparent = Activity.Current.Id,
           Tracestate = Activity.Current.TraceStateString
       };
      
       var @event = new EventGridEvent("subject", "type", "data-version", eventData);
       await eventCollector.AddAsync(@event);
      }
    • If you use OpenTelemetry (experimental support), create a new Activity using custom ActivitySource. For more information on using ActivitySource, see Adding distributed tracing instrumentation. As a note, Activity can be null here.

      using Azure.Messaging.EventGrid;
      using System.Diagnostics;
      // code omitted for brevity
      
      // make sure to enable this ActivitySource when configuring OpenTelemetry
      private static ActivitySource source = new ActivitySource("MyEventGridProducer");
      // code omitted for brevity
      
      using (var sendActivity = source.StartActivity("Send Event Grid event"))
      {
       var eventData = new EventGridData
       {
           Traceparent = sendActivity?.Id,
           Tracestate = sendActivity?.TraceStateString
       };
      
       var @event = new EventGridEvent("subject", "type", "data-version", eventData);
       await publisherClient.SendEventAsync(@event);
      }
  3. Azure Functions consumer changes are similar to the CloudEvents example above. The only difference is how traceparent and tracestate are obtained from the data property. Modify this code to use your data model definition.

    using System.Diagnostics;
    using System.Text.Json;
    using System.Text.Json.Serialization;
    // code omitted for brevity
    
    public static void LinkContext(this Activity activity, EventGridData eventData)
    {
       if (activity != null && IsValidTraceparent(eventData.Traceparent))
       { 
           var link = new AzureMonitorLink(eventData.Traceparent.Substring(3, 32), 
                                           eventData.Traceparent.Substring(36, 16));
    
           // consider formatting JSON manually for best performance
           activity.AddTag("_MS.links", JsonSerializer.Serialize(new[] { link }));
           activity.TraceStateString = eventData.Tracestate;
       }
    }
  4. Call LinkContext method in Function execution as early as possible.

    using Azure.Messaging;
    using System.Diagnostics;
    
    // code omitted for brevity
    
    [FunctionName("EventGridFunction")]
    public void RunEventGrid([EventGridTrigger] EventGridEvent @event, ILogger log)
    {
       Activity.Current?.LinkContext(@event.Data.ToObjectFromJson<EventGridData>());
       // ...
    }

The following screenshot shows Azure Function consumer calls correlated with producer in the Transaction diagnostics. In this case, the producer is instrumented with OpenTelemetry:

Azure portal screenshot showing Azure Function consumer calls linked to producer

Java

Azure Functions supports distributed tracing for bindings in Java without extra configuration. Azure Monitor preview support enables collection of custom and rich telemetry from Java Functions. We’ll need it to correlate Azure Functions and event producer.

  1. Enable Azure Monitor for Java Function apps (preview)

  2. Add a dependency on the OpenTelemetry API package: io.opentelemetry:opentelemetry-api. For more information, see OpenTelemetry documentation.

  3. Obtain an OpenTelemetry tracer instance. We’ll use it to start a new span.

    import io.opentelemetry.api.GlobalOpenTelemetry;
    import io.opentelemetry.api.trace.Tracer;
    // code omitted for brevity
    
    private final static Tracer TRACER = GlobalOpenTelemetry.getTracer("my-function");

The examples below use com.azure.core.models.CloudEvent and com.azure.messaging.eventgrid.EventGridEvent models from Azure SDKs. You can get them by adding a dependency on com.azure:azure-messaging-eventgrid.

If you use different implementations, you might need to adjust these examples for your use case.

CloudEvents schema

The Microsoft.Azure.WebJobs.Extensions.EventGrid package (version 3.1.0 or later) enables correlation for CloudEvents within Event Grid triggers on Java workers. Check if the extension bundle you use includes this support. You may also update Microsoft.Azure.WebJobs.Extensions.EventGrid by switching to explicit extension installation; however, using extension bundles is recommended.

If you can’t update Microsoft.Azure.WebJobs.Extensions.EventGrid yet, you can still enable correlation using the following steps:

  1. Add a helper class instance that writes the trace context to EventGridEvent:

    import io.opentelemetry.context.propagation.TextMapGetter;
    // code omitted for brevity
    
    private static final Iterable<String> KEYS = List.of("traceparent", "tracestate");
    private static final TextMapGetter<Map<String, Object>> CLOUD_EVENT_GETTER =
            new TextMapGetter<Map<String, Object>>() {
        @Override
        public Iterable<String> keys(Map<String, Object> carrier) { return KEYS; }
    
        @Override
        public String get(Map<String, Object> carrier, String key) { 
            return carrier.get(key).toString(); 
        }
    };
  2. We’ll read events from string input here. Since there could be a batch of events, depending upon Event Grid subscription configuration, we’ll get the trace context from each of them. We can’t modify telemetry reported by the Azure Functions runtime here. So we’ll create a new span and link trace contexts from all of the events.

    import com.azure.core.models.CloudEvent;
    import io.opentelemetry.api.trace.Span;
    import io.opentelemetry.api.trace.SpanBuilder;
    import io.opentelemetry.api.trace.StatusCode;
    import io.opentelemetry.api.trace.propagation.W3CTraceContextPropagator;
    import io.opentelemetry.context.Context;
    import io.opentelemetry.context.Scope;
    // code omitted for brevity
    
    @FunctionName("CloudEvent")
    public void processCloudEvents(@EventGridTrigger(name="eventsStr") String eventsStr, 
               final ExecutionContext context) {
       List<CloudEvent> events = CloudEvent.fromString(eventsStr);
    
       SpanBuilder spanBuilder = TRACER.spanBuilder("Process CloudEvents");
    
       events.stream().forEach(event -> {
           // extract trace context from the event using OpenTelemetry propagator.
           Context eventContext = W3CTraceContextPropagator.getInstance().
                   extract(Context.current(), event.getExtensionAttributes(), CLOUD_EVENT_GETTER);
    
           spanBuilder.addLink(Span.fromContext(eventContext).getSpanContext());
       });
    
       Span span = spanBuilder.startSpan();
       try (Scope scope = span.makeCurrent()) {
    
           // process events here
    
       } catch (Throwable t) {
           span.setStatus(StatusCode.ERROR);
           throw t;
       } finally {
           span.end();
       }
    }

If you’d like to trace each event in the batch separately, you can modify this example to create a span for each event. We’re rethrowing the exception here. Azure Functions will record it. If you don’t want to rethrow the exception, you’ll probably want to record it with span.recordException(ex).

Event Grid schema

Similarly to the .NET example, we’ll use custom properties in event data. Those properties will be populated on the event producer. On the consumer side, we’ll start a new span and link it to the producer trace context.

  1. Update your Event Grid data model definition to include traceparent and tracestate properties:

    import com.fasterxml.jackson.annotation.JsonProperty;
    // code omitted for brevity
    
    static class EventGridData {
        @JsonProperty("traceparent")
        public String traceparent;
    
        @JsonProperty("tracestate")
        public String tracestate;
    
        // code omitted for brevity
    }
  2. Add a helper class instance that reads trace context from EventGridData:

    import io.opentelemetry.context.propagation.TextMapSetter;
    // code omitted for brevity
    
    private static final TextMapSetter<EventGridData> EVENT_GRID_SETTER =
            new TextMapSetter<EventGridData>() {
        @Override
        public void set(EventGridData carrier, String key, String value) {
            if ("traceparent".equals(key)) {
                carrier.traceparent = value;
            } else if ("tracestate".equals(key)) {
                carrier.tracestate = value;
            }
        }
    };
  3. Add traceparent and tracestate to event data on the producer side:

    private void sendEventGridEvent() {
       // change this to your event data model
       EventGridData eventData = new EventGridData();
    
       Span span = TRACER.spanBuilder("Send Event Grid event").startSpan();
       try (Scope unused = span.makeCurrent()) {
           // inject context into EventGridData
           W3CTraceContextPropagator.getInstance().inject(Context.current(), 
                eventData, EVENT_GRID_SETTER);
    
           eventGridClient.sendEvent(new EventGridEvent("subject", "type", 
                BinaryData.fromObject(eventData), "data-version"));
       } catch (Throwable t) {
           span.setStatus(StatusCode.ERROR);
           throw t;
       } finally {
           span.end();
       }
    }
  4. Similar to the CloudEvents example, read the trace context from the event data and trace event processing:

    import com.azure.messaging.eventgrid.EventGridEvent;
    // code omitted for brevity
    
    @FunctionName("EventGridEvent")
    public void processEventGridEvents(@EventGridTrigger(name="eventsStr") String eventsStr,
                final ExecutionContext context) {
       List<EventGridEvent> events = EventGridEvent.fromString(eventsStr);
    
       SpanBuilder spanBuilder = TRACER.spanBuilder("Process EventGridEvents");
    
       events.stream().forEach( event -> {
           EventGridData data = event.getData().toObject(EventGridData.class);
           Context eventContext = W3CTraceContextPropagator.getInstance().
                   extract(Context.current(), data, EVENT_GRID_GETTER);
    
           spanBuilder.addLink(Span.fromContext(eventContext).getSpanContext());
       });
    
       Span span = spanBuilder.startSpan();
       try (Scope scope = span.makeCurrent()) {
    
           // process events here
    
       } catch (Throwable t) {
           span.setStatus(StatusCode.ERROR);
           throw t;
       } finally {
           span.end();
       }
    }
    
    private static final TextMapGetter<EventGridData> EVENT_GRID_GETTER = 
           new TextMapGetter<EventGridData>() {
       @Override
       public Iterable<String> keys(EventGridData carrier) { return KEYS; }
    
       @Override
       public String get(EventGridData carrier, String key) {
           if ("traceparent".equals(key)) {
               return carrier.traceparent;
           } else if ("tracestate".equals(key)) {
               return carrier.tracestate;
           }
           return null;
       }
    };

The following screenshot shows Azure Function consumer calls correlated with the producer in the Transaction viewer. It shows the case in which batching is configured on the Event Grid subscription. The Functions runtime (with Microsoft.Azure.WebJobs.Extensions.EventGrid version 2) tracks a single Function execution that results in three calls to the Java worker.

Azure portal screenshot showing Azure Function consumer calls linked to producer

Want to hear more?

Thanks for reading this Azure SDK blog post. What do you think of distributed tracing in the Azure SDK? We’re actively seeking feedback on this feature, so let us know!

0 comments

Leave a comment