Improvements in .NET Core 3.0 for troubleshooting and monitoring distributed apps

Sergey Kanzhelev

Thank you David Fowler and Richard Lander for reviews.

Introduction

Operating distributed apps is hard. Distributed apps typically consists of multiple components. These components may be owned and operated by different teams. Every interaction with an app results in distributed trace of code executions across many components. If your customer experiences a problem – pinpointing the root cause in one of components participated in a distributed trace is a hard task.

One big difference of distributed apps comparing to monoliths is a difficulty to correlate telemetry (like logs) across a single distributed trace. Looking at logs you can see how each component processed each request. But it is hard to know which request in once component and request in other component belong to the same distributed trace.

Historically, Application Performance Monitoring (APM) vendors provided the functionality of distributed trace context propagation from one component to another. Telemetry is correlated using this context. Due to heterogeneous nature of many environments, with components owned by different teams and using different tools for monitoring, it was always hard to instrument distributed apps consistently. APM vendors provided automatic code injection agents and SDKs to handle complexity of understanding various distributed context formats and RPC protocols.

With the upcoming transition of W3C Trace Context specification into Proposed Recommendation maturity level, and support of this specification by many vendors and platforms, the complexity of the context propagation is decreasing. The W3C Trace Context specification describes semantics of the distributed trace context and its format. This ensures that every component in a distributed app may understand this context and propagate it to components it calls into.

Microsoft is working on making distributed apps development easier with many ongoing developments like Orleans framework and project Dapr. As for distributed trace context propagation – Microsoft services and platforms will be adopting a W3C Trace Context format.

We believe that ASP.NET Core must provide an outstanding experience for building distributed tracing apps. With every release of ASP.NET Core we execute on this promise. This post describes the scenario of distributed tracing and logging highlighting improvements in .NET Core 3.0 and talks about discussions of a new exciting features we plan to add going forward.

Distributed Tracing and Logging

Let’s explore distributed tracing in .NET Core 3.0 and improvements recently made. First, we’ll see how two “out of the box” ASP.NET Core 3.0 apps has logs correlated across the entire distributed trace. Second, we’ll explore how easy it is to set distributed trace context for any .NET Core application and how it will automatically be propagated across http. And third, we’ll see how the same distributed trace identity is used by telemetry SDKs like OpenTelemetry and ASP.NET Core logs.

This demo will also demonstrate how .NET Core 3.0 embraces W3C Trace Context standard and what other features it offers.

Demo set up

In this demo we will have three simple components: ClientApp, FrontEndApp and BackEndApp.

BackEndApp is a template ASP.NET Core application called WeatherApp. It exposes a REST API to get a weather forecast.

FrontEndApp proxies all incoming requests into the calls to BackEndApp using this controller:

[ApiController]
[Route("[controller]")]
public class WeatherForecastProxyController : ControllerBase
{
    private readonly ILogger<WeatherForecastProxyController> _logger;
    private readonly HttpClient _httpClient;

    public WeatherForecastProxyController(
        ILogger<WeatherForecastProxyController> logger, 
        HttpClient httpClient)
    {
        _logger = logger;
        _httpClient = httpClient;
    }

    [HttpGet]
    public async Task<IEnumerable<WeatherForecast>> Get()
    {
        var jsonStream = await 
                  _httpClient.GetStreamAsync("http://localhost:5001/weatherforecast");

        var weatherForecast = await 
              JsonSerializer.DeserializeAsync<IEnumerable<WeatherForecast>>(jsonStream);

        return weatherForecast;
    }
}

Finally, ClientApp is a .NET Core 3.0 Windows Forms app. ClientApp calls into FrontEndApp for the weather forecast.

private async Task<string> GetWeatherForecast()
{
    return await _httpClient.GetStringAsync(
                                 "http://localhost:5000/weatherforecastproxy");
}

Please note, there were no additional SDKs enabled or libraries installed on demo apps. As the demo progresses – every code change will be mentioned.

Correlated logs

Let’s make the very first call from ClientApp and take a look at the logs produced by FrontEndApp and BackEndApp.

FrontEndApp (a few line breaks added for readability):

info: Microsoft.AspNetCore.Routing.EndpointMiddleware[1]
      => ConnectionId:0HLR1BR0PL1CH 
      => RequestPath:/weatherforecastproxy 
         RequestId:0HLR1BR0PL1CH:00000001, 
         SpanId:|363a800a-4cf070ad93fe3bd8., 
         TraceId:363a800a-4cf070ad93fe3bd8, 
         ParentId:
Executed endpoint 'FrontEndApp.Controllers.WeatherForecastProxyController.Get (FrontEndApp)'

BackEndApp:

info: BackEndApp.Controllers.WeatherForecastController[0]
      => ConnectionId:0HLR1BMQHFKRL 
      => RequestPath:/weatherforecast 
         RequestId:0HLR1BMQHFKRL:00000002, 
         SpanId:|363a800a-4cf070ad93fe3bd8.94c1cdba_, 
         TraceId:363a800a-4cf070ad93fe3bd8, 
         ParentId:|363a800a-4cf070ad93fe3bd8. 
Executed endpoint 'FrontEndApp.Controllers.WeatherForecastController.Get (BackEndApp)'

Like magic, logs from two independent apps share the same TraceId. Behind the scene, ASP.NET Core 3.0 app will initialize a distributed trace context and pass it in the header. This is how incoming headers to the BackEndApp looks like:

You may notice that FrontEndApp didn’t receive any additional headers:

The reason is that in ASP.NET Core apps – distributed trace being initiated by ASP.NET Core framework itself on every incoming request. Next section explains how to do it for any .NET Core 3.0 app.

Initiate distributed trace in .NET Core 3.0 app

You may have noticed the difference in behavior of Windows Forms ClientApp and ASP.NET Core FrontEndApp. ClientApp didn’t set any distributed trace context. So FrontEndApp didn’t receive it. It is easy to set up distributed operation. Easiest way to do it is to use an API called Activity from the DiagnosticSource package.

private async Task<string> GetWeatherForecast()
{
    var activity = new Activity("CallToBackend").Start();

    try
    {
        return await _httpClient.GetStringAsync(
                               "http://localhost:5000/weatherforecastproxy");
    }
    finally
    {
        activity.Stop();
    }
}

Once you have started an activity, HttpClient knows that distributed trace context needs to be propagated. Now all three components – ClientApp, FrontEndApp and BackEndApp share the same TraceId.

W3C Trace Context support

You may notice that the context is propagating using the header called Request-Id. This header was introduced in Asp.Net Core 2.0 and is used by default for better compatibility with these apps. However, as the W3C Trace Context specification is being widely adopted, it is recommended to switch to this format of context propagation.

With .NET Core 3.0, it is easy to switch to W3C Trace Context format to propagate distributed trace identifiers. Easiest way to do it is in the main method- just add a simple line in the Main method:

static void Main()
{
    Activity.DefaultIdFormat = ActivityIdFormat.W3C;
    …
    Application.Run(new MainForm());
}

Now, when the FrontEndApp receives requests from the ClientApp, you see a traceparent header in the request:

The ASP.NET Core app will understand this header and recognize that it needs to use W3C Trace Context format for outgoing calls now.

Note, ASP.NET Core apps will recognize the correct format of distributed trace context automatically. However, it is still a good practice to switch the default format of distributed trace context to W3C for better interoperability in heterogeneous environments.

You will see all the logs attributed with the TraceId and SpanId obtained from the incoming header:

info: Microsoft.AspNetCore.Hosting.Diagnostics[1]
      => ConnectionId:0HLQV2BC3VP2T
      => RequestPath:/weatherforecast 
         RequestId:0HLQV2BC3VP2T:00000001, 
         SpanId:da13aa3c6fd9c146, 
         TraceId:f11a03e3f078414fa7c0a0ce568c8b5c, 
         ParentId:5076c17d0a604244
      Request starting HTTP/1.1 GET http://localhost:5000/weatherforecast

Activity and distributed tracing with OpenTelemetry

OpenTelemetry provides a single set of APIs, libraries, agents, and collector services to capture distributed traces and metrics from your application. You can analyze them using Prometheus, Jaeger, Zipkin, and other observability tools.

Let’s enable OpenTelemetry on the BackEndApp. It is very easy to do, just call AddOpenTelemetry on startup:

services.AddOpenTelemetry(b => 
    b.UseZipkin(o => {
                    o.ServiceName="BackEndApp"; 
                    o.Endpoint=new Uri("http://zipkin /api/v2/spans");
               })
     .AddRequestCollector());

Now, as we just saw, TraceId in the FrontEndApp logs will match TraceId in the BackEndApp.

info: Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker[2]
      => ConnectionId:0HLR2RC6BIIVO 
      => RequestPath:/weatherforecastproxy 
         RequestId:0HLR2RC6BIIVO:00000001, 
         SpanId:54e2de7b9428e940, 
         TraceId:e1a9b61ec50c954d852f645262c7b31a, 
         ParentId:69dce1f155911a45 
      => FrontEndApp.Controllers.WeatherForecastProxyController.Get (FrontEndApp)
Executed action FrontEndApp.Controllers.WeatherForecastProxyController.Get (FrontEndApp) in 3187.3112ms

info: Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker[2]
      => ConnectionId:0HLR2RLEHSKBV 
      => RequestPath:/weatherforecast 
         RequestId:0HLR2RLEHSKBV:00000001, 
         SpanId:0e783a0867544240, 
         TraceId:e1a9b61ec50c954d852f645262c7b31a, 
         ParentId:54e2de7b9428e940 
      => BackEndApp.Controllers.WeatherForecastController.Get (BackEndApp)
Executed action BackEndApp.Controllers.WeatherForecastController.Get (BackEndApp) in 3085.9111ms

Furthermore, the same Trace will be reported by Zipkin. So now you can correlate distributed traces collected by your distributed tracing tool and logs from the machine. You can also give this TraceId to the user when ClientApp experience issues. The user can share it with your app support and corresponding logs and distributed traces can be easily discovered across all components.

Taking example one step further – you can easily enable monitoring for all three components and see them on the gantt chart.

ASP.NET Core apps integrates with distributed trace

As we just seen telemetry collected by Application Monitoring vendors is correlated using the same distributed trace context as ASP.NET Core uses. This makes ASP.NET Core 3.0 apps great for the environments where different components are owned by different teams.

Imagine that only two of apps – A and C on the picture below enabled telemetry collection using SDK like OpenTelemetry. Before ASP.NET Core 3.0 it would mean that distributed tracing will not work, and a trace will be “broken” by app B.

With ASP.NET Core 3.0, since in most deployments ASP.NET Core apps are configured with the basic logging enabled, app B will propagate distributed trace context. This distributed traces from A and C will be correlated.

With the example of apps from before – if ClientApp and BackEndApp are instrumented and FrontEndApp is not – you see distributed trace is still being correlated:

This also makes ASP.NET Core apps great for the service mesh environments. In service mesh deployments, A and C from the picture above may represent a service mesh. In order for service mesh to stitch request entering and leaving component B – certain headers have to be propagated by an app. See this note from the Istio for example:

Although Istio proxies are able to automatically send spans, they need some hints to tie together the entire trace. Applications need to propagate the appropriate HTTP headers so that when the proxies send span information, the spans can be correlated correctly into a single trace.

As we work with service mesh authors to adopt W3C Trace Context format, ASP.NET Core apps will “just work” and propagate needed headers.

Passing additional context

Talking about other scenarios, it is often the case that you want to share more context between components in a distributed app. Let’s say a ClientApp wants to send its version so all REST calls will know where the request is coming from. You can add these properties in Activity.Baggage like this:

private async Task<string> GetWeatherForecast()
{
    var activity = new Activity("CallToBackend")
        .AddBaggage("appVersion", "v1.0")
        .Start();

    try
    {
        return await _httpClient.GetStringAsync(
                                         "http://localhost:5000/weatherforecastproxy");
    }
    finally
    {
        activity.Stop();
    }
}

Now on server side you see an additional header Correlation-Context in both – FrontEndApp and BackEndApp.

And you can use the Activity.Baggage to attribute your logs:

var appVersion =  Activity.Current.Baggage.FirstOrDefault(b => b.Key == "appVersion").Value;
using (_logger.BeginScope($"appVersion={appVersion}"))
{
    _logger.LogInformation("this weather forecast is from random source");
}

And you see the scope now contains an appVersion:

info: FrontEndApp.Controllers.WeatherForecastController[0]
      => ConnectionId:0HLQV353507UG
      => RequestPath:/weatherforecast 
         RequestId:0HLQV353507UG:00000001, 
         SpanId:37a0f7ebf3ecac42, 
         TraceId:c7e07b7719a7a3489617663753f985e4, 
         ParentId:f5df77ba38504846
      => FrontEndApp.Controllers.WeatherForecastController.Get (BackEndApp) 
      => appVersion=v1.0
      this weather forecast is from random source

Next steps

With the improvements for ASP.NET Core 3.0 we hear that some of the features included in ASP.NET Core is hard to consume. Developers and DevOps wants a turnkey telemetry solution that will work with many APM vendors. We believe that investments we are making in OpenTelemetry will allow more people to benefit from investments we are making in ASP.NET Core monitoring and troubleshooting. This is one of the big areas of investments for a team.

We also help people adopt W3C Trace Context everywhere and will be making it a default distributed trace context propagation format in future versions of ASP.NET Core.

Another area of investments is to improve distributed context propagation scenarios. Distributed apps comparing to monoliths are lacking common shared state with the lifetime of a single distributed trace. This shared state (or context) can be used for basic logging as was described in this article, as well as for advanced routing of requests, experimentation, A/B testing, business context propagation, etc. Some of scenarios are described in this epic: Distributed Context in ASP.NET and Open Telemetry.

Please send us your feedback and tell what improvements in distributed apps troubleshooting and monitoring we need to make.

20 comments

Discussion is closed. Login to edit/delete existing comments.

  • Thangavel Muthusamy 0

    Thanks for a detailed explanation. Grpc in .net core is using Httpclient, I wonder whether this works with Grpc too.

    • David FowlerMicrosoft employee 0

      Yes it does. It works for GRPC as well.

  • Matthew Lieder 0

    Am I correct to assume that this applies to Application Insights as well? If all my backend APIs are running ASP.NET Core 3.0 and are instrumented with Application Insights, will a client app’s TraceId be visible in their Log Analytics traces, requests, etc.?

    • David FowlerMicrosoft employee 0

      Yep, it works with Application Insights as well.

  • Travis Illig 0

    Is there any plan to allow different propagation headers to be plugged in? If you already have heavy investment in B3 propagation switching everything to W3C isn’t realistic.

    • Sergey KanzhelevMicrosoft employee 0

      It was always possible to make things like logs attribution with the traceId work with any headers. ASP.NET has a lot of extensibility points. This post demonstrates how everything works out of the box without the need to install anything. And out of the box we want to follow standard headers.

      I think over time there will be more incentive to switch to w3c which will outweight the amount of work required to transition. Especially if more things will “just work” with w3c.

  • Christer Eriksson 0

    Is there any sample code somewhere I can download? I can not get the SpanID for the frontendapp and the ParentSpanId for the backendapp to match.

    • Sergey Kanzhelev 0

      TraceID would match, SpanID wouldn’t. You need logs from http client that does not exists with ILogger.

      • Johannes Schöffberger 0

        Maybe I got it wrong, but I thought that a (W3C) Trace Context has to look like this:
        Service A (SpanId:1) -> TraceId:123;ParentId:1 -> Service B (SpanId:2)
        Service B (SpanId:2) -> TraceId:123;ParentId:2 -> Service C
        Service B (SpanId:2) -> TraceId:123;ParentId:2 -> Service D (SpanId:3)
        Service D (SpanId:3) -> TraceId:123;ParentId:3 -> Service E

        So traceparent headers for request B to C and B to D are the same, to declare that both C and D were called by B during ‘123’.

        The logs mentioned above support my interpretation: Your FrontEndApp logs
        SpanId:54e2de7b9428e940
        TraceId:e1a9b61ec50c954d852f645262c7b31a
        ParentId:69dce1f155911a45

        while your BackEndApp logs
        SpanId:0e783a0867544240
        TraceId:e1a9b61ec50c954d852f645262c7b31a
        ParentId:54e2de7b9428e940

        So, SpanId from FrontEndApp results in ParentId in BackEndApp.

        But when I execute your sample code https://github.com/SergeyKanzhelev/ot-demo-2019-11 locally, FrontEndApp logs
        SpanId:8cf7f539b8a7a745
        TraceId:549b09fe083a48478b872bb763872c9c
        ParentId:c4a4e4b93370ac44

        while BackEndApp logs
        SpanId:06f073849f11ac46
        TraceId:549b09fe083a48478b872bb763872c9c
        ParentId:2b14f7812c845441

        In this case SpanId from FrontEndApp does not match with ParentId in BackEndApp. So it is not possible to conclude that BackEndApp was called by FrontEndApp, only that they were part of one ‘transaction’ in any way.

        This behavior corresponds to https://github.com/dotnet/runtime/blob/master/src/libraries/System.Diagnostics.DiagnosticSource/src/ActivityUserGuide.md#creating-child-activities, saying that for each outgoing request a new Activity should be created.

        I also found out that System.Net.Http.DiagnosticsHandler creates a new Activity and that the usage of Activity.Current can only be forced if I subscribe for “HttpHandlerDiagnosticListener” this way:

        var observer = new System.Reactive.AnonymousObserver<System.Collections.Generic.KeyValuePair<string, object>>(value => { });
        Predicate<string> predicate = (string eventName) => false;
        
        DiagnosticListener.AllListeners.Subscribe(delegate (DiagnosticListener listener)
        {
            if (listener.Name == "HttpHandlerDiagnosticListener")
            {
                listener.Subscribe(observer, predicate);
            }
        });

        So is my interpretation wrong and traceparent’s parent-id should not match with Activity.SpanId, but should be always a unique id per request instead? But if so, how can I derive a “call tree” from these information (A calls B, B calls C and D, D calls E, …)?

        Thanks a lot!

  • Rehan Saeed 0

    I’d really appreciate a code sample. There seems to be a bunch of stuff missing.

  • Rehan Saeed 0

    Will the W3C mode ever become the default or will I always have to add this line of code to all my apps:

    Activity.DefaultIdFormat = ActivityIdFormat.W3C;
    • Sergey KanzhelevMicrosoft employee 0

      that’s the plan.

  • Shaun Tabone 0

    It is really great that you are adding such features in .NET Core and thanks for the detailed explanation. I suggest that the code snippets uses DiagnosticSource.StartActivity to also write the event to the DiagnosticSource instead of Activity.Start directly (as suggested in the Activity Guidelines – https://github.com/dotnet/corefx/blob/master/src/System.Diagnostics.DiagnosticSource/src/ActivityUserGuide.md)

    • Sergey KanzhelevMicrosoft employee 0

      Thank you for advice! You are correct, that would be the right way to do it so Activity Start event can be read by the listener

  • Tom Davis 0

    This is a great article, thanks for sharing.

    My client’s policy is for all APIs to pass around a “TraceId” HTTP header which we use to enrich the logs – this now clashes with the TraceId provided by ASP.NET Core. The simplest workaround would be to enrich the logs with a different named property (e.g. ClientNameTraceId), however this might get confusing as logs would have both TraceId and ClientNameTraceId properties – also, our trace id wouldn’t be in the shared context when making outbound calls to other APIs.

    Is there a way to override the ASP.NET Core TraceId with our own value? Perhaps by writing some middleware to read our existing TraceId header and somehow overriding/setting the TraceId on the current Activity?

    • Sergey KanzhelevMicrosoft employee 0

      You will only see TraceId in logs when you enable Scopes on logging. There is no “TraceId” http header. Is the clash happens when you set TraceId on your own scope?

      • Camilo Fierro 0

        I’m using Serilog and also experience this issue. I want to override the ‘TraceId’ value with mine on the logs. What would the steps be to accomplish this? Serilog uses logging scopes.

  • Dalin WilliamsMicrosoft employee 0

    Great summary! Quick question – how should we use this approach if we are already using grpc interceptors?

  • 309429775@qq.com 0

    Does this feature support .net framwork?

Feedback usabilla icon