March 1st, 2024

Crafting an Opinionated Logging and Error Handling Framework for Enhanced Observability in PowerShell

Maho Pacheco
Sr. Software Engineer

Over the past year, our team has dedicated efforts to adapting and porting various solutions, tools, and architectural references to Azure IoT Operations, a new IoT platform announced at Microsoft Ignite 2023. While our focus has been on this platform, it’s important to note that the insights shared in this post are applicable beyond its scope.

As the complexity of these scripts increased, we worked with our customers to focus on observability, the ability to understand the behavior of the system via telemetry such as metrics, traces, and logs. We generalized our findings into a set of reusable practices and code examples.

This post will delve into the aspect of logging and error handling, and share insights gained during our implementation journey, resulting in the development of a highly opinionated set of reusable code to manage logging effectively.

Interestingly, during our journey, we discovered a lack of opinionated rules or comprehensive guidance on best practices in PowerShell. As a bash enthusiast myself, I acknowledge PowerShell’s unique features and its large, mature, and continuously expanding community. This blog post serves as our contribution to that community, sparking discussions and providing insights into enhancing the usage of PowerShell in real-world scenarios.

The Epic Quest of Forging Scripts in the Intricate Tale of Logging Magic

If you are less interested in the journey than in the results, feel free to jump to the solution here.

Why Structured Logging Matters

Structured logging is essential for software systems as it provides a standardized format for recording log messages, including severity levels, error codes, and additional metadata. Unlike traditional logging, where messages are simply emitted to the console or a file, structured logging enables developers and operators to capture meaningful information in a consistent manner.

By structuring log messages, developers can easily query and analyze log data, allowing them to identify trends, diagnose issues, and generate alerts more effectively. For example, a structured log message may include fields such as the severity of the event (e.g., info, warning, error), an error code indicating the specific issue, timestamps, and contextual metadata such as the user ID or transaction ID.

Structured logging plays a pivotal role in our project, especially in analyzing developer experience (devx) within a toolset executed across diverse environments. Each environment can simultaneously handle dozens of distinct IoT contexts. In our case, each context comprises a set of Azure resources and a Kubernetes cluster. This becomes especially critical when numerous variables, such as environment configurations, Azure subscriptions, and IoT requirements, come into play. By ensuring consistency in error handling across all scripts, structured logging makes them machine-parseable, facilitating seamless grouping, aggregation, and reporting.

In our specific scenario, structured logging serves a dual purpose. Firstly, it aids in uncovering patterns and behaviors of user experience. Secondly, structured logging enables us to leverage technologies like KQL in Azure Monitor, or even ML/AI for more sophisticated analysis. Through KQL queries, we can swiftly pinpoint relevant data points and trends, while ML/AI algorithms help in predicting potential issues or suggesting optimizations.

The ultimate goal is to empower customers by providing a low-cost yet powerful mechanism for emitting meaningful messages at the right level of detail, ensuring that structured logging communicates not only with users but also with other systems efficiently.

PoshLog

Following a brief design investigation and a spike, we opted to use PoshLog as our foundation. PoshLog is a framework built on C# Serilog, a tool we have successfully employed in previous projects. PoshLog introduces the concept of “sinks,” providing easily extendable storage mechanisms for logging data, for example, to export logging data to Azure Monitor.

Initially, we used PoshLog with the Console Sink, configured in the default logging configuration. We created a short script module that could be included in the existing scripts at the beginning (e.g., init-logger.ps1) and then modified all the existing output methods to verify if the logger exists and is configured. If not, it falls back to the regular PowerShell Write-Output cmdlets. For example:

if (Confirm-LoggerIsEnabled) {
    Write-InfoLog $Title
} else {
    Write-Host $Title
}

This allowed us to quickly integrate the logging mechanism with our existing solution without much fuss.

One challenge with this approach was that PoshLog does not support replacing or reconfiguring the ConsoleSink in an existing logger seamlessly. We wanted to provide an option to our customers to reconfigure the logger in any way they see fit but providing a pre-configured one by default. We decided to initialize the logger with a set of sinks by default but allow the user to override the original logger by creating a custom code module if necessary.

An initial implementation added the parameter LogLevel into the scripts parameter, so the user could decide how much to see in the console. We reused the type Serilog.Events.LogEventLevel setting the default to Information. However, this presented an issue that we detected while running our Pester unit tests in the CI/CD pipeline: because the init script installs the modules, including PoshLog, we cannot access the Serilog.Events.LogEventLevel class type before calling the scripts if PoshLog is not previously installed. It became a chicken and egg problem, as the execution throws an error if PoshLog is not installed. We ended up creating our own LogLevel class type. A further revision showed that the common practice is not to use a LogEventLevel (which introduces friction in DevX by asking the user to understand the different logging levels and how they overlap) but to use switch flags such as -Verbose, -Debug, or -Silent. After discovering that these flags exist by default, PowerShell Streams entered the scene.

PowerShell Streams

PowerShell has 7 streams (Success, Error, Warning, Verbose, Debug, Information, Progress), and it has its own methods (Write-Debug, Write-Verbose, Write-Host, etc.). PoshLog supports a PowerShell sink that outputs to those streams, but the way it works is different than the ConsoleSink.

All PowerShell scripts already support the switches -Verbose and -Debug for modifying the preference of such streams. See Common Parameters. However, because we thought PoshLog still adds value to the other sinks when logging (e.g., file), it is still a better alternative than using the native Write-Verbose, Write-Debug, etc. methods.

PoshLog does support a special PowerShell sink for PowerShell streams, so we decided to create a prototype with the following findings:

  • The output streams in PowerShell are not hierarchical; you can enable Debug and disable Information, for example.
  • The Information stream is turned off by default.
  • The Verbose stream is also used by PowerShell natively, adding a lot of noise (e.g., every time a new class is loaded, it is logged there).
  • We cannot overwrite -Verbose or -Debug in the parameters. These parameters always take precedence over any PoshLog configuration, making it confusing.
  • To access the value of -Verbose or -Debug, we need to access the value of the preference variables.
  • The behavior of -Debug and -Verbose is inconsistent. -Verbose is very noisy and sets the preference to Continue; -Debug sets the preference to Inquire. See issue 10873 for more information.

After all this information and some design discussions, we decided to implement our parameters -DetailedOutput and -SilentMode and use the PoshLog ConsoleSink.

A Brief Interlude on Non-terminating Errors and Terminating Errors

There are two types of errors in PowerShell: terminating errors and non-terminating errors. It is possible for the consumer of the scripts to upgrade non-terminating errors with -ErrorAction Stop to make it catchable.

A terminating error functions as an exception in the runtime stack, while a non-terminating error operates as a notification in a messages stack without triggering or propagating. By ‘upgrading’ non-terminating errors, they can propagate through the runtime stack, allowing you to ‘catch’ them, much like capturing Pokémon.

The Write-Error cmdlet declares a non-terminating error. By default, errors are sent in the error stream to the host program to be displayed, along with output.

To write a non-terminating error, enter an error message string, an ErrorRecord object, or an Exception object. Use the other parameters of Write-Error to populate the error record.

Non-terminating errors write an error to the error stream, but they don’t stop command processing. If a non-terminating error is declared on one item in a collection of input items, the command continues to process the other items in the collection.

As we talked about before in this post, PowerShell provides multiple output streams. The streams provide channels for different types of messages. You can write to these streams using the associated cmdlet or redirection.

The Error stream is the default stream for error results when using the Write-Error cmdlet. The Error stream is connected to the stderr stream for native applications. Errors written to this stream are also added to the $Error automatic variable. For more information, see PowerShell common parameters.

In PowerShell, both $PSCmdlet.ThrowTerminatingError() and throw are used to throw exceptions, but they are used in different contexts and have different implications.

Handling Errors Best Practices

When talking about terminating errors, we decided that our modules would throw exceptions, and the scripts would use try { } catch blocks. In the end, we followed the philosophy that exceptions are for exceptional circumstances. All other errors that can have a fallback or handled inside the modules, we decided to use non-terminating errors and continue.

Exit and Error Codes

Exit codes are numerical values that indicate the outcome of a script or a command in PowerShell. They are stored in the $LASTEXITCODE variable and can be used to provide feedback or perform conditional actions based on the success or failure of a script or a command.

There are two default exit codes in PowerShell:

  • 0 means normal termination or success. This exit code is returned when a script or a command completes without any unhandled errors or exceptions.
  • 1 means failure or an uncaught throw. This exit code is returned when a script or a command encounters a script-terminating error that is not caught by a try-catch block or a trap statement.

PowerShell also allows you to return custom exit codes, as long as they are integers. The range of values for the exit code is platform-dependent. On Windows, any 32-bit signed integer is valid, while on Linux and macOS, only values between 0 and 255 are valid.

To return a custom exit code, you can use the exit keyword followed by an integer value. For example, exit 42 will return the exit code 42. The exit keyword will also terminate the script or the command, so it should be used with caution.

While Bash contains a reserved list of exit codes, PowerShell does not. However, we decided to create our own standard for error codes based on the following principles:

  • The error codes should be meaningful, descriptive, and consistent.
  • The error codes should be displayed and logged using the appropriate cmdlets.
  • Each distinct error should have a unique error code. This will make it easier to identify the cause of an error.
  • Avoid using numbers that are used as exit codes by the shell or commonly used programs.
  • Use Meaningful Numbers.

To implement these principles in PowerShell, we have decided to create a PowerShell module that contains a function for generating standardized error messages based on error codes. This function, Get-ErrorMessage, takes an error code and an optional additional information string as input and returns a standardized error message. The error codes and their corresponding messages are stored in a hashtable within the function, making it easy to add, remove, or change error codes and messages as needed.

By encapsulating this functionality in a module, we can easily import it into any script that needs to generate error messages. This approach aligns with the modular design philosophy of PowerShell, which encourages the use of reusable modules to simplify script development and maintenance.

Here is an example:

function Get-ErrorMessage {
    param (
        [Parameter(Mandatory=$true)]
        [string]$ErrorCode,

        [Parameter(Mandatory=$false)]
        [string]$DirFilePath = ''
    )

    $ErrorMessages = @{
        # 100 codes are for context errors
        '100' = "Error 100: There is no default.json in the context directory [$DirFilePath]"
        '101' = "Error 101: An error occurred while getting the context path: $DirFilePath"
    }
    # ...
}

It is important to note that the error codes are not meant to serve or replace exit codes. These error codes are custom business error codes meant to be specific for our solution.

Metadata

A very nice thing about structured logging is the ability to provide structured context in the form of metadata. PoshLog provides mechanisms to add metadata to each logging record in a very easy and extensible way. PoShLog has a number of default and custom cmdlets that can be loaded with extensions that can be found here. PoshLog has a default method for enriching logs with custom parameters that we may use:

$LoggerConfig = $LoggerConfig | Add-EnrichWithProperty -Name <PROPERTY_NAME> -Value <PROPERTY_VALUE>

There is also a PoShLog.Enrichers extension that contains some custom cmdlets with some predefined commonly used parameters, such as User Name and Machine Name:

$LoggerConfig = $LoggerConfig | Add-EnrichWithEnvironment -MachineName -UserName

A Very Opinionated Logging and Error Handling Library for PowerShell

Effective error handling is a cornerstone of robust script development. With all this information and learning, we decided to create a PowerShell logging library. More than a library, it is a set of Cmdlets wrappers on top of PoshLog to help us code easier but also forcing us to follow best practices.

Introducing, RAD PowerShell Cmdlets, the Robust Adaptive Dynamic (RAD) PowerShell Cmdlets provide a comprehensive and very opinionated error handling mechanism. This not only enhances the readability and reliability of your PowerShell scripts but also contributes to their adaptability in unforeseen scenarios.

You can find the code of the solution and examples here. It is divided into three modules: Error-Utils, Text-Utils and Logging-Utils.

Text-Utils provides a set of functions to be used to show messages in different verbosity levels. It relies on PoshLog, but if PoshLog is not enabled, installed, or properly configured, it falls back to Write-Host, which makes a very reliable mechanism. We suggest adding some linters to your solution to avoid developers using Write-Host or Show-Error directly and using only the Cmdlets provided by Text-Utils.

Error-Utils provides a set of functions to show and trigger error messages. It has a dictionary of errors, and each error has a code, which can be an integer or string. It provides a mechanism to load the errors from a YAML file, allowing easy extensibility. It is also very opinionated in the way we show and throw errors by forcing the developer to always provide an error code.

Logging-Utils provides a set of functions to easily setup and configure a PoshLog logging instance. This is specially useful to include at the beginning of other scripts, and provide a single and standardized mechanism to configure the same logging instance (including templates, sinks, etc.) over all the scripts.

Combining the mandatory use of Text-Utils, Error-Utils, and Logging-Utils will ensure that 100% of all error messages in your solution are structured, standardized, logged, and formatted, improving troubleshooting and monitoring over your solution.

At the end, we decided to go one step further, and we combined the terminating error-throwing function with the error generation. We also ensured that each time a terminating error is thrown, it must use the Invoke-TerminatingError function, for example:

Invoke-TerminatingCommand -Command { az import life } -ErrorCode 1010 -ErrorParameters @("life")

This will not only ensure that 100% of all terminating error messages are properly structured and logged but also manage the flow of the application in a standardized way. In other words, each time there is a chance a command could trigger a fatal error, we should execute Invoke-TerminatingCommand instead of a try/catch. There could be exceptions to this rule, but those should be very conscious.

You could see an example of a very clean script using the these modules and implementing the Init.ps1 script here.

Conclusion

Effective error handling and logging are cornerstones of robust script development. Our learning journey led us to create the Robust Adaptive Dynamic (RAD) PowerShell Cmdlets, which take this to the next level by providing a comprehensive and very opinionated error-handling mechanism. This not only enhances the reliability of your PowerShell scripts but also contributes to their adaptability in unforeseen scenarios.

Error-handling features significantly contribute to a positive developer experience by simplifying troubleshooting, promoting customization, enabling controlled script termination, optimizing resource usage, and providing versatile message handling options. It is one of those areas where a very opinionated approach could solve more than one headache and collectively enhance the overall reliability and adaptability of your solutions.

Acknowledgments

While I penned down this blog post, it’s important to acknowledge that the real stars behind the scenes, including the code, are mostly from the Voyager Dev Crew: Olha Konstantinova, Renato Marciano, Yani Ariunbold, Wendy Reinsel, and Marcia Dos Santos. A special thanks to Emmeline Hoops, who helped build and tests the RAD cmdlets, and to Ronnie Yates and Bill Berry for their guidance and support in helping us generalize and share our learnings.

Author

Maho Pacheco
Sr. Software Engineer