July 5th, 2024

Azure IoT Operations Data Processor Pipelines: Unlocking Efficiency in the Industrial Metaverse

Maho Pacheco
Sr. Software Engineer

Picture this: the hum of machinery, the scent of freshly manufactured goods, the symphony of gears in motion with robotic arms dancing alongside traditional assembly lines.

Stepping into a manufacturing facility is like entering a world where innovation meets reality. That is what our team saw firsthand — we were lucky enough to be able to immerse ourselves in one of our customer’s factories, explore the intersection of technology and industry, and delve into processing data at the edge with Azure IoT Operations Preview.

The modern manufacturing landscape is evolving. One key player in this progression is Azure IoT Operations, a new IoT platform announced at Microsoft Ignite 2023. This industrial metaverse solution transforms physical operations from the cloud to the edge.

As our team ventured into the bustling noise of a manufacturing line, one thing became abundantly clear: at its core lies the need for efficient data processing. Data collected from the devices drives decisions, actions and operations. Moreover, data, efficiently processed at the edge, has the power to improve the overall performance of the manufacturing line.

Azure IoT Operations provides a core module for such operation: AIO Data Processor pipelines.

Armed with a trove of insights and a clear scenario in mind, we decided to implement two key values from the Overall Equipment Effectiveness (OEE) standard: “Machine Status” and “Total Counter”.

OEE serves as a crucial performance metric used in manufacturing industries to assess the efficiency and productivity of equipment or machinery. It provides insights into how effectively equipment is being used to produce high-quality products.

Given this, we naturally wondered: Could AIO Data Processor Pipelines handle such a scenario? What challenges might we encounter? What friction points could arise? Despite AIO being in public preview, with a wave of enhancements expected before General Availability, we felt it was important to share some of our early insights in this post.

Processing Data with jq

Let’s dive into the backbone of the data processing: jq. Data Processor pipelines offer jq as a simple yet powerful way to process the payloads that usually arrive from OPC UA Servers in JSON format. OPC UA (OPC Unified Architecture) is a standard developed by the OPC Foundation. A jq program then operates as a “filter”, taking an input and generating an output. It comes equipped with built-in filters for tasks such as extracting a specific field from an object or converting a number to a string.

AIO Data Processor pipelines are divided in stages, where each stage could be a filter, transformation, or HTTP request. These are our takeaways from using jq for specifically the filter and transformation stages:

Pros

  • Performance. Written in C, so even complex or large array scans happen in a very short time.
  • Powerful. We did not find any scenario where jq could not handle the transformation.
  • Simple. Jq is simple and cross-platform, you can test and validate jq expressions easily in a command-line or playground.

Cons

  • Learning curve. Grasping jq involves more than just knowing syntax. It requires considerable effort to learn how jq processes, transforms, and understands an input.

Tips

  • ChatGPT understands jq. You can ask GPT models for help in creating jq expressions.
  • Split large expressions. Splitting large expressions logically into multiple stages helps with readability and testability. For example, this machine status logic could have been one large expression but it is easier to troubleshoot when it is separated in multiple stages.

Data Processor pipelines model

Data Processing pipelines provide a convenient and powerful way to process data. Using the AIO MQTT broker as central mechanism to deliver and retrieve messages, we were able to implement complex logic by separating it out into different pipelines.

We successfully implemented all the OEE model calculations using the full capabilities of the AIO Data Processor. This included HTTP requests for inputs or stages, enriching messages with reference data, and leveraging the last known value features. As it matures and expands with additional features, AIO Data Processing pipelines could become a cornerstone for data processing at the edge.

Tips

  • Don’t hesitate to split large pipelines into multiple smaller ones. This approach simplifies the logic and management of pipelines. The latency within the MQTT broker is minimal, greatly aiding troubleshooting and management. However, this may increase the number of Kubernetes resources (CRDs or Azure ARMs) to manage, with its associated implications. Azure IOP DP Jumpstart provides two examples of this approach.
  • Make sure to have a very well defined topic structure. It will help with performance through partitioning and standardize the pipelines implementation.

Validation, debugging, testing

To handle instances where we might receive invalid input, we created a validation pipeline that checks the validity of each property in the input payloads. We also crafted several supporting debugging pipelines that display the data in the payloads and datasets using passthrough stages, aiding our development work.

A significant part of our work revolved around testing. We devised integration tests to verify the end-to-end functionality of the pipelines, feeding files to the input topic and comparing the generated output with the expected results.

Tips

  • Feel free to create supporting passthrough debugging pipelines to see into stages or datasets where needed and to get the whole payload including internal elements. Here are some examples.
  • Standardize your input/output topic structure and append “/debug” to the passthrough pipelines.
  • Do not hesitate to use multiple of output topics. While it may be tempting to reuse output topics to favor performance, remember that MQTT brokers scale very easily. Reusing output topics can increase complexity by mixing different kinds of messages within the same topic.

References

  • Learn more about data processor pipelines here.
  • Jumpstart your journey with with our sample repo.
  • Keep up with the latest features in Azure IoT Operations here.

Acknowledgments

Thanks to the Voyager Crew for the work in data processor pipelines: Olha Konstantinova, Renato Marciano, Wendy Reinsel, Marcia Dos Santos, Emmeline Hoops, Maho Pacheco, and Yani Ariunbold. A special thanks to Mohzina Zaman, Meena Gudapati, and Udbhav Trivedi for the work and collaboration from the Product Group. Special thanks to Alexander Gassman, Bill Berry, and Larry Lieberman for the guidance on customer patterns.

Author

Maho Pacheco
Sr. Software Engineer