{"id":16593,"date":"2026-03-19T00:00:00","date_gmt":"2026-03-19T07:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/ise\/?p=16593"},"modified":"2026-03-19T03:28:43","modified_gmt":"2026-03-19T10:28:43","slug":"aio-data-processor-pipelines-to-dataflow-processing-data-in-the-edge","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/ise\/aio-data-processor-pipelines-to-dataflow-processing-data-in-the-edge\/","title":{"rendered":"From Azure IoT Operations Data Processor Pipelines to Dataflows"},"content":{"rendered":"<p>We recently completed an <strong>18-month journey<\/strong> building an event detection system at the edge with Azure IoT Operations (AIO).\nThe goal was simple to describe but complex to execute: use a Vision Model to detect real-world events, then raise confidence by correlating that signal with <strong>other data sources<\/strong>\u2014from industrial sensors to weather APIs.<\/p>\n<p>This meant designing <strong>heuristic logic<\/strong> at the edge: the \u201cbrain\u201d that fuses data from multiple inputs to decide if something really happened. And as the technology evolved, so did our architecture.<\/p>\n<h2>The Evolution: From jq Pipelines to WASM Dataflows<\/h2>\n<p>When we started, we leaned on <strong>AIO Data Processor Pipelines<\/strong>\u2014a familiar tool we had used before for Overall Equipment Effectiveness (OEE) calculations. It was simple, declarative, and built on <code>jq<\/code> (JSON Query), making transformations approachable. We even published a <a href=\"https:\/\/devblogs.microsoft.com\/ise\/aio-data-processor-pipelines-unlocking-efficiency-industrial-metaverse\/\">blog post in July 2024<\/a> about how effective it was.<\/p>\n<p>But when we took our processing logic to a pilot in production, the Azure IoT Operations team introduced the <strong>Dataflow component<\/strong>, which evolved from the original Pipelines. This change was shaped by customer and partner feedback\u2014including ours. Dataflow brought both excitement and new challenges:<\/p>\n<ul>\n<li>Exciting because it provided a <strong>richer, more robust framework<\/strong> for connecting multiple data sources.<\/li>\n<li>Challenging because it required us to <strong>rethink our architecture<\/strong> and adjust existing pipelines.<\/li>\n<\/ul>\n<p>We even drafted a \u201cmigration\u201d blog post (planned for November 2024) on moving from Data Processor Pipelines to custom Rust\/Dataflow pods\u2014but we paused. Dataflow itself was evolving quickly, and by 2025 it had matured into two complementary flavors:<\/p>\n<ul>\n<li><strong>Expression-based Dataflows<\/strong>: out-of-the-box functions with a rich expression language.<\/li>\n<li><strong>WASM-based Dataflows<\/strong>: full flexibility to embed custom, managed code inside the flow.<\/li>\n<\/ul>\n<p>This gave us new options\u2014but also new trade-offs.<\/p>\n<h2>Hybrid Approach: Rust Pods + Dataflows<\/h2>\n<p>Instead of using a rip-and-replace approach, we chose a <strong>hybrid approach<\/strong>:<\/p>\n<ul>\n<li><strong>Custom Rust Pods<\/strong> handled heavy transformations, enrichment, and heuristic logic.<\/li>\n<li><strong>Dataflows<\/strong> managed lightweight tasks: routing, filtering, simple enrichment, and connectivity (e.g., syncing to Azure Event Grid).<\/li>\n<\/ul>\n<p>This balance gave us the <strong>performance and control<\/strong> of Rust where it mattered, while using <strong>Dataflows\u2019 simplicity<\/strong> for integration.<\/p>\n<h2>A Deep Dive Into Our Transformations<\/h2>\n<p>Over time, we built a small Rust <strong>transformation library<\/strong> that became our toolkit. Some highlights:<\/p>\n<h3>1. <strong>Enrichment: Adding Context<\/strong><\/h3>\n<p>We appended environment variables, cluster metadata, or system data to incoming messages. This gave downstream systems richer context without extra lookups.<\/p>\n<h3>2. <strong>Last Known Value (LKV): Handling Gaps<\/strong><\/h3>\n<p>Sensors are chatty\u2014until they aren\u2019t. Inspired by Pipelines, we built an LKV trait with two implementations:<\/p>\n<ul>\n<li><strong>In-memory cache<\/strong> for dev\/test.<\/li>\n<li><strong>MQTT-backed store<\/strong> for production.<\/li>\n<\/ul>\n<p>Because there was no official SDK for MQTT-backed state at first, we tested in-memory and switched later. That decision highlighted an important lesson: prototype quickly with simple backends, then harden for production.<\/p>\n<h3>3. <strong>Mapping &amp; Standardization<\/strong><\/h3>\n<p>Different devices spoke different \u201cdialects.\u201d We normalized units, standardized names, and translated site codes. This reduced integration headaches.\n<em>Pro tip:<\/em> keep mapping logic separate from heuristic logic\u2014simpler to debug, swap, and test.<\/p>\n<h3>4. <strong>Filtering: Reducing Noise<\/strong><\/h3>\n<p>From basic field checks to nested conditions, filtering kept pipelines focused on what mattered. We even used <strong>hierarchical MQTT topics<\/strong> to reflect filtering stages, making it easy to trace message paths.<\/p>\n<h3>5. <strong>Heuristic &amp; Arithmetic Logic<\/strong><\/h3>\n<p>This was the \u201csecret sauce.\u201d We combined arithmetic, heuristics, and lightweight ML anomaly detection\u2014reusable via a shared Rust crate. Once the building blocks were in place, the \u201chardest\u201d part became the simplest.<\/p>\n<h2>Scaling the Architecture<\/h2>\n<p>Each transformation ran as an independent Rust thread, grouped into <strong>pods<\/strong>. This gave us fine-grained scalability:<\/p>\n<ul>\n<li>Scale high-volume enrichment pods separately from compute-heavy heuristics.<\/li>\n<li>Deploy on Kubernetes with node affinity and replica tuning.<\/li>\n<li>Isolate failures\u2014one broken stage didn\u2019t topple the system.<\/li>\n<\/ul>\n<h2>MQTT as the Brain<\/h2>\n<p>From day one, <strong>MQTT was our nervous system<\/strong>. Every component\u2014Rust pod, Dataflow, sensor, or cloud connector\u2014spoke through the broker. We didn\u2019t wire services together directly; we published and subscribed to topics.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2026\/03\/mqtt-trace-example.webp\" alt=\"Example Architecture diagram showing Rust pods, Dataflows, and MQTT broker integration\" \/><\/p>\n<p><em>Figure: High-level architecture\u2014Rust pods handle heavy processing, Dataflows manage lightweight tasks, all coordinated via MQTT broker.<\/em><\/p>\n<p>That decision paid off:<\/p>\n<ul>\n<li><strong>Decoupling<\/strong>: Each component only needed to know its topic. We could add, remove, or rewire logic without breaking others.<\/li>\n<li><strong>Tracing<\/strong>: Because <em>everything<\/em> flowed through MQTT, we built a tracing component that correlated events across the system. It was as simple as subscribing to multiple topics and reconstructing event lifecycles.<\/li>\n<li><strong>Scalability<\/strong>: By partitioning topics, we scaled horizontally. We could route high-volume data to one broker, critical alerts to another, or separate ingestion (frontends) from processing (backends).<\/li>\n<li><strong>Flexibility<\/strong>: MQTT let us experiment\u2014run multiple versions of a component in parallel, replay messages from retained topics, or test new heuristics without touching production wiring.<\/li>\n<\/ul>\n<p>\u26a0\ufe0f <strong>Caution:<\/strong> brokers aren\u2019t infinite. Depending on data size, message rate, and retention policies, you\u2019ll need to test, partition, and scale across brokers. But with the right architecture, MQTT as the \u201cbrain\u201d becomes the single, reliable connective tissue.<\/p>\n<h2>Lessons Learned<\/h2>\n<p>Eighteen months, several rewrites, and one major platform evolution later, here\u2019s what we\u2019d tell our past selves:<\/p>\n<ul>\n<li><strong>Modularity Wins<\/strong>: Whether in jq, Rust, or Dataflows, keep logic small and composable. Easier to debug, scale, and migrate.<\/li>\n<li><strong>Hybrid Is Practical<\/strong>: Don\u2019t chase \u201call-in\u201d rewrites. Dataflows for routing; Rust (or the language of your choice) for heavy lifting.<\/li>\n<li><strong>Context Is Everything<\/strong>: Enrichment and LKV strategies are essential for meaningful, resilient data.<\/li>\n<li><strong>MQTT Is the Brain<\/strong>: Treat the broker as the system\u2019s nervous system. Route everything through it, and you unlock tracing, scalability, and decoupling.<\/li>\n<\/ul>\n<h2>The Takeaway: Design for Change<\/h2>\n<p>Edge AI and IoT environments evolve rapidly. Components shift, features are redefined, and what works today may no longer be viable tomorrow. The most effective way to stay resilient is to design for <strong>modularity, flexibility, and hybrid operation<\/strong> from the start.<\/p>\n<p>Keeping components <strong>lightweight and single-purpose<\/strong> not only improves performance but also simplifies troubleshooting. When each stage does one thing well, it is easier to isolate failures, replace or upgrade logic, and trace problems end-to-end. This modularity accelerates debugging and enables faster iteration as requirements or technologies change.<\/p>\n<p>For teams working in this space, we recommend following Microsoft\u2019s <a href=\"https:\/\/github.com\/microsoft\/edge-ai\">Edge AI Accelerator<\/a>. It provides patterns, tools, and guidance for building scalable and production-ready edge AI solutions, many of which align with the lessons we share here.<\/p>\n<p>Whether you are building event detection systems at the edge, adopting AIO Dataflows, or experimenting with custom heuristics, the core guidance remains the same: keep your architecture modular and flexible, and let MQTT serve as the central nervous system that\u2014when sized, partitioned, and tuned correctly\u2014can enable scaling, tracing, and long-term evolution.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this post we explore the evolution from Azure IoT Operations Data Processor Pipelines to Dataflows, why we adopted a hybrid strategy with custom Rust pods, and the architectural lessons we learned building event detection systems at the edge.<\/p>\n","protected":false},"author":114452,"featured_media":16594,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1,3451],"tags":[3539,3531,3540,3641,3537],"class_list":["post-16593","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cse","category-ise","tag-azure-iot-operations","tag-clean-architecture","tag-data-processing","tag-edge-ai","tag-industrial-metaverse"],"acf":[],"blog_post_summary":"<p>In this post we explore the evolution from Azure IoT Operations Data Processor Pipelines to Dataflows, why we adopted a hybrid strategy with custom Rust pods, and the architectural lessons we learned building event detection systems at the edge.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/16593","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/users\/114452"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/comments?post=16593"}],"version-history":[{"count":1,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/16593\/revisions"}],"predecessor-version":[{"id":16595,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/16593\/revisions\/16595"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media\/16594"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media?parent=16593"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/categories?post=16593"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/tags?post=16593"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}