Hitchhikers Guide to Workflow Engines

Don’t Panic! This guide is designed to help you understand the basics of workflow engines, with no prior knowledge needed!

We’ll introduce workflow engines and their qualities and features, and provide some insights into when and why you might want to use one. Moreover, we’ll highlight key considerations to keep in mind when selecting an engine that suits your specific needs. This will equip you with a good starting point for making an informed decision.

Workflow Engines: Orchestrating Complex Processes with Precision and Scalability

One of our large customers in the automotive industry approached us with the perfect scenario for a workflow engine. They wanted to read data from their PLCs (Programmable Logic Controllers) and programmatically change values based on third-party systems, values read from the PLC, or provided parameters. They also wanted the process to be developed by their production engineers.

This scenario demonstrates the need for an effective way to manage and automate complex processes, which is where workflow engines come into play. They serve as a powerful foundation for building software applications that can tackle complex challenges and adapt to ever-changing requirements. Workflow engines empower developers to create solutions that dynamically scale to meet demand, reuse components for efficiency, provide deep insights into system health, and recover gracefully from errors. They offer a robust framework to define, execute, and monitor workflows across various systems and applications.

Now that we have the perfect scenario for using a workflow engine, it’s up to us to decide which one best suits our needs and requirements, allowing us to fully harness the power of customizable and scalable processes.

Key Considerations for Selecting a Workflow Engine: Qualities and Features

Workflow engines possess a range of characteristics that enable them to execute processes efficiently. These features will be discussed in greater detail in the paragraphs that follow. Before diving in, it is crucial to establish a common understanding of certain terms. In the next section, a process conducted by a workflow engine will be referred to as an “activity” (also called Task, Step, or Job) which can be in a successful Successful Activity , failed Failed Activity or not started Not Started activity state. A workflow definition is the process of chaining activities together to form a sequence and a workflow execution is the current state of a system running a specified definition.

To help you understand the different uses of an engine, we’ve put together a list of important qualities and features that engines should have, which we’ll discuss in the following paragraphs.

The features outlined below provide a general idea of what you can expect from a workflow engine. However, it’s important to drill deeper into the specifics, as each engine may have various implementations that are either desirable or undesirable depending on your needs.

Scalability

Scalability is a crucial feature for any application, and workflow engines enable your code to handle an increasing number of executions on a large scale. For instance, a workflow engine can efficiently manage and coordinate numerous PLCs across a factory floor, even as the number of devices and their complexity grows.

A scalable workflow engine should offer these features:

Multi-cluster replication: Distribute and synchronize workflow data across multiple clusters for high availability and fault tolerance.
Archival of old data: Store and manage historical workflow data for compliance, auditing, and analysis purposes.
Scalable persistence stores: Use scalable storage solutions to accommodate growing workflow data and activity loads.

Multiple workflow executions shown in parallel

Reusability

Workflow engines should enable developers to create reusable activities and components that can be easily integrated into new workflows. This not only saves time but also ensures consistent behavior across different processes. For instance, a reusable payment processing activity can be used in various workflows, such as online purchases, subscription renewals, or refunds.

A workflow engine that enables reusability should offer these features:

Sub-workflow execution: Allow workflows to call and execute other workflows.
Activity reuse: Enable activities or components to be reused across multiple workflows.
Shared libraries or templates: Provide a repository of predefined workflow patterns, activities, or templates to simplify and standardize workflow creation.

Use of a sub-workflow shown within another workflow

Observability

Monitoring and maintaining the health of a system is an essential to keep processes running smoothly. Workflow engines should offer the capability to query and observe the state of every activity, workflow, group of workflows, or the entire system. For example, a system administrator can easily track the progress of a long-running data processing workflow, identifying bottlenecks or errors in real-time.

A workflow engine with robust observability should provide these features:

Metrics: Collect and analyze performance indicators for workflows, activities, and resources to optimize and monitor the system.
Tracing: Leverage distributed tracing to visualize the call graph of a workflow, including its activities and any child workflows. This allows you to debug and diagnose issues more effectively
Logging: Record and store logs for workflow execution, errors, and events to aid in troubleshooting and auditing.
Alerting: Send alerts or trigger actions based on predefined conditions, such as workflow failures, high resource usage, or long-running activities.

Workflow that changes behavior based on failing activity

Resilience

Resilience is a significant advantage of workflow engines, as they provide the ability to retry activities and workflows with different policies. This feature helps ensure that transient errors, such as temporary network outages, do not cause the entire process to fail.

A resilient workflow engine should provide the following features:

Retry policies for activities and workflows: Automatically retry failed activities or workflows based on user-defined policies.
Timeouts for activities and workflows: Set limits on how long activities or workflows can run or wait for action, preventing stalled processes.
Fallback strategies: Define alternative actions or notifications when retries fail or certain conditions are met.

Workflow that shows transitioning from a failing into a successful activity

Durability

Durability ensures that workflows can process highly complex workflows, run indefinitely, or wait for action for hours or even days without losing data or state. For example, an approval process in a large organization might require multiple levels of sign-offs, and the associated workflow must be able to wait for the necessary approvals, even if it takes several days. With durable workflow engines, developers can create long-running processes that maintain their state and data integrity across extended periods, ensuring that no critical information is lost during the execution.

A durable workflow engine should offer these features:

Timers: Schedule workflows to begin execution at predetermined times or intervals.
Sleep functionality: Allow workflows to pause or wait for action without consuming excessive resources, supporting long-running processes.
Visibility into the current and past state: Provide insights into the current status and historical execution of workflows, enabling better tracking, management, and auditing.

A long running workflow represented by 3 dots inside of an activity bubble

Comparing Features of Workflow Engines: Code-based vs. DSL-based

Workflow engines have traditionally utilized domain-specific languages (DSLs) such as JSON or YAML to represent workflows. This approach offers several advantages, including ease of deserialization and serialization, which enables the creation, transmission, and validation of workflows using well-defined schemas e.g. JSON, YAML.

However, as workflows grow in size and complexity, developers may require more advanced features. Recently, workflow engines like DTFx and Temporalio have emerged, allowing developers to use code for designing workflows.

In this chapter, we compare the features of both code-based and DSL-based workflow engines to provide a balanced perspective.

Type of workflow definitions ( Workflow as Code / Workflow as DSL )

There are two primary categories of workflow definitions, each catering to a distinct audience. One focuses on developers, while the other targets business users. Both share common features, but also possess unique elements specifically designed for their respective audiences.

Workflow as DSL

Domain-Specific Language, or DSL, refers to a usually declarative language specifically built to address challenges within a particular domain or industry.

DSLs enable domain experts to define and configure workflows without need for extensive programming skills or understanding of the underlying workflow engine. By streamlining the process of creating and modifying workflows, DSLs not only enhance their readability but also increase their maintainability.

You can see a very simple example here which describes a workflow with just one activity that writes to console.

Id: HelloWorld
Activities:
- Id: Hello
  ActivityType: Console.Write.HelloWorld, MyApp

Workflow as Code

Workflow-as-code definitions are typically comprised of three essential components. First, there are activities, which serve as the building blocks for executing tasks within the workflow. Second, a workflow definition is created, written in code, to organize these activities. Finally, a wrapper is utilized around these activities, enabling the chosen framework to manage input and output. This allows for seamless internal transfer between executing instances and promotes scalability.

In this pseudo example you can see how a workflow is created directly in the code with function void workflow(Context context) and then the function function string activity() is utilized inside of context.RunActivity.

async function string activity() {
  return "Hello World";
}

async function void workflow(Context context) {
   string text = await context.RunActivity(activity)
}

Comparing the two types of workflow engines

Advantages of DSL-based Workflow Engines

Visual Editing: These engines often provide a user interface where the workflow can be visually edited and viewed.
Integrations: Many DSL-based engines support pre-built integrations with various external systems and services, such as databases, messaging platforms, or third-party APIs. These integrations can be beneficial, as they simplify the process of connecting and interacting with these external components, saving time and effort compared to developing custom integrations from scratch.
Dynamic Configuration: As DSL-based workflows are essentially configuration files, they can be easily stored in blob storages and loaded into engines without deployment, offering greater flexibility and ease of use when updating or adding new workflows.
Accessible: DSL-based engines are more accessible to non-developers, thanks to the use of simplified languages geared towards non-technical users.
Enhancement Capabilities: The DSL-based approach allows for the augmentation of code with additional configurations retrieved from other systems that may be similar in nature, making it easier to adapt and extend the workflow functionality for non-technical users.

Advantages of Code-based Workflow Engines

Type Safety: Activities often require access to prior data or a global state in order to process input data. Typically, this data is untyped, which can result in overlooked edge cases during workflow development.
IDE Support: When building workflows, the DSL method falls short in providing IDE Support for recommending activities, as well as input and output types.
Debugger: Generally, DSL-based engines lack debugger support for quickly identifying code bugs, instead relying on error reporting and logging.
Testability: Code-based engines allow developers to write unit tests for individual workflow components, as well as integration tests for the entire workflow. This ensures that workflows function as expected, improves code quality, and reduces the likelihood of introducing errors or inconsistencies during development.

Taking into account the advantages of both code-based and DSL-based workflow engines allows organizations to make well-rounded decisions when selecting an engine that aligns with their unique needs.

Criteria for Evaluating Workflow Engines

To effectively choose a workflow engine for your specific use case, it is essential to evaluate the engines based on a set of criteria. The primary decision you must make is determining the target audience, either developers or non-technical users, as this will significantly influence the suitability of the workflow engine for your project requirements. This chapter outlines the key evaluation criteria to consider when selecting a workflow engine for each audience.

Tip: Before diving into engine comparisons, clearly define your target audience, be it developers or non-technical users. This step will help streamline your selection process by focusing on the features and capabilities most relevant to your audience.

As previously stated, the two primary types of workflow engines cater to distinct audiences.

Developers (Workflow as Code): Workflow engines geared towards developers emphasize code-based configuration, extensibility, and integration with existing developer tools and platforms.
End users (Workflow as DSL): Workflow engines designed for non-technical users prioritize user-friendly interfaces, visual design capabilities, and a gentle learning curve.

When evaluating workflow engines, consider the following criteria:

Performance: Assess the engine’s ability to meet the specific performance requirements of your use case.
Integrations: Evaluate the built-in integrations offered by various engines, particularly those targeting non-technical users, to simplify your use case implementation.
Popularity: Consider the community support and usage for an engine, as evidenced by GitHub Stars and pulls on the platform where it is hosted.
Features: Determine which features are essential for your use case and evaluate engines based on their implementation of those features.
Scalability: Consider the scalability requirements of your use case and whether the engine is designed to accommodate such demands.
Licensing: Consider the licensing requirement of your use case and see if workflow engine license fits into your project use case.
Programming Language: A key criteria for selecting the engine is utilizing an engine where building custom workflows for you and your team is easy and utilizing the right libraries is simple.
Integration Type: Evaluate the desired type of the workflow engine, such as a library, framework, API, or fully-fledged application, based on your project’s specific needs and how it should integrate with your existing systems or development processes.

By using these evaluation criteria, you can better understand the strengths and weaknesses of different workflow engines, enabling you to choose the most suitable one for your specific requirements and target audience.

Reference table for different engines

This is a short list of popular engines with active community support, for a more exhaustive curated list have a look at awesome-workflow-engines.

Name	Programming Language	Type	Links
Apache Airflow	Python	Code	Website , GitHub
n8n	JavaScript/TypeScript	DSL	Website , GitHub
DTFx	C#	Code	GitHub
Temporal	Go, Java, PHP, Python, TypeScript	Code	Website, GitHub
Node-RED	JS	DSL	Website, Github

Conclusion

Throughout this guide, we have explored the essential elements of workflow engines, their qualities and features, and the critical factors to consider when selecting the right engine for your specific use case. By understanding the unique requirements of your project, whether it is geared towards developers or non-technical users, and assessing the scalability, reusability, observability, resilience, and durability of each engine, you can make an informed decision that best meets your needs. With this guide in mind, we were able to choose a workflow engine that fit our customer’s needs, and we are confident that you will be able to do the same.

Remember that the ideal workflow engine should not only address your current requirements but also be capable of adapting to the ever-evolving challenges of your business, ensuring that you can continue to create efficient and reliable workflows that drive success.

Hitchhikers Guide to Workflow Engines

Workflow Engines: Orchestrating Complex Processes with Precision and Scalability