Write about Azure Durable Functions – a short overview

[Reading Time: 12 minutes]

First comes first

Azure Durable Function is a cool and helpful technology in getting workflows done. You can imagine Durable Function (DF) as a lightweight Workflow-engine for developer. Ok, BizTalk, Nintex and all the other cool “tools” have their right places, but with DF you can code your workflows in different styles (languages, patterns, …) and control the flow better and easier than any engine could. So, with that said, I like to introduce Azure Durable Function, giving a short overview and a Hands-on with example code.

What are we going to get?

First, I will cover the scenario. For that please have a look to the following diagram. In my code repository on GitHub, I cover this scenario. It is about a user, going to register on a platform closed to members (members only- area). To verify, that the user is not a bot or something else, the user has to use a second verification way by eMail or mobile phone. With the image below, I will explain the steps to do in that workflow.

Figure 1 verification workflow

  1. The user sends e request for registration with his email and mobile phone number to Azure.
  2. Then he will receive a SMS and/or an eMail with a PIN.
  3. When using eMail, the user can click on a URL, that sends back the verification code;
    • alternatively, the user can pass the PIN (from SMS or eMail) into (has not been developed so far) a UI, that posts this to Azure back.
  4. If this happens in between 90 seconds, and the passed code is valid (no typos etc.), -> the users registration is verified.
    • The process can continue with whatever should happen (this is not scope of that article)
  5. If the user passed a wrong PIN -> a counter will track that three times and cancels the workflow at the end.
  6. If a timeout occurs, the whole registration process becomes obsolete.

Feel free, to get a deep dive into the repo to play around with the code.

What are Durable Functions?

Let me say, that DF are built on top of Azure Functions. So, say, this is a Durable Function Extension (DFx). The core is based on Durable Tasks Framework (GitHub Repo here). This gives code the ability to be handled asynchronous (long running) and keeping states (said simple). With that, Durable Tasks Framework makes Azure Functions stateful. So, developer can benefit from serverless state management. But before continuing with Azure DF, I should shortly show, what Azure Functions are, if you don’t know. Otherwise skip this here until “Why durable?”.

What is Azure Function – in short?

Azure Function is “Function as a service”, what means, a developer can concentrate on a “pure function”. But, what is a “pure function”? From a mathematical perspective, it is a function, that returns a specific output for an input. Every time, this is repeated with the same input, it results in the same output. So, you could easily replace the function for a specific input by its return value. It never changes a return value for an input. e.g.:

y = mx+n -> {m = 0, x = X, n = 1 | X = 2} -> 1 = 0*2+1 -> y = f(x)

This means you could also replace f(x) with 1 for input 2 (X): y = f(x)1 [basic math, isn’t it!] (see Wikipedia) A function in Azure should also be used with this idea above, then, you can split an Azure Function into three parts. One is the Input/Trigger of the function. There are a lot of input bindings (what, in general, is a method, to decorate a function with information on how and where to bind the underlying service to an input source). For correctness, I should mention, that a function can only be activated with a trigger. A trigger is also an input, but it “pushes” input values to function, whereas an input binding is used mainly as “data sources” (queues, tables, ….) and cannot trigger/activate a function on changes. Second, a function can have an output binding. This here is same as input binding, but with opposite direction (means “sending to…”) . For example, a function can evaluate some input data and passes return value into a message queue. Finally, a functions body can be seen as core, or logical/computational brain. It can be written in different languages like C#, Java, JavaScript and so on. When writing about “serverless”, I should also write, what I think, this is about. Serverless is a cloud-computing execution model, what means, that a provider is hosting all the servers, network components an all other resources, that are needed, to have computing power, not caring about resource allocation. The provider hosts, maintains and manages the machine resources. Therefore, scaling is also managed by the provider. The consumer has not to deal with anything below his service. Furthermore, serverless means modern pricing models, that are based on usage of resources.

Why should there be a durable solution … ?

Figure 2 Scenarios function communication

… if you can do everything with Azure functions (where you can use messaging with Queue input/output and Storage Tables for holding states)? See the following scenario. Say, we want to chain functions – that’s easy -, by sending a message from first function to second and from second to third. No magic! Consider next scenario, fan out from one message to a bunch of parallel working functions and waiting for all to complete with final function, evaluating results from all other before. This is not so easy done with Azure functions as they are – when is last function complete? Furthermore, if there is an exception in between function calls? How to handle these? Ignoring, retrying ore rolling back? Doing this all by hand can be hard – I know that, did it for a long time. 

Durable functions – benefits and concepts

In a nutshell, a developer can create workflows of any kind in code. It is also easy to implement. Exception can be handled in one point. Tracking progress or cancelling orchestration is easy. And last, state handling is done out of the box. A durable function can act on different ways like waiting for external events, calling activities or start sub orchestration. Activities, a durable function calls are plain Azure functions, which is why a developer has all freedom, to do, what he can do. External events are for example user inputs or events from “unknown” services etc.. Sub-orchestration gives the ability to maintaining/partition complex workflows into smaller parts. For durable function this means, there are three main parts, that are built of an

  • Orchestrator
    • That calls activities
    • That controls the flow
    • That waits for completion
    • Keeps track of all states
    • DEV: decorated with OrchestrationTrigger
  • Starter
    • That triggers an orchestrator
    • Responds to the caller with details of orchestration maintenance URIs
    • DEV: decorated with OrchestrationClient
  • Action
    • That is a step within a workflow
    • That leaves all dev-freedom
    • DEV: decorated with ActivityTrigger

I will point out, that there are some rules, a developer should keep in mind, when creating an Orchestrator.

  • First, the code must be deterministic This means, for the same input, there must be the same result – every time! There must not be any side effects, like working with (for example) Date/Time.
  • Second, code should be non-blocking Access to IO or using something like Thread.Sleep() should be handled in activities
  • Third, Async operation should never be initiated Meaning, no Task.Run or HttpClient.SendAsync…, because this can bring up side effects and can break orchestration
  • Last, avoid infinite loops This can lead to out-of-memory exception, because internally orchestrator builds a history, that would exceed limitations

Keeping state

Figure 3 Event Sourcing for durable functions

Azure durable function, as said before, keeps/maintain the state. In the example above (verifying member registration), there are a lot of states, that should be handled. The system waits for a manual user input. This means, the waiting time can vary for an amount of time. These needs to be tracked. For that reason, it uses Azure storage. To communicate with activities durable functions uses Queues and for tracking the states it makes use of Table storage. Altogether this is called Event Sourcing (it’s an abstract view of, what’s going really on in back end). For further reading on that topic I recommend the architecture pages of Microsoft Docs. There you can find a lot of interesting references on architecture in cloud. In principle, the concept works as you can see in the picture above. The orchestrator sends an event (it contains additional data as payload). This gets stored in Azure Table Storage for holding state and history. After successful persistence, it gets send using Azure Storage Queue. A triggered function (Activity) retrieves all payload data and acts on it. After finishing, it sends also an event with (maybe) payload, where the orchestrator listens and was persisted into that Table Storage. This continues as long, as the orchestrator does not complete. For sure, this description is not the real detailed story behind the scenes, but you can have a look into Microsoft Docs as suggested before – there are also a lot of other resources, you can find in the internet, that are helpful.

What about maintaining Durable Functions?

Let me show you some aspects of maintenance of Durable Functions. It is important to understand some details on how to track status, interact with maintenance tools, do logging and other details. Durable Functions can be maintained by using REST Api. The starter function calls the orchestrator – better it runs up a new instance of orchestrator function. As a response from the starter, you are getting an Accept (202) with a location header, that contains the URI for requesting the current state of the orchestrator instance. https://{host}/runtime/webhooks/durabletask/instances/34ce9a28a6834d8492ce6a295f1a80e2?taskHub=DurableFunctionsHub&connection=Storage&code=XXX With that a browser, or another tool, that can evaluate location header, can use this for redirecting. (as long as response is 202 the orchestrator is running – getting back 200, means completed). Additionally, the response contains a json body with following content: { “id“:”34ce9a28a6834d8492ce6a295f1a80e2”, “statusQueryGetUri“:”https://{host}/runtime/webhooks/durabletask/instances/34ce9a28a6834d8492ce6a295f1a80e2?taskHub=DurableFunctionsHub&connection=Storage&code=XXX”, “sendEventPostUri“:”https://{host}/runtime/webhooks/durabletask/instances/34ce9a28a6834d8492ce6a295f1a80e2/raiseEvent/{eventName}?taskHub=DurableFunctionsHub&connection=Storage&code=XXX”, “terminatePostUri“:”https://{host}/runtime/webhooks/durabletask/instances/34ce9a28a6834d8492ce6a295f1a80e2/terminate?reason={text}&taskHub=DurableFunctionsHub&connection=Storage&code=XXX”, “rewindPostUri“:”https://{host}/runtime/webhooks/durabletask/instances/34ce9a28a6834d8492ce6a295f1a80e2/rewind?reason={text}&taskHub=DurableFunctionsHub&connection=Storage&code=XXX” } The ID-attribute contains the instance id of the orchestrator, started by starter client. It is used nearly everywhere, and you need this for troubleshooting The statusQueryGetUri-attribute contains the URL, that a tool (maybe postman) can retrieve current state of running orchestrator. It is the same as the location header has. The sendEventPostUri-attribute can be used for sending events to the orchestrator. This is mainly used for implementing the “wait for external event” – construct. The terminatePostUri-attribute delivers a URL, that one can use for cancelling an orchestration. This is used, when an orchestrator runs long running operation, that becomes obsolete for example or is unresponsive. The rewindPostUri-attribute contains a URL, that can be used to restart and go back to starting point. Details about the structure of such URL is obvious. The host-part is the name of the App-Domain. The orchestration Id is also part of the URL. This is the main part, to understand. If you have all instance IDs of orchestrator, you can track, maintain or cancel all orchestrators, by using “same” URL. If you are interested in further details, you may study the code in my GitHub repo or read Microsoft Doc (it is well written and easy to understand).

What about Error handling?

Figure 4 Handling errors

As pointed out in “Why should there be a durable solution” handling error can be difficult in an Azure Function scenario, when not having a single point, where handling failures can happen. Back to our chaining scenario of three function communicating from first to third function. When an error occurs, as the picture above points out, in second function, how to deal with? Ignoring…? Failing with an exception…? Or doing retries? Going the hard way and keeping track of states and handling communication with messages by hand, would higher the complexity of your system and leaves places for more bugs. Also, it gets hard to maintain all states. For example, if you want to do Retry as handling strategy, then, you should have all states of interest to roll back and reset states. With Azure Durable Functions, this can easily be done in the orchestrator. Making use of the language features you are using is everything you need. For example using C#, you can try..catch and handle an exception with a roll back, where you have “simply” call a Cleanup-method and doing a new “await” on last activity call. That’s all. Also, exception handling is done with standards. Let’s say, an activity fails and throws an exception – in out example this would be the activity, sending a SMS to a phone with PIN code for verification. Best Practice: That exception should be caught in the activity. The response to the Orchestrator contains an Error-State (e.g. IsError = true) and the Value (here exception message). The Orchestrator evaluates the response object and acts depending on error flag and or message. In case of successful activity, the Orchestrator evaluates only Value attribute (see my GitHub repo). That’s it! If you like this post or if you want to share your ideas thoughts with me, let me know. Every feedback, as long it is constructive is welcome. I encourage you to take my example from GitHub go through or add features. I would be glad, if this could help someone out there.

By Thomas

As Chief Technology Officer at Xpirit Germany. I am responsible for driving productivity for our customers by a full stack of dev and technology in modern times. But I not only care for technologies from Microsofts stack like Azure, AI, and IoT, but also for delivering quality and expertise with DevOps

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.