Building workflows with the Durable Task Framework

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

>> You've probably heard about durable functions, but did you know that behind durable functions is the durable task framework? This is an open source framework that's been out for years that helps manage long-running functions and maintain state automatically. You need to check this out. [MUSIC]. >> Hello, and welcome to another episode of ON.NET. I'm your host Jeremy Likness, and today I've got Simon and Alphone with me. We're going to talk about the durable task framework, not to be confused with durable functions. This is something that I was thinking about when I first saw the title, but you'll be surprised to find there's an open source project that drives mini-projects, actually, inside and outside of Microsoft. Before we dive into that framework though, Simon tell me a little bit about what you do at Microsoft. >> I am Simon. I'm Engineering Manager on the Azure IoT platform team, specifically for IoT Hub, where we offer a variety of platform as a service offerings. We make heavy use of Durable Task within our systems. >> Okay, and Alphone? >> Hi, I'm Alphone, I'm also an Engineering Manager in the IT platform team. So as Simon mentioned, we make heavy use of the Durable Task Framework, and so I have been involved in the original version of the Durable Task Framework. So I've been on this since the beginning. >> So you helped actually contribute to making the framework? >> Yes. >> So tell me a little bit. Let's back up for a second. What is it the framework tries to solve? What problem are we looking at here? >> So the framework is meant to enable users to build workflows using code. Many workflows, when you think of workflows, typically you think of Visual Designers and boxes that you draw up induce hairdo, debit account, credit account, do something, provision a VM, create a storage account in terms of visual flows, but that did not work for us when we were building out our first provisioning system, which led us to this new approach of thinking about the problem. How can we actually use code, and specifically C#, because of all the great features in C#, to build out workflows that you can just write in code and just deploy them, and have them scale and realistically intend, and all of those things. >> Well, that's interesting to hear because I think some of the best tools are the ones that were actually created to solve a problem. Sometimes you hear about frameworks people create in search of a problem to solve, right? So this was something that you ran into an issue, and specifically we need a better way to do it in that way was workflows through code. >> Yeah. Actually, it was a Hackathon project. So we were building out the provisioning system for API management. API management service as an Azure service. We looked at various approaches, we looked at building a state machine-based system where you would create a storage account and then you would store the state, and hey, I just created a storage account, and then you would go on to the next step which is creating a VM, and then time the VM to the storage account. All of those things, right? So there was a lot of state management in the middle. So we thought why could we not use a VITS, basically the C#,.Net tasks, and VITS to actually encode this very simple control flow, which becomes so complicated once you throw in databases and answering machine, and then how can we incode that in core drive. So this is where it came up. >> So the idea if I understand it correctly, is instead of me exclusively opting into some sort of state management mechanism and saying, okay, I have a long running workflow. So encode I'm going to save my state, do all this ritual and ceremony. Come back at another point, go rehydrated, try to figure out what to do. You're taking advantage of the language features to manage state sort of behind the scenes for the user. So for me, it's simply do something, wait for that long running tasks, and continue. There's magic that happens in between but that's taken care of by the framework. >> Absolutely. So to add to that. So not only do we, actually let users have Avet become durable in some sense. We also have features which in the framework which actually allow the whole process to be dehydrated to desk or service bus in this case. Then we dihydrogen vendors actually work to be done. So as an example, if you have a two-step process which is again going back to my example of clear the VM, and then create a storage account. You would create a VM caller method which creates a VM under the covers, just makes two calls to. >> That's not instantaneous. >> Exactly. It takes like minutes, seconds, I guess tens of seconds at least, if things are working out fine but so once it creates the VM, the next step you want to do is to create a storage account. So you want to fire off the call to create a VM, and then you want the process to go to sleep in the sense that there is no memory being used, there is no compute being used as an adverb being used. The only time when this control flow comes back to live is when the VM was created. Then at that point, you want to control far to start from the next step which is basically cleared the storage account. So you do not want it to redo the whole halting again. Obviously, in some sense, you want instruction pointer to the code to be stored persistently, properly. So this we enable using this magic of durable tasks, where the tasks that you get back from this framework is what we call a durable task, which is automatically stored on some storage, key-value store, or some parts for some Azure storage, whichever provider you're using. Also it allows you to resume when the task is actually completed so that I guess is the magic of durable tasks that we would want to use. >> So you talked a little bit about Service Bus and an Azure storage. So there's some supporting mechanisms behind this that you're using to frame it in. Is there any code or example you can show that to drive that point home where I illustrate it. >> So we haven't done. >> When I came in and started working with firm at that point in time, there was a service provider which was used for the messaging flow and session state, and then Azure Tables is used for the instance store, which is really the history of the orchestration. One of the things we did at that point was we made a provider model. So if you could then bring your own provider. So if that wasn't going to meet your needs, you could write a provide it for SQL. There is one for storage. There is one that's going to be very soon core Redis and another one for service fabric. So depending on the teams need some people to service bus team, for instance cannot take a dependency on themselves, so they have a Service Fabric Provider. >> Okay. Interesting. >> So the main design principle behind the providers was to have, as we do dependency on a special feature as we could. For example, all you need is a key value store and potentially actually, that isn't, literally, give you the key-value stored and some compute obviously almost forgetting something. Some place to run this code. >> So I may be getting ahead of myself, but it sounds like you could potentially spin up something lightweight then for tests run through or smoke screens but plug it into production provider, "Service bus active account", for example when it's deployed is that okay? >> Absolutely. >> Yes, we actually have an emulator. That is an in-memory provider. >> Okay. >> That you can use for local testing, so you don't have any outside dependencies and both the latency or issues that come with that. >> Very cool. >> So I see you've pulled up some code, to tell me about the code you've got. >> So what I've got is I'm using the ServiceBusOrchestrationService which means that we're using Service Bus and Storage for our state. What I've defined here is you have to find a basic, I've instantiated the Service Bus client, the ServiceBusOrchestrationService We're going to use an AzureTableInstanceStore for storage connection string and Service Bus connection string is going to get us our queuing mechanism and state ends and session state. We're going to create some of the concepts, or there's a client-end a worker. So the client is what you will call to create a workflow and fire it off, and then the worker is the one that's actually going to have dispatches that'll pick up the work, and run it for for the orchestration and that'll in turn spawn off activities that'll go and run, and one of the things that it does, is it allows you to fan out. >> Okay. >> Fan out the tasks. So not only do you get the durability guarantees of long-running processes, but you can fan out across if you have operations that are compute-intensive, you can fan those out as well. >> Okay. >> So the basic example I've done here is really just a sum of squares orchestration, it's good for illustrating the concepts, but it obviously is not something you end up using. >> Orchestration, you can think of that as a control flow, for the workflow and activities as the actual work items that control flows. >> Right. So my flowchart, all my arrows, and decisions are part of the orchestration, and my nodes are going to be active. >> Exactly. Yes. >> Okay. >> Every orchestration has a state, and that state is what it uses to replay into the previous activities results. So what I've got is I've created an array. It's an array of either numbers or sub-arrays of numbers. >> Okay. >> This orchestrations is going to go, and it's going to sum the squares of all the numbers. >> Okay. >> So I'll get back to this code in a second, but I want to show you the activity that we have defined is pretty simple. All it's going to do is it's going to receive a number, it's going to return the square of that. >> Okay. >> We have an orchestration which we've defined, it's going to take the input, and it's going to pass it to JSON, and then it is going to loop through the items in that JSON, and if it's an array, it's going to create what's called a sub-orchestration. A sub-orchestration is an orchestration within an orchestration, but it has it's own lifetime essentially. >> Right, but they're tied to each other. Parents and everything and until the child. >> Think of this of workflow spinning up another workflow. >> Okay. So this is a recursive sub-orchestration basically. >> Yes. So if we said if it's an array, we're going to create a sub-orchestration which is going to go do its thing, if it's an integer, it's just going to go and call that activity that I just showed you that does the multiplier. >> Okay. >> Then it's going to, each of these is going to be creating a task, and then we just do, went all on our tasks, and then we're going to sum the results of that, and then that's going to be the results of our orchestration. So a little bit more about how we got the setup is, we have the client where which we're going to call to instantiate, and we're going to have the worker, which is actually going to do the work. On the work that we registering our orchestration, where you registered the type itself, and then the activity would have only done just one as a simple example, but a sum of squares task. So we basically registering our code with the worker, so that it knows what to do. >> The worker is the unit of scale as well. So imagine this one worker is running on a single VM, right? So it is processing N number of orchestrations, as the client says, "Hey, run this workflow, run this workflow, and run this worker." This worker is the only one picking up all of these, actually executing them. You're could throw in more workers as well. Give them the same, give them the same connection string as you see, as this one is being given. They all will compete for work on the same Service Bus store in this case. >> Does this project also handles, so this handles the orchestration, and the clients, and the flows. Does it handle providing the infrastructure that the scale out or is there something else have to provide the different VMs or processes. >> That thing we've left as an exercise for the deploy because there were so many different combinations of places where you can operate them. Samples that show you how to run it in a single process, but actually running it in different hosting environment is not part of that. >> Okay. So when we talk about, because I know a lot of people get confused a little bit. They hear durable tasks framework, they think durable functions. Durable function uses this, but part of the services is providing is managing that scale out and providing. >> Exactly. >> How it's basically for all the processes. >> That's advantage definitely. That's the durable function is more managed way of using durable task framework. >> Got it. Okay. Makes sense. >> It's also got the rich integration with the David experience, to host your functions and everything I get as well. >> Right. >> So then you really bringing all of those things together. >> Okay. >> So once we've got up to our orchestrations and activities registered, we will go start it, and then here we can use the client. We can create an orchestration. We'll tell it what type it is, and we'll give it the input which is that bag of numbers JSON that I showed earlier. Then we're going to do a sample wait for orchestration. I've got a little bit of debug that chosen here, that what's going to do, but what's not apparent here is what's going to happen is, it's actually going to be doing a whole bunch of message. We could have had this running on five or 10 machines. You could have the client on one machine, you could have five machines being orchestration hosts, and you could have 15 or 20 machines running activities depending on what they're doing. So if we take this now and we run it, it is going to start up and you'll see that we've got up an input. >> Now it's passing. >> So the original orchestration got that in, it then went and set, I'm going to run activities for the two, the three, and a sub-orchestrations for the four and the five. So that's you'd gone and sent that back out to be more work. >> Right. >> Then so we'll get more orchestration input for that. We'll get more orchestration input for the others that we've defined in here, and those will fan out, and then they're eventually going to fan back in as you run all the squares and, at the end where we've got our debug. Down here, the sum of the squares is all the orchestrations actually completing them. So you can see that the inner ones are fanning back in, until eventually we get the sum of squares which is 2,869 which is the sum of all those squares. >> So in some sort this is basically doing a scatter/gather. So if you have a farm of machines, the main orchestration is going to take the input, bigger new chunks and then create either sub-orchestration out of it, if they're bigger sequences, or it is going to hand them off to activities which are leaf level nodes. Leaf nodes versus sub-workflows. This is going to wrap up all of the results. After fanning them out, collect all the results, and then paste them on the screen. Interestingly, this is also reliable. So if in the middle of the workflow, if the machine just blew up, right? Just went down, restarted, it would start from exact same spot where it had stored. >> Okay. >> Because the status is saved. It actually remembered the last wait it was stuck on waiting for and then it just resume from that. >> So that's actually one of your constraints inside the main orchestration, is that it has to be reliable code. Right? I can't generate a random number and make decisions based on your randomness because then your flow is not working, but as long as it's a, I don't even like-. >> The dominant state. >> Yeah. >> Whatever, but as long as it's a consistent flow but the activities that are calls those nodes, those can do pretty much whatever they wanted to. >> Absolutely, exactly. As you said, the control flow is a special place. That is where the programmer actually that's the trade-off that we made. That whatever you write within that special orchestration code, needs to be deterministic. So you cannot have random numbers, you cannot have, get me the current temperature or something like that. You cannot do something that will yield a different result the next time it is replayed. >> That's just the control flow. >> Yes. Activities are a fair game. >> Yeah, but we do have some specific functionality that isn't it. That enables things that you need. So you can create, we have time as, and things like that. So if you're going to create an orchestration that's going to fire off an activity that's waiting for someone to manually approve something, and that could take three days, and you say, I'm going to wait up to three days, and if they don't finish, then you can actually code that into your orchestration to have time is fired. Because you can't do a regular time, and you can't to do anything like that. >> So this is actually, what we've brought up is several advantages so far. So the way that this manages state means that I have a resilient process. So if something were to fail in the middle of it, it's preserving state, so it can replay up to that point and then continue running. So if we crash, we restart, we still get the output that we're looking for. The other thing is scalability. So you can have multiple workers dealing with the work that's handed off from the orchestrator, and then it sounds like there's a lot of patterns that are sort of built in to the framework. So we talked about fan out, and fan in, and I'm waiting for multiple processes and then I want to converge them back to result. I'm assuming you have like asynchronous sequential workflows. So wait for one, then the next, then the next. >> Then the nice thing is, that it's all built using .NET tasks, familiar concepts. Right? >> Right. >> If you want to do a sequence of workflow, you await task one, and then you'll await task two, and then await task three. >> I want to call that out because in your code. >> Yes. We did a task when all over here. >> Right. >> But we could've done in an await on this, and then an await on the next one, and then it awaits on the next one. If we wanted to do them one at a time. >> When all is deceptive because it looks like just a simple line of code but what you're really doing is all this orchestration and state management and basically storing history in the background, right? >> Yes. So basically, exactly. This simple orchestration actually created like I don't know n tasks. And every task represented some Word that was executing on a different machine potentially. >> Potentially. >> Potentially and the tasks would only complete when the machine had executed that piece of code and returned some value or some result, and then the result will be passed back to the waiter or the entity that is actually getting the result from that task. So and then this task that you get back from the framework and the special context is can be manipulated just like any other .NET task you can awaited, you can chain it with other tasks, you can use continue with as well and you can actually do a task when all tasks or than any of the patterns that you are familiar with in.NET tasks you can apply in this. >> This sounds like if I want to level up my C# knowledge and get behind the scenes of what await and async does this open-source project would be- >> Absolutely. >> -a place to start digging to see how you hook into those language features. >> We've done some pretty interesting gymnastics and how we build this out. It's an interesting lead for someone who wants to just experience some [inaudible] >> I'm going to give it the 15-minute rule. I'll look at the code for 15 minutes and if it's still completely foreign to me, I'll say it's a good thing they wrote it and not me. Now, the example that you're showing to get started with this entire framework. Is that do you have to pull down the code, compile it is as easy as a NuGet package reference. How do you start plugging into the durable tasks? >> We have some NuGet packages. >> Okay. >> If you want to dig into the code, you can as well, but the GitHub repo that we have does have some samples that allow you to get started with some documentation. >> I assume the packages give you kind of fine-grained control over what features of the framework you want to use. >> Yes. So we have a COL, there's a COL package which is the COL Run-time. >> Okay. >> This whole thing but then there's the emulator which you'd bring in for your testing purposes and just general tooling around and then you pick what you provider isn't bringing whatever packages off for your relevant provider. >> Okay. That sounds good. We'll put all of those links in the show notes, so people can go out and grab that. Now, one of the things or did you still have something you wanted to show with this demo because I have a question that. >> No, I think nothing. >> I don't want to distract. So we've been talking about the durable functions because that's something I think a lot of people are familiar with but my understanding is there are a lot of internal projects even at Microsoft that depend on that Durable Task Framework or are you familiar with others, you mentioned IoT Hub as well using that for provisioning? >> Yeah. So the first project that we started with was API. So in the context of provisioning. So we have used this framework a lot in just the control plane for our services. For example, API management, when you go and create a new API management service, the Durable Task Framework run in the background and creates all the necessary resources and ties them together into one unit and then presents it to the user. IoT Hub similarly uses this to build out the control plane. So whenever you go and create an IoT Hub, it actually uses this framework to go and create X and Y and Z resource and then string them together. Besides that, this is also used in the runtime plane as well for IoT Hub. If you go to IoT Hub and use this feature we call DeviceJobs which is basically if you have a million devices connected to your IoT Hub and you want to run a command or you want to update what we call a Device State, Device Twin, for the million devices, you just make one call to the IoT Hub and IoT Hub will give you back an operation status or operation cookie which you can use to track status. >> In choosing one of these orchestrations. >> It Is basically under the code. It is a big massive orchestration with a lot of sub-orchestrations which divvy up into device orchestration, so use it to manage state for that. There were I think- >> There's a couple of other teams that we've actually been surprised. I had someone new who joined the team about eight months ago and he said," Hey you know, I found your name on Durable Task and we've got five different teams using it." >> Oh wow! >> So there's a couple of users out there we don't know of. But there's number of teams that are using us. >> It has all been organic. So we've never like I think this is the first time whatever I don't know. Not sure about the thing. It has been organic just because it was useful, it was occupying. >> Can you think of other situations we talked about provisioning and workflows in general but are there some canonical like if you're in manufacturing and you have this problem or can you think of other patterns if you will that lend themselves to this framework. So someone's watching this they say, "Oh I've been struggling with this over here. Maybe I should look into the terrible task framework." >> Abstractly speaking, if you have long-running process by definition that needs to be stateful otherwise it is not efficient. Because of this long-running either you keep everything in memory and then you wait until the whole thing's done. >> Right. >> Or you can make a stateful and the first case by the way it's also not reliable if it's in memory, the machine can crash and you can lose your state. So it's a long-running process. It is probably stateful which means that you check point at every point in that step. Then you need something like dribble task to make it easy for you to write code for that. >> Right. >> Which actually brings in a lot of scenarios you just mentioned. Potentially, manufacturing might be interesting one. Mostly, we've seen LOB Integration like Business Processes, Workflows scenarios like SharePoint has this document Workflow scenario. We have a document that needs to be approved by five different people. So somebody writes up a document. It goes to person number one, they approve it. Goes to person number two, they approve it. Maybe an expense report kind of thing, where the gaps are dependent on a human actually looking up an alert and saying, "Okay, I need to go to that spot." >> Because you can wait for that manual process using the Workflow. >> Exactly. >> It could be the three days or weeks or months. >> Right. >> That's just a get dehydrated. >> That's something I've done with Durable Functions but not with the Durable Task Framework. This gives me a whole new set of things to play with. Are you still involved on either view on the core team adding features or supporting? >> So me not as much as Simon and folks on his team are actively doing. >> Any major exciting features and more stabilization and cleaning up defects that are done in the field or? >> There's some list optimizations and things like that right now because it's a replay-based. It means that it's going to dehydrate and re-hydrate and potentially that means it does a little. It's a little chatty to itself and from the compute standpoint. So Durable Functions has already implemented a sticky concept where it will stack things and optimize the replay. But the core framework doesn't doesn't implement that yet. >> Okay so that's something- >> That's an example of something that would be brought in. >> Because I know that you have some concessions that if I'm intentional about it for example if I'm running an infinite loop that's obviously not going to make sense because they're history build but I can kind of kick off the task as if it's new but still maintain the state from the previous so there are those hooks already in place. >> Yes. >> So it's exactly. Another thing that we had been thinking about is adding Inline Execution of activities as well. Of course, simpler activities just having a Lambda for example just in line. >> Okay. So you don't have to create. >> You don't have to go into a new activity and go through all the machinery. So we're thinking about some optimizations again to Simon's point to more about optimizing the flow as we continue to consume it within our data plan as well. For example, IoT Hub uses Durable Task. We continue to find opportunities to optimize the flow as well. But it's all based on feedback that we're getting in and generally the pinpoints, the top pinpoints that are on our list. >> Awesome, sounds great. Well, thank you so much for coming out and sharing the demo and your knowledge about the Durable Task Framework. I'm going to say that this is a framework that is used heavily internally and externally it's open-source and it's baked into some of our most important services. So if you're doing anything with any of these patterns that we've discussed, if you're working on long-running workflows, you definitely need to check out the Durable Task Framework. [MUSIC]

Info

Channel: dotnet

Views: 14,531

Rating: undefined out of 5

Keywords: .NET, Microsoft, .NET Core, C#, Distributed Systems, Async, Azure, Durable Functions, Durable Task Framework, Workflows, Programming, Long running Tasks, Software Development

Id: 11a4FMm5BHU

Channel Id: undefined

Length: 28min 45sec (1725 seconds)

Published: Wed Aug 14 2019