>> You've probably heard
about durable functions, but did you know that behind durable functions is the
durable task framework? This is an open source framework
that's been out for years that helps manage long-running functions and maintain state automatically. You need to check this out. [MUSIC]. >> Hello, and welcome to
another episode of ON.NET. I'm your host Jeremy Likness, and today I've got Simon
and Alphone with me. We're going to talk about
the durable task framework, not to be confused with
durable functions. This is something that I was thinking about when I
first saw the title, but you'll be surprised
to find there's an open source project
that drives mini-projects, actually, inside and
outside of Microsoft. Before we dive into
that framework though, Simon tell me a little bit
about what you do at Microsoft. >> I am Simon. I'm Engineering Manager on the Azure IoT
platform team, specifically for IoT Hub, where we offer a variety of
platform as a service offerings. We make heavy use of
Durable Task within our systems. >> Okay, and Alphone? >> Hi, I'm Alphone, I'm also an Engineering Manager
in the IT platform team. So as Simon mentioned, we make heavy use of the
Durable Task Framework, and so I have been involved in the original version of
the Durable Task Framework. So I've been on this
since the beginning. >> So you helped actually
contribute to making the framework? >> Yes. >> So tell me a little bit.
Let's back up for a second. What is it the framework
tries to solve? What problem are we looking at here? >> So the framework is meant to enable users to build
workflows using code. Many workflows, when
you think of workflows, typically you think
of Visual Designers and boxes that you
draw up induce hairdo, debit account, credit
account, do something, provision a VM, create a storage
account in terms of visual flows, but that did not work for us when we were building out
our first provisioning system, which led us to this new approach
of thinking about the problem. How can we actually use code, and specifically C#, because of
all the great features in C#, to build out workflows that you can just write in code
and just deploy them, and have them scale and realistically intend, and all of those things. >> Well, that's interesting to
hear because I think some of the best tools are the ones that were actually created to solve a problem. Sometimes you hear about
frameworks people create in search of a problem
to solve, right? So this was something that
you ran into an issue, and specifically we
need a better way to do it in that way was
workflows through code. >> Yeah. Actually, it
was a Hackathon project. So we were building out the provisioning system
for API management. API management service
as an Azure service. We looked at various approaches, we looked at building a state
machine-based system where you would create a storage account and then you would store the state, and hey, I just created
a storage account, and then you would go on to the
next step which is creating a VM, and then time the VM to
the storage account. All of those things, right?
So there was a lot of state management in the middle. So we thought why could
we not use a VITS, basically the C#,.Net tasks, and VITS to actually encode
this very simple control flow, which becomes so complicated
once you throw in databases and answering machine, and then how can we incode
that in core drive. So this is where it came up. >> So the idea if I
understand it correctly, is instead of me exclusively opting into some sort of state
management mechanism and saying, okay, I have a long running workflow. So encode I'm going to save my state, do all this ritual and ceremony. Come back at another point, go rehydrated, try to
figure out what to do. You're taking advantage
of the language features to manage state sort of behind
the scenes for the user. So for me, it's simply do something, wait for that long running
tasks, and continue. There's magic that happens in between but that's taken care
of by the framework. >> Absolutely. So to add to that. So not only do we, actually let users have Avet
become durable in some sense. We also have features which in
the framework which actually allow the whole process to be dehydrated to desk or
service bus in this case. Then we dihydrogen vendors
actually work to be done. So as an example, if you have a two-step process which is again going back to my
example of clear the VM, and then create a storage account. You would create a VM caller method which creates a VM under the covers, just makes two calls to. >> That's not instantaneous. >> Exactly. It takes like minutes, seconds, I guess tens
of seconds at least, if things are working out fine
but so once it creates the VM, the next step you want to do is
to create a storage account. So you want to fire off
the call to create a VM, and then you want the process
to go to sleep in the sense that there is
no memory being used, there is no compute being
used as an adverb being used. The only time when this control flow comes back to
live is when the VM was created. Then at that point, you
want to control far to start from the next step which is basically cleared
the storage account. So you do not want it to redo
the whole halting again. Obviously, in some sense, you want instruction pointer to the code to be stored
persistently, properly. So this we enable using
this magic of durable tasks, where the tasks that
you get back from this framework is what
we call a durable task, which is automatically
stored on some storage, key-value store, or some parts
for some Azure storage, whichever provider you're using. Also it allows you to
resume when the task is actually completed so
that I guess is the magic of durable tasks that
we would want to use. >> So you talked a little bit about Service Bus and
an Azure storage. So there's some supporting
mechanisms behind this that you're
using to frame it in. Is there any code or example you can show that to drive that point
home where I illustrate it. >> So we haven't done. >> When I came in and started working with firm at that point in time, there was a service provider
which was used for the messaging flow
and session state, and then Azure Tables is
used for the instance store, which is really the history
of the orchestration. One of the things we
did at that point was we made a provider model. So if you could then
bring your own provider. So if that wasn't going
to meet your needs, you could write a provide it for SQL. There is one for storage. There is one that's going to be very soon core Redis and
another one for service fabric. So depending on the teams need
some people to service bus team, for instance cannot take a
dependency on themselves, so they have a Service
Fabric Provider. >> Okay. Interesting. >> So the main design principle behind the providers was to have, as we do dependency on
a special feature as we could. For example, all you need is
a key value store and potentially actually, that isn't, literally, give you the key-value
stored and some compute obviously almost
forgetting something. Some place to run this code. >> So I may be getting
ahead of myself, but it sounds like you
could potentially spin up something lightweight then for tests run through or smoke screens but
plug it into production provider, "Service bus active account", for example when it's
deployed is that okay? >> Absolutely. >> Yes, we actually have an emulator. That is an in-memory provider. >> Okay. >> That you can use
for local testing, so you don't have any outside
dependencies and both the latency or issues
that come with that. >> Very cool. >> So I see you've
pulled up some code, to tell me about the code you've got. >> So what I've got is I'm using the ServiceBusOrchestrationService
which means that we're using Service Bus
and Storage for our state. What I've defined here is
you have to find a basic, I've instantiated
the Service Bus client, the ServiceBusOrchestrationService
We're going to use an AzureTableInstanceStore for storage connection string and Service Bus connection string
is going to get us our queuing mechanism and
state ends and session state. We're going to create
some of the concepts, or there's a client-end a worker. So the client is what
you will call to create a workflow and fire it off, and then the worker is the
one that's actually going to have dispatches
that'll pick up the work, and run it for for
the orchestration and that'll in turn spawn off
activities that'll go and run, and one of the things that it
does, is it allows you to fan out. >> Okay. >> Fan out the tasks.
So not only do you get the durability guarantees
of long-running processes, but you can fan out across if you have operations that
are compute-intensive, you can fan those out as well. >> Okay. >> So the basic example
I've done here is really just a sum of squares orchestration, it's good for illustrating
the concepts, but it obviously is not
something you end up using. >> Orchestration, you can think
of that as a control flow, for the workflow and activities as the actual work items
that control flows. >> Right. So my flowchart,
all my arrows, and decisions are part
of the orchestration, and my nodes are going to be active. >> Exactly. Yes. >> Okay. >> Every orchestration has a state, and that state is what
it uses to replay into the previous activities results. So what I've got is
I've created an array. It's an array of either numbers
or sub-arrays of numbers. >> Okay. >> This orchestrations
is going to go, and it's going to sum
the squares of all the numbers. >> Okay. >> So I'll get back to
this code in a second, but I want to show you the activity that we have defined
is pretty simple. All it's going to do is it's
going to receive a number, it's going to return
the square of that. >> Okay. >> We have an orchestration which we've defined, it's
going to take the input, and it's going to pass it to JSON, and then it is going to loop
through the items in that JSON, and if it's an array, it's going to create what's
called a sub-orchestration. A sub-orchestration is an
orchestration within an orchestration, but it has it's
own lifetime essentially. >> Right, but they're
tied to each other. Parents and everything
and until the child. >> Think of this of workflow
spinning up another workflow. >> Okay. So this is a recursive
sub-orchestration basically. >> Yes. So if we said
if it's an array, we're going to create
a sub-orchestration which is going to go do its thing, if it's an integer, it's
just going to go and call that activity that I just showed you that
does the multiplier. >> Okay. >> Then it's going
to, each of these is going to be creating a task,
and then we just do, went all on our tasks, and then we're going to
sum the results of that, and then that's going to be
the results of our orchestration. So a little bit more about
how we got the setup is, we have the client where which
we're going to call to instantiate, and we're going to have the worker, which is actually
going to do the work. On the work that we
registering our orchestration, where you registered the type itself, and then the activity would have only done just one as a simple example, but a sum of squares task. So we basically registering
our code with the worker, so that it knows what to do. >> The worker is the unit
of scale as well. So imagine this one worker is
running on a single VM, right? So it is processing N
number of orchestrations, as the client says, "Hey, run this workflow, run
this workflow, and run this worker." This worker is the only one picking up all of these,
actually executing them. You're could throw in
more workers as well. Give them the same, give them the same connection
string as you see, as this one is being given. They all will compete for work on the same Service Bus store
in this case. >> Does this project also handles, so this handles the orchestration, and the clients, and the flows. Does it handle providing
the infrastructure that the scale out or is there
something else have to provide the different VMs
or processes. >> That thing we've left as
an exercise for the deploy because there were so many
different combinations of places where you can operate them. Samples that show you how to
run it in a single process, but actually running it in different hosting environment
is not part of that. >> Okay. So when we talk about, because I know a lot of people
get confused a little bit. They hear durable tasks framework, they think durable functions. Durable function uses this, but part of the services is providing is managing that scale
out and providing. >> Exactly. >> How it's basically
for all the processes. >> That's advantage definitely. That's the durable function is more managed way of using
durable task framework. >> Got it. Okay. Makes sense. >> It's also got the rich integration
with the David experience, to host your functions and
everything I get as well. >> Right. >> So then you really bringing
all of those things together. >> Okay. >> So once we've got up to our orchestrations and
activities registered, we will go start it, and then here we can use the client. We can create an orchestration. We'll tell it what type it is, and we'll give it the input which is that bag of numbers JSON
that I showed earlier. Then we're going to do a sample
wait for orchestration. I've got a little bit of
debug that chosen here, that what's going to do, but what's not apparent here
is what's going to happen is, it's actually going to be doing
a whole bunch of message. We could have had this running
on five or 10 machines. You could have the client
on one machine, you could have five machines
being orchestration hosts, and you could have 15 or 20 machines running activities depending
on what they're doing. So if we take this now and we run it, it is going to start up and you'll
see that we've got up an input. >> Now it's passing. >> So the original orchestration got that in, it then went and set, I'm going to run
activities for the two, the three, and a sub-orchestrations
for the four and the five. So that's you'd gone and sent
that back out to be more work. >> Right. >> Then so we'll get
more orchestration input for that. We'll get more orchestration input for the others that
we've defined in here, and those will fan out, and then they're eventually
going to fan back in as you run all the squares and, at the end where we've got our debug. Down here, the sum of the squares is all the orchestrations
actually completing them. So you can see that the inner ones
are fanning back in, until eventually we get
the sum of squares which is 2,869 which is the sum
of all those squares. >> So in some sort this is
basically doing a scatter/gather. So if you have a farm of machines, the main orchestration is
going to take the input, bigger new chunks and then create either sub-orchestration out of it, if they're bigger sequences, or it is going to hand them off to activities which
are leaf level nodes. Leaf nodes versus sub-workflows. This is going to wrap
up all of the results. After fanning them out,
collect all the results, and then paste them on the screen. Interestingly, this is also reliable. So if in the middle of the workflow, if the machine just blew up, right? Just went down, restarted, it would start from exact same spot
where it had stored. >> Okay. >> Because the status is saved.
It actually remembered the last wait it was stuck on waiting for and then
it just resume from that. >> So that's actually
one of your constraints inside the main orchestration, is that it has to be reliable code. Right? I can't generate
a random number and make decisions based on your randomness because then your flow
is not working, but as long as it's a,
I don't even like-. >> The dominant state. >> Yeah. >> Whatever, but as long as it's a consistent flow but the activities
that are calls those nodes, those can do pretty much
whatever they wanted to. >> Absolutely, exactly. As you said, the control flow is a special place. That is where the programmer actually that's the
trade-off that we made. That whatever you write within
that special orchestration code, needs to be deterministic. So you cannot have random numbers, you cannot have, get me the current temperature
or something like that. You cannot do something that will yield a different result
the next time it is replayed. >> That's just the control flow. >> Yes. Activities are a fair game. >> Yeah, but we do have some specific
functionality that isn't it. That enables things that you need. So you can create, we have
time as, and things like that. So if you're going to create
an orchestration that's going to fire off an activity
that's waiting for someone to manually
approve something, and that could take
three days, and you say, I'm going to wait up to three days, and if they don't finish, then you can actually code that into your orchestration
to have time is fired. Because you can't do a regular time, and you can't to do
anything like that. >> So this is actually, what we've brought up is
several advantages so far. So the way that this manages state means that I have
a resilient process. So if something were to
fail in the middle of it, it's preserving state, so it can replay up to that point
and then continue running. So if we crash, we restart, we still get the output
that we're looking for. The other thing is scalability. So you can have
multiple workers dealing with the work that's handed off
from the orchestrator, and then it sounds
like there's a lot of patterns that are sort of
built in to the framework. So we talked about
fan out, and fan in, and I'm waiting for
multiple processes and then I want to converge
them back to result. I'm assuming you have like
asynchronous sequential workflows. So wait for one, then
the next, then the next. >> Then the nice thing
is, that it's all built using .NET tasks,
familiar concepts. Right? >> Right. >> If you want to do
a sequence of workflow, you await task one, and then you'll await task two, and then await task three. >> I want to call that
out because in your code. >> Yes. We did a task
when all over here. >> Right. >> But we could've done
in an await on this, and then an await on the next one, and then it awaits on the next one. If we wanted to do
them one at a time. >> When all is deceptive
because it looks like just a simple line of
code but what you're really doing is
all this orchestration and state management and basically storing history in
the background, right? >> Yes. So basically, exactly. This simple orchestration actually created like I don't know n tasks. And every task represented
some Word that was executing on
a different machine potentially. >> Potentially. >> Potentially and the tasks would
only complete when the machine had executed that piece of code and returned some value
or some result, and then the result will be
passed back to the waiter or the entity that is actually
getting the result from that task. So and then this task that you
get back from the framework and the special context is
can be manipulated just like any other .NET
task you can awaited, you can chain it with other tasks, you can use continue
with as well and you can actually do a task
when all tasks or than any of the patterns that you are familiar with in.NET
tasks you can apply in this. >> This sounds like
if I want to level up my C# knowledge and get behind the scenes of what await and async does this open-source
project would be- >> Absolutely. >> -a place to start
digging to see how you hook into those language features. >> We've done some pretty
interesting gymnastics and how we build this out. It's an interesting lead
for someone who wants to just experience some [inaudible] >> I'm going to give
it the 15-minute rule. I'll look at the code for 15 minutes and if it's still
completely foreign to me, I'll say it's a good thing
they wrote it and not me. Now, the example that you're showing to get started with
this entire framework. Is that do you have to
pull down the code, compile it is as easy as
a NuGet package reference. How do you start plugging
into the durable tasks? >> We have some NuGet packages. >> Okay. >> If you want to dig into
the code, you can as well, but the GitHub repo that we have does have some samples that allow you to get started
with some documentation. >> I assume the packages
give you kind of fine-grained control
over what features of the framework you want to use. >> Yes. So we have a COL, there's a COL package which
is the COL Run-time. >> Okay. >> This whole thing but then there's the emulator which you'd bring in for your testing purposes
and just general tooling around and then
you pick what you provider isn't bringing whatever packages
off for your relevant provider. >> Okay. That sounds good. We'll put all of those links
in the show notes, so people can go out and grab that. Now, one of the things or did
you still have something you wanted to show with this demo
because I have a question that. >> No, I think nothing. >> I don't want to distract. So we've been talking about the durable functions because
that's something I think a lot of people are familiar with
but my understanding is there are a lot of
internal projects even at Microsoft that depend on that Durable Task Framework or
are you familiar with others, you mentioned IoT Hub as well
using that for provisioning? >> Yeah. So the first project
that we started with was API. So in the context of provisioning. So we have used this framework a lot in just the control plane
for our services. For example, API management, when you go and create
a new API management service, the Durable Task Framework run
in the background and creates all the necessary resources
and ties them together into one unit and then
presents it to the user. IoT Hub similarly uses this to
build out the control plane. So whenever you go and
create an IoT Hub, it actually uses
this framework to go and create X and Y and Z resource
and then string them together. Besides that, this is also used in the runtime plane
as well for IoT Hub. If you go to IoT Hub
and use this feature we call DeviceJobs which
is basically if you have a million devices connected
to your IoT Hub and you want to run a command or you want to update
what we call a Device State, Device Twin, for the million devices, you just make one call to the IoT
Hub and IoT Hub will give you back an operation status or operation cookie which
you can use to track status. >> In choosing one of
these orchestrations. >> It Is basically under the code. It is a big massive orchestration with a lot of
sub-orchestrations which divvy up into device orchestration, so use it to manage state for
that. There were I think- >> There's a couple of other teams that we've actually been surprised. I had someone new who joined the team about eight months ago and
he said," Hey you know, I found your name on Durable Task and we've got five different
teams using it." >> Oh wow! >> So there's a couple of users
out there we don't know of. But there's number of
teams that are using us. >> It has all been
organic. So we've never like I think this is the first time
whatever I don't know. Not sure about the thing. It has been organic just because
it was useful, it was occupying. >> Can you think of
other situations we talked about provisioning
and workflows in general but are there some canonical like if you're in
manufacturing and you have this problem or can you think of other patterns if you will that
lend themselves to this framework. So someone's watching this they say, "Oh I've been struggling
with this over here. Maybe I should look into the
terrible task framework." >> Abstractly speaking, if you have long-running process by definition that needs to be stateful
otherwise it is not efficient. Because of this long-running
either you keep everything in memory and then you wait
until the whole thing's done. >> Right. >> Or you can make a stateful and the first case by the way it's also not reliable if it's in memory, the machine can crash and
you can lose your state. So it's a long-running process. It is probably stateful
which means that you check point at every point in that step. Then you need something like dribble task to make it easy
for you to write code for that. >> Right. >> Which actually brings in a lot
of scenarios you just mentioned. Potentially, manufacturing
might be interesting one. Mostly, we've seen LOB Integration
like Business Processes, Workflows scenarios like SharePoint has this document Workflow scenario. We have a document that needs to be approved by five different people. So somebody writes up a document. It goes to person number
one, they approve it. Goes to person number
two, they approve it. Maybe an expense report
kind of thing, where the gaps are dependent on a human actually
looking up an alert and saying, "Okay, I need to go to that spot." >> Because you can
wait for that manual process using the Workflow. >> Exactly. >> It could be the three days
or weeks or months. >> Right. >> That's just a get dehydrated. >> That's something I've done with Durable Functions but not with
the Durable Task Framework. This gives me a whole new set
of things to play with. Are you still involved
on either view on the core team adding
features or supporting? >> So me not as much
as Simon and folks on his team are actively doing. >> Any major exciting features and more stabilization and cleaning up defects that are
done in the field or? >> There's some list
optimizations and things like that right now
because it's a replay-based. It means that it's going to
dehydrate and re-hydrate and potentially that
means it does a little. It's a little chatty to itself
and from the compute standpoint. So Durable Functions
has already implemented a sticky concept where it will stack things and
optimize the replay. But the core framework doesn't
doesn't implement that yet. >> Okay so that's something- >> That's an example of something
that would be brought in. >> Because I know that you
have some concessions that if I'm intentional
about it for example if I'm running an infinite loop that's obviously not going to
make sense because they're history build but I can kind of
kick off the task as if it's new but still maintain
the state from the previous so there are
those hooks already in place. >> Yes. >> So it's exactly. Another thing that
we had been thinking about is adding Inline Execution
of activities as well. Of course, simpler
activities just having a Lambda for example just in line. >> Okay. So you don't have to create. >> You don't have to
go into a new activity and go through all the machinery. So we're thinking about some optimizations again to
Simon's point to more about optimizing the flow as we continue to consume it
within our data plan as well. For example, IoT Hub
uses Durable Task. We continue to find opportunities
to optimize the flow as well. But it's all based on
feedback that we're getting in and generally the pinpoints, the top pinpoints
that are on our list. >> Awesome, sounds great. Well, thank you so much
for coming out and sharing the demo and your knowledge about
the Durable Task Framework. I'm going to say that this is a framework that is
used heavily internally and externally it's open-source and it's baked into some of
our most important services. So if you're doing anything with
any of these patterns that we've discussed, if you're working
on long-running workflows, you definitely need to check
out the Durable Task Framework. [MUSIC]