TensorFlow Federated Tutorial Session

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] all right welcome everyone and thanks for joining us today for our workshop on federated learning and tensorflow federated i'm very excited to be here with you and to tell you all about the interesting cool things we are doing at google i'm a researcher at google and i'll have a lot of my colleagues and friends explain many of the concepts that you are looking for but before we jump into our tutorials i wanted to tell you a few pieces of information that are going to be helpful for you to better understand uh how we're going to run this tutorial first if you have questions we welcome your questions but we want you to post them on dory dory is the tool that google uses for q a so please find the link on our website and make sure you can access dory and post all your questions on dory if you cannot access dory please send us an email at federated learning dash workshop 2020 at google.com and we'll try to give you access as soon as possible so for today this is our schedule i'll start by introducing federated learning and explaining the general concepts of federated learning and then i'll hand off to chris who's going to tell us a bit more about tensorflow federated the tool that we're all here to learn more about and then we're going to start our uh tutorials we'll build up uh gently and we'll do an introduction to tensorflow federated with nova and emily and then we'll go into more advanced concepts with wakang and zak here's a big shout out to our amazing speakers today they've been working hard over the past few weeks to make this happen emily nova wacon and zach and you'll learn uh you'll hear more from them soon all right so why are we all here we're all here because we all love machine learning we all like artificial intelligence and as we all know ai and machine learning thrive on data data is very important to train machine learning models recommendation systems and do great things um data is born at the edge data is worn on people's devices cameras microphones smart cars all the appliances that you have around you at home and in your city and this data is very useful for ai and machine learning but it can be also very sensitive it can carry very personal information and there's a real danger of collecting this data and putting it all in one place and that's why the fundamental question that we are trying to tackle here is is it possible to continue to do ai and machine learning and build services and amazing uh models without collecting and storing this data and putting it in one place this is not only important for privacy it's also important to improve latency and to improve the battery life instead of having to ship all the data from devices to the service provider back and forth and build all these services if we can do it all in a decentralized fashion we can gain on all these fronts we can have all these advantages and that's essentially what led to the birth of federated learning so here's the definition of federated learning federated learning is a machine learning setting it's an ai machine learning setting where multiple entities collaborate in solving a machine learning problem under the coordination of the central service provider so there's the central service provider who's trying to coordinate a number of entities that are trying to learn a combined machine learning model on their combined data set and here's the catch the catch is that the raw data is stored locally it's never exchanged or transferred so the users or these clients never exchange their data with each other or with the server instead only focused minimal updates are transferred back to the service provider so this is still at a high level and you may have a lot of questions about it and i'll dig a little deeper into it in a bit but first let us see the two different types of federated learning the first one is the one that google has been focused a lot on is called the cross device federated learning and as the name indicates it's the setting where you have a lot of devices these devices you can think of them as smartphones you can think of them as tablets or computers or maybe sensor uh networks of of of small little sensors um so you have possibly millions hundreds of millions or even up to a billion maybe 10 billion in the near future of these devices that are carrying very sensitive information and as you can see on this slide the server is trying to coordinate between clients amongst each other to train the model and then once we're done we deploy these models on select devices the other main application of federated learning is this uh silo federated learning here we don't have devices we're not talking about phones or tablets or computers we're talking about institutions think about this as hospitals or banks or schools you can have tens maybe hundreds thousands tens of thousands at most of these institutions these institutions have slightly more powerful capability and have larger data sets because every hospital can have medical records for possibly tens of thousands of patients and it's the same concept these institutions do not want to exchange and share information amongst each other or with a central service provider and so they engage in this federated learning to train and learn this combined model that's going to work well for everybody all right so before i dive into the details of federated learning it might make sense to recall how classical machine learning in ai works so how does model development work typically so typically you've gathered data you've curated a data and it's placed in the cloud you can see it here on the slide there's this cloud data and there's an engineer who wants to build a machine learning model or some type of an ai model and the engineer is going to submit a job in the cloud to train a model of some size or architecture with some number of parameters with some hyper parameters so on and so forth and we're going to use some machine learning framework to train the model on the data that sits in the cloud so you do train and evaluation maybe you try a bunch of hyper parameters and you then examine the model that you've obtained you make sure that it works well at least on some held out set and you check some metrics maybe you do some a b testing and once you're very happy with the quality of the model that you've trained in the cloud you can deploy it to devices and it can be used on people's devices to reduce latency and bandwidth and save on battery so this is the workflow that pretty much a lot of us are very familiar with this is the classical workflow of machine learning and in some sense federated learning is very much the same we don't want to change this workflow in this in the sense that we don't want the engineer to have to learn new concepts or new ways of submitting jobs and and training machine learning models in fact we want to build an infrastructure so that the engineer would not have to worry at all about this whole process that's happening under the hood so let me try to tell you how we do it in federated learning so nfl because the data does not sit on the cloud it's distributed on people's devices as we said before it's going to be a little challenging because we have to do some orchestration back and forth between the service provider and these devices in order to train and evaluate the model that we're trying to learn and so here's how we do it in a few quick steps so devices often check in to our servers and they say hey i have data do you need me do you want to train your model on my data and often times because our servers get a lot of requests and a lot of pings they respond the server responds saying not now i'm busy with other clients but every now and then the server says yes i have worked for you i want you to compute or update a model for me on your data that's when the device checks in to the process of federated learning and so what happens is we start from a model this model can be either initialized randomly so we start from a random initialization or we can pre-train it on some data that's publicly available so you can pre-train the model on some publicly available data and so you send a copy of that model to the device and on that device you use the local data to update this model and we'll talk a little bit about it in a bit how we do this local training on the device so we're actually improving the model on the device and once we're done improving the model on the device the device sends back an update back to the server now this is something i would like to emphasize these updates that we sent back are ephemeral meaning that we never store them we never lock them they're only stored for a few minutes until they've been aggregated and then we erase them and that's important for privacy so these are focused updates because we're not shipping the data we're only shipping the updates of the model parameters and that's important for privacy now um so this is the ephemeral now as you can see here uh in this round there are many devices that have participated that have sent their updates back to the server and so the server takes all these model updates from the devices aggregates them combines them together to obtain an improved model so you can see here on the slide there's this combined model that is hopefully better than the initial model that we started with and so here's where the engineer can try a few things perhaps you can test the efficiency of the model or the accuracy of the model you can do some evaluation and check some metrics and oftentimes one round is not enough we'll see that we need many rounds and i'll talk a little bit about this in a bit so what we end up doing is we end up kicking off another round so now instead of starting from an initial copy that is randomly initialized or pre-trained on publicly available data we're going to use what we have learned in the previous round to do another round of federated learning so again we send this model two devices that have checked in they take this model and they train on it and then they send back updates and so we get another combined model that's hopefully even better and we can repeat this process many many times you can think of it as possibly hundreds if not thousands of times of going back and forth between the server aggregating updates combining the model and then broadcasting it to a number of checked-in devices that would each update the combined model locally on their device now i told you i'm going to give you some orders so here are some numbers uh we usually do anywhere between 100 to uh you know tens of thousands of rounds to get a high quality model and then uh we need typically about anywhere between 100 to you know thousands tens of thousands of users of clients checked in to train and get again a good combined model in every round and typically we operate on the order of about anywhere between a minute to 10 minutes per round and this is all in the context of the cost device federated learning of course in the cross side of federated learning you will end up having different orders because you don't have phones or computers you have institutions so it's a different dynamic altogether now one thing i would like to also emphasize on this slide is this back and forth process between the service provider and these participants these clients that have data that are uh participating in the federated learning so one round of taking the model shipping into devices training on the device and then sending back the updates and combining them on the server that's an uh that's we call it a process and the fact that we're going to do it many times hundreds if not thousands of times that's an iterative process because we are iterating that process many many times and you'll hear from chris and other presenters later on today uh more about the concept of an iterative process all right so what i described is very general it applies to any computation you can think of on the device and any aggregation method you can think of on the server but one very popular algorithm that we have been using and many others have been using it's called the federated averaging algorithm and in this algorithm what we do is we run an some number of stochastic gradient descent steps stochastic gradient descent is a method for doing optimization which is necessary for model update on device so if we run multiple steps of sgd or stochastic gradient descent on the device we get an improved model on the device and that's how we can ship back the update to the server and then if we simply just average these updates and around we get the algorithm that's now become so popular and it's called federated average and you can learn more about it by checking the paper that i'm linking to in this slide but we can do far more than just federated averaging we can do a number of computations on device a number of optimization frameworks on the data that's on the device and we'll hear more about it in this tutorial as we progress so one thing we can do is analytics instead of focusing only on uh training deep learning uh neural network models from decentralized data we can compute some statistics we can compute things like the mean or the mode or a histogram or try to find the frequent elements we can do some data science and data analysis on this highly distributed data and this is now known as federated analytics and it very much very much meshes with federated learning now there's a lot that i'd like to tell you about but this is a tutorial focused on tensorflow federated if you would like to learn more about uh federated learning please check out our paper which is on archive i have the link on this slide and now i'm going to hand it off to chris who's going to tell you about tensorflow federated the philosophy of the design before we move on and to our presenters okay thank you peter hi so as peter explained further that learning is an area of active research so there is a lot to explore and that's why we built tff to export it together so tf5 is a new framework what we call federated computations those are computations on data that is born decentralized and stays as such and for that that learning is just one example we see others later tff is also designed to be a sharp foundation for what we for the various research research in production uses at google and i'll get back to this later we use it internally at google but we'd like to also build it with your help to support your use cases so you can think of tf as having three parts essentially first there is a layer of core abstractions for building computations on top of this you get a layer of higher level apis with implementations of fedora delivering algorithms and finally there's a runtime and simulation components with things like data sets training loops and what and count apis are really all that you need to get started and in just a few minutes emilia nova will tell you how you can run a little experiments in tf with just a few lines of code so it can be really simple um but before we get to that let's uh maybe take a small peek under the hood um just you know a little journey down the rabbit hole to build some intuitions that i think will be helpful during the tutorials and if anything is confusing don't worry about it the details are not important so you will heal all of this again later very shortly all right so let's take a look first what is a federated computation well in a nutshell it's a computation where the data stays on device where it was born so um the clients here are potentially have sensitive data they're anonymous um they're interchangeable they come and go all the time um they do all the work but they always work together and the server is responsible for making that happen so server helps the clients form cohorts um it aggregates updates across clients as you saw in peter's slides and it also carries state across rounds you heard about those you know thousands of rounds as things evolve while clients come and go that's the server's role to make that possible now um this is perhaps best explained on an example and so here's one that is different from predicted learning um so i imagine you have a set of client devices that have temperature sensors and they all have some sensitive data now the analysis in the server wants to know what fraction of the clients have temperatures above some threshold right so now imagine the data is sensitive so you can't just go ahead and collect it so what you'll do instead is broadcast the threshold to the clients and now that every client has a threshold and its own temperature reading the client can do the work so in this case the work will be computing one if it's above the threshold and zero otherwise you can think of this as a map step in mapreduce because again all the clients do the same thing they all work together so now they have those ones and zeros all that it's left to do is just aggregate them and this was it the average on the server and um that's it that's the result you want of course you have lots of clients you may have to repeat this many times to get a more accurate estimate but that's in a nutshell what a federated computation looks like and you can recognize some of those properties that i talked about clients working together server orchestration and so on all right so why do we need framework for this um why don't we just go ahead and write it in python you might ask and and that's indeed what we did at first and we just found it not very convenient um for a number of reasons first we're dealing here with data that's distributed right so we talk about temperature readings as an inputs but that inputs um exist in many places on all the clients and so you can think of it as a sort of a distributed federated float if you will um another problem is that we have inputs in the clients input on the server so um this computation doesn't just run in one place it runs kind of all over the network right so it's a little distributed system and the communication here that you're looking at is not just an implementation detail it's actually an integral part of the logic of your computation so to support things like this well and make them easy to to build you need a distributed programming framework like tff and so here's a little example of uh what uh how you can express this this temperature sensor thing in tff you start by writing a python function you declare all the inputs wherever they are so client side or server side and now in the body of this function you can invoke the various um federated operators communication operators just like those that you saw on the diagram so you can recognize the broadcast the mapping operator and aggregation they mean in all the work that's happening on the individual devices is expressed in plain tensorflow since it's a tensorflow framework um so that's it that's essentially what the code in tff looks like whenever we talk about code in tfl this is it um and again details don't matter here um you will hear again in one of the tutorials later from zach all the same concepts repeated again so so that don't worry if it sounds confusing but this is essentially when we talk about coding tff expressing federated learning algorithms this is this is the code so now what you saw um look like python but in fact it isn't and if you're familiar with the graph mode in tensorflow it's basically some idea so tff translates this python looking code into an internal representation in the tfs internal language what that gets you is that you can write your code just once and then you can deploy it anywhere right so you can run it in your cloud notebook you can run it in a cluster on gcp you can deploy it on device in particular the important thing is that if you have your research ideas expressed in tff as you want to move them into production you don't have to change your code because it's again the same code running anywhere you want and so that's that's kind of nice um it's important to kind of keep in this in mind especially if you're used to eager mode in terms of law because not just tff but federated computations in general are just fundamentally not eager right so you're the code that you're writing i shouldn't assume it will run in your cloud notebook again it may be running on a cluster it may be running on devices on android and that has various implications as you can see um so uh when you go through the tutorials and see a bunch of codes in what looks like python in fact you will be looking at code in three different languages mixed together so it would be your model in terms of law there will be federated learning algorithms in tff language constructed perhaps for you if you're using the account apis and finally there will be a little bit of actual python and that will be basically just the simulation logic so looking over around selecting clients randomly in each and etc now you see use an example of this in a minute but before i get there i would like to highlight just one more important property of tff that that is important to understand is that it's a purely functional programming language and so there's really no such thing as state in tff there's no such thing as variables stating distributed systems in general is a huge kind of worms we don't want to open it functional programming makes it much easier and so in tf if you want infinite states you just model it as both input and output of your computation you can have variables in tensorflow in tff you need them for things like gradient descent but they behave like locals so they don't persist across calls and one example of an important functional abstraction um that we use a lot in tff and again you will use you will see this in the emily nova's tutorial is the iterative process uh so initial process is essentially just a pair of computations one of them produces an initial state of some sort and the other one models a single step of processing that takes that state and then produces an updated version of that state then you just run it in a loop and that's essentially all there is to it you'll see examples of iterative processes for things like fitted training and so on okay but back to our three languages in one example so what you'll see in tutorials in just a few minutes is you'll start with some code in terms of law next you'll be looking at code in tff perhaps created by calling the cond apis and that will represent your federated learning algorithms that you're playing with and finally you'll see a bunch of actual python simulation logic so first you will request creation of the initial state of for example federated trading process and here python will be calling under the hood the tff runtime to execute the tff code for you and then you'll be looking at a little for loop that iterates over rounds of a computation and each in each round it um it um again python is asking tff to execute a federated computation that represents a single round of federated training and where where the state which includes your model parameters will get updated with new trained model parameters and so it will continue now um as i mentioned at the beginning of this presentation tff gives you flexible deployment options and all those runtimes that you can customize yourself we won't have time to talk about this today but the tff framework namespace contains various components that allow you to create runtimes that you can uh you know host on gcp or you can even integrate your own custom backend systems into tft and run computations there if you're interested in exploring this please reach out to us um and be more than happy to to fallout and help you but meanwhile this is all i have um so i hope that you enjoyed the rest of the day all the tutorials and thank you very much and now let me introduce emily and nova who will be presenting the first of the tutorials awesome uh thanks chris and thanks everyone for joining us for the introduction to tff tutorial um if you're following along at home first we have to do some setup steps before we jump in so let's go ahead and make sure that we're connected to a hosted runtime and then we're going to make sure that we have the crop the proper pip packages installed for tensorflow federated and the other libraries that we'll need throughout this tutorial we can go ahead and run this cell to make sure that everything's been set up correctly you should see a greeting if this is all set up correctly so today i'm going to be running through a variant of the federated learning for image classification tutorial um those concepts that peter and chris just introduced will be diving into here and playing around with the tff learning api um with having fun with the classic mnist training example uh so throughout this tutorial uh here's the sections that we'll cover we've already loaded those tff libraries then we're going to start where every fun machine learning process starts exploring and analyzing and pre-processing the data that we have available to us we'll uh create a model if you're familiar with keras this section will look um pretty familiar to you then we'll go ahead and set up a federated averaging process for training we'll analyze those metrics and try to understand what those metrics actually mean we'll build a federated evaluation computation to evaluate the model that we've produced as a result of federated training and then we'll go ahead and analyze our evaluation metrics awesome so let's go ahead and get started um first we're going to uh look at our data with federated learning we require a federated data set so this is a collection of data from multiple users federated data is typically non-iid users will typically have different distributions of data depending on usage patterns uh one thing that's really cool with tff is we've seeded the repository with a few federated data sets including the one that we're going to use today which is a federated version of the mnist data set that contains a version of the original nist data set that's been reprocessed using leaf so that the data is keyed by the original writer of the digits because every writer has their own unique style this data set will exhibit the kind of like non-iid behavior that we'd expect from a federated data set or that's pretty typical of a federated data set so one thing that i like to call out in a production federated environment um it'd be more challenging to work with federated data because you don't have access to the data itself so you wouldn't be able to do the analysis that we're about to do in this following section because we are working in a simulation environment we have all the data available to us locally so we can go ahead and explore it so let's go ahead and see how we can load our sample data set um we're going to load the train data set and the test data set we can load it from the tff repository again here's where all the data sets are we're going to select the mnist data set and we'll use the load data function so if we go ahead and run that cell we can download all of our data this data that was returned by the load data method is an instance of the tff simulation client data interface this allows us to enumerate a set of users to construct a tf data set that is actually a representation of like a single user's data so keep in mind that this interface allows us to it will allow us to iterate over client ids but this is only a feature of the simulation data um as you can as we'll see shortly the client identities are not used by the federated learning framework itself their only purpose here is to allow us to select a subset of the data so let's see how we can explore this data set um if we get the length of our training data set we can see we have like around 3 000 clients we can explore the shape of our data by selecting one of the client status sets and looking at the element spec and here we see that it's what we would expect of an mnist data set data entry we have the label that the image maps to and then we have our pixels of our image in a 28 by 28 image so let's go ahead and select an example data set from one of our simulated clients again and then we can explore this data set further by getting an example element so if you're familiar with tf datasets this should look pretty familiar to you so here we've gotten a example element by creating an iterator and let's go ahead and look at the label of the element we've selected just because i'm curious uh let's see what number we actually got from our first client and it looks like we've selected um a five and let's go ahead and plot the actual pixels of that five and we can see this is kind of like what the shape of the training data uh looks like um so this next section we're going to explore like the non-iid behavior of the federated data use this time to ask any questions on the dory if you have any questions or get caught up um so let's first look at a selection of the mnist digits from one client so you can see here because it's been keyed by one user this is one user who just wrote out all of the image digits and here's a nice little sampling of 40 of those and we can go ahead and look at the number of examples um that each client has for each label for the mnist data set uh so this kind of gives us an idea that uh on each client they have a different different distribution of the mnist digits available to them um some clients have less data some clients have like fewer examples for each label we can see client two has fewer examples than say for instance client zero or client one um beyond that the clients actually have exhibit different um training examples in cells so if we run this cell we'll be plotting a mean image of what each client has available to them so if we look at this for example um we can see that client two or client zero on average draws their twos like this so when we go ahead and perform that local training step client zero will be teaching the model that twos look like this whereas client two draws their tees much bigger a little bit less neat than client zero and they'll be nudging the model in a slightly different traction as to what a two is so that's kind of neat to see how each local training process trains on each of these clients data locally and we'll see later how we can combine all these updates into the global model awesome so now let's actually pre-process this data since this data is already available to us as a tf data set um pre-processing can be accomplished using the typical data set transformations that you're used to i've provided a link there to that actually links to those transformations if you want more information here we're going to be flattening our 28 by 28 images into a 784 element array we'll be shuffling the individual examples we'll be organizing them into batches and we're renaming the features um from pixels and label to x and y uh so that it works with keras and then we've also thrown in a repeat so that we can run over multiple epochs with the data let's go ahead and run that cell to build our pre-processing function and we can verify that worked by applying the preprocessing to our one of our example data sets and getting a sample batch from that so you can see here that um our images are now those flattened rays and we still have our labels associated with them and we've renamed our data um cool so given that this is a tff simulation how do we actually feed this data into the tf simulation uh one way that we can do this is to feed the federated data to tff through a list of tf data sets where each element in that list corresponds to one simulated client's data because this is simulation again we have all the data available to us locally so we can create a simple helper function here that will construct this list of data sets for use with a tff simulation so here we can run our helper function that will pre-process each client data set and then return that as a list um now the next important part for federated training and federated evaluation uh choosing our clients so in a typical federated training scenario we're dealing with potentially a very large population of user devices where only a fraction of those devices would be available for training at a given point in this case so this is the case for example for mobile phones that would only participate in training when they're plugged in when they're idle when they're off a meter network of course since we are in a simulated environment all of our data is available to us locally so when running a simulation here we would typically sample a random subset of clients each round so that they would be different each round that being said here we're only going to sample the clients once because this is a tutorial and we want to encourage a faster convergence rate here in the next tutorial nova will show you how to um simulate random sampling for each round so here we're just grabbing the first 10 clients from our trained data set and then let's go ahead and create let's go ahead and get the data into the proper form for consumption by our tff simulation so we're just going to be calling our helper function that makes federated data function we'll be passing it our training data set and um we want the these clients to participate so if we go ahead and run this um federated data train data there it is so if we go ahead and run this we can see that we've selected 10 of our clients and the um data sets have been pre-processed into the way that we just um defined awesome so let's uh create a model now with keras that we'll be using for federated uh training and federated evaluation if you're already familiar with keras that should look pretty familiar with you um one really nifty thing about the tff learning api is you can just take a keras model that you already have defined for centralized training or that you've used in other applications and you can just plug it into this framework for um federated and it will construct a federated training or federated evaluation computation for you um so let's go ahead and create our keras model you can see we have like our input layer we have one dense layer and then we have our softmax that will give us our prediction for image classification and let's take a brief look at what a centralized training api looks like with keras use this time again to ask any questions on the dory to get caught up this this section is a great time to do that so let's look at what a centralized training looks like with keras um keras again has an mnist data set uh similar to like how we have or how tensorflow federated has the ems data set they have an mnist data set um we're gonna select our training and test data sets from here we're going to similarly pre-process this data reshaping it um making sure it's like float32s uh we're going to use the same model that we'll be using for federated training here's how you attach an optimizer loss and metrics in the centralized setting and let's go ahead and run um two epochs over the training data for centralized data just to kind of get a feel for how that api works awesome so now that we've done centralized training let's let's see what this looks like um with uh tff so in order to use a model with tff it has to be wrapped um as a tff.learning model which will expose methods to stamp the model's forward pass metadata properties etc and it will also provide ways of like controlling the process of computing a federated metric let's not really worry too much now about the model api uh if you have a keras model like i've told you or i've i went over before you can just have tff wrap this by invoking our function here the tff.learning tff.learning.from keras model you'll pass in the model that you create here and a sample bat or an input spec to kind of give tff an idea of what the data looks like that you'll be consuming you can go ahead and attach your loss here and add any metrics that you'd like to see so let's create our model uh creator function and now the part that we're all waiting for let's see what um federated training looks like in tff so now that we have our model wrapped as a tff learning model for use with tff we can let tff construct our federated averaging algorithm that you learned about from peter earlier peter and chris earlier uh keep in mind that the argument here uh is actually a constructor function it's not the actual model itself um this is so that tff can construct our model in different contexts that it controls and another critical note here is that for the federated averaging algorithm you actually need two optimizers both a client optimizer and a server optimizer the client optimizer is only used to compute local model updates on each client the server optimizer will apply that average update to the global model at the server so in particular that means that the choice of optimizer and learning rates may be different from the ones that you would use on a standard id data set so we recommend starting with regular sgd and maybe choosing a smaller learning rate than usual if all of this was super interesting to you stick around for zach's tutorial where he'll be diving more into depth there so here's our iterative process we're using the learning api's build federated averaging process passing in our model function that is wrapping our keras model for use of tff here's our client optimizer and let's go ahead and define our own server optimizer function um let's see we'll be using a keras optimizer um and here let's see what you're learning might be let's just go ahead and choose one here awesome let's go ahead and construct our editive process so what just happened here um chris kind of gave us a brief overview of what an iterative process is but i'll just go ahead and cover that again here um an iterative process has two computations the initialize function and the next function you can think of an iterative process as being driven by a control loop like this we have our initialize function creating our initial server state and then we can use that initial server state in a for loop like this to run multiple training rounds via our next function we'll get more into that in a little bit so first let's go ahead and invoke our initialize computation to construct that initial server state um our server state contains things like the initial state of our model and the initial state of the optimizers our second in the pair of federated computations is our next method which represents a single round of our federated averaging process um this consists of pushing that server state including those model parameters that optimize your state to the clients or besides the optimizer state the the model parameters to the clients um they'll perform on-device training using their own local data um this includes collecting and averaging that those model updates back together to produce a newly updated model back at the server that's learned from all of those uh clients different usage patterns so let's see what running a single round looks like here and let's go ahead and like visualize those results so here's running a single round of federated training we're using our iterative process and calling our next function and we see that for round one we've produced an accuracy of this much and a loss of that much that's pretty cool so let's go ahead and run several more rounds uh here's what the control loop looks like here um we're gonna run 11 more rounds again for the sake of demonstration we're going to reuse the same clients from earlier just to like encourage a faster convergence in novus tutorial you will see how to actually select different data for each round so cool we see that our accuracy is going up like you always want to see and the loss is going down nifty uh later on in evaluation we'll see some important caveats to what these training metrics actually mean but for now just see that they're an indication that training is happening um because just printing our output here isn't as visually stunning as like a nice little nifty plot let's uh re run through our control loop again and see how we can introduce tensorboard into the mix so that we can visualize our metrics in a cleaner way so if you're familiar with tensorboard this part should look pretty familiar to you we're going to be creating our summary writer here file writer we'll be passing it that temporary logger i just created and so again we're creating our initial server state and then within our control loop here we're actually um passing those metrics to this tf summary scaler and they're getting written out to that log directory um so that we can use tensorboard to go ahead and visualize our metrics so while that's running let me go ahead and start up our tensorboard instance this instance will allow us to view the metrics that we just produced as a result of federated training and here we can see our losses going down again and we can see our accuracy is climbing up which is pretty nifty awesome so in order to view your evaluation metrics the same way you would add that to the evaluation metric computation awesome so how is our training model actually performing so in all of our experiments we've only presented federated training metrics and these represent the average metrics over batches of data trained across all clients for the round this will introduce the normal concerns about overfitting especially since we use the same set of clients for each round for simplicity but then there's also this additional notion of overfitting and training metrics specific to the federated averaging algorithm the easiest way to understand this type of overfitting is to imagine each client only has a single batch of data and we train on that batch for many iterations like multiple epochs in this case the local model will quickly exactly fit to that one batch of data and the local accuracy reported for that training round for that um client will approach one so that this can be shown to say that the training metrics can only be taken as a sign that training is progressing but not really the actual performance of your model to understand how your model is performing um one week we can do this is using federated evaluation nova will show you how to do centralized evaluation of your model so here we can go ahead and construct our our federated evaluation computation so let's call this evaluation um we're going to be using the learning api again and it has this nifty build federated evaluation uh function and we can go ahead and supply it our model constructor again so we're using our same paris model but this time we're constructing a federated evaluation computation so let's go ahead and run that and now let's uh prepare our test status set again we're going to be selecting a different subset of clients to perform a valuation on um we're using our same helper function to get it into that same format that we need to feed it into a tff simulation and let's go ahead and see what that looks like again we selected 10 clients and we've pre-processed their data sets to be in the form that we want to consume so let's go ahead and run an evaluation so we're going to be getting test metrics back we can use that evaluation federated computation we just had we'll be using this model that we produced as a result of federated training so this will represent the last um the updated model from all those rounds of training and we'll be using our federated test data here so enthusiasts running the cell and we can go ahead and look at our test metrics from that and we can see that this is uh this gives us a better idea of how our model is actually performing here is our accuracy here and our loss here so this is going to go ahead and conclude this tutorial um we encourage you if all of this is interesting to play around with those hyper parameters play around with other parameters like batch size number of users epochs learning rate etc um to modify the code above to improve the accuracy that we're seeing here and the loss that we're seeing here before i switch off to nova let's go ahead and go to the dory to see if there's any questions cool so it looks like the collab notebooks are a bit different i believe we'll be supplying these tutorials to you if you go to the tensorflow federated website you'll see some tutorials that you can run through again if you are having trouble with the notebook here is transformation a part of the iterative process on each device needs to run transformation outside of the tf graph um so i think this is referring to data set transformations and because tff only works in simulation right now we can go ahead and pre-process that data before running training so this actually occurs before the iterative process part um you mentioned we need two optimizers one for the server and one for the client is the client optimizer shared among all the clients or a separate optimizer for each client also i would like to know when averaging the weights on the server what operation should be done on the optimizers so here um the client optimizer it's each client will receive a copy of um a different client uh optimizer uh they have yeah essentially they have a separate optimizer for each client um i would also like to know when averaging the weights on the server what operation should be done on the optimizers um i don't i think i understand this question fully um you can think of averaging the weights back on the server as completing that federated averaging process where you are taking all the updates that you've received from each of the clients and averaging back that all of that back together and i just received um a note from one of my fellow presenters to remind you that on that website where you got links to the collabs um the user copy is incomplete and the full copy will have everything that you saw today filled in awesome so how could you deal with different pre-processing coming from clients which have different devices so here we're thinking about hospitals which could send different iran images from different device models example um so that's a really good question maybe in your pre-processing step you can see um data that is in different forms and you can apply different transformations to um the example so you can like see what type the example is and you can apply the transformation that you want depending on that but here again to note this is a simulation environment so all the data is available to you locally um is federated averaging algorithm customizable and what are the levels levers we have um sorry i haven't read too much on it yet um so this is actually a really great question and i'll refer you to zach's tutorial later i know that's not much of an answer now but it's foreshadowing what you'll see later on in this tutorial series i'm sorry if i'm jumping ahead but is the data model updates encrypted um this is a really awesome question with federated learning and it it shows that there's um this is actually a really important question for like uh user privacy in this case and yes uh there's different techniques you can do to make these model updates private and ensure that the data is private on each device there's things like secure aggregation there's things like differential privacy etc um i'd refer you more to i believe tensorflow federated the website has more about this um what's the roadmap for non-simulation uh device data as that's where production apps will begin to make an impact this is a really great question um stay tuned for more you can check in on the tensorflow federated website should we standardize the data like location scale transformation and how does that work um again the data pre-processing step is all in your hands um you can go ahead and play around with this in simulation and see what works for you in that regard um we offer these libraries to allow you to kind of test it out as well i'm not sure what time i'm getting if i should pass this off to nova yet maybe we have time for one more question another topic of interest users at the edge that subjectively label data in contradicting ways any way to detect that and potentially split the clients into groups that collaborate together versus groups that are canceling out each other's work um so in this case remember that in a real federated production environment you wouldn't have access to the user's data um so you wouldn't really be able to know that this is happening you can do different like fa and that kind of stuff to see the different qualities of your data itself but like in this case i think that there could be different ways of doing data set pre-processing awesome um thanks for everyone for tuning in for this image classification tutorial let me go ahead and pass it off to nova she'll be presenting um text generation tff tutorial everyone um thank you so much emily for introducing tff um and so in this tutorial we'll expand on what emily showed you by applying federated learning to a different problem so we'll try doing text generation um one thing to note is that you'll be continuing on with the same collab notebook um so if you already are connected then you won't have to run these steps but if you had a problem you will have to make sure to do the same beginning steps where you upgrade tensorflow federated and then make sure tff works by running this next cell to run the greeting um so with this tutorial we'll be learning a few new things about gff um sorry for the scrolling so in addition to trying a new um application of federated learning we'll also be interspersing centralized training and federated training so this is actually a very useful thing because you might have a model that is already trained and you want to refine it using federated training so that you can get the model to match the data that's at the edge rather than centralized data that you use to initially train it and then you might also want to intersperse federated training and centralize evaluation in case you have a held out test data set um that you have centralized and so we'll show you how to do that as well and then finally we'll also show if you have your own data um in this example we'll show loading it from a csv file and you want to use it with tff we'll show you how to do that okay so in this tutorial we will load a model that was trained on centralized data so in this case it was trained on charles dickens data and what we want to do is fine-tune the model with shakespeare data so we can take a look at what the model generates right now and so the text that this generates will look a lot more like a charles dickens novel than a shakespeare novel or play um so now what we'll do is we'll load a shakespeare data set from the tff repo um so this is available for anyone to load um and but what we might want to do in a different case is you might have your own data and you might want to use that with tff so what we'll do in this tutorial is we'll show that you can load your own csv file into tff so in order to demonstrate that we can write the data from the shakespeare data set out to a csv file and download it and then try re-uploading it again so first we'll write this data out to csv and while we're doing this we can think we can learn a bit about what the data set looks like so in this data set each client is a shakespeare character so let's print and see what it looks like as a csv so in the csv file we have two columns character and snippets character indicates one character from a shakespeare play and then snippet that is a line that that character spoke so in this case we have adam from all as well that ends well and then some lines that adam spoke and one way we can think about this is that each of the characters is typing out their lines on their phone so adam is typing out yonder comes my master your brother on his phone and then we want to train our model to learn from these lines that these characters are typing okay so now in order to get the data on our computer so we can show how we would actually upload it we can download these csv files that we just created one thing to note is that if you're following along make sure you allow collab to download multiple files and so now we'll re-upload the files so choose them from your computer so while we're uploading these files we can talk about how we're going to actually load these into a tff data set so what we need to do is um we're going to call this function cff simulation client data from clients and function and then we need to provide two things so the first thing we need to provide is client ids and the way we get these client ids is first we're going to read our csv file and we're doing this using pandas and you don't have to use pandas um but it's just a convenient way to read the csv and then process the data um so first we read the file and then we collect all the unique character names from the character column because these are going to be our client ids and so that's what we provide to this first parameter fine ids and then the next thing we have to do is provide the create data create tf data set for client function and so what this is is a function that maps from a single client id to a tensorflow data set for that client so the way we get this is given a client id we look in our pandas data frame and retrieve only the rows that correspond to this client id and remember a client id in this case is a shakespeare character and then we're going to filter out any rows that don't have any snippet um then we're going to select just the data columns so in our tensorflow data set we don't want the client id column um and one thing to note here is that in this data set we only have a single data column the snippets column but in your data set you might have multiple data columns so maybe multiple features and then a label and you can extend this approach to work for a data set that has multiple columns you'll just have to make a few more modifications to this code then we'll convert our pandas data frame into a dictionary in the format where the column name is the key and the column contents are the value and then we can use a generator um that yields each row of this list of dictionaries and then we can call the tensorflow data set from generator function and then provide this generator and one thing we have to do here is we have to explicitly specify the output types and the output shapes because they are not tensors and tensorflow expects them to be tensors if they're not in this case we have maps then we need to explicitly specify what the output types and output shapes are so let's see if this works we'll try actually calling this on our csv files and then if this works then we'll be able to look at the data set for a particular client so in this case we'll look at the king from king lear and then we can see a couple of his examples so we can see what the data looks like in this case it looks like a snippet a snippet is the key for our dictionary and then the value is a string tensor which is a line that king leary spoke um so if for any reason you are trying to follow along and you weren't able to follow along with the steps to load from csv don't worry about it and you can just reload the original data sets right here using this cell but if it did work for you and you were able to see the snippets for king lear then you can just continue following along and one thing to note is if you are trying to use the user copy and you're having trouble typing along you can always refer to the full copy instead so in this section we're going to talk a bit about pre-processing the data so there's no new tff concepts in this particular section if you want to take a break to ask a question on the jury for example um but we do need to we'll need this code that we're writing to be able to run the next section so if you do take a break just make sure you run all these cells before coming back and trying to run the next cell so um what we'll do is just use some tensorflow data set transformations to prepare our data for training and so the first thing we want to do is we will map our string snippets to tensors of integers where the integers each represent the index of the character in our vocabulary which we defined above and then the other pre-pressing steps are we'll want to first map the characters to ids as we discussed then we want to split them into individual split these um snippets into individual characters and then form um sequences of sequence length plus one so we want to have a consistent sequence length here so we batch it into these sequences of consistent length which is a hundred and then we will shuffle shuffle and then form mini batches and finally we want to split these sequences into an input and a target the input has the first n or where n is sequence length characters and then the target has the last and where n is sequencing characters and the reason we want to do that is because given characters 0 through n we want to be able to predict character n plus one so another thing to note is that if you have trouble following along with any of these steps related to the model itself um all this is based off of a tensorflow tutorial about text generation and so you can refer to that tutorial if you want to dive deeper into the details of the model and it's linked to in this collab um but i'm not talking too deeply about this because we want to get to the section that's specific to tff okay so let's see if we preprocess our example data set that we made earlier what are the types so we have a tuple of um remember the tuple is because we have the input and the target and each of them are shape eight one hundred eight is the batch size and hundred is the sequence length cool um so one an additional thing we are going to do is define a new metric so we want to count character level accuracy so this would be the fraction of predictions where the highest probability is actually put on the correct next character um and we are finding it as a new class rather than just calling keras sparse categorical accuracy because we need to flatten our vector from rank 3 to rank 2. the rank 3 is because we have batch size times sequence length times vocabulary size so finally we can compile a keras model and then evaluate it on our example data set so what you'll see is we'll be evaluating our model which was trained on the charles dickens data on shakespeare but it still performs much better than random guessing so this evaluating on completely random data that's when we just generate a sequence of random characters and then evaluate on that um and as we expect it performs close to the expected accuracy for random guessing but evaluating on shakespeare data performs much better than that so we're already we already have an advantage but we still want to be able to tune the model to really fit the distribution of the shakespeare data awesome so now we'll start talking more about tff so if you were taking a break to ask a question or anything like that this would be a great time to make sure to come back run the previous cells so you're caught up and then we can continue and talk more about tff so emily mentioned in her tutorial but we need to provide this function create tff model and anything any models that we create need to be created within this function because they need to be serialized so that they can run in a potentially non-python environment so in emily's tutorial um should find the entire model within this function but in this tutorial we already have a model so what we need to do is just clone the model and then we can provide that to this cff learning from kerry's model function um so first the carries model clone and the next thing we'll provide is the input spec so this is the way of telling tff what our input data looks like so we can provide this by just um from our example data set which is a tff data set we can get this element spec and this is just a description for tff of what it should expect for the input data um and then the next thing is we just provide the loss and this is just the same as the loss that we provided when we created the carrots model above i forgot losses and then divided from logics equals true because we're predicting probabilities for each of the elements in the vocab and then we finally provide the metric that we created above in this list of metrics so flattened categorical accuracy so now that we have our function to create the tff model we're ready to construct our iterative process and emily described a bit more about iterative processes above so we'll use again a federated averaging process and in this case we only provided the client optimizer function that doesn't mean there's no server optimizer that just means that it is the default um so if you wanted to add a server optimizer function that was different that would work too um so then what we want to do next is create a function to initialize the state of the federated computation so first we'll call initialize on our iterative process um but this will only initialize the model to random initializers not the weights that we have that are pre-trained because clone model doesn't clone the weights so in order to start the training from a pre-trained model we actually need to set the model weights um directly from the keras model so now we'll set the state we'll use the state with new model weights function so we provide the state that our federated computation is using to keep track of all the model parameters and then we need to provide both trainable weights and non-drainable rates our model um it might have non-trainable weights which are part of the model but we don't want to ever change so gff needs to know which parts of our model are trainable and which are non-trainable so we just need to provide these separately okay um and then the last thing we need to do is return the state see okay cool so we talked a bit earlier how as an alternative to federated evaluation we might want to mix federated training with centralized evaluation if we have some held out test data set that we have in a centralized location so first to simulate this we'll create a data set for centralized evaluation by concatenating some of our test data sets for evaluation with keras and now we can write our function to perform centralized evaluation so to do this we will first compile a keras model using the same loss in metrics and then we need to set our weights from our keras model onto earth from our tff model onto the keras model so we can use this function assign weights to keras model so we pass in our carrix model as the first parameter and then our second parameter is our state dot model so this is the model that tff has been updating and then we will compute loss and accuracy by using this curious model to evaluate using our test data set awesome so in emily's tutorial we when we ran federated learning we just chose the same few clients and use them in every single round of training and there are a few reasons you might want to do that one in the tutorial setting emily did it because it achieves convergence faster to only use the same clients um but you might also be trying to simulate a cross silo federated learning setting which peter discussed and in the cross silo setting there are a few clients maybe hospitals is one example and they're highly available but if you want to simulate a cross-device setting where you're trying to simulate phones which can only participate in federated learning when they're plugged in on wi-fi and idle then the way you can simulate that is through random sampling um because we're in the simulation environment and the data is locally available we must use random sampling here so let's try um writing our function to get data and use random sampling here so we can use numpy and we take this list of clients that pass in and then we pass in there for the bubbles that keep popping up so we're going to choose three clients each round to train on and then we'll just sample without replacement here so set replace to false and so that we can see what clients we're training on each round let's cringe and then we'll return the data for each of these clients so we had a function below that computes the data okay so now that we have our function um we can define our training loop and i'll start the training off now while we discuss it because it'll take a little while so first we um have this list of remaining clients which we'll be using to sample from without replacement then we call our knit state function which we defined above which initializes the model with the weights from centralized training now we can iterate for 10 rounds of training and in each round we will evaluate using the keras evaluate centralized evaluation function we defined above and then we can run a round of training using this next function on our iterative process so this actually will push the model out to clients the clients run their updates and then we will run federated averaging to combine all those client updates and when we're calling this we provide the sample clients without replacement function so that we train on different clients each round and then when we evaluate our function uh we'll finally run a round of evaluation at the end to see what our final performance of our model ends up being well so you can see that training has begun um one thing to note this is something interesting about federated learning that you might have to deal with in a real world setting is in this case we had a loss and accuracy of zero so why might that be so if you look back closely at our preprocessing function above if clients did not have sequences that were long enough to match sequence length then their data was dropped so these are all minor characters from shakespeare so they didn't have enough data to even participate because they didn't match sequence length um so this can introduce bias and better learning because there may be clients that only have a small amount of data but you still want to yeah if you don't want bias in your model you want to be able to use that data as well so one thing that we could do is rather than dropping lines that are not long enough for c or characters that don't have enough data to match sequencing what we could do is add a special token to the end of their sequence to get the sequence to sequence length and then we could modify our loss function to take this special padding token into account um so we just didn't do that in this tutorial in the interest of time but it could be an interesting extension um if you want to do that on your own well so as we can see the loss is going down over time evaluation loss but what you might want to do is experiment with other optimizers so we just use plain stochastic gradient descent on the clients in the in the training that we just ran but we could try adding momentum i mean it's very easy to try different optimizers in tff because all you have to do is provide a different optimizer to these functions client optimizer function and server optimizer function um so on an nlp data set such as shakespeare um we might want to try an adaptive optimizer such as atom so we could instead of using sgd at all we could provide atom to the client optimizer um so we could try running this again in the interest of time i will leave you if you want to do that on your own you can run the training with these different optimizers one thing to note though is right here we're only providing um a different client optimizer but when you are trying to experiment with optimizers for federated learning it is important that you also tune the server optimizer at the same time um because the client optimizer and server optimizer interact with each other and um if you tune just one at a time you might not end up with the optimal um learning rates so um we only ran 10 rounds of training when we just ran our training loop above so achieving convergence when we're doing random sampling um is it might take much longer than that so if you wanted to achieve convergence you might want to run more like hundreds of rounds um and that would take much too long for this interactive tutorial so if you did train longer on more shakespeare data however you would see a difference in the style of the generated text but here we can see that we can try setting our trained weights back into a keras model that we can then use to generate text um so here we we have the keras model already that has the weights from federated training and the reason it already has those weights is because we use it for evaluation um and then now we use it to generate checks so right now this still looks more like charles dickens i would say um but if you were to run more rounds of federated training you should see it looking much more like shakespeare great okay i think we can go ahead and take some questions now okay how do we specify client id like ip or url where the client is running or client data is present in notebooks we only have a simulation and we have data also on the same machine so we are not computing communicating with some remote machine so how to take the understanding from these notebooks to production yeah i think this is a good question so in this tutorial we were able to select a particular client id and look at the data and something that you should know is that in production you wouldn't be able to do this so the clients are anonymous in production and you wouldn't be able to look at their individual data so the reason we make this possible in simulation is because it can be useful for debugging um and that's something that you would want to iron out before you would start running a federated learning algorithm in production like the charac the concept of shakespeare characters chatting on fun yeah i think that's fun i kind of want to see a picture of that um do we have some examples of how to perform federated learning in tensorflow.js i don't know of anything like that but if anybody else from the tensorflow team knows about that our tensorflow federated team knows feel free to chime in um hi this is keith uh i think we do not architecturally it could be supported but it's not something that we've really pushed on um cool how do the local and server optimizers interact um so this is getting a lot more into the research territory so there's actually some recent papers that zac charles just put out a paper on archive about the way that the local optimizers and server optimizers interact to not only achieve um faster conversions but sometimes achieve where the model ends up converging so there's a lot of things to unpack here so i don't know that i can quickly answer it but there's a lot of interesting papers out there that you can read about i guess simulation is used to create data charts locally but that's not the case for distributed data will the package change to access data remotely um so yeah so right now we can do simulation using gff um if we in the future support a production deployment then tff as a library would look the same but um because the gff has been designed so that it will work as a language and the way you use it would be the same if we were to support actually running on clients however we don't support that right now in a production environment how does the server determine which clients to accept so this depends a lot on the clients themselves so first of all as we mentioned the clients have to be plugged in idle and on wi-fi or an unmetered network network and then the clients will check into the server and then the server can optimize at what point the clients check in by then providing clients a window in which to return so that way the server can control the rate at which clients are checking in and ensure that it doesn't get overloaded and at the same time also ensure that if it has a small number of clients then enough clients can check in at once in order to actually run a round of training and have enough clients rendezvous when can we expect a production deployable version of tff um i can't give you any information on this i don't know if anybody from the tff team wants to address this um i i yeah i don't think i can either i don't know that anybody can um that's it that's all i got it'd be interesting to know in practice how the cross device training would work i'm thinking if devices are likely idle at night then do you bash training across time zones for example yeah this is a really good question um because it's very true that devices are idle at night because people plug in their phones while they're sleeping um and so one thing this can introduce is actually um bias because certain rounds of training will use data that is from a similar time zone um so even if you don't try to bash the training across time zones that ends up happening just by the nature of when clients are checking in so that might be something that you try to [Music] when you're determining how to train your model this is something you might need to take into account and make sure that you're trying to get a wide spectrum of data from different time zones so that you aren't um learning on just one time zone and bias in your model how can you ensure that clients are converging at a good pace because you would always be working on simulated data is there are such metrics published from clients um so i think in this case you can use the training loss and training metrics um yeah because clients do publish metrics it's um whatever metric that you define in your tff model um which we showed we provided the flattened categorical accuracy metric but you could provide other metrics as well and then these metrics are averaged on the server so then what you see is the average of the training loss from all the clients um or the average of whatever other metrics um so yeah i think that would be the way you can ensure clients are converging so can you elaborate on training server and client optimizer together so all i mean here is that if you tune the client optimizer by itself um and then you pick a particular learning rate and then you tune the server optimizer later and pick learning rate for that then that those two learning rates you end up picking would not necessarily be optimal as if you were to try the combinations of different server and client learning rates could you dive a little deeper into the relationship between client and server optimizers um yeah i think there's been a couple questions about this but maybe zac charles might have more information in his tutorial about this or um you might want to jump in with yourself yeah yeah i can jump in um so that we've uh posted a lot of links to some maybe some papers that discuss it uh but i i think in general what you want to think of is that the client optimizer works a little like it does in centralized learning it actually learns from data on the device the server optimizer does something a little bit different it takes all the client updates it averages them and then it treats this as kind of a pseudo gradient um for some overall loss function at a very very high level um and there's a lot that you can explore for how to set the client learning rate how to set the server learning rate i can't discuss it all right now uh i'd encourage you to check out a paper we put out called adaptive federated optimization it discusses this problem in much more detail and if you check out some of the comments on dory i've posted a link to the paper thanks are all clients required to use the same type of optimizer um yeah i believe this is true you would use the same optimizer over all clients if anyone knows of a way to use different optimizers you could jump in but i i'm pretty sure you have to use the same for all kinds so download the notebooks i see yeah so for um future tutorials make sure that you just use the notebook in colab um online so you shouldn't have to download the collabs at all you can just run them in the cloud using collab so just make a copy of the collabs that we provided um and then in your own copy you connect to your runtime and you can follow along there without having to download it are the client computations paralyzed across clients and different threads if not is there a way to parallelize the client computations is there a maximum number of clients that we can parallelize i think this might be a better question for someone on the tff team wait start could we repeat it i was spaced out for a sec there are the client computations parallelized across clients in different threads and is there a maximum number of clients that we can parallelize uh yes they are parallelizing different threads the tff like executor architecture you can roughly think of as spinning up a thread to represent each client if you're running everything locally you can also write everything in distributed manner which you know will pin a bunch of clients on a bunch of distributed machines um actually currently is uh such a limit so they'll start running serially eventually um we're you know that's a bit of an implementation detail i think you should probably think of them as running serially each on its own thread or concurrently sorry each on its own thread any pointers to recent work related to device selection for federated learning um i'm i don't know anything off the top of my head related to device selection in particular know that in the systems paper that was published a few years ago it briefly touches on the problem of device selection in particular paste steering which is the way the server tells the clients when to come back but i don't know about any work that particularly addresses this more recently but somebody else might in a real world setting how would you tokenize words on devices and make sure device vocabs are the same across devices um so this is a good question so in a real world setting there is some pre-processing that would be run on each device before you would begin the actual training so the clients would have some instructions on exactly how to pre-process their data and they would be doing that before actually training the model with this data um yeah i think that's what other wrappers other than keras are available um well you can use tensorflow directly and what do you plan to support in the near future so maybe somebody on the tff team could address this uh yeah so you you definitely can use pure tf and just implement your forward pass like straight up normal tensorflow i think we don't currently have a plan to support other high level tf apis um but i i put a comment on that question we you know we are open source we're definitely would accept contributions for other ones if you wanted to contribute them but i don't think anybody internally is is working on it i don't know if zach has another idea zach g who i just saw joined nope guess not um so one thing to note is that if you did ask a question on the dory especially if you were asking for a link to a paper or something like that um you might want to check back at your question because team members are actively responding in text and you might not get a notification for that so just check back on your question if you're looking for a link especially and you might find the information you need answered by a team member so all the models from clients needs to be the same type um so yeah i think that this is saying that we are training the same model in every client and that is true so part of the training step when we are running the next function on the iterative process is the server is broadcasting the model to the clients and the clients then all train the model in the same way great so that looks like we're at the bottom of our list of questions so now i believe we're going to take a quick break and after the break we'll come back and waikong will present a tutorial about compression in tff so we'll reconvene at 11 to hear from oh it's going to be 10 minute break so we'll reconvene at 10 50 to hear from [Music] wacom okay all right thanks everyone uh thanks for coming back to the tutorial in my part i will talk about how to use tff to do some modern update compression in our failed learning research so in this tutorial we're going to use em-ness dataset 2 to demonstrate how to enable a lossy compression algorithm to reduce communication costs in fair learning algorithms so we will mostly using tff.learning.build favorite averaging process api as well as the tensor encoding api which can help us to do some compression on the models so let's get started first we install a package and run this cell and we should be able to see a hello world printed out so we know the package is successfully installed all right all right so uh before we jump in i like to uh give you an overview of this tutorial so we can take a look at the table of contents first to to see what we're gonna do so basically we will first uh prepare the input data and we will gonna define a model uh which includes first we define current model and then a tff model function which is quite similar to what you see from our amuse tutorial and then we will uh build a training and evaluation process and run it with symmetric visualization enabled and then we will uh jump to how do you build a compression uh functionality english uh training process and we will train the model again to see to compare the difference between uh errors and risk compression without compression so let's go to the first step providing the input data so uh this is basically the same as what we uh see from any tutorial the difference here is that uh in the uh the model we use in this tutorial is expecting something you know different from the uh model used in the amnesty tutorial so we will uh expecting a a shape like this to negate an a1 so that's why in this batch format function we have used a different uh format here and after we load the data let's go to the modulo part so we define a cn model uh note that this model is used in the original paper of favorite average algorithm if you're familiar with keras then you know that if we do model dot summary we should be able to see the structure as you can see it is a comp layer followed by max fleeing and then a complete max pulling and the flattened and two fully connected layers all right so with the keras model let's try to build a tff model function so uh note that in here we uh we will need a function which produces a pff model instead of just defining and in addition to that the function cannot just pre-capture a a constructed model it must create the model in the context of a function the reason behind this is tff is designed to go to devices and tff need to control when the resources are allocated and constructed so here comes our first question so let's build a key factor learn the model so as we mentioned we will first need to create a model in the context within this function and then we return through i think you're already familiar with this with uh 25. from cars model you can create a if you have a line model let's give it the right model and input expect uh here we just we already prepared in prospect for you so you just need to directly pass it in and uh in the comments we suggest you use less cross entropy loss and matrix is the accuracy all right sync yeah so now let's build the training process so we're using tff.learning.build very averaging process to build the error process i think you already know this from ms tutorial if you haven't this is basically a main api we're going to use in this tutorial so we will build the evaluation process as well okay now we have data data set model training uh process and the evaluation process what we're gonna do next is to build a function which to run the experiment so here we have this function uh let's don't worry about you know it is long but let's uh go live online to see what this function does the first list function will need a training process and it can you can specify how many rounds you want to run and how many clients you want to sample each round and this summarizer is used for tensorboard to visualize the metrics so as you can see first we initialize the process here and we sample a from training set and create the data we run in this drum using matt make fairy data function and we give it the initial state and the data in this round and after this run this dot next function we will get a new state and the matrix so this three lines because we are doing a compression algorithm in this tutorial so uh we will be interested to see you know how many data transferred from clients to server and how many data transfer from uh server to client so with this a little utility function here we will be able to get those informations uh if you're interested in uh knowing more about this uh here is the link to the api docs so in here broadcast bits is uh where uh is what the the amount of data transfer from uh server to clients uh we use broadcasts uh to to name this process and we use aggregation to referring to uh when uh data transfer from client to server then we print out those informations and we do a evaluation run we sample from test that and make the test set data and run one round of the third federated evaluation so on here note that normally you don't do one evaluation pushing because they will slow your uh training you slow your whole training process usually uh people will do is uh every uh maybe 100 1000 runs they do one round of evaluation but here we just do it per arm and after we have the training and uh evaluation uh metrics we use a summary writer to add notes to pencil board let's run this and let's start the pencil board here all right right now it has nothing it's expected because we haven't started the chain english cell let's start training uh as you can see we create a summary writer for tensorboard and we call it strong experiment function give it the training process and we say we're going to run 10 runs and in each round we'll sample 10 clients and give it a summary right so that uh the tensorboard can be populated with the matrix so as you can see there is already some logs printed out so in let's let's focus on the round zero so in here we can see uh this is accuracy means the accuracy is increasing and the loss is decreasing and the uh forecast bits are increasing and the aggregation base is also increasing and we also bring out the time uh spend for this round so uh as you can see this value is the amount of the data transfer from client to server and server to client all right looks like we already have your runs so if we go back to tensorboard we should be able to see this so i recommend you uh set the smoothing to zero and uncheck this box to just get the original figure as you can see this one is the aggregated bits crossed 10 rounds and we can also see the broadcast space increasing and the training loss is decreasing the accuracy is increasing we uh we also bring out the evaluation metrics so as you can see the test loss is decreasing the uh test accuracy is kind of increasing all right so i think let's uh if you uh if you see the uh tutorial from the image classification one you should be pretty familiar with the previous concept so uh but in here know that we only run ten runs just to make sure everything works but for an interesting experiment we recommend you running the code for at least 15 000 rounds which could give you a pretty good model with about 98 accuracy and you if you keep training the tests and the training accuracy will keep increasing so uh that comes to the question number two so if you're interested try to find tuning hyper parameters like the server generate current epochs current batch size and see uh how many uh how many communication rounds you can get uh when you get 99 percent uh test accuracy let's try to you know converge to that accuracy using as less run as possible know that running 15 000 runs on a gpu collapse runtime which is the current code runtime we are connecting to could take you less about last uh less than three hours but if you fine tuning the type of parameters you can get even faster convergence rate all right let's move on to the compression part so we will implementing the compression functionality using tensor encoding api so let's uh briefly uh introduce uh what is tensor encoding api so tensor encoding api is a general tensorflow tool for invertible potentially lossy transformations uh for the api we are interested we list this too so first is the research uh surface api where you can define a either a identity encoder or a uniform quantization encoder so basically identity encoder just does nothing but we will cover what is uniform quantization later so uh testing coding api also provides apis for fairly learning so for the broadcast process it defines a uh test encoding dot core.simple encoder and for aggregation it has a tension coding.core dot gala encoders you may ask why you know uh we need different uh encoders for the broadcast and aggregation process i think that's a good question i recommend you exploring the tensor encoding api more to find out the answer all right so next we move on to how to build a encoder to perform uniform a uniform quantization so what is uniform quantization this is basically a simple running process so in which uh each tensor value is rounded into the nearest value from a set of quantization levels so i have a little example here for you to understand this so suppose we have a n values in this x array and the mean value is x mean and the maximum value is x max how do we perform a 2-bit uniform quantization first we will separate this uh range into four parts because uh four two to the power of two is four and suppose if x1 is somewhere here then after the uniform quantization x1 will be rounded into the nearest uh quantization layer which is this one and after the quantization uh uniform compensation uh the x all the elements in x will only have four distinct values and usually they are integer values so this is how this is why you get the value compressed all right to implement that we just simply do uh tension coding dot encoders dot uniform quantization and give a we're gonna use eight bits here so just give it the number eight so with that how how how do you build encoder for tff how do you build encoder for further learning so suppose you are given a tensor with this uh 10 times 10 this shape and the data type is float how can we do this how we're going to create an encoder for this tensor okay the answer is quite simple you just do create a simple encoder using this api as simple encoder and give it the quantization code and the specifications of the tensor and that's it that's this is everything you need to do and uh for the aggregation process it's similar just uh simply use this as scalar encoder instead of as simple encode right now we have encoder for each individual tensors how how we're going to do next what we're going to do next so uh the first thing we're going to do is to define two functions the broadcast encoder function the mean encoder function this main code function is for aggregation but in here we are just averaging all the updates so we use main as a name here so uh each of this function will create a instance of either a simple encoder or gallery encoder to encode a individual variable in the model so it is important to know that we don't apply a compression method to the entire model in this tutorial you can do that but in this example we just uh decide to compress each individual variables in the model independently so uh in this example we will apply a uniform quantitation with eight bits to every variables with more than ten thousand elements and for the elements of the variables with less than ten thousand elements we will just do nothing we just apply identity encoder so here comes to our question three so uh create this uh two functions so as you can see for the broadcast encoder if the number of elements is larger than this we will create a [Music] uniform quantization encoder so i will just copy this here all right the answer is just in this tutorial and for the elements which is uh less or equal to this value we will just simply do nothing you just give it a dot identity and yeah all right so for the i think we need to run this all right so we'll still return a simple encoder but we will replace it with uh replace the actual encoding with the identity encoder so same here for the aggregation encoder we will do the same thing but he will use gather code instead so i'm copy pasting code in here just to save some time but i recommend you uh try it yourself all right so let's do this all right so now we have this what we're gonna do next so the next step before we go to the next step uh i want to give you a a preview of the error process so a user process can be seen as a composition of the following process the first is the broadcast process which sends models or whatever you want to send from client from server to client and then a client process where you just do something with the model and your current data and then after you've done this we will send whatever you you have and back to a server and uh after that a server will aggregate those uh updates and uh do whatever it wants to update the server mode so uh but um [Music] in this uh compression uh tutorial we will change is only you know two of these four process which is the broadcast process and aggregation process because we're gonna compress before a broadcast and uh aggregate before a compressed before aggregation right so can we still do that using the uh tff of the learning api uh the answer is yes so if you use this tf.learning.firmware encoded broadcast process from model and give it the model function and the encoder function we just defined you can create a broadcast process and the aggregation process from here and and then you know the the very uh you know good api is that uh you just simply pass this to process into this high level tfw learning the build fair averaging process and then you will get a user process risk compression because the broadcast and aggregation process is replaced by a process with compression let's run these two it is very simple and convenient right all right let's train the model again to see the difference all right here comes the question four we're gonna change the model using the function we just defined like a minutes ago so we will do one experiment and give it a thyroid averaging process with compression and we will run two runs this time as well and each round will sample templates and we will be using a different summary writer just two or just four compressions so that we can have uh two different curves in the tensorboard so let's run this hopefully it runs all right i think it is running good good good i don't need to like debugging anything that's good so uh as you can see the uh lot are similar now accuracy is increasing loss is decreasing we have the uh forecast space and the aggregated this available as well but this time you will see this uh this number is you know a lot less than what we have previously so if we navigate back to the tensorboard cell which is here all right so uh this this uh orange line still here but we got a new uh blue line here uh which is the curve for the error process with the compression enabled as you can see the uh aggregation bits is you know there's a significantly drop in here as well as the uh broadcast bits but we see you know similar uh chance in loss and accuracy which means we enable the compression version but we don't sacrifice the loss and the accuracy in this case you can see the test loss it's also decreasing test accuracy so so it's also increased all right let's go back to the next okay that's everything for my tutorial and we also provide a few uh exercises if you're interested you can try this offline and it provides some instructions for how to implement the custom compression algorithm and we also provide potential valuable open research questions including you know try non-uniform quantitation or lossless compression or you know adaptive compression represent and if you're interested more in this uh research uh area we also provide a few links uh to the papers which is related all right let's move on to the uh story questions all right so the first first question is uh what should we give the inputs back to the very model can it just infer from the data set okay i think this is a a very good question it can infer from data but we need this uh in prospect when building the um a tiffany model which you know it hasn't go to the client yet we just need this information for building the model but i think uh anyone from 2015 can give a more in-depth answer to this uh yeah i commented on it um i think that we as we mentioned i think we wanted to kind of decouple a handle on the data set itself from the ability to uh like define the model you know because these things are going to be happening very you know in different places like across different platforms or whatever um in kind of the general case uh and actually we used to accept a dummy batch here uh instead of an input spec and inferred the the data the model is going to see from that dummy batch but this is less general because um a dummy batch is always going to have like concrete specified uh you know tensor sizes in each dimension you might want to do things like padded batching where you are uh you know where a bunch of things are going to be unspecified sizes uh at the at the symbolic level and only specified you know later when data is actually fed in so rather than um couple like ownership of the data set that the model is going to ingest we leave it up to users to to kind of pull the data sets back or the inputs back off the data set and pass it in there we think that that's the most general approach we got yeah thank you kids so i think uh the next question will also need i think you you uh also need you to respond i think it's already something in the implementation details of that yeah i think that um i was actually chatting with a couple of the other people offline here uh i think that the fact that observed behavior is a bug i believe um we actually so you know tff is of course open source but the uh way that collabs are run the back ends internal external are slightly different and i noticed this behavior when we were looking at these clubs previously um and the internal back end didn't reproduce it so i assumed that the latest tff release would fix it because there's been a couple of changes that could affect this uh i think that there must be a bug into the caching layer or something of tff which would cause the next computation to be executed twice or something like that um but i haven't chased it down so it's a great question but i don't have a good answer yeah all right so next uh can we use the simulator for me max for all the clients this may be useful for a secret oh i think the answer is yes if you uh right now we just use the uh very a few arguments of the tencent coding api but i think there are uh more you know ways there are ways you can customize the encoder for example just set the mean and max value for each uh for for the individual uh for for every client i think that's doable if you're interested please go to the encoding api docs for more uh information is the input spec anyway relatable to the white noise ammo there's a definition uh i'm not we're quite familiar with this one looks like it's really to a differential privacy uh anyone from our research team can answer this okay maybe jump in for a second i think this uh we might have to look at this github uh uh example before we can answer but we'll leave a comment on the dory afterwards uh to address this specifically all right thank you zach so uh now let's move on to let's welcome zach charles for his great tutorial about how to build your own error process okay uh hopefully i'm unmuted and everybody can hear me all right um so let's take a little bit of a a stretch break because i know it's been a while thank you all so much for attending this is the last talk uh and it's gonna be a little bit different than the rest um i'll get into that and so let's just you know start up our collabs connect to our session load tensorflow federated and load some relevant libraries uh and while that's happening i want to give you an overview of what i want to talk about and so the previous three tutorials you saw primarily dealt with the tff learning api which is really really great if you just want to get in and start trying experiments you want to try simulation you say i have this data i have this great idea i'm going to try this optimizer you know what have you um but the tff learning api as some of the dory questions have gotten at is not the be all and all of federated learning algorithms right like there's only so much that can go to an api without making it horrible so if you want to go deeper if you want to do something more complicated you might actually have to implement your own custom federated learning algorithm and that's what i'm going to talk to you today about and so i have three goals in this talk for you the listener hopefully you come out of this understanding the general structure of federated learning algorithms the second goal is just to explore the federated core of tff a little bit uh chris talked about this in his intro and i'm going to delve deeper into it what we're going to do is use this federated core to implement implement federated averaging directly okay so first things first we are going to start off with our input data and we're going to pre-process the e-mnist data set this is a federated mnist data set we're going to do it basically the same way that was done in the first tutorial so we're going to use the simulation api to load our data i'm going to specify a number of clients and a batch size and i'm going to do a pre-processing function that basically just flattens the pixels and i'm going to convert them the pixels and the label into a tuple i'm gonna batch it and then i'm going to apply this uh flattening function uh and so let's wait for my code to load uh great and so now what we want to do is just sample a couple of clients that we can use later on down the line and emily and nova both talk more about this in detail i'm not going to go to it too much in detail but remember that what i'm going to do is i'm going to sample my client ids these are identifiers for my clients and i'm going to do it just using np.random and i'm going to sample from the emness train client ids this is kind of a list of identifiers for the clients i'm going to sample num clients and i'm going to do it without replacement once i've sampled that i can make my federated data by just pre-processing these data sets and i use the create tf data set for client and now i have my federated train data which is a list of tf data sets we're going to prepare our model it's going to be a really really simple model it's the same one that we used in the first tutorial uh it's going to have a 784 input that's 28 by 28 a lot of you who have played around with mnist know this number quite a lot uh it has a single dense layer followed by a softmax and we're going to wrap this into a tff.learning model no big deal okay so now let's get on to kind of the core of what i want to talk about and it's the tff learning api primarily implements things like federated averaging and you can do things like specify your custom optimizer function but maybe you're interested in more uh sophisticated methods i just got a note that i should increase my text size a little bit so let me do that hopefully this is a little bit better for people uh and so things that you might be interested in uh implementing are things like regularization clipping uh oh shouldn't double click uh i always forget that uh or something that that maybe is on your window is uh gans uh federated gans are a really really exciting research topic that we've put out some work on and you might be interested in in making your own and so to do these we're going to have to really write our own custom fl algorithm and in general fl algorithms have four main components there's a server to client broadcast step this is talked about in wacom's tutorial there's a local client update step that's the client training on its own data there's a client to server upload step that's you know the the server aggregating the client updates and there's a server update step the server says oh i saw these things from my client model how am i going to update my own model and so this is a very very simple diagram of the process the server model here is broadcast to a bunch of clients the clients do their local training in a b kind of represents the end result of all these clients doing their training the clients upload their model to the server which is represented by this cloud here uh and this cloud somehow updates its own model that's c and then repeat so the way we're going to represent our federated algorithm is as we've said before an iterative process an iterative processes this is an abstract based class that has an initialize function and a next function the initialize function is just the server saying i'm going to start my computation somehow and the next function is running one round of the four steps that i outlined above and so what we're going to do is we're going to implement federated averaging over the course of this tutorial uh and so we're going to first write kind of a skeleton of what it should look like so the initialize function this is just the server starting it's going to start doing something and for us we're just going to initialize our model function and pull out the trainable weights there's only trainable weights on our model so that's not such a big deal but i'm going to implement the initialize function in the following way uh i'm going to say model equals model function great i instantiated my model and i'm going to return the trainable model model weights excellent very easy that's the server saying let's get going uh the harder part is the next function and i want to sketch out what it's going to look like it's going to have four steps we talked about the four steps before there is a broadcast step so what i'm going to do is i'm going to broadcast the server weights to the client server weights at client equals broadcast of serverweights this is pseudocode by the way right now we haven't implemented these functions but we're going to once i have the server weights at the clients i can then make the clients do their own local update and i'm just going to say client weights equals client update and it's going to take us input the federated data set and the server weights that are located now at the client after the broadcast step the server is going to take all the client weights and simply average them it's going to compute a mean and then the server is going to update its own weights using a server update function done that is the next function that is one round of my federated computation we're going to implement these four components separately and what's really nice for those you familiar with tensorflow is that two of these blocks can be implemented in pure tensorflow that's going to be the client update function and the server update function it's going to look a lot like training functions that you've seen in tensorflow if you have experience with that so let's get going the client update is going to use our tff learning model to do uh training instead of deferring to the tff.learning api we're going to have to implement our learning directly and we're going to use tf gradient tape so this is back propagation for the people who are familiar with it um and what we're going to do is we're going to just compute the gradients using backprop and we're going to apply the gradients using our client optimizer pretty easy so let's let's dive into the code and i've written out most of it already and i don't want to go too much into it because it is just tensorflow code so hopefully you've seen something like this before but the client update is going to take is input the model some data set the server weights these are the ones that were broadcast from the server and a client optimizer i'm going to pull out the trainable weights of my model i'm going to apply the server weights to them so i'm going to assign them there the the client weights i don't really care about the initial weights of my model i care about what are the server weights i'm going to start from there and then update then what i do is i just take a bunch of batches in my data set i use gradient tape to compute the gradient and the way that i do that is by saying outputs equals model.forwardpass so this forward pass is something that's implemented in the tff learning uh models um so this is something you can just call and say i want to do a forward pass on my batch done you get some outputs the gradients you just call tape.gradient this is something that's pretty standard in tensorflow training and we're going to apply it to the loss function that's in our outputs and it's going to be with respect to the client weights we zip up the grads and the client weights and then we're just going to apply them using our client optimizer and again this is going to look very similar to anybody who's seen tensorflow training i'm just gonna say client optimizer dot apply gradients of grads and bears so what this does is this just applies the gradients to my client weights and i return my client weights done that is my client update function okay so this is a lot but hopefully digestible for people with experience with tensorflow the server update is going to be even easier remember that the server update in federated averaging is not very complex the server takes a mean of all the client weights and it says this is my new model forget what i had before this is my new model so my server update is going to take as input its current model and the mean of the client weights and all i'm going to do is pull out the trainable weights in my model and assign the mean client weights that's it done so really i'm just saying forget what i had before that was garbage that's no longer useful take the mean of my client weights and use that this code is a little bit overkill i could have just returned the mean client weights but i've written this here as a skeleton in case you want to do a more sophisticated server update step so for instance one thing that you could think of doing is instead of saying oh my previous server model is garbage my new client weight models are great you could say what about the midpoint between them right maybe that makes more sense maybe that's a little bit more of a conservative step and this is actually really really analogous to something called the look ahead optimizer i've posted a link in case you're interested uh and i challenge you to implement that take the midpoint of the server weights and the mean client weights instead of doing what i've done up here okay but this has been pure tensorflow codes so far and you're here for a tff tutorial um but the fact that we can use so much tensorflow code is by design tff is going it's supposed to allow us to use a lot of the tf code that you're familiar with in order to do uh client training and server training but in order to specify the orchestration logic these are the things that bind these client and server updates together we're gonna actually have to use the federated core of tff so let's do a little diversion we're going to give i'm going to give you an introduction to the federated core this is kind of one of the main parts of tff and the federated core is a set of lower level interfaces that are basically the foundation for the tff learning api tff learning api uses the federated core to define the things that were talked about in the previous uh tutorials but you can use them directly um and so the the kind of goal and chris mentioned this in his talk is that what we want to do is combine tensorflow code with distributed communication operators and this is things like uh broadcasting and distributed averaging and we want to give you explicit control over the distributed communication without requiring system level implementation details right like probably you don't want to have to specify point-to-point network communication that's just too much um and the other thing that's really important is that tff is designed for privacy preservation so it's going to be very explicit about where data resides and this helps prevent unwanted accumulation of data at places like the server and at first it might seem weird to have all this explicit data control but just remember it's in the uh in service of privacy preservation so in tensorflow we have you know tensors these are kind of this fundamental thing that we use everywhere in tff instead a key concept is federated data and what this refers to is a collection of data items distributed somewhere in your system and so what happens is that we will model an entire collection of data items across all devices as a single federated value so for instance if my clients have a bunch of temperatures if they're a bunch of sensors and network with temperatures i'm going to model all of their temperatures as a single federated value and the way this works is that we're going to have a float representing the temperature of the sensor and it's going to be actually a federated float and i've defined it here federated float on clients it's a tff.federated type and there's two aspects to it there's the tf.float32 this is saying what kind of data type is it it's a float32 the second aspect is saying where does this data reside and this tff.clients is telling me oh it resides at the clients as opposed to the server and so federated types represented by a type t like float32 and a group g of devices that's tff.clients and we can look at this type signature just by calling string of my federated float so let's look at what our type signature looks like it looks like float32 at clients pretty self-explanatory and so the reason that i'm harping on this so much is that again i want to know where my data is at any given time and so tff focuses on three things data flow 32 where the data is placed that's clients and how the data is being transformed we haven't gotten to that yet but to talk about how data is being transformed we're going to need federated computations so battery computations tff is a strongly type functional programming environment this was talked about by chris and since it's a functional programming environment its central unit are federated computations these accept federated values like our float32 at clients from above and they return federated values so that you go from a federated value to a federated value i apologize i might be cut off i just stickulate a lot when i talk so hopefully the hand motions are not necessary to understand okay so as an example of a federated computation let's say again that we're going to have a bunch of sensors that have temperatures on them and we just want to compute an average we're going to define the following federated computation uh and it's get average temperature it's going to take its input my client temperatures and it's going to use tff.federated mean of those client temperatures so it's going to compute a mean now the really important thing here is that we have this type annotation at the beginning we have this at tff.federated computation and then we say okay here is the federated type that we are expecting as input so what i pass into get average temperature has to be a federated type of type float32 at clients and so what's important is that this is not quite a tf.function decorator which you might be familiar with um this goes to what chris was talking about where tff is secretly three languages there's tensorflow there's python but then there's something that's neither it's kind of a glue language but the thing that that to abstract away from all of that you should just think of tff computations as functions with really well defined type signatures telling you where data starts and where it ends up so let's look at the type signature of our battery computation it takes the float32 that was at the clients it applies this federated mean and uploads it to the server so it starts with a float32 clients and it ends with a float32 at the server uh and and the fact that it goes from client to server is baked into federated bean there are other things we could do if we wanted to compute an average of the clients and then send it back to all the clients um okay and so uh now to actually start debugging a little bit uh tff does allow you to invoke these federated computations as just a python function so what we're going to do is we're going to take our get average temperature function and we're just going to call it on a list of floats let's say my temperatures are uh 20.1 i'm trying to be in celsius uh here but i'm actually very bad at celsius so i hope that this is like a moderate temperature uh 18.5 so maybe these are the temperatures of my sensor and i call get average temperature and look i get nineteen point two six six six six eight um i can't do the math in my head but this looks like the mean so everything is good okay now let's bring in tensorflow and there's before we bring into tensorflow there are some really really important things to note um the first is that tff computations are non-eager so for those of you who love eager mode and tf you can use it to some degree in tff but tff computations as a general rule of thumb are not eager the second is that federated computations should only consist of federated operators these are things like the tff.federated mean that we saw before they cannot contain tensorflow operations directly instead what we do is we first put tensorflow code into blocks that are annotated with tff.tf computation and it's really you can just take standard tensorflow code and annotate it with this and it works perfectly fine so as an example we have a really simple function here that takes as input x and it adds 0.5 to it and all i've done is i've added a decorator that says it's a tf computation and it's going to accept a float32 and i've put it here because of the strongly typed nature of tff so let's define our function and now just like the federated computations these have type signatures but they don't have placements they're just abstract tf code so the type signature of this tf computation it takes in a float32 and the function is traced through when you put this decorator and it knows it's going to output a float32 pretty easy and what's nice is that the tensorflow code can go with the server or the client that's why we've kind of omitted placements within this type signature so again the difference between federated computations and tf computations the former has explicit placements the latter does not to use a tf computation within an actual federated computation we're going to use tff.federatedmap which basically just applies a tf computation while preserving the placement so as an example we had this function before that added one half to elements and i'm going to design a version now that is going to apply the function to client temperatures it's going to say i have all these temperatures at my clients this tf.flow 32 out clients and i want to add 0.5 to all of them so it's pretty easy to do so uh i am just going to say return tff.federated map this is saying apply the tf computation i care about the add half function i'm going to apply it to x done so let's look at its type signature and it takes in a float32 our clients and it spits out a float32 at clients where all the clients have had their temperature incremented by one half so this has been a pretty short introduction to the federated core but i have just a couple lessons that i think are really important to keep in mind tff operates on federated values barrier values have a federated type such as float32 and a placement such as clients variated values are transformed using federated computations which have to be decorated with the tff.federated computation and a type signature to use tensorflow code you first put them in tf computation direct decorators and then you can incorporate them into federated computations generally through tff.federated map all right let's pause let's stretch a little bit we understand the federated core a little bit and if some of what i said didn't make sense don't worry we have very very in detail tutorials on the website but this is really just supposed to be kind of exposure to these concepts but let's go back to implementing federated averaging um remember above that we had this iterative process and we had to define two things we needed our initialize function and our next function and the next function is going to use the tensorflow that we've already defined in our client update and our server update again that was where pure tensorflow blocks now in order to make our algorithm actually federated we need both the next function the initialize function to be federated computations because fundamentally they're going to take things that are either at the client or the server and put them either at the client or the server so let's write our tensorflow federated blocks we're going to first deal with the initialization computation and remember that we want to initialize our model we have this model function that we have that we're just going to instantiate it and so to write the corresponding tff.tf computation what we're going to do is say okay i'm going to write a tff.tf computation that's my decorator and i'm going to define my server initialize function my server is going to call the model function and it's going to return the model.weights.trainable okay this is just a standard warning doesn't really matter um you'll notice that this is exactly the same initialized function that we did before but with the tf computation decorator that's it we just slap that on and we're good to go now we can turn this into a federated computation using tff.federated value what tff.federated value does is it takes something that doesn't have a placement and it gives it a placement so my federated computation now is going to take the server init function and just place it on the server and we've now created our federated initialize function the next function sounds harder but it's not going to be that much harder the first thing we're going to do is we are going to uh turn our client update function which was pure tensorflow and we're going to turn it into a tff.tf computation and remember it needs to accept two things needs to accept the client data set and the server weights and it outputs an updated client weights so we will need to decorate our function using type signatures but we can actually extract them pretty automatically from our model remember we have our model function so i'm going to instantiate it in our dummy model and i can extract the necessary type just by calling input spec on my dummy model and this is part of the reason getting to a question in the previous tutorial why we pass in the input spec because uh we can actually use it before we've even seen any data on the model so let's look at the type signature remember that our model takes his input these 28 by 28 images with integer labels that have been flattened and so if you look at the type signature of my tf data set it's a float32 with a question mark 784 this question mark refers to kind of a variable batch size i don't know what the batch size is going to be it can depend 784 is the dimension of the tensors my n32 again has a question mark because i don't know the batch size and one that's the label i'm also going to extract the model weights type by using our server init function above remember we defined a federated computation above that was our server init and just by calling its type signature i can get the type of my model weights and again that's by doing type signature dot result and so let's just look at that my model weights look like this my first layer has a float32 uh 784 10. this is my uh input layer that maps my 784 flattened vectors to a 10-dimensional output layer this over here this float32210 these are the bias units so again these are the weights the other these are the biases you can exactly see our model architecture in the type signature to create the tf computation for our client update we first give the correct decorator this is a tff.tf computation and i put in the tf dataset type that i had above and the model weights type above and note that they exactly correspond to the inputs of my client update function the tf data set has type tf data set type the server weights have model weights type and so to actually do the client update function we're going to use our tf code above but first i just instantiate my model i instantiate my client optimizer and i use my client update function and just a quick note i am instantiating my model my client optimizer outside of the client update function because the client update function is a tf.function and you can't create variables within the scope of the tf.function there's some great tutorials on the tensorflow website about tf.functions and what you can and can't do and why you should and shouldn't use certain practices i'm not going to go too into it in depth but it's just something to be aware of tf.functions okay so this was the client update function and we're going to do exactly the same thing but with the server update function we're going to i'm going to walk you through it so we want a tf computation for my server so let's start with tff.tf computation okay and the thing that was important to remember if you go back up to the server weights uh the the server update function i don't want to go back because i don't want to confuse anybody but the server update function we defined above accepted exactly one thing they accepted the mean of all the client weights so it's going to have to accept something of model weights type so this is my tf computation annotator and i'm going to define my server update function and it's going to accept my mean client weights and i'm going to do something very similar to what i did above i'm going to first instantiate my model and i'm going to return my server update of my model and my mean client weights great okay now uh one thing that is important to note is that there's a little bit of an asymmetry here um the client optimizer here doesn't have a parallel here i'm doing a very simple form of federal averaging where i don't use a server optimizer but in general you could imagine doing much more okay so now i want to finish up my algorithm we've really we've talked about converting my client update my server update into tf computations but we really need to bring it all together now and so i want to remind you all of the four elements of an fl algorithm there's the server to client broadcast step there's a local client update step there's a client to server upload step and there's a server update step really we've dealt with two and four we've turned them into tf computations now we're going to create a next function that uses all four of these and to do so i'm going to need some federated types because it's going to be a federated computation these are going to be my server type this is basically a federated version of the model weights type of playstat server i'm going to need a federated data set type this is a federated version of the tf data set type that's placed at clients i would uh recommend going back over this after the tutorial to maybe understand it a little bit more in depth um but if this part is a little bit confusing right now don't worry too much about it what i want to do is i want to write my next function this is the thing that takes my server weights and my federated data set and it does one round of federated averaging and remember my four steps above the first is going to be a server to client broadcast step and this is really easy i'm going to say server weights at client equals tff.federated broadcast of my server weights that's all i need to do that's now telling my federated computation hey these server weights they were placed at the server if i do a federated broadcast they're now placed at all the clients so that's why i call it server weights at client we're then going to use our client update function above to let our clients compute their updated weights and remember that this client update functions a tf computation so i'm just going to say my client weights are going to be tff.federated map of the client update function and the specific input there's two inputs there's the federated data set and this is the server weights at the clients i don't want to just put in the server weights there because these server weights lie at the server they don't lie at the clients i need to pass in these ones that were explicitly placed at the client and again federated map just applies this update function now the third part i'm going to take a federated mean of these client weights and this is pretty simple too i am just going to say mean client weights equals tff.federated mean of my client weights great that's all it took to compute the federated mean of my client bytes and now what this is doing implicitly is it is taking something was placed at the client and now placing it at the server that's part of what federated mean does last but not least the server is going to update its own weights it's going to use tff.federated map of the server update function on the mean client whites i'm just going to return my updated server weights uh oh i didn't sorry i did not hit that collab button great let's do that that was a type signature error i hadn't instantiated these yet but these are just the federated types corresponding to the inputs of the next function okay but long story short i just want to emphasize something here our next function really if you talk about the orchestration logic of my federated algorithm kind of what is happening how is data being passed around in my system there was really only four lines and they corresponded exactly to the four elements of an fl algorithm i broadcast i do client update i do my client i take a federated mean and i do my server update and so this is part of the power of tff is this compact expression of what is kind of complicated uh distributed logic pretty easy so now we have our federated computation for the algorithm initialization and for the next function and so to actually create my federated algorithm i'm going to use an iterative process i'm going to specify specify my initialize function and my next function okay so that's it we now have our federated algorithm this is something that we can initialize it and then we can run next function in order to update the weights we can do that repeatedly or iteratively so let's look at a type signature of both of these functions the initialize the next functions we look at the string of the federated algorithms initialize type signature we see that that does not take any input the initialize does not have any input whatsoever which makes sense and it outputs this at server and remember that this is the type signature without placement of my model it has a dense weights and it has the bias units and it's placed at server if we look at the string of the next type signature it looks a little more complicated but let's digest it it takes as input two things this is the model weights placed at the server and this is the client data sets remember my client data sets have 784 uh dimensional vectors and one dimensional labels and they're placed at the clients and what it returns is now this which again is a new model weights placed at the server so again this is just saying it accepts a server model client data and it returns an updated server model and i can reason explicitly about where things should be placed in order to use this so let's actually evaluate it now and we're going to evaluate it in a really similar way to how evaluation was done in previous tutorials i'm going to do a really simple form of evaluation where i do centralized evaluation so i'm going to take a single data set that i'm going to use to evaluate and i do this by taking my test data set and creating a tf data set from all clients i'm just going to take a thousand elements to make evaluation faster but in general you would want to use the whole data set and i'm going to apply my preprocessing function to this test data set great i've written here a very simple function that all it does is it takes the server state which is these model weights and it uses keras to evaluate on the test data set if you're used to tf keras this is going to look really really simple i create my keras model i compile it the only difference now is i'm going to set the weights according to what's in my server state and then i'm going to evaluate i'm going to use keras model.evaluate so this is my evaluation function so now let's initialize our algorithm and evaluate on the test set again we're going to call federated algorithms that initially initialize and see what the evaluation gives us okay so we have a loss of 2.3 and an accuracy of 0.1 uh note that there's only 10 labels so this is telling me that i'm doing no better than guessing at random so let's train for a few rounds and hope that things change so let's do a couple of rounds let's do 15 rounds i'm going to say server state equals federated algorithm remember this is our algorithm represented as an iterative process i'm going to call my next function and i'm going to give as input my server state and way above this was a long time ago hopefully you remember but we defined a federated train data this is a list of tf data set objects and so now i'm going to be running what we defined above those four elements of my algorithm they're going to be run 15 times my collab is acting pretty slow right now so hopefully this doesn't take too too long great so it ran 15 rounds of my algorithm and let's just do some evaluation all right karis all right great okay so let's compare i had a loss of 2.3 when i just initialized now i have a loss of 2.16 we had an accuracy of 0.1 now i have an accuracy of 0.25 so i'm doing better than simply guessing at random um but again it's worth noting that that i've only done 15 rounds and on a couple of clients i've barely seen any data i don't think i've done a single epoch yet if you think in terms of centralized learning so you might have to do hundreds or thousands of rounds to actually get something comparable to centralized accuracy but that's it we've implemented federated averaging and we've evaluated and we see that it works it actually did increase our accuracy um and the way that we created it this algorithm i just want to like stop and reflect i want to give you the afternoon school special viewpoint of what we've done we implemented federated averaging by combining pure tensorflow code this is for the client server updates with federated computations from the federated core of tff so the pure tensorflow code was used for the things that you think it's going to be used for it's used for training models using gradients and back propagation tensorflow is great at that the federated core for the non-tensorflow aspects they were for telling uh the federated computation well where is data placed how should it be placed after this uh transformation um things like broadcasting aggregation federated means that kind of thing um but to create a more sophisticated algorithm we actually have a lot of really useful tools now uh we can just simply alter a lot of what we have above to make it more sophisticated so for instance if you want to just edit the pure tensorflow code above you can change how the client actually performs his training if you're interested in adding in a weird regularizer or if you're interested in doing uh gradient clipping you can implement that directly in your client update function no need to play around with the tff code just change your tf code if you wanted to make larger changes you might have to actually change though the tff code so for instance we could make the server not just store the model weights but also store more than that we can make it store for instance the client learning rate and make that decay over time so maybe the hundredth time i call next on my iterative process i want the client learning rate to decay by a factor of one half this is going to require some changes to what we did it's going to require some changes to some type signatures but it's really not so bad once you understand what's going on and so kind of a harder challenge for you if you're interested implement federated averaging with learning rate decay on the clients uh and so if you're interested in learning how to do these kinds of things i'd recommend checking out this is uh you know the more sophisticated version is the tff.learning.build federated averaging process uh or you can check out various research projects using tff uh in particular uh we also have an example section on our github that has something called simple fed that average which is a very simple implementation of federated averaging which i recommend checking out okay and with that i am done so let's move on to dory questions okay can you please elaborate on asynchronous communication with the server it seems that the custom model yet imposes the synchronous update you are right that what we wrote uh it was very synchronous both in the tff learning api and in the algorithm that we just wrote um i think it's a really open question of how asynchronicity should be used in federated learning you could try to simulate asynchronicity using a custom iterative process uh but i i suspect that a lot of thinking needs to go into it um i don't have any references offhand but um yeah i i encourage you to definitely think about this and do some digging maybe somebody has a better answer than i do is it possible to define different classes of clients instead of just clients that might treat their data local their local data slightly differently maybe a set of two clients okay uh yeah so this question is asking about like well what if we have different tiers of clients like maybe we have some clients that have very powerful devices that we can do lots and lots of sophisticated things maybe we have clients that have very minuscule devices that can't do very much um i think this is a really good question you could definitely do this with your own custom iterative process so for instance instead of just having indicators for the client data you could also have as input to the algorithm uh some kind of let's say integer specifying zero one or two kind of levels of your clients and you could have the training loop depend on that input so yes that definitely is possible within tff you have prevent presented us fl as a way to train models on edge devices in an industrial context where we have a batch of similar machines and different plants which would train together a federated model how would you imagine specializing each model to fit on a particular plant yeah so this is really a question about cross-silo fl and how cross-silo fl might differ from cross-device fl um i i this is a very good question i it's hard to come up with a short answer to it um i suspect that one thing that might happen is that you would want to do more local training in a cross-silo setting than you would in a cross-device setting just because cross-device you have so many devices and they're pretty lightweight cross-silo you would probably want to do something more sophisticated uh there's definitely so much active research on cross-silo fl right now i would encourage you to uh [Applause] [Music] do do kind of a search on on google scholar or whatever uh and dig up some references i'm sorry that i don't have a good one for you offhand can we do a weighted average by knowing the number of client update samples absolutely this is a really good point that i wish i had talked about uh mona thank you very much um i did a federated averaging that was purely uniform uh i did not specify any weights however the tff.generated meme does accept a an argument that dictates how much each client should be weighted during the averaging so one thing that we could have done and you'll see this if you dig into some of the examples we have in tff we could have also computed the number of examples seen by the client and used that to say okay this is how much my client is going to be weighted um there's lots of other interesting ways that you can weight clients uh and so for an example on how to do this uh definitely check out simple fed average which is in the example section on our github all right federated learning seems like it will almost always be heterogeneous hardware is there a way to scale the steps batch size by types of connected clients yeah so so this was touched on in a separate question um you could imagine uh an fl setting where clients don't just have a data set they have some kind of value that indicates how powerful their hardware is and change the training loop based on that um you can definitely simulate that in tff right now uh by making the client update function depend not just on the client data but on some extra parameter that is essentially how powerful the device is um so i i think there's a lot of active uh research on how the number of steps and batch size influence things um i'm going to be really self-aggrandizing here and say that i have a paper related to this called on the outsized importance of learning rates in uh local update methods but but honestly there's so many great references on this i would actually what i actually recommend doing is checking out the survey paper we put out last year called advances in open problems and federated learning there's definitely discussion of this kind of thing can you please elaborate on how one can modify the data while the model is trained eg when generating adversarial samples one need to use a currently trained model to find an adversarial sample how can one implement adversarial samples on the fly for the federated learning setting yeah this is a great question um and so this kind of adversarial training cannot be done directly in the tff learning api instead you would have to use kind of the methodology i presented to make your own custom algorithm so what we showed above in our client update if i go all the way no i'm looking at the door i can't present right now we had this pure tensorflow code that was just pretty simple it was computer gradient use that gradient to update what i could have done instead is i could have said i compute an adversarial example then compute the gradient with respect to that adversarial example so you could just modify the tensorflow code in there in order to do adversarial training but it does require implementing your own custom fl algorithm yeah thank you is there an abstraction in tff for sending data between clients uh yeah so this is a good question um so what i didn't mention is that there are different placements you could define a different placement for instance in tff instead of just clients or server however currently server serving clients are kind of the things we focus on and clients are treated uniformly so the reason that we don't necessarily want to send data between clients is because that can violate data privacy um but this is definitely something that that could be investigated um in the future uh it might just be difficult to reconcile it with a lot of privacy issues is it possible to do client weight averaging on the client side without aggregating them on the server hmm okay i have to pause and think about that a little bit um i'm not sure that this is directly possible in the framework that i've presented in particular we the way that i presented the algorithm is that it works server goes to client clients do update clients go back to server server updates so you would need some methodology to have all the clients know about all the other clients models in order to do that um and you could definitely simulate that in tff but i i'm not sure why i'm not sure how that that fits in with with privacy issues um yeah yeah i think i would need to think more about that um and as far as multiple servers how would they work in an fl framework uh this is a good question uh there's already some support for uh kind of hierarchical aggregation in tff um i might ask for keith to jump in and talk a little bit about that oh sorry i was spacing out again on a different door question what was the question no no no worries uh it's about multiple servers in fl and i i wanted to see if you would touch on kind of hierarchical aggregation in tff yeah tff does have hierarchication um as a feature it's not one that i know super well it's actually i think one we're probably going to be hardening in the immediate future but i think that the conceptual model is really single server and it's like you know um kind of an implementation detail that the server can be literally distributed across multiple machines in order to you know reduce like you know avoid let's say linear cost of scaling or whatever i don't know if that answered the question maybe i i think so um cool yeah thanks any work done on detecting blocking malicious clients thanks uh yeah this is a this is a good question um there's a lot of active research uh on what happens with malicious clients um we have in our research repository two examples that might be relevant uh one is robust aggregation so this is using instead of averaging using things like medians to help negate the influence of malicious clients and the other is a library that talks about how to actually do this malicious client attack and uh defenses against it stemming from differential privacy i don't think that we've done any work on detecting mainly because detecting might violate some some privacy issues i think it's a really good question uh if you're interested in this by the way the advances in open problems and federated learning paper we put out last year has a really really good section on this is it true to say peer-to-peer type of distributed learning isn't possible in the current version of tff uh will there always need to be a server trusted central aggregator is this on the roadmap uh okay this is out of my wheelhouse uh is somebody on the tff team maybe you want to comment um yeah uh it so tff kind of um i guess uh you see the the broadcast in this in this actually what's on the screen right now right that delegates internally to a tff in uh construct called an intrinsic that has a particular type signature that says okay take whatever is at the server and put it on the clients essentially um tff does not currently have any intrinsics that have a client's like clients on the left clients on the right besides essentially mapping a function so there's basically no client to client communication intrinsic but tff is designed to express such a thing so tff can express this it's not like a fundamental limitation of the type system or the design or anything um whether it's on the roadmap or not i actually i don't i don't know that anybody knows the answer to that it's desi tiago is designed such that it could be on the roadmap so i guess that answer is is no it's not on the immediate roadmap um but yeah it's something that i think we'd be open to thinking about i mean i would like to add here that uh this is not what we necessarily refer to as a federated setting because commonly what we refer to as a federated learning type setting is one where there is a central orchestrator so this orchestration back and forth between a service provider and clients uh really is one of the defining characteristics of learning i think this is a fully decentralized learning setting and it's an exciting one but it's not exactly federated learning thank you both is it possible to run training slash inference of multiple models on the same client yes definitely and this is part of the advantage to having defined our own iterative process i showed you how to train a single model but you could easily take what i did and just slap in a different model maybe there's another model you want to train simultaneously so yes definitely possible is it possible to find different cost functions for different clients an attacker performing poisoning backdoor and fairy learning um so i'm assuming i'm going to zoom in my answer this question uh by cost you mean loss function um and so this is this is a good question um i think it is technically possible um i think it's a little you would have to be a little bit creative in how you do it however uh we talked before about how to identify different clients and say maybe if the client uh has a one associated with them do this training loop and if the client has a two associated with them do this training loop you could imagine doing a similar thing but for the models or for the loss function so this is definitely possible um but for maybe more tff-friendly ways of how to do this i would recommend checking out the research repository and examples of how to do backdoor attacks that are in there i've noticed tff uses used grpc can we extend this to production will there be support for other protocols like mpi uh yeah uh would someone from the tff team want to comment on this i can maybe jump in on this um yeah this is zach from the j5 team of the other zach can we extend this to production um i think at this point tff is kind of a framework on top and people can implement additional systems underneath and combine them with other technologies uh i think there was an earlier question about kind of verifiable computing um yes but i don't know if the tff team has that and on the immediate roadmap but can we like we as a group of humans yes we can uh will there be support for the protocols like mpi that's also uh probably not on the near-term roadmap but definitely the execution stack allows alternative implementations of the protocols under the hood and that's very encouraged [Music] it's probably a bit off topic what are your thoughts on fault tolerance of the tff server i remember you had a multi-tiered hierarchical aggregation network structure in production as explained in the towards federal learning at scale system design paper uh yeah i might also need to phone a friend for this one uh keith or other zach man you guys are asking some wild systems questions here um uh yeah tff is not what runs uh you know production federated learning at google currently i think that um the pushing fault tolerance of of the tff native execution stack is actually something that is on the immediate roadmap i don't know that i personally am going to be working on it but i know that people are looking at it yeah that's pretty much all i got for that one great uh does it need to load all client weights into memory to calculate the mean yeah this is a good question um i the short answer is no um but for the longer answer i might still defer to uh keith uh but but yeah like on a short level we have different executors that that uh will execute this code in different ways uh and one of the ways is to do more uh intermediate aggregation to avoid exactly these problems uh yeah definitely not always um if you are running on a single machine i think that this might happen by default but i think we should consider it an implementation detail uh it's certainly not part of part of any contract and if you run into distributed setup especially with intermediate aggregation then yeah there's gonna be there should be no true linear scaling pretty much anywhere that might not be literally true but if it if it exists it's something we'd like to root out and get rid of the computation protobufs don't seem to restrict the placement to clients and server is this restriction something that is feature of the python api and the simulation runtime uh this is a very detailed question that that might go better back to keith or zach i was very excited to see this question of the door this is an excellent question um somebody really went deep into the interns with tff to you know figure one stuff out i'm very surprised that somebody obviously figured this out with some speed um yeah tffm is not this tff's designed to potentially extend to other placements um essentially the server clients thing yeah basically is a feature of the current python implementation um there's another comment on there from from somebody else i believe who mentions that there are implicit dependencies other places in the stack that's also definitely true so uh tff can be extended to other placements i think that there is probably a long-term design to do this i don't know if there's any concrete plans to do it i think that there's probably not um not currently but i think that probably it will happen uh and we'll see a lot of my codes you have to be rewritten when we do this i think so um but yeah amazing question could you elaborate on what makes tf.federated incompatible with e or execution what subparts would be compatible yeah so i'm not going to go too too much into depth in this um but what i want to say is that uh when the python interpreter encounters a federated computation decorator the function is traced and serialized for future use um so so that is fundamentally why federated computations are non-eager but you you can do some execution in an eager manner um so i i don't want to go too much into the weeds here there's a great tutorial uh on the tff tutorials web page uh i believe it's federated core one um that that talks more about that in depth uh if there are any comments from the tff team that you wanna interject now please do so but uh yeah oh go ahead sorry zach go ahead i might jump in real quick on that question and mention that um in in eager mode there's kind of differences in semantics you get kind of python code order semantics and the same thing happens when you wrap in a tf function decorator and that's still completely compatible with tff federated you can call tft functions inside of your tff.tap computations and you get the kind of code order semantics you expect one thing to think about with kind of graph mode versus eager mode is these computations run out on devices um and eager mode kind of assumes that you have some global context locally and so this there's kind of a conflict of concepts here and that's why we typically run in what we call kind of not eager traced and then execute later semantics could you please elaborate what tff.tf computation decorator does under the hood to convert python code into tf computations uh this is a phenomenal question that would require another half hour tutorial at least but maybe zach or keith can give like a very short answer um yeah i actually responded to this one as well another great another great question um i think as we kind of just mentioned in the answer to the previous one uh you know so tff is designed to whatever serialize everything immediately has got to run in this cross-platform way uh which means that we need to make all the logic independent of python immediately so it's effectively what the tfftf computation decorator does is drops into a graph context records all the tf operations that are happening in this graph context uh computes bindings into and out of this graph in order to be able to run the resulting thing as a function and then just like spits the resulting serialized object out to tff and actually this is what's happening when you invoke your tf computation it's effectively uh actually tff is importing it into the eager runtime whatever blah blah blah but you can imagine that tff is actually sectioned out running this graph um yeah i left a code pointer in the comment if any if anyone's interested there okay that's the end of the question so with that i'm going to pass it back to peter to give maybe a couple closing remarks uh thanks uh thanks zach and thanks everyone who actually attended today's tutorial um we're almost done um hopefully you know you thought this is very interesting and hopefully you'll look at the links that we've provided throughout the tutorial please continue to check our website the tutorials website we're going to hopefully post some additional information and more wardlings i would like to uh thank uh all the presenters emily nova wacom and zach they worked really hard to put all these tutorials together hopefully you enjoyed them i would like to also thank chris keith rush and um zach garrett for uh participating as well in q a and introducing some of these uh concepts uh in addition i would like to thank the university relations team at google who have helped us tremendously the events and production team who are behind the scenes making this possible i hope you enjoyed it and have a great rest you
Info
Channel: Google TechTalks
Views: 27,380
Rating: undefined out of 5
Keywords:
Id: JBNas6Yd30A
Channel Id: undefined
Length: 193min 49sec (11629 seconds)
Published: Fri Jun 04 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.