Fairness and Robustness in Federated Learning with Virginia Smith - #504

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] all right everyone i am here with virginia smith virginia is an assistant professor of machine learning at carnegie mellon university virginia welcome to the tuomo ai podcast thanks thanks so much for having me hey i'm looking forward to diving into our conversation we are going to be focusing on federated learning and some other topics but before we do i'd love to have you share a little bit about your background and how you came to work in the field yeah absolutely so i think for me i always enjoyed math i always wanted to take as many math classes as i could but i think one question i had is you know how can i really put this math to use and uh just before my my senior year in undergrad i took my first first computer science class and absolutely loved it um and that's kind of what i wanted to focus then on in my phd something at the intersection of computer science and math and i think machine learning was a really natural fit um in my phd there was around the time i started a lot of excitement around big data and you know deep learning was also taking off and so there was a lot of focus on how to make models more accurate and how to make them more efficient and that was kind of what i focused on in my in my phd is is techniques for distributed learning and distributed optimization so taking a lot of the machine learning methods we knew and loved and the in the small scale setting and getting them to work across you know large data centers and massive amounts of data um since then in my research as well you know as a lot of other researchers i think we've realized that you know big data is not just big it's also very complex and there's a lot more to the picture than just efficiency and accuracy uh so a lot of my recent work has been focusing on other constraints as well things like robustness and fairness and privacy and um one application that i think really makes these these points salient and and grounds these ideas is the application of federated learning uh where the goal is to go beyond the data center and and train across networks of remote remote devices or across uh you know private data silos like across different organizations um and i think that this is you know a really exciting and uh ongoing area of research awesome awesome and so you think of federated learning as an application that kind of grounds your research is there an application of federated learning that you like to think about when you're you know thinking about your research yeah so i think there's one thing to note is i guess there's become a kind of important dichotomy in terms of the applications of federated learning um so there are applications in cross-device federated learning where the goal is to train across a large network of remote devices and then there are also applications in what is known as cross-silo federated learning where the goal might be to train again in a privacy preserving way but across say a group of you know 10 organizations or you know could be hospitals or financial institutions um and so i i've done some work on both types of applications but more of my work tends to be in the cross-device federated learning setting okay and the um the main distinction between those is uh kind of a one-to-many set of concerns about privacy versus a few to few is that a good way to characterize it yeah so there can be differences in terms of what you care about from a privacy point of view i think a major difference is just the scale so in the cross device setting you're talking about maybe uh you know thousands to millions of devices that you're learning over each of those devices could be really you know constrained from a computational point of view in the cross silo setting it could be like a like i said 10 10 hospitals that you're training over and you might have more compute power at each of those hospitals but there could be similar concerns about uh you know not sharing private information across the organizations or across the devices so in that way you know they're fundamentally distributed learning problems it's just a difference in terms of uh the scale and then as you mentioned the the kind of privacy characteristics right right and how much of your research is focused on the um or even you know beyond your research you know in terms of where the field is today the distributed learning aspect of these problems uh relative to the privacy aspects of these problems i've done a number of interviews on uh privacy preserving machine learning differential privacy techniques like that and i'm curious if that's kind of the bulk of uh of your research uh versus you know are we still trying to figure out better ways to to do the core learning itself across devices yeah that's a great question it's really it's both and i think in federated learning privacy is really a first-class citizen so it's it's one of the main motivations uh for performing this you know distributed learning problem you don't want to move all of the raw data that you have from these user devices to some central location there can be some you know downstream privacy benefits for keeping that raw data local and so this is a privacy is a really important consideration and i think a lot of the exciting work in federated learning is thinking exactly about this so how do we take the the privacy you know notions that we've thought about in simpler centralized settings and and understand them in this distributed learning context um but certainly my work uh focuses on on both problems and it sounds like they're you know from a practical perspective fairly tightly intertwined yes yes they they can be very related so i think you know as i mentioned privacy helps to motivate why we would want to to perform this distributed learning problem so why we want to you know keep data on these devices as opposed to moving them but it also makes it difficult to perform the distributed learning because you want to make sure that the information that you do send over the network doesn't reveal any any sensitive information one of the areas you've been focusing on from a research perspective is fairness and robustness you've got a icml paper on that topic um let's start with uh what fairness means in this context because i think it's different from the type of fairness we think about from a ai ethics perspective yeah that's a that's a really great point and i think this goes back to this this earlier point i mentioned which is one thing i think is interesting to me about federated learning is it helps to kind of ground these these notions in a specific way and certainly i should say there are multiple notions of fairness that you could consider in federated settings one of the notions that we've been looking at and that we touch on in this work is related to the idea of representation disparity and the idea is basically that if you have a network of heterogeneous devices so different user devices might be generating data that looks slightly different across the network you could imagine you know a network of mobile phones people might be interacting with those phones in slightly different ways and for that reason the data might look slightly different across the network but you want to train a model that performs ideally equally well across these possibly differing diverse devices that you have in the network and so this is related to to the idea of representation disparity we want to i think a good way to phrase it is at a high level you want to ensure some reasonable quality of service across the entire network so you want to train a model that performs reasonably well across all of the different devices and so the premise is that if you apply um if you plot apply distributed or or federated learning techniques without considering the specific needs of fairness it's likely that you're going to run into problems where uh the results aren't fair in that way what are the particulars of the failure modes and why do you see them when you're not worried about them so what can happen is typically when uh we're training a model in a federated network one of the most common objectives to consider is just traditional empirical risk minimization and with that objective typically you consider just minimizing an average notion of loss so you're trying to minimize the average error across the different devices in the network and the concern is that if you just look at the average performance it could be that you perform quite well on average but at the expense of performing maybe very poorly on a small subset of the devices and so you can have situations where you know if you have a small set of devices that differ in in some way um then you can have a model that performs well on many of the devices but could perform catastrophically on on some of these devices and this is why you would care about looking at alternatives to empirical risk minimization and encoding this kind of notion of fairness for federated learning and when you're thinking about fairness in this way is it um independent of the you know what what's the uh the relationship between the the model or the thing that you're trying to optimize across these the different devices yeah so so the uh the issue is that if you're training just kind of one model to perform well across all of these devices um and you have differing you know data coming from these different devices and the data might differ in in some meaningful way um then there can be limited capacity for one model to kind of capture all of this diversity and this is where you can can have issues with um with fairness being a concern and this i shouldn't this can particularly happen because you know in in federated settings we're thinking about training models that we can deploy often on device that can run very efficiently and perform often you know kind of real time machine learning um and that naturally limits the the types of models that we were able to deploy in these settings and so this is a scenario where um even you know if we have expressive models there can be a real limit to how um how just a single model can capture this uh entire realm of diversity across the network and now fairness is you know just one of many attributes that you're looking to balance when you're training a model or you know federated or not can you talk about some of the other uh the trade-offs that you're making in particular your work focus is on a trade-off between fairness and robustness yeah so robustness is another really important concern in federated settings and the idea here is that because you're using user devices as a computing substrate there can be practical issues that happen with these devices someone might turn their phone off or you could potentially have an adversary in the network and so we want to develop models that are robust to things like device failures or possibly to corrupted data and what's interesting though is that if you think about the issue that i just talked about with fairness which is that we want our model to fit well possibly to diverse or heterogeneous looking data this can be directly at odds with this issue of robustness so a common way that people handle robustness is they look at that diverse data or you know the the outlier data that they're seeing and they get rid of it right so that could be data coming from a corrupted device or a device where there's been some failure and so an easy way to to think about encoding robustness is just to say let's ignore that information and the reason i'm saying this can be at odds with fairness is from a fairness point of view if that data is actually you know just coming from a device maybe that's generating some different looking data then that's exactly the you know the the device that we want to update that we want to ensure that our model fits well too um and so this is why these these two notions can can be at odds uh in federated learning um and so the a big part of your research and this paper that i refer to the icml paper is looking at the trade-offs and how to ensure fairness while managing robustness kind of walk us through the uh the approach that you take yeah so one of the insights that we have in this work is that if you're training as i mentioned just a single model across the entire network there's limited capacity for this one model to be able to both ensure fairness and ensure robustness simultaneously and uh one of the the techniques that we propose to help address both of these constraints is something called multi-task learning and the idea is basically that intuitively if you have data that differs across the federated network it makes sense to not just train a single model but possibly to train multiple models so to personalize the model to the local data and multitask learning is one way of doing personalized federated learning the idea is that you're just solving multiple tasks you're solving for multiple models simultaneously and uh this is something that i think again it's intuitive but it what we've seen is that it's it's actually quite powerful on how this simple technique we're not trying to do anything specific regarding fairness or robustness we're just implementing actually a very simple multitask learning framework and there's multi-task learning always denote two models as opposed to a single model that's trained to to do two things yeah so thanks for bringing this up so multitask learning has many meanings for for different applications so you know i think more commonly in deep learning people might think about multi-task learning as learning across actually very diverse tasks like you're training uh you know some nlp model simultaneously with an image classifier um here the the notion that uh i'm referring to for multitask learning is that we can view each device as being its own learning task um and so the overall um learning objective can be similar between them you could still be training just a single image classifier uh but the notion of a task is with with respect to the local data set on the the individual device so you're still trying to train an image classifier but now you have multiple different devices that are generating data and you model each of those devices as an individual task okay um and so then you are i'm trying to put the pieces together i was thinking about it in the kind of the way i traditionally think of multi-task learning where as you might have you know one objective function that's focused on you know fairness and another that's focused on robustness and another that's focused on whatever your core task is and multitask is you know the way you are kind of optimizing across these three objectives but it sounds like that's not really what we're talking about here no yes and that's so what we're what we're showing here and and i should say one of the the reasons that we look at multi-task learning in particular is that this is something that's been shown to improve just the accuracy so forgetting about fairness and robustness just learning an accurate model in federated settings multitask learning has and other forms of personal uh federated learning have been shown to really improve just the the raw accuracy and the reason is exactly kind of this this point that we mentioned earlier which is that the data might differ across the network and so learning models that are personalized to each of the individual devices can help to improve the overall accuracy um and so but but what we show in this work is that there are also important benefits in terms of fairness and robustness and especially when you care about both of these things simultaneously um so basically what's going on is that if you're learning models that are personalized to the individual devices then those models have more capacity to uh to learn to the heterogeneous data right so you can learn models that are more fair to data that looks diverse and you can also separate this kind of tension of having just a single global model that you're learning which helps to deal with issues like robustness right so you can learn a separate model for all of the corrupted data in the network for example and then that corrupted model doesn't affect the other parts of the network where you've learned other personalized models okay and and is there something as uh simple as like a hyper parameter that you can a dial that you can tune that weights the locally learned uh model or the you know the model trained on local data versus the uh the centralized uh yes yeah i'm imagining there's multiple ways to do that you can kind of tune that model at inference time as well as that training time um yeah yeah so we actually in this work i think this is a much uh broader kind of research direction which is looking at all the multi-task learning for federated learning or other forms of personalized federated learning but in this work we actually look at a very simple objective which is similar to what you're saying so basically what the objective does it's it's a simple form of multi-task learning where there's there's basically two tasks there's a global model so there's the model trained across all of the devices and then there's a local model so the model that's personalized to the local data and there's a simple hyper parameter that you can tune to adjust how much you want to rely on the global model versus just your own local model that's just fitting to local data and i think you know the the tension there is that the whole promise of federated learning the reason that we we care about doing this is that um you know ideally we're getting something from sharing all this information across the network we would hope that the global model is providing some some useful information but we also want to be able to trade off between learning just that one global model and learning more personalized or kind of local behavior on each of these devices um and so this is exactly what you can do with this with this hyperparameter nice and to when you've got this hyper parameter um does the are you is the the implications of the local data confined to the local model which is trained on the device and it it stays on the device and that's how you kind of ensure this separation between the local data and the central data or um you know what ways do you uh in what ways are you kind of leveraging the the local data and creating the the centralized you still are you still sending the data are you sending weights like how is the centralized model trained yeah yeah so so there's two parts of training this this multi-task objective so there's the the global component and then there's the the local components so the local component and actually this hyper parameter that i just mentioned is trained completely you know in a distributed fashion on completely local data so ignoring the information from all of the other devices and you can tune lambda just looking at local validation data so that's all happening locally what happens uh you know across the network where you end up sharing information is when you're training the global component of the multi-task objective and what you can do here is you can apply basically a bunch of work in in federated learning to think about how to train this global component but what you end up sharing is is kind of exactly what you're i think you alluded to you end up sharing model updates that are curated based on the local data so you're trying to basically you're trying to find one global model by uh aggregating a bunch of smaller model updates from each of the devices got it got it very cool um and how do you what's the um you know what what types of data sets do you evaluate this on um and in fact what you know talk a little bit about evaluation of federated learning in general what are the kind of the standard benchmarks and metrics that you're looking at so this i think this is a really important problem um federated learning is very much an ongoing area of research there's a lot of new applications coming out and i think as such it's really critical that we have a reasonable set of benchmarks to look at so this is actually some some motivation for me and as well as some collaborators at carnegie mellon and at google uh we came together and we created uh something called the leaf benchmark which can be used for um evaluating federated learning on a common common kind of applications that you would see in practice of federated learning um so it includes a suite of open source data sets that you can use for evaluation as well as uh complementary sort of metrics that you would care about so you could you know validate things like looking at the average accuracy across devices or you could look at notions of of fairness for example as well so that's kind of part of the the benchmark that we developed in terms of of evaluating what performance looks like looks like we're trying to simulate what performance looks like when you're actually running this on say a network of mobile phones there are a couple strategies here so i think one of the most common ones is to train this in something like a data center setting but then simulate what the performance uh might be if you were running it on a device so you can you can think about gathering kind of the raw metrics from training this in a data center and then you can scale those in various ways depending on what sorts of constraints you want to add to that uh add to that training process um another one i should mention is that there's also some a few actually benchmarks that have come up from from other groups one from google is tensorflow federated and the goal is is to make this um i think you know easier for people to actually run on devices so they provide some some tools that you could potentially run run these techniques on device as well maybe even more fundamentally what's the is there kind of a wallops a well accepted metric for fairness in a network or robustness in a network uh you know a la blue score for uh for these types of metrics or um or is that still evolving yeah i think there's still a lot of work to be done to make this uh more rigorous and and to evaluate a lot of different metrics for fairness i think it's uh i think there's more of a clear answer here right now and that a lot of the work in fairness has focused on this um this notion of representation representation disparity that i mentioned and so the goal is to try to ensure a more uniform uh performance across the differing devices so you could measure this by looking at say the variance of the test accuracy distribution um or you could look at the worst performing accuracy so you could look at like the min the minimax performance try to find the worst performing device and make sure that's above some threshold so those are those are two common metrics for fairness okay for robustness there's a lot of different things you could think about so you could could look at robustness to device failures as i mentioned so you could see what happens when devices drop out of the network or you could look at all sorts of different attacks and i think a lot of the attacks here mirror what you see in centralized settings you can look at traditional kind of data or model poisoning attacks but just as applied in the federated setting got it got it awesome uh and then uh separately you've got another paper at icml that's focused on unsupervised or federated learning in a more of an unsupervised setting can you tell us a little bit about that paper yeah so the motivation for this paper was actually that uh i think there were two key motivations one is that in practice for a lot of these federated learning applications you don't have labeled data and i think for that reason we wanted to to spearhead some work in unsupervised federated learning specifically looking at this idea of clustering and federated networks but a second major motivation for the work is that you know so far a lot of the problems i've discussed revolve around this issue that data is diverse in federated networks you have this issue of heterogeneity that the devices might be generating differing data and this can result in a lot of problems it can it can break the assumptions that we have for traditional distributed optimization methods it can result in issues of unfairness it can make it difficult to provide robustness but what we show in this work is that there for a certain set of problems there can actually be benefits of of heterogeneity and i think intuitively clustering is one where diversity can be beneficial and what i mean by that is the method that we propose in this work which looks at federated clustering is a simple one-shot clustering scheme where basically what you do is you uh cluster locally on each of the devices and then you aggregate that clustered information to form one global clustering of the data and um intuitively if you have data that's diverse across the different devices this can actually make that method more effective right so if you have uh some diversity if you if you already sort of have natural clusters that form on the devices it can be easier to do this in a totally distributed fashion and this is what we're we are kind of making rigorous in in this work is is the benefits that might exist of clustering in a federated network specifically when you have heterogeneous data uh and so meaning the paper is not specifically focused on the techniques but looking at um kind of performance bounds as a more theoretical paper is that the idea yeah yeah so it's it's it's we do propose this this one shot communication scheme it's it's it's basically just a distributed version of lloyd's method which is a very common method for k-means clustering um but then the i think that the meat of it is is really analyzing the performance guarantees for that method and showing in particular that this issue of heterogeneity can be beneficial for the analysis okay and what's the uh can you kind of summarize the the intuition around how uh this method um you know makes heterogeneity beneficial or kind of unlocks the power of the the native heterogeneity in the data yeah yeah so i think that the main idea is basically that if you if you want to do this this kind of simple very communication efficient type of clustering which makes a lot of sense in federated learning you know if you're training across a million devices it makes sense to try to reduce the communication as much as possible yeah so this technique that we're looking at is is a i think a really simple heuristic for how you might want to perform clustering in practice basically you can cluster your data locally on each device send it to some central server and then you can aggregate those those local clusters into one global clustering and the reason i'm saying that heterogeneity can be beneficial for this process is that in clustering the goal is basically right to to split your model into separate to split your data into separate sections right to split it into these separate clusters and if your data is heterogeneous in some way it's already it's already been distributed based on these clusters right so you would imagine that some devices might only have data from a small subset of the total clusters um and given that then it helps to to make this process more decoupled it makes it easier to distribute the clustering across the devices and so this is this is the way that heterogeneity can can benefit this this analysis specifically look at this idea that each of the devices only has data from a small number of clusters which would it's it's an intuitive way to think about how the data might be heterogeneous right there's a small number of clusters that belong to each one of the devices basically um and uh what we show then is just that by by performing this one-shot clustering scheme um with this heterogeneity assumption uh you can show that the the results are are basically better than if you were performing it on like totally randomly iid partition data okay okay um when you how do you characterize the heterogeneity of your um your data in this uh what's what's the assumption you're making in the paper yeah so in this paper we're making exactly so we're making this assumption that each device contains data from a small number of the underlying clusters so say that you know that all of your data is coming from i don't know 100 different clusters then the assumption could be that every device contains data from only three of those clusters and this is this this is one notion of heterogeneity but it makes sense in the in the clustering context if you know if you're if you're if your goal is to perform clustering it makes sense to think about heterogeneity in terms of the underlying clusters um and so this is the this is the notion that that we look at is basically that there is uh there's a small number of clusters that generate each of the the local data sets at each of the devices and is the [Music] is there a lot of uh prior work that has gone into this idea of um thinking like local iid versus global iad in federated environments so i would say that there's a lot of work thinking about this issue of heterogeneity in federated settings so this is really i think a defining characteristic of federated learning compare compared to something like the data center setting and and the reason is that in the data center the idea is that even though you're still solving a distributed learning problem you own and can access all of that data and you can re-partition it any way you want to major difference in the federated setting is that each of these devices say a mobile phone you're generating data on that phone and then you're not moving that data or re-partitioning it across the network in any way and what this means is that you know in the data center even though you were still distributing your data across different machines you could partition that data in an iid manner in a you know independent and identically distributed manner across the machines in the federated setting you're getting the data as is different devices might be generating different data and that results in this you know this issue of non-iid or heterogeneous data across the network um and i think i i mentioned this earlier but there's there's been work in thinking about how this affects fairness and robustness but another major issue is that this can affect um some of the the convergence guarantees that we have for communication efficient optimization methods in federated settings um so one of the the main assumptions that you know is typically made when you're performing distributed computing is that the data is is iid distributed across nodes um and so this actually breaks kind of a fundamental assumption in some of the common methods and analyses that are used for distributed learning yeah and i guess i drew a parallel between uh you know one of these um you know devices or a subset of devices with heterogeneous data and uh you know what i thought it was like local id do you see that you know within one of these heterogeneous segments there there is an id property and do you rely on that or do you assume that id is just broken and it's replaced with this local notion of heterogeneity yeah so i think a good way to frame it would be that each device is generating data in an iid way but according to its own separate distribution and the the distributions can differ across the devices um but each device might be generating data in an id fashion just according to its own unique distribution you know one of the motivations for something like multi-task learning would be that the distributions between different devices might be similar so it makes sense to train them simultaneously and to learn about uh you know how these different devices might differ from one another but they they differ in a meaningful way so it's also used not to just train one model yeah um we talked a little bit about kind of uh you know applications of um this work and federal federated learning generally when you're uh looking at the kind of unsupervised uh setting what are some specific applications there is it something along the lines of you you know have a army of mobile devices and you're trying to segment them by uh type or something like that yeah so actually one this relates to this idea of multi-task learning and personalized learning more generally a simple way to perform multi-task learning is just to first cluster your devices into clusters right so if you knew that there was a natural clustering between the devices then you could learn models specific to each of those clusters so that's this would provide you this this one-shot clustering scheme that we that we look at provides you a simple way to do multitask learning you can just do this clustering procedure and then you can learn models that are personalized to the individual clusters in the network um but beyond that i mean clustering is also yeah as you know so clustering is obviously widely used for a lot of applications and in machine learning just as an important kind of pre-processing tool to understand and analyze uh you know the the underlying data distributions that you have um and so this could also just be used as maybe a pre-processing step to to get a sense of what the data looks like in the network uh does your your first point kind of suggest uh um a hierarchical kind of model tiering where you've got this you know centralized model then you've got this uh intermediate type of model that's based on clusters and then you've got a local device model you know outlaw the first conversation about robustness and fairness and instead of your one lambda parameter now you've got kind of two that you're um balancing across these different models yeah you know that's an interesting point we haven't looked at that but i think that's a very natural way that you could think about applying these things yeah you could have maybe multitask learning happening within each of the clusters as well um and i think and this is something that um we haven't looked at as well but i think it makes sense another benefit of of these multi-task objectives or things like clustering is that you could also help to reduce communication in a meaningful way so you could only in the scenario that you're describing maybe you could have this nice hierarchical structure where you only actually communicate within a small cluster within the network as opposed to you know sending everything to this one central location right right um awesome so what uh you know what are some of the future research directions that you're looking at and excited about in your work so one direction that we i think we started at and then i just want to circle back to is is the idea of privacy so this is something that is really important in in federated settings and in particular you know the common notion of privacy that's that's considered is that we want to be able to train models across these devices without necessarily being able to to know that any one device participated in the training procedure so a common tool to address this is through techniques like differential privacy um some recent work that i've been looking at i think is really important is to think about how privacy then connects with issues of fairness and robustness and personalization so a lot of the other topics that i that i touched on and in particular you know one area we've been looking at recently is defining notions of privacy for multi-task learning so for these for these personalized objectives uh there's a real lack of work understanding how to make those models differentially private and so i think that's a really important area of work to ensure that we can you know simultaneously address all of these constraints not just fairness and robustness and efficiency and accuracy but also the constraint of privacy awesome awesome well virginia thanks so much for taking the time to chat it's been great learning a bit about your research and what you've been up to yeah thank you so much thanks again for the opportunity thank you bye-bye
Info
Channel: The TWIML AI Podcast with Sam Charrington
Views: 487
Rating: undefined out of 5
Keywords: Ai, artificial intelligence, data, data science, technology, TWiML, tech, machine learning, podcast, ml, virginia smith, carnegie mellon university, icml, iclr federated learning, unsupervised learning, one-shot learning, heterogeneity, clustering, ai ethics, robustness, fairness, multitask learning, privacy, cross device
Id: vv8v0fdWBUE
Channel Id: undefined
Length: 39min 13sec (2353 seconds)
Published: Mon Jul 26 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.