Compiling to preserve our privacy - Manohar Jonnalagedda & Jakob Odersky

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

well hello good morning still thank you for coming right before lunch to our talk so yeah my name is Manohar and so I was a PhD student at Martin or Dan skis lab working on programming languages and then after finish my PhD I me ended around a bit did some research with Microsoft India but then you know I decided academia was enough so I would switch I would switch to the industry and you know the main the main reason to switch for the industry for me was well it's basic common sense you earn a lot more money right but you know so I really wanted to find out the true academic that I was I wanted to compare you know do I really unknown II and how does it stand and how does it stack up with respect to sorry with respect to I guess the rest of the people so you know I went and asked my colleague Jakob hey how much do you earn yes sir I had already been in the industry for a while and I was a bit surprised when ma asked me because we never he talked about this in the industry so what we did is okay order of slides we went to talk to our CTO dmitar and asked him like what we should do and he suggested well you know what instead of revealing everyone's salary why don't we just try to get an average so we came up with this basic protocol calculating an average problem here is it doesn't really solve the problem that with that we could meet that we would like to hide our individual salaries either we need to all reveal it or we need to trust the third party to have access to all of our salaries so instead we can understand why you did not want to reveal his salary as well instead he suggested we could come up with a different kind of protocol protocol that is privacy-preserving so with some clever mathematics we can calculate the average without revealing our individual salaries so here's how this protocol goes every one of us we have a secret number we would like to to keep secret which is the number in which are the numbers in red you see here we each also generate three random numbers all numbers must be random but the total sum must be zero we'll get to that in a minute how would we model this in Scala well the first thing we should introduce is two types one the secret value we would like to represent in this case it's just an integer in this example and then the concept of a shared number so the shared number is the list of secret numbers it corresponds to what you've seen on the left here this sequence of numbers and the key part here is in this model in Scala is every element of the list is only accessible by one by the corresponding person then well we have a function that generates these random numbers and we generate three one for the meter one for myself and one for mana half in a table view how would this look like a bit like this we have again the the parties involved in the computation on every row you have the shared 0 so DJ and M are actually all sum up to 0 and our secret data then the protocol goes as follows we each sum up our shares of of 0 so our own share and what we have received from other players this gives us a shared some this value here is published to every other player and the final sum of all values will equal the sum of all our private data then yes here is we can see in call out was I just mentioned as a numeric example it might be more illustrative to see how it works we're getting to the security in a second we see here every row indeed sums up to 0 old numbers here have been randomly generate generated and if we sum up all the the shared sums we arrive at the same sum as if we had summed up our all our data now why does this work the key thing again here is that all the shared zeros sorry all the shared the triplets we generated they all sum up to 0 so that means the total sum gets cancelled out and why is it secure well it's because no player has full information if we look at the point of view of Dimitar Dimitar knows what's in this row because these are the numbers that he generated and he knows what's in the column because these are the numbers that he has received for myself I have a different view of the world and manohar again so unless someone is able to intercept all communication there's no way to define the original secret data how will we model this exchange in Scala well we need to introduce what it means to add numbers first so there's one concept which is adding shares so adding parts adding essentially two numbers D and J what that means it's just an element-wise addition of the individual secret numbers and then we also need to introduce a concept of what it means to reveal a value and that again is just summing up all parts of the of the all individual secret shares then on the right of the table you'll see a one-to-one mapping of the sections of what I just described and how we can model that in Scala so again the green part we generate the 0 shares the red part we model the secret data we have which we would not like to reveal and then finally in the shared some we use the additions we just defined to compute the final sum and reveal it to the world this for the sake of the demonstration here we have addition and secret addition defined on element wise of course with some neat Escala tricks we can make this look much nicer but for the demo we we did not do this so happy as I was to now know the average of the salaries I also learned something new so what you have just seen is an example of what is known as secure multi-party computation so this is actually a subfield of cryptography where the goal is to create protocols or algorithms for different parties to compute a joint function such that without revealing their private data so to illustrate this again what it means is you have secret shared data that is sort of hidden only I know my share but when we join it together we get something we get a result so we can of course go back and forth right the idea is to draw a computer joint function which gives you something that you are looking for all together so well how does MPC work so what we have seen is an example of addition so as we said an average is you reveal a sum and you get the error and then you compute the average by dividing by the number of people well addition is great but it's not enough really is it we may need a bit more so what about multiplication of secret values so multiplication is a nonlinear function how would we do that well I won't go too much into the exact details about it but the general idea is that we can also do multiplication and please come talk to us to find out more but the main idea is that we can do multiplication but to make it a bit more efficient in terms of in terms of performance we introduce a the concept of a trusted dealer here so remember in the earlier example so Jakob Demeter and myself we created our own shares of zero and then we exchanged it amongst each other in this case we rely on mr. question mark here to do that work for us so mr. question mark is only involved in giving the shares to us but when we actually do the computation he's out of the picture so this is what my trusted dealer and what does what do these shares allow us to do well basically for multiplication this mr. questionmark needs to give us some pre computed values which we use for doing something which is called mask and reveal and in the case of multiplication this is known as beaver triplets there's a lot of new words don't worry about it you don't need to know much about the words just as I said come talk to us to know more I just put these here because you know when you see the videos it's nice for you to go and Wikipedia them great so we have edition of secret values we have multiplication of secret values which means we can actually do quite a lot of things because with addition multiplication we can express any polynomial so which means we can do actual serious interesting stuff in particular we can do things like linear regression and logistic regression because you know they can be expressed as polynomials wait how is exponential a polynomial right well I don't know if you remember your calculus classes for me also it was a while back thankfully I also learned that amongst in addition to my average exponential you can you can approximate an exponential function using something like a Taylor polynomial so again back to polynomials so it all kind of works out quite neatly and so these are just mathematical formulas but what you really can do once you have such functions is classification right so I mean I wouldn't want to be one of those Chihuahuas there in public right it would be it would be quite quite unfortunately bit confused to the muffin yeah I don't like to be good clear to be confused with a muffin yeah so when we get those when we get the polynomials then we can actually go to the real world and MPC is actually valid and valuable use case it has value eight use case in the real world let's look at a sort of a general picture of the different kinds of use cases that we we may have here we have different sectors financial services healthcare and genomics digital advertising defense and manufacturing so basically anytime you want to compute something but you're not allowed to share data for various reasons right the data is sensitive they can be legal they can be just computational they can be anything you can use a technique like ours like MPC to do this so let's go into you know one of the bank cases this is a case that we studied with ing example ing is a bank it has very rich customers in some of the richest countries in Europe and what it wants to do is build some sort of credit scoring model on all its clients but the problem is the user data I can't cross a country because you know of various laws in particular I'm sure you have heard of gdpr which was you know there was well maybe one of the biggest things in the last two years in terms of privacy privacy related computations legally speaking of course so if you have something like a multi-party computation engine then you could still build your model without the data having to cross borders let's look at a slightly more fun example let's say you have you know the usual Russians and Americans maybe Chinese now because that's what the world is like currently who have satellites on earth and they don't want to reveal where the positions the satellites are yet they don't want to create destruction in the outer space so you may want to secretly compute secretly detect collisions and this is not an imaginary example we had such a thing happen in 2009 and you know it's just nice to avoid such things because this actually then resulted in the International Space Station having to maneuver because just to avoid and avoid the debris and stuff like that so this is not a joke really it is serious stuff and so of course we also you know we know that computationally we can preserve privacy we can have the security but it's also nice to know that we you know consulted lawyers and Baker McKenzie are experts in this area and they have concluded as well that we don't you know cross we don't violate the GD P R so that's also great news legally speaking if you're interested in solution this is definitely one way to go so I we showed you a short example of an embedding in Scala we have done a bit more work in our prototype here we have multiplication involved so even if you don't want to come to talk to us you can definitely see the code and find out how it works great well thank you very much is it enough it's enough to have a direct embedding so what does the direct embedding give us once more what have we done here we've taken code now we can write code in a declarative or machine learning style we can write machine learning algorithms as if they were written in Scala or Python or your favorite language and due to the embedding we were able to get the security multi-party computation protocols running but actually in reality we in a bit more right because in reality the computations are distributed they're not just an array and so we must target a runtime which runs securely in each of the parties so that's one reason why it's not enough just to have an embedding the other reason actually more fundamentally mathematically speaking is we need to do some static analysis of our programs such that we can sort of optimize for memory and for communication purposes and we also want to be able to compute some statistical distributions which we will get into in yes so rather than having only a library we decided to write a compiler the the idea here is that we will take a high-level language something data scientists will be familiar with something that looks maybe like Python matlab's and linear algebra style and this high-level language will abstract away completely of any kind of low-level NPC details so anyone could use this language to write a program as they would you would would usually do the compiler will then generate a transform this program into low-level primitives that get distributed and actually run so that that's that touches the case for the the real-life deployment and then of course there's also a lot of static analysis that is done kind of what manner I just mentioned I'll give you a quick overview to of the the system we have at info and how the compiler fits into this distributed reality so there are a couple components involved to do a real life secret computation I will start going from the bottom work our way to the top in terms of abstraction so at the bottom we have players who so players like is the technical term for parties that have data we have the players who have access to some data which they would not like to reveal each one of them will run a virtual machine on their system that has access to the data this virtual machine can be verified that there there is no data ever leaked out then these these works so these virtual machines will do the number crunching these will actually run the secret compute algorithm to receive the true the algorithm this is where the compiler fits into the picture we also have the trusted dealer which is what we mentioned what manner I mentioned earlier to support any some more advanced computations and just to make it a little bit easier to use there is also a nice front-end the compiler is hosted by infer the engines the things running the programs are of course hosted by customers so how would it look like if we want to run a computation well the analysts through a front-end would submit a computation to the compiler a compiler compiles the program distributes it amongst all players then the trust the dealer generates random numbers sends them to the players so the trust the dealer knows of course what random numbers to generate thanks to the compiler and a cooperation between the compiler and some engine components and then there is a the computation actually starts amongst the engines is a communication exchange of data and finally the end result gets sent back to the one who submitted the operation now this trust the deal I just want to take two minutes to dig into detail here while this is still secure because you could think like hey we generate random numbers why should anyone trust us we could spy on you right after that and the key thing here is you'll notice that after the trusted dealer has sent out numbers there is no more communication with the trusted dealer so that means we do not have any knowledge of any data that leaves the data sources so that even if we had a malicious intent and knew all the random numbers that we generated we would not know any intermediate results and there would be no way for us to reverse what comes out of it so that was a the distributed part the overview of the architecture now well why else do I need a compiler so we're talking about you know the actual mathematical properties behind this so this is going to be the most technical part of this talk but I just want to give you an intuition about what it means to mask numbers so what do we mean by masking numbers well earlier in the example we had we were masking our salary by generating these random numbers and adding them up such that when I look at a number I couldn't really make out what the initial or the original plaintext value was so of course we did with integers right so how do we mask integers well it's a relatively easy easy thing to do an integer let's look at the 64-bit integer so we have a fixed we have a fixed size we have a fixed range and which means that I can uniformly at random pick any integer and add it to my plaintext and so what is so how do I know that how do I know that this is secure enough well the security model is given to given two masked values can I differentiate them can i distinguish one from the other and in this info in this integer case I can't because it's everything is picked uniformly at random so I'm none the wiser really with masked values and this gives us something a property which is very very nice which is called information theoretic security so this is sort of the best of the best that I can do unfortunately in real life we have floating-point numbers and those are slightly higher harder to mask so I'm going to try to give you a bit of intuition as to why that is well in floating-point numbers the first thing is the fixed range is something that does not exist we have basically it's essentially real numbers and so how do I mask real numbers well the best I can really do is to take my plaintext and have a normal distribution around it and then pick something in that now if I pick a normal if I pick a distribution that is too small then I may be able to distinguish two values and that's sort of on the left with the double camel hump here so think of me picking two different values I think here it's sort of 0 and 1 and I put the small distribution not a big variance and so the only the only way these two values are distinguishable is if I fall you know between the legs of the camel here if if I'm if I'm able to pick something that's in this in this space then I'm good that means I can't distinguish the values however if I happen to have generated my zero in this area in the left hump and my 1 in the right hump that means I can clearly distinguish both and that's a problem because that means an attacker can I knows and you're leaking leaking your data right so what you note also is that well so these humps are large so it means I have a very high probability here of falling into a case where I can easily distinguish my numbers so what is the solution there the solution is on the right so instead of taking a smaller distribution I take a very large one I take some that I can I take things that are sufficiently large you see that the scales have considerably changed on the right hand side such that the humps are very very small or sufficiently small so what is sufficiently small mean so sufficiently small means that I have a very low probability of falling in those humps and I get computational security so I want so my attacker could actually maybe distinguish the value but to do that he would have to do a lot of work because you'd have to send up to a large number of queries just to be able to distinguish the values so we don't have information theoretic security where we're none the wiser with the masked values however what we have here is we're making the work of an attacker much more difficult by making the humps as small as possible and and yeah that's that's what you get with computational security but however the problem is now as I told you we have this we have these humps let's go back to these humps so if I take a very large distribution that means that what in in practice what does it mean it means that if I have a floating point let's say that's 10 bits long so it's smaller than about 1,024 if I I need to take a sufficiently large distribution so I need to mask using maybe 40 extra bits because I want my attacker to be able to be work to work very hard so it means for a 10 bit number I'm using a 40 bit number so if I multiply two such numbers then you know I can't blow up right so if I have when I multiply plaintext if I have a 10 bit number and I multiplied with another TEDMED number then I can get a 20-bit number at most 20 but numbers still is fine because it's a floating point it can still fit under 64 bits but at 1440 well you know I'm really blowing up and so that's an issue so those are the kind of things that you know we basically need to on the compiler side analyze and estimate such that we can sort of ensure that we don't blow up so there's a little bit of you know what we do is we don't really use a floating-point representation we use something that's called a fixed point representation to represent floating points in our back-end and these different number representation sort of helps us avoiding this format explosion or we at least don't explore as quickly and this is what we analyze statically in the compiler so that we can propagate the statistical bounds correctly to the whole program before we sending to the engine yes exactly so I'll give a quick overview know now of what the compiler actually does and how the input language looks like let's jump right into it this is some source code for um our language it looks pretty much like Scala except semicolon is still required well why it's a long story the it doesn't look really exceptional and that's by design we want it to be as easy to use for anyone who does not know anything about MPC so it really looks like some yes some some linear algebra code the only thing you'll note is there are these eggs or dot things these are you can think of them as calls to a standard library or some primitive instructions this is these things are known by the engines known by the machines that have access to data but not others so they're not they're not known quite like this again not showing here are all these parameters these masking parameters sizes which which are not of relevance for a developer but of course very important in the actual runtime so on top of these primitives then we build a language on it it is strongly typed in fact it everything is always in lined everything gets constant folded so that that these primitives that we can infer parameters such as masking sizes to the primitives when we actually generate the code we'll get into that in introduced in a second so when you compile a program you just submit it as a source file and the compiler is goes through various phases it's built like a traditional compiler for around eight phases so we do things like while parsing type checking etc and then we get into some more details where we actually do the cryptographic part and this is where we need to check that for example if we multiply two numbers that we the that although the the resulting number may be or will be larger that the mask does not just naively get doubled maybe we can because we know something about the distribution of data we can take a smaller mask and thus be much more efficient this is what comes out of the compiler so might look a little bit daunting and it looks a bit like assembly because this is what it is it's assembly for the NPC engines but we'll walk you through through some of what happens here so that you can understand what's going on the first part is memory management so right in the beginning we said that one of the jobs of the compiler is to to make sure we're as efficient as possible and we're doing matrix multiplications these things require a lot of memory so we need to make sure that we can actually so that we allocate and deallocate memory when variables are no longer used so in blue you see currently all the the create calls of so these are like it this is like a malloc essentially delete is not on the slide it happens just below but don't worry about that then we have the built-ins so built-ins always work on containers a container is again like I think of it like a variable these built-ins and however so these built-ins map to to what I showed here the these eggs or these language built-ins the difference however is that we now show the all built-ins have some extra parameters pass to them and these are the result of the compiler there is also all dimensions of any input variables so these are the things you see in brown here they all get folded they all get replaced by the actual values so in the end so the example I'm showing here is we have a number of rows and a number of columns these are taken from the input values but however at runtime since it's a distributed computation we need to and to be as efficient as possible we need to know the sizes of all input in advance and that gets propagated through the program and then finally in red this is actually the key part of the compiler is inferring these parameters so these are these magical masking values that we try to minimize in the compiler to make as computations as efficient as possible now you could technically write this by hand all right but of course that's actually what we have done right in the beginning but using a compiler also allows us to compose programs much much better so rather than writing a prepackaged linear regression you know assembly routine you could you can actually build a full full mathematical library with these these operations and compose them to your liking so what we currently have is the compiler only works with NPC so this is the protocol we talked about however since it's a compiler and we can work one on these again these high level languages it would be nice if in the future we can integrate other privacy-preserving techniques here this is something more aspirational that we would like to get to currently we are focusing on MPC but the compiler is in a in a prime spot to be able to work as an intermediate for all these technologies finally I want to give a quick word about our team were currently six people working on the compiler at info a company we're quite some more we're actually growing and the salary is not as bad as we mentioned in the example so yes please come talk to us if you want more details and thank you I think we have enough time for questions yes please take microphone yeah that's pretty cool thanks doc I'm pretty curious about though I mean I guess we will have many questions will come to your proof later but about the type system do you infer the matrix a dimension I guess yes so if I if I ever miss much of the matrix dimension you can compile time you will tell me that's good yeah the compiler is it's not a Turing complete language everything must terminate and we need to know a lot of information about the input so the shapes we need to know sizes of the input so actually when you see this X or input built-in you see here this X needs to be known by oh yeah we need to know that yeah so we need to it's important to emphasize that we need to know just the properties of the data and if that's not the data itself and then this value that you you infer that or for you know this complex multi-party computation you do is it also part of the type system this is something you infer as part of the type checker so that's actually a bit of a trick question and a complex question so let me disambiguate that a little bit so we don't encode it in the type system in our implementation what we do instead is you know through the multiple phases with a bit of partial evaluation and so on but yeah with I mean I guess with Scala three and literal types it would be very fun with it wrist would be so that address is something that I would like to experiment as well to play around with but I think the key really is here is you still at some point need something like a compiler so you need to go from sort of you know shallow embedding to a deep embedding yes and actually actually have access to your trees because in particular I mean if you look at phase four here SSA is actually very critical for us because we want to generate efficient assembly such that memory is properly allocated and D allocated and so you need to convert to some nice form so that you can you can do yeah it makes sense anyway you want mostly more physics because you have very specific essentially the key part what we're focusing mostly is anything after round phase eight anything before that is general compiler construction yeah the the front end like how the language looks this is something we could imagine changing and saving multiple of them writes potentially thank you thank you yeah several questions first is how R is insensitive to noisy freedom if you want to use apply machine learning using some back propagation freedom using exponential functions if you just use approximation then you add some random numbers into the input which might make the back pro-beijing have problems hobbies of this currently we don't know so this is definitely something we're working on so whenever you need to propagate to get some results back from another computation as input to a new computation we need to submit a new program at the moment and save intermediate results to another place we were working on maybe making this more seamless into and with regards to precision right of course we're approximating but what we try to we try very hard to do is we want sort of the same precision whether you would do it in plain text or with NPC and so our current benchmarks are that we're we're in the seven to eight decimal points and after the after the unit so which is what our customers also expect us to have that's already okay so you mean that's you can always keep this precision regard what kind of inputs yes so that's that's what the the the key or the most difficult part for us is with the static analysis yes okay thank you my next question is what is inputs for the trusted dealers in general for all roadways for your libraries what kind of information use in general you provide for the chasity doors yes so the the trusted so this compiled program which comes out of it is you can think of it as a graph it's a dag essentially and this gets fed into the trusted dealer so the trusted dealer knows how much memory like so how many secret shares are ever need to be generated generates them and then sends them to the engines we give the trusted deal of the compiled program ok ok ok we can discuss later I have some more question so my last question is because he's a visual computing all right what if there are some failures happening on one of the nodes how I can two-way fight currently large amounts of data not large amounts of parties in it so we'll have I mean we can have several parties but we wouldn't have like thousands of parties in a computation so if something goes wrong we would just restart the computation and the how how do you verify like as if it's correct some knots right that's a good question we can just get that's okay okay I think just to summarize so the non-interactive zero knowledge protocols that could independently verify the correctness of the computation and just to give an idea so one of the the protocols that are used there is used also in this cryptographic currency Z cache so it's called ZK snarks you can do unlike we are putting a new public rock chain we we investigate the decay protocols like people so one is really not practical at war if you want to provide proofs or if you want really just walk with it is like for one community it could be like even seconds to generates approves you know so you can you can provide proof certain certain techniques in which we can provide proofs we even did it with a external company who specializes in non interactive zero knowledge protocols so for example the program that you saw there for that program we can give you a proof that convinces the that writing zero knowledge proof that the function is minimized the objective function in machine learning is minimized so if I may interrupt please take this offline because a few more questions at the back that Victor had a question thank you thank you I think this may be going to the to the hard parts but so do I infer correctly that you you need to estimate the ranges of these real number approximations that you're encountering yes exactly so we have actually one of the inputs is not only necessary the size also distribution of the data so for certain types of computations we need to know that okay so ii do type systems help that or do you feel that kind of abstract interpretation numerical domains they're a better match so i think i think honestly it's a big mix of both again for us since so we're using sort of a cheated partial evaluation to do this but sort of the calculus itself is so we haven't formalized this but i think something like i mean you need you need the actual sort of distributions there may be one one sort of answer that gives us a probable probably future technique is sometimes statically estimating them using static mathematical knowledge of distributions may not be enough so you may need to do some computations at compile time some random sampling at compile time to get you better bounds essentially some simulation thank you could you please point some public results that explain so yeah definitely so there is the sort of the main reference for what we do is called unfortunately flights but it's called the dispatch speeds protocol so SP TZ for the four offers that and this is this is sort of the setting that we use which is you know honest but curious with trusted dealer that would be a good good place to start another good place to start I would say is checkout so actually check out our repo because we've posted a few links and there's one other tutorial by Morton dal he goes into some of the details it's very very educative yeah so he sorry our language do we want to open source it so that's um that's a very interesting question and sort of tough question because actually we have this yeah so we have the the problem is so our security model sort of tolerates some things but there are other attacks so particularly if I can send multiple sort of orchestrated values to the secret computation such that I can recover the other person's data and so this is where something like a privacy budget would be nice so you want to have a static analysis which tells you hey you sort of york the computation you're doing will leak data and we don't have this for now and so since our costs for now our customers are fairly have very sensitive data we don't want them to shoot themselves in the food just yet an example would be you just it's an O of the identity or even just transpose a matrix and give me the result back well you can trivially revert that so we don't have this concept yet but it's something which would be super exciting if we could do some research into that so if I understand correctly there's some trade-off between like your how complex from masking is and like how security security is versus how much your computation is time a computation time you're going to take so how do you choose what trader currently 40-bit yeah so security is actually paramount for us I mean Dimitar confirm that but the idea is again customers are you know they need the actual some level of security which in this case is the computational security I was mentioning earlier so we need to do we need to make sure that all our algorithms and protocols match this security requirement so that's the trade-off so basically if you think about standard symmetric encryption nowadays there are standards for how many bits of security you want and what is the amount of work that you need to do in order to break a scheme for example AES so people advice make certainly over 80 bits of security so based on that these standards you could define your global compiler parameters that are going to guarantee security if you follow the protocol to to the 40 bits 40 bits for online operations so interactive queries which is largely sufficient for any data at rest that you could like if you were to take it and analyze it we're much more secure like AES 1 to 20 over 128 or something like this yes so you want to match the security of AES 128 actually names to the our product is called XOR and there's many reasons one of them is XOR is from the one-time pad information-theoretic security I mean it's it's also fast and used in many cryptographic operations well no question the back yeah thanks I have a question regarding gdpr because there is right to be forgotten in GDP and from the algorithm itself it looks like if one of their employees like on your first example exercises the right to be forgotten then I need to recompute the values for everyone else and it feels like a very computation heavy in case you have a lot of users how do you tackle that well I so my answer to this is that I mean I can't use cases we don't so we don't have currently haven't faced this issue directly because most of our current use cases are about sort of one entity managing multiple distinct but sensitive entities right like a bank for instance so all their data sources are belong to them they're just the lordís allows them to sort of exchange things I'm a user of the bank and I can request my data to be forgotten forever by the bank so I don't want it to be used in the training of the models I don't want it to be used everywhere and it's my gdpr right so it feels like in that case the bank needs will be enough like recomputing a lot of things so that's why we need to make it very efficient I guess we don't address this problem we are really in privacy-preserving so I don't know maybe this I don't know the exact legal details of GDP are but is it maybe it's sufficient if you an anonymize maybe user handles or anything before they they get they get shared so what is important is that your data never leaves the bank never leaves the source with this protocol that's why big maker McKinsey estimated that this falls outside of the scope of GDP are any other questions or one more question hello how does your solution currently compares to a technical what of a solution for privacy-preserving computation like what you you can have in a tons of refrigerated and open mind so how does the solution from a cryptographic side compares to a transfer flow was that the question well it would be more on the commercial side because you your competitors today they're people like your tens of tons of about by by Google and they using tons of a fidelity you can do secure multi-party computation right right I see the question so so this is a there is a good distinction between [Music] multi-party computation and federated learning so federated learning is a protocol where we could all collectively do a computation here everybody in this room so everybody computes locally on a device and then sends a local resolve to a server that is going to aggregate the result of the computation so obviously this is this is very different than secure multi-party computation where you're not sending in any kind of even aggregated data artists they just computing completely decentralized distributed normal and then at the end of the computation you are getting the result in sacred shared so you don't have to even reveal the result to that to the public or to the analyst so it's it's very different but the Torah in the very different use cases and in the situation where for example you want to compute between a handful of hospitals then secure multi-party competitions for various mathematical functions is much more appropriate than then federated learning Google is interested in federated learning obviously because they have they want many many many parties to do a collective computation but even from a security point of view it's it's it's very different does that answer great looks like then it's time to have lunch thank you very much [Applause]

Info

Channel: Scala Days Conferences

Views: 208

Rating: 5 out of 5

Keywords: Manohar Jonnalagedda, Jakob Odersky, Scala, ScalaDays, Lausanne

Id: 44EL11N3tOs

Channel Id: undefined

Length: 49min 29sec (2969 seconds)

Published: Thu Jul 11 2019