High performance Privacy By Design using Matryoshka & Spark - Wiem Zine El Abidine

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
yep hello everyone thanks for choosing this talk today Olivia and me we are going to talk on high performance privacy by design using matryoshka and spark um we am Zine El Abidine is Karabakh and developer at MOA and in my free time I contribute on Geo I'm a big data architect engineer and co-founder of lot of thoughts a Paris based consulting company so today we are going to talk a bit about privacy framework that we built giving you a small introduction and like you see data structures and how to handle them we are going to talk about recursion schemes using a Scala library called matryoshka and present in depth three privacy and join but we built with this framework so what are we talking about well we are talking about user information we are talking about protecting users data this is especially important in the European Union with the GDP or agreement so what are we talking about precisely when we are talking about user information well here's an example of a piece of data in this piece of data where is well a few fields some would be considered important would be considered user information and some wouldn't be just metadata or not very important what would we like to protect well for example we would like to protect the name of the person her email or phone number or address lane and maybe the last position we saw her when I mean protect I mean that in any company we need to give access to this kind of data to that our engineers data analyst data scientists the goal is to have a clear sense about which kind of data you can divulge and which kind of data you need to anonymize and when we say protecting well there are many ways to information in order for data science process to be able to work sometimes we don't want to anonymize or ash everything we want for example to encrypt a company with a hashing mechanism and hit Verlaine with the hashing mechanism but maybe we want to mask the person's name using dot xxx fire keeping the first letter of last letter of a name and especially for emails we might want to keep the domain name so we have many encryption functions that we might want to apply and these functions are dependents not exactly on the type of the field because the type is just going to be for example string but on some deeper meaning of what semantically these informations represent so our goal is to build a generative ABC framework that can dynamically apply privacy on specified fields with different on-chip encryption functions when we are building a framework we need to first clearly understand and get a sense of what we are dealing with so we are going to use a very usual separation in our domain we are going to separate data into a data set into the schema on one hand and the data itself on another hand in big data it's very usual the schema is what it's the field names and their types so let's say other C's which are strings and the data is going to be like all the whole blobs of data that we are going to store we do that in big data for many purposes but may need to be able to endure schema evolution and to be able to fetch a schema wide before billions and billions of records of data but it's not sufficient for a privacy framework we need to add metadata add information that will tell you well what makes an address worth protecting what is an address for person is which is this something I need to protect us with something related to a company for example so we you in our framework we use a specific set of tags that are well inspired by the domain of semantic web so we used a DFS type to be able to semantically annotate our piece of data to say this is well this is a person and this is its address this is a person wishes of email this is a company wishes an address so I don't need to protect that using this kind of separation between you know the schema of a data and annotated schema with the tags we can define privacy strategy in the broader sense and in the more generic way than we would need in the specific data sets well we are not going to say I'm going to protect the name field of this data set we are going to say for any given person's email apply this strategy for any person's password apply this strategy for any person's ID delete the field we don't want it to appear anywhere else this is particularly important well this is the interest the person interested in Lovat talking right now but a VLP Union went into great length into not defining what user information on what personal data was so this is a way to define what is worth protecting for us so how can we express such a privacy framework so using a simple type a yes we are going to say that our privacy strategies are a simple map with the sequence of tags that we want to match and a given privacy strategy that we are going to apply to these set of tags the privacy strategy is what it's just a simple contract that is going to define how we are going to encrypt the data the encryption strategy but also how this privacy strategy is going to change the schema of our data because for example if we protect an ID by hashing this integer then it going to change its type into a string so we need to reflect the mutation on the schema itself as well so for example if we want to apply certain privacy strategy which can be useful to data science processes in the we might want to protect the specific age of a person by turning it into a category like young adult adult senior teenager etc etc so how can we build such a comprehensive famous well we are going to create a schema which is pretty usual as usual we are going to define that any piece of data is whether struct an array or a value the value is like the leaf part of a tree it's the end game it's going to be boolean strings dates double simple types the strikes on our recursive data types that's to say that struct as a field name and recursive t schema so recursive type parameter but yaja is practically the same thing in Scala what does it look like so it looks like a sealed tried T schema simple ADT where the district is going to be the list of field names and their cheese schema and VRA is going to be like a simple concrete consistent element type where V array contains all of the elements of the same type and the element type is going to be once again a chief schema you got the leaves node at the end and we are going to add on every piece of our schema we are going to add metadata this metadata is going to contain the semantic tags but we talked about and maybe so information regarding new label or mandatory fields or data is practically the mihawk concept because kheema and data they go together so we need to have VJ data that is going to open the seminar key in terms of types it's way simpler because it's supposed to only handle the data so you've got the concrete values and for vieja type is going to be the sequence of data that's the only changes that we are going to have Spock has practically the same concepts it's as a data type with the archivist well type and the values and in terms of data it's a bit more lenient but a bit more performant it's going to be a ho-ass Park sequel oh-ho is basically an array of any which contains well primitive types or rows or lists and so that's not what we want to be dealing with we are going to deal with our simple type of types that are type safe and we are going to create recursive functions upon those types now we are going to deal with recursive data structures and in order to apply privacy on specified field so we have to implement recursive functions and recursive functions is about thinking how to iterate that our data structure and what to do on each iteration and our data structure is deeply nested and we have two data structures our schema and the data so in order to think about how to iterate our data structure and recurs through this data structure and what you do on each layer this we want to have a stack overflow exception in our mind before we get it in the runtime system so we we wanna have also a complex code what we want to do is to focus only on the what to do on each layer which is what what we want to do is to apply privacy so there is the best the best approach to do this is recursion schemes which is described in another an interesting paper and famous paper in functional programming with bananas lance's envelopes and buried wire recursion schemes is about separating how to trigger traverse a recursive structure and what to do with each layer so we wanna have a maintainable code and recursion schemes will automate the recursion and let us thinking only about our business logic and we are using Scala various functional program in Scala library call it matryoshka that specializes on recursion schemes it generalized the traversal the recursive travel cells using anamorphisms cut a morphism I know morphism and more but today we are going to cover what is Anna katha I know those are those functions that will generalize our recursion and in order to understand those functions we have to prepare our data and to simplify this let's take as an example our schema so in order to prepare our schema we have ingredients so it's our magical steps to follow before we can cook so our schema is recursive schema but as a first step we want to remove recursion in order to make matryoshka able to evaluate our data structure into a single value of type age so we will replace our recursive reference with that tape parameter new generic type parameter with a but what what if we want to to define or describe that a is another schema F of a and how could we describe a reclusive schema F with that new type and how could we the define a type with a recursive schema F and in order also to to think about if for example we want to have a function that takes as a parameter a reclusive schema F and a return or returned a schema F with us of schema F whatever it's deep so we we need to generalize this type and matreoshka have a fixed point type that can help us to recapture the recursion so now if we want to say our schema is the recursive schema so we can define a schema as a fix of schema F and we wrap each layer with fix in order to make matreoshka able to traverse our data structure we have to to say this schema F is a factor and we implement our the map of the factor to traverse the schema F of a to another schema F of B now we prepare our ingredients to be able to use those recursion schemes functions that generalize and automatically recurs our data structure for example Anna would recursively unfold or construct our data structure and kata will recursively fold our data structure and I high/low will refold our data structure so now we are able to start let's cook we want to have three recipes the first one is to create a schema F from a single value of type spark data type the second recipe is to fold a schema F from a to single value of type spark data type and the third one is to transform a data type to another data type via a schema F in order to create a schema F from spark schema we wanna need a function that come from A to F of a and Matt Utica has a generic function called Anna that will return a recursive data structure yeah wrap it in fix and this data structure should be an instance of type factor which we already did and now we want to have from a single value from a simple value of type spark schema we will get our what we want to do is to have a fixed schema F how Anna works it will build our schema F from spark schema from top to down so then in order to be able to build and using anamorphisms we need to define our Co algebra and Co algebra is a function that function that I mentioned earlier that comes from A to F of a and in our case a is a data type and F of a is a schema F now we are able to to mention our like in each layer to define our recipe what we want to do on each layer and then we can cook using an ax and then this will automatically do the recursion for us and built from a single type spark data type to a schema f-fix of schema F cool now let's move to the the next recipe we want to fold a schema F to spark schema so we need a function that comes from F of a a and much huge gap provides Rick a recursive fault function generic function call it cutter and how cut a morphism works it will take our data from the nested deeply nested data it will start from the long F and fold it to a long type and then the deep the other deeper type is string F and folded to a string type and then to the top level so cut a morphism is will will fold our data type from from bottom to up and how how could we define this how could we imagine this recursive function would work how could we take the elements of for example if we have an array of a single value of simple values how could we take them to the next level to to to fold and collapse our schema F cut a morphism has an algebra and this algebra is our function from F of a to a and our F is a schema F and the a is data type and the magic happens in the type parameter data type because it will keep what we did in the previous recursion iteration and then would would keep it for the next iteration and now we can define our recipe on each layer and as we see in for example struct f it has field and it will take those fields from the previous level that that contains the fields instruct F our list of data type that will content that will be computed on the previous iteration from cata morphism so now we can have a schema F and fold it using cutter and using our algebra pool let's see how to transform a data type to another data type as we as we saw in the previous slides we we built we can build the schema from a value of type data type using an ax and we can fold a schema f2 data type so we need to call algebra and algebra matreoshka provides a function generic function that will do that in one step which call it eylem orphism we can call it and we can define our algebra and co algebra so cool now we understood but what are our functions that we wanna used in order to apply privacy and for the first the first private privacy that privacy strategy that we will apply is how to apply privacy in our schema which was simple to do because we defined an algebra and it will take it will do the same work and the same logic on each layer so if the code was very simple and we call it cat a morphism in order to automate that recursion now let's see how we apply privacy in our data so we are going to talk now about the free engines but we built the just as a reminder so we have an app with a sequence of tags but we want to match against the schema for data and we have strategy that will actually do the work but the thing is that we want to encrypt data only if the tags within its schema matches those of the privacy strategy so if we are thinking about very naive approach we know that we need when we have a specific piece of data we need to be able to look at its dedicated schema to be able to our vest the tags and check if those tags well what we are looking for what we want to protect so the first very naive approach that we can have to this problem is to zip recursively the data and the schema together that way but at each level at each layer we are going to have two things for data and the schema and then we are going to apply just a simple pattern matching what is going to say okay well if for this specific piece of data I have the specific set of tags but I'm looking for I'm going to apply the privacy strategy if not then just output the data as it was well there's no need to secure that piece of data so how can we do that well luckily it's not locked but yes the matryoshka framework gives us enough tea enough tea is basically a button from Tao but from a phone to W in our case or data F we can put just right next to it a label of type E which is going to be a schema in this case and it still has that type parameter a but type parameter a is very important this is this is B whole but that is going to get filled at each computation by the matryoshka framework this a is what is going to be the final output so the FG is basically our case class so we can better match it to create it or we can pattern match it to extract the necessary components and we have to access to two methods asking lower which we gives us respectively the label the label and font itself so for an example if we have a specific piece of data and a specific piece of schema which is john mcclane of gender 0 we have the person name on one side and the data on the other side then in the end result what we want is a district F that contains on the left-hand side the schema with its tags and on the right-hand side were the data itself that's what we want in the end so using matryoshka we all need to match schema and data but in the real world well schema and data might not be compatible I mean most of the time the schema evolutions we make may be they do work but most of the time if someone didn't follow the rules when it doesn't work at all so we need to take into account the fact that when we will be zipping schema and data together we might have incompatibilities we might have a specific piece of data that is not nearly related to what the schema is and so we are not going to output just vft which is that I with schema we are going to output an either within compatibility or simple test class which will give you problems you're looking for and fail because there's nothing but we want to output from this framework which hasn't been verified if you have a specific piece of data that does not conform to the schema you have no idea what you're manipulating and the last thing you want is to give a data is to credit card what might sound a bit so the zip with schema that we are going to define is actually a very simple pattern matching as we said we are only responsible in matryoshka for giving the recipe for a specific layer so at a specific layer you're only going to say ok when when the when the schema specifies that I'm expecting a struct the data must be abstract when the schema specifies when I'm expecting you now ever data must be unaware and that's it and if anything else happens when you have an incompatibility and you stop right there well you don't stop right there but in the co algebra you are going to output the left-hand side that's to say the incompatibility so zipping this data and schema we have the recipe now encrypting the data well this time it's another transformation from this IVA which is a first transformation the Co algebra to a fix of data F that's to save a transformed data privacy data all we need to do at this point is define the corresponding algebra that is going to extract the schema and the data from our of T and say okay I'm just going to check for the privacy strategies if those specific set of Tanks Wingull bell and if they don't then I'm going to just output the fixed value as it was and if I do when I'm going to apply the privacy strategy over that over back fixed value and that's it that's what privacy enjoin the beauty of functional programming is most of the time you end up doing 80% of the work in the types and in in before and then when you have to really do something and it's quite easy so we have a KO algebra we have an algebra so what we're going to do is to apply it with an eye low the ILO is going to take as input the schema and the data because it's as a topper that we can zip them together and then get the necessary output and we only need too much being compatibilities or the end result so this is a very naive approach it's quite easy once you get the types right so we have a very versatile and generic engine are we happy no we're not it's not very efficient so we're going to try to do better we are going to build a lambda privacy and join the privacy and join is fine but the thing is that if you have a thousand records when you're doing the thousand time zipping the schema and the data together the schema is not going to change the data is going to change but the schema gives you the recipe I mean that the whole point of SPARC is that the schema gives you the recipe so is it just possible to do some kind of one-time lens that to say something that is going to put their limitation go down into the data and modify just what we need we can do that we can do that by chaining functions so we are going to try to build a lambda that will go down into the data according to the schema and it will be applied the good thing is that the recursion we did privacy is always going to be applied because we need to do both algebra and co algebra to be able to know if we need to do something but this this thing it's only going to be one once checking the schema and checking if there is anything to do on that schema so if there's nothing to do when it just it you can call identity and that's it that's over so we're going to define two types of classes we're going to define a trait that is mutation owed and you've got a case object that is going to be no operation nothing necessary please don't go there and we go down operation but we don't have two functions apply to actually do the mutation and the metin2 chain another function to be able to get inside the elements so once again we have a transformation using Messier but this time is not in dialogue we don't need to transformation we're only doing from a schema F to a mutation so we are going to call it prepare transform because in case of for example a streaming application you want to do that at the start of your streaming application and then just apply it for the rest of the lifecycle of your application we're going to use an algebra anakata but is going to come from the schema to the mutation of the virus here device once again a pattern matching of all the cases possible that's to say struct a high and values and once again one layer and that's it in matryoshka you always assume that the rest of the work has been done for you so in case of values when it's practically the same code as before we're going to just check the schema check it according to privacy strategies what we have but this time we are deferring the execution to later we are going we're creating a godown operation that is going to take a specific piece of data later and you're going to apply the privacy strategy using the closure and that's it if no privacy strategy was matched when it's no oath when you're dealing with the array remember we all in your functor is always filled with the previous computation so the element type is now previous operation it's not an element type anymore it's it's an A which has become a magician odd you match the element type you match what should be applied to the elements you match it follow-up or anything else and if anything else needs to happen that means that you've got to secure all the elements in your array that means what you need to loop on the elements of your array and recreate an array with the privacy strategy previously defined applied on each data of your array this is quite simple it's just unwrapping rewrapping and it's there's no if you don't want anything to do with your data and you don't need to do anything on your array as well because viaje itself is just a container for the data first like it's a bit more complicated but interesting aspects as many fields so you're going to check if every field is safe if at least one field needs to be well needs work done then you're going to unwrap and reverb the stroke to be able to apply on this specific element the privacy function but it needs to use so now according to any given schema we can now build only once a lambda that will zoom into the recursive data and only go into what it needs to it's practically the same as a very specialized code that is going to do something like that that gets you will get one get you will get one and yeah that's what I need a lens but for arbitrary data and it can be socialized and applied many times so you can serialize it in a spark process you can serialize it in the streaming process you can serialize it and use it when you want so it's it's better we it's efficient right we can do better because it's still managed by the garbage collector who is not who is familiar with yeah okay so in spock sequel you've got this really neat engine which is the catalyst and joined symbolic manipulation that allows you to define your job and and manipulate the data and do Co gen that's to say that Spock is going to generate Java code Java bytecode that is going to be compiled on the fly sent to the executors and one and it's extensible not many people do that but you can it's not a hack you you can integrate into spark a bit more so for example in in an apache spark job with millions of record any of the previous methods will actually generate a lot of conversions back and forth it will go from in the worst case you'll go from a spark hole to a data from 12 and data with schema and the data front and then back to a hole which is helpful and it's not really integrated or spark so you're breaking the logical plan execution and optimization of spark because you're going back to a DS applying the transformation on the piece of data then applying the transformation on the schema and recreating the data frame it's it's perfect so let's do better the catalyst engine is is going to do all these steps to be able to guarantee an optimized logical plan but if you see at the end that we selected physical plan and cogeneration we can actually be part of the bad story we can use the spark catalyst to generate other optimized Java code that will go down into the data precisely the same way we did with the lambda up but this time in the unsafe world that's to say using the unsafe API and and Java code it's not perfect but look at it that way you're doing interesting functional programming to give orders to Java which is cool and you're type safe it's not but you are type safe so you're going to mutate it according to privacy and stay as much as possible in the unsafe world we are going to go there uncharted territory so this part life cycle is pretty high for what we are going to go from a schema F to Java code as string and then compiled by Johnny no which is a Java compiler very quick send we buy code to the executives and do its magic so we need a little bit of work done to be able not to lose our mind we're going to wrap in a value class the input variables what we're going to use and we're going to define the catalyst code as being something that takes an input variable and generate a string which is going to be the Java code and this string is going to execute code that is going to put in Java the output of its computation in an output variable that is documented in this case class so in a sense the input is given by the outside the output is given by you that's it so how to create a new expression well you only need the children that are going to be the columns of your data frame that you're going to use you extend the expression when you're very simple case you can extend a narrow expression but this is not our case this is not simple because we are going to apply it to the whole data set is your expression notable how does your expression transform the original schema of your data which is great but it's in the contract because we don't have to specify it anywhere else and we already prepared for that because we have all the necessary transcriptions between our schema F or transform schema and the data types so we need we need this and spark is giving it to us there's something quite nice as well which is the eval function sparks sometimes doesn't rely on code generation because if thinks that it can do better on heap and this is just basically any of the previous engine that we defined that we're going to apply there and that's it it's not for the most complex cases or in production it's it's not that what is going to be used and the doujin code that so let's start with the end the end is what we are going to define an algebra once again from a schema F which is the basic recipe and it's going to generate a catalyst LP with its data type because I need the data type as intermediate computation I'm going I'm not going to use it in the end but I need it in between if there's nothing to do when you've got a very neat Java code between string which is say output equal input don't forget the semicolon it's Java and if you've got code then you're going to call the string generator you're going to call your method with your method with the input variable that you have and it's going to generate a huge chunk of code a block of code but is going to define the privacy output variable that you can finally assign to your result so it's cool but that's the end let's go inside once again we are going to segregate between all the different cases struct are high values and the value is practically the same thing we're going to check according to the scheme of privacy strategies Spock has a very remember we are doing things on the driver side but we need to send them on the executive side so you need to sell your eyes one way or another the privacy strategy that you have and send it to the executor but luckily SPARC as an ad reference table which allows you to transfer code to transfer objects from your world to the executor world this is going to give you a variable string but you can use in your Java code you're going to safely define your output type your output variable cast it's because you because that's what we do and then apply it apply the function in the executor world this is going to give you your first catalyst code and then it's always it's practically the same thing for VR has produced workers based on whether you have no ops that to say that no code was generated and when you have nothing to do and via hire but if you have code that was generated when you're going to take that code and apply it in a very neat and very type safe Java loop for loop Java 1.2 for loop with the most basic elements that you can find it's going to apply it feel a temporary array of objects and give you a specific output you're writing code blocks but we've string into operation it's not that bad and the struct is basically the same thing but on each and every field with the same logic if there's nothing to do on any of the field I'm good if there's something to do on one field well this time this is a mutable internal hole so you can just mutate the part you need sometimes because spark realized early on the unsafe API and the unsafe API is memory not managed by the garbage collector so for fixed size types you can be very optimized but for arbitrary sized type like strings or another state and you need to be more clever ok so we are done we have our final method that is going to use all the Java strings but we nested neatly together and use it and output it in a final code block but we hope is not too big it's tough but at least the data stays in the unsafe world when it's not needed it can even stay in the tungsten data format when it's of fixed size it's deeply integrated with spark I must admit but it's not a hack this is all public API undocumented it's not widely used but you can do it and the results are pretty cool because false sample Apache spark job on the mezzos cluster with 10 cores 5 gigs of heap 5 gigs of compressed data the first engine is basically 70 minute Long's the lambda engine is slightly better with 45 minutes long and the Kotian is unbeatable because you can't beat that it's 21 minutes on data structures and algorithms algorithm our pattern for solving problems and you can came up with we can come up with a good solution and elegant solution if you have good design and in our case we use it functional programming Scala library matryoshka and we came up with three engines to apply privacy with a testable code and maintainable code if you are interested on the code you can check out our repository and we implemented also the test there and you can also use Majorca and try try it also if you are interested about the idea of recursion schemes so you can check out the paper and we want to take this opportunity to thank Van Outen Casas for his foundation of this design and our colleagues especially amin say gamma and the raza Bahamut and thank you all for your attention you can follow us on Twitter [Applause] any questions okay thank you we're around I'm sorry go ahead so from yours struck by you I saw like this floor and ah that's also decimal Type four for spark as well you're talking about schema that we defined yeah and sometimes say the sauce schema see I'm using JSON schema or Aero schema as SSI it could be I could numeric and certain positions map to double certain positions meant to be like a decimal different scaling and position oh that like the fact what you've been doing not much to be honest with with this design in the organizational standpoint this design was is the abstraction of the code the first schemas were designed by data management teams in JSON schema then they are translated to this ADT and then we define the transformation afterwards but any kind of data type that can be represented with spark I mean it's it's not a matter of spa we are using spark here but if you want to define your your precise decimal type or everything you can do it and it doesn't change anything in the code that has been designed the only thing that might change is that your privacy strategies your implementation of the privacy strategies we didn't get into that much depth with it but as you can guess they need to be typed as well I mean you're going to take the privacy strategy you cannot apply I would say fuzzy fuzzy of GPS coordinates on unint so your privacy strategies are going to be well type safe or checking the input types to check that the data you're applying them on to is actually compatible so it doesn't change anything to to rational but the more types you had the more you need to take into account and that's it but a value is a value so for example with with the citrate T value most of the time you don't want to get into the into what kind of value in it it is you just need to know that it's not a recursive data type and that's it and that's the only thing matryoshka will need in a way so doesn't matter that much you've got decimal types you've got byte array you've got doesn't matter thank you okay yep encryption so the question was is it possible to add a salt value to the encryption we didn't had it to be open source project but we have something called the privacy context you know implementation and the privacy context can be well we rely most likely on the target you're going to target and and will contain for example salts for different business areas or for just know different stakeholders and and your ash functions are going to take into account this privacy context so yes you will need to add that context we need it to have that context yeah thank you very much for the talk it was a very useful and it's quite a nice application of the recursion schemes first and the rest of the schema representations next one thing I have a little doubt you may have clarified on the second man fair the schema representation schema calculation are you actually using a recursive types a fixed-point types as in the first one what on the schema on the second and third solution you have a you have first of all Matthew Scott can you go back or yeah it depends on okay let's let's be clear on one thing this design as the T schema and G data that are Aditi's okay but in real life too if you if you don't have a necessary application for these ADT's if you don't have them in concrete types like in json serialization or in scheme ology theorization you don't even need to have them you can just get and keep the pattern filter that's to save a schema f of data F and then yeah when you're manipulating those even if there are you know intermediary computation types that you're not going to be using outside of the harm of matreoshka then to manipulate them and make the compiler happy then yes you need to make them fixed you need to have a fixed data for fixed schema F for simplifying the code of in this presentation i sticked to the t schema we kept the t schema we kept the G data but in in real life application if these types are not needed then you can just keep the pattern front of form and and do it use it as a vehicle for the concrete types that you have whether it would be JSON schema or spark data type in XML or schema registry aho this schema F this vehicle is what allows you just like you know the ash lists of shapeless they allow you to transform between arbitrary formats in the data like that we built they are used as pivot format for going from JSON schema to spark data type to avoid data type to pakka data type I mean yeah we use them as the private in the pivot format so and if you use the schema F of a data F with pattern from Java then you need a fixed point whether it be fix or mu there are many fixed points that you can use we use the simplest one and your question did I answer your question okay thank you thank you have a nice day [Applause] [Music]
Info
Channel: Scala Days Conferences
Views: 633
Rating: 5 out of 5
Keywords: Wiem Zine El Abidine, ScalaDays, Lausanne, Apache Spark
Id: hh9SYl-IfIc
Channel Id: undefined
Length: 50min 43sec (3043 seconds)
Published: Thu Jul 11 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.