Overcoming the Data Glut with Snowflake, Databricks and Portworx

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Applause] alright hey everyone how you doing hi I'm Alex Williams the new stack and let's just do a quick introduction we have been wha does real love of snowflake Alley goatse of potato bricks and we have Murli tourmaline of Port works how are you all doing super awesome really let's start with you I mean one of the things I want to really want to help do here is help people understand what are the differences when we think about data management and this whole concept of data Glatt you know what does that mean to you and I'm and I'm curious because you know one of the things you really deal with a lot Murli is the people who are in the DevOps space and and how how technologies are really adapting to new application architectures using containers for example and so you're really working you know with these enterprise companies out there and trying to help them transform to some degree and I think then there's a different perspective the two of you bring so perhaps I'm early why don't we start with you and just talk a little bit about what it is actually port works does sure so port works is a container kind of plat data platform so what we're really doing is doing an extend extending kubernetes so that we can do storage and data management for kubernetes right so essentially if you think about sort of a layer cake of what customers are trying to do let's take a customer for example like t-mobile one of our best customers they're taking a lot of the iPhone kind of and I watch new information and onboarding customers onto a platform that's built on containers so they're taking their their onboarding software and that's running in containers it's being orchestrated with kubernetes so the containers themselves are being orchestrated by communities now all of that data needs to be managed and orchestrated and that's the role that port works has so we kind of sit as an overlay on top of existing storage but what we're doing is ensuring that all of the data s LA's for performance for high availability for backup to be to ensure disaster recovery is all being managed so that those applications can always be running and ensure that data is always available Murli I want to transition out early and early was really talking about is how you have to think about data in a wholly different way it's like this new kind of thinking about data it's it's always there and Ally I know it data bricks you guys talk a lot about the way data is used to transform how do you think about transformation of data and what are the requirements for it now is it batch which is kind of like just taking data in increments and and then being able to transform it into something that you need and then there's streaming right where you can just kind of stream the data all the way through what is your all's take on it yeah that's a great question it depends on the use case so what we do with data box is we basically enable companies that have massive massive amounts of data way more than can fit on like a machine or ten machines or hundred machines and we enable them to do AI and machine learning on it so it depends on the use case so for instance Regeneron is a big customer of ours and what they've done is they can actually build a big database of the genomes of all their patients and they also have a big database of all the diseases that the patients have and they use machine learning in AI to find out what are the gene sequences that are responsible for those diseases and they can actually develop drugs then to attack those this particular use case it's much more battery and that right it's like massive amounts of data and they really want to find that genome that's responsible for the for that use case we have another customer which has actually a chat application that they've built and they want in real time using AI be able to detect if for instance if there are pedophiles on the network that are pretending that they're fourteen-year-old but they're not and that they want to do in real time it's really important I can capture those people so that's a different use case we support both so it kind of depends on the use case but more and more we're seeing real-time appear in our customer base and been want snowflake it it's a lot about the data in the cloud isn't it I mean what we're seeing just is this tremendous generation of data correct I mean we have machine data for example that's just being you know coming out of for instance application developers who were using monitoring techno you know applications are generating a lot of themself you know web blogs you know I t's sensor data so and these data is really as two basis challenges what one is the volume there is among humongous amount of data you know petabytes many petabytes of data that's that's one you know challenge the other challenge is a structured of the data is not simple table Dan Collins so this is called big data so the challenge of big data is is very real and and if you want a company to be really data-driven you need to analyze also that data with business data so you need a system one system and that's what snowflake you know the it is is to reinvent data warehousing for both business data and large volume of big data in the cloud so as a service and the scale here is super important right because because of the volume you need to scale not only because of the data also because of the users you know every enterprise and this is what we see from our customers every company's you know today wants to be data-driven like everyone wants to you know acts like Google you know look at data make decision based on this data that means every you know employees potentially alpha company you know will access the data system to make decision and and the number the growth of data the growth of user makes this system very challenging so being in the cloud where you have unlimited resources is a big plus that and and that's that's why you know we be snowflake that way so let's Galaga yeah I just want to actually just agree with Ben Wong here especially for machine learning and AI which is our use cases people have been doing machine learning for 60 years it wasn't working for the first 50 what changed the thing that has changed is what Ben is saying massive massive amount of data when you throw that with modern hardware you actually start getting superhuman results so that's the data is really really important if you have those massive data sets you have an advantage it's the new oil if you don't have those data sets you're at as a at the disadvantage so morally for you what's the data glut what is that data got out there you guys are dealing with you know people who are system administrators and IT operations teams what's that data collected they face see the the part of the industry that that we're facilitating is really around agility right today's today not only is there a lot of data but now that data is is coming at them in real time we talked about real time there's massive amounts of it and really the the role of the CIO has changed about like 1012 years ago the CIO was very focused on the infrastructure in the data center that was their baby they they the care and feeding of that infrastructure using virtualization maybe even using infrastructure you know renting it instead of owning it so the focus from infrastructure has now changed dramatically to being focusing on apps and data apps and data or how the new CIO is gonna win and help their enterprise win so there's different aspects of that of that apps and data right some of the stuff that we've been hearing from from Aly and from Benoit but there's another aspect of it which is the application themselves are changing dramatically let's give you an example right we've got DreamWorks as a customer so DreamWorks is is has people who are creating these new kind of content and new ways to process that data they have lots of lots of different applications you know Maya other things that do rendering so they have people who contribute from all over the world so in this case it's a one it's a proliferation of data coming from different content creators all around the world how do they ensure that all of this can come to a central place and actually that it will work and from people who are not experts in infrastructure well the way they do that is to containerize those those models just like genomics so that's another application for us T gen is a customer who which does that and so containers allow people to rapidly make changes in their applications and deploy them instantly anywhere with a guarantee that it's gonna work and for those who are familiar with container technologies and/or kubernetes the container is essentially just a process that lives on top of the operating system and then the orchestrator is kubernetes that orchestrates all containers so you can move code around very quickly and then call on the data exactly Alex so in a way you you are container izing your application and making that very mobile and portable right then comes kubernetes which takes multiple of these containers which are you know a application can have four to five six seven containers and hundreds and hundreds of users so you have thousands of containers and what communities does is orchestrate where those containers land what we do at port works is ensure that wherever the container land the data is available so we provide data persistence second thing is if there's any failure in any part of the network the container could fail that node could fail we ensure that the data is available and the canoe when the new container gets spawned so it's data availability third thing is the disaster recovery you know if you can have whole racks fail or sometimes parts of a datacenter fail or parts of a cloud fail right we ensure that that data can be recovered and and run instantly and the application runs you know instantly so so for us the idea is that data now is distributed it's it's ubiquitous it's it's real time and it's it's massive amounts of data orchestrating all of that at container scale is something that the old storage kind of paradigm just fails to do you know traditional storage is very siloed and very contained and what we're doing is providing an overlay on top of that that ensures that all of that can still work alle so when I when I here Murli speaking I'm also thinking about the amount of data that is created from that right and just the massive amounts of data there's just scaling up that creates its own data glut and but in your world you're transforming data at you know at it at a rate that makes it manageable how are you doing that what is it that makes your technology distinctive and in the marketplace you are a big rely on spark which is maybe you can explain what spark is to people who may not be familiar with it and how do you think about that data glut itself but how do you make it something that's an asset another value ya know so we sedated works actually started out of UC Berkeley research when we developed three popular open-source projects that have massive adoption now on the planet they're not all customers of ours so those three technologies were Apache spark Delta Lake and the project called ml flow spark project enables you to take massive massive amounts of data and do processing on it and the way it does it is that it basically turns your data center into a computer so previously we used to use a computer to do data processing on but the computers are not scaling anymore so what SPARC does it just turn your whole data center into a one computer you can treat it as if it was just one giant virtual computer and you can now process however much data you want with it and you want to go faster you can just add more machines to it and we do that in the cloud so that's one the Delta technology that we have enables you now do this really reliably so that you can actually start bringing operational requirements into it you can do it really really fast and now you can actually connect other tools to it so that you can leverage this data that you have for applications like bi and then the ml for project now enables you to do machine learning with it so you know you can actually start doing AI predictions build models that can predict the future for you so those are the three sort of technologies that we combine and we offer it in the cloud and the cloud is really essential to what we do so the technology they develop it's really important for us because the containers we leverage that to actually be able to move between the clouds and for for us for our technology to work you need lots of data and actually the cloud is really the only place where you can actually go now and buy data sets so you might have a data set you're doing a I on it your results are not going to be that great but what you can now do in the cloud is you can buy data sets from other data providers you cannot combine these and then you get really great results so the cloud transition is really essential to what's happening here and I think that it's kind of something that yeah bind us all three together and and I you know the cloud is a very critical aspect and and and data is moving to the cloud you know you would say now most of the data is born in the cloud not on-premise and therefore and and the cloud that's you know first a lot of options or possibilities access to animated resources so Ali was saying you know now the your platform is is a full data center is a full you know a cloud region how to to use and and and and and to leverage this this this resources is odd so snowflake is a service that allows to run you know as many workload as you want so it traditionally the the the the challenge was if you wanted to put all your data in a centralized place so so before you can do data driven decision you have to have access to the data that's the first shining show you were talking about lurch you know the challenge is really how to centralize all the data I mean but when you centralize all the data you centralize also all the quays all the questions that you have on that data and that's compute resources so compute resources were becoming the bottleneck so in the cloud you can have independence you know computers you know resources clusters accessing all the same data at the same time but you can scale so you can add workload more workloads you know more use case against the same data so the scaling of the crowd is is what is really transformational the data sharing I mean getting all the data is is it's challenging because sometimes this is not your data you need data which is not yours it was not you know created by you Enterprise so so data sharing is so is a huge aspects the cloud is one single platform where you can you know share data so snowflake allows you to do that for example and and data sharing we see our customers you adopting data sharing more and more they share data between them you know different enterprise so what has been the the adaptation with data among other technologies that correlate to what you provide so I think of databases for example what is the transformation we're seeing in those spaces that relate to this data glut and relate to this kind of transformation of the data itself what are you seeing in across the entire technology stack that represents that well there's probably many things that you know different you know folks could come to mind but one thing that I think is incredibly critical in today is all of the things that Benoit and Ally have been talking about are impossible to do without automation the old you know you kind of have a three kind of step thing that's transformation going on one is just this infrastructure being on demand and and being flexible and being distributed available everywhere and and you know in real time but when people talk a lot about ml and AI but really to be able to do all of those things in real time you really need to automate all of the processes that handle that data and the applications themselves and one of the things that I think you you know we then why and I were talking actually on a different day about about this and and you know it's kind of a crawl walk run work right you know people sometimes when as founders and and particularly you know somebody who's who's kind of selling kool-aid of each of our flavors we tend to kind of talk like about a vision as if it's here now but but you know you customers are kind of making their way to that to that vision and in it you know in today's world I think a lot of what people are trying to do is just get the people and the crud out of the system so people can can some of this stuff can actually happen in real time right and in in our part of the world that's kind of the role that kubernetes has played right one of the reasons why kubernetes is is such an amazing technology is because it automates the handling of applications in a way that was never ever possible before you were talking about stuff that wasn't possible before now you can you can you know you know automate thousands and thousands of containers why because Google was doing it on a project called Borg which was the you know internal version of kubernetes before before they are open sourced it so same way what we're finding is that that's true for data to data comes in all types of forms it's found everywhere we're living in a distributed world you know we we have customers with sensor networks that are distributing with distributed data and all of that needs to be moved and providing data agility requires that it be completely automated so automation is one of the key I think technology's that that enable that so at least so when you're thinking of like how people access that data and kind of these these next generation of data databases for example for example that are that are corollary to your technologies what do you think of what is it that transformations you're seeing gonna in the processes that that the the technical teams are using inside organizations yeah I mean our view is enterprises big enterprises have already lots of data in different places we don't think it's feasible to ask them to move it into a different place or change it or depending on it so our approach has been we'll hook in to whatever data you have wherever you have it we're going to feather it into it our goal is once we hook into it how can we democratize it so that the whole organization can actually starts asking questions from the data now so they can start asking you know predict from this data set before so we actually don't care we're kind of agnostic where your data lies we'll just access it wherever it is in whichever system it is we'll let you hook into it and we also let you mix it so you can have it in you know traditional database you can have it in a modern you know database like in no sequel database we don't care well you can scrape it off the web we let you do all of that that's what spark really enables you to do to it combine and mix and match all the data you have and then we enable the organization to be able to ask questions from it so we have a lot of dashboards and you know reports things that actually make it really easy for people that are novelisation of it yeah people who are not you know super technical they need to be able to get answers out of the data if it's just super technical stuff for people who can program then it won't actually change the world that one yeah so database I think is is a very interesting concept right it has been there for many many years the new and and it has carried there is a lot of technology that has built built around databases like all the tools visualization tool BI tools and so everyone knows how to access data with data base it has been normalized with you know journey sequel so when we we started snowflake and and found a snowflake we want it to to be a new type of database you know keep the interface you know the sequel interface keeps you know transaction transaction is very important how you you change this data you know the having consistency in the change of the I if you make a mistake being able to rollback being able to go back at any point of time so transactional aspects of changes is very important and this is Kerry I mean bring by databases technology but at the same time traditional database they don't scale you know they like if you look at you know technology like Oracle teradata the this this type of technology they are you know single you know cluster so they are born like they don't know how to deal with big data so having a new Modi and you know system was you know what we want it to do is no flex really more data platform that has all the good aspects of database you know transactions you know sequel but at the same time that we scaled and take advantage of the cloud and that's very important I mean yes you don't want to force enterprise to to centralize their data this is true and and this is our to do but at the same time data silos have really hurt companies we see a lot of our customers that have created zillion of silos and they have no you know consistent views of their customers you know their customers their eyes is limited we were fragmented so they cannot make good decisions with with you know data that lies you know everywhere and which are hard to access so having a one single you know data source one single data platform that can scale of course he is you know really important we only we have less than three minutes left and I would like to just quickly go at each of you tell me how what you're talking about is a gateway to machine learning and AI you use machine learning and the reason we use machine learning in your technology too late but this seems to be kind of we're in this kind of fuzzy stage would you say would you cite it as that or do you have another kind of perspective no I think we're basically five years in to a complete new revolution that's happening 60 years we've been trying to do AI but we were doing it the wrong way we were trying to mimic human brains and you know using logic and reasoning we're five years into actually now using massive datasets and using statistical AI and we're just scratching the surface I mean I mentioned the drugs that were developed you know this particular dog was chronic liver disease you know they're amazing use cases and we're just a few years in so fantastic things are gonna happen in the next five ten really absolutely I mean I think you know the the advent of on-demand infrastructure with automation of apps and data is really the precursor now to being able to apply AI to massive amounts of data in real time right we're gonna live in a real time world anybody you know who's not in gonna be having real time systems they're gonna lose the fast or is gonna are gonna eat the slow so I think in and the precursor of all of this is all the precursors are actually having massive amounts of AI being applied in real time yeah so I agree with that I think I think that the fact that you have all these data and you can make decision but the human is very slow so you need you know the Machine to make sense of this data and this is the statistical model you know some time you you don't know what you are looking for and and the the the AI is about finding your correlation between you know things say the haha when they are these difference you know data set it leads to this this outcome and and and finding that is is can be done by machine much more efficiently by than by human being right looking at data and trying to find this correlation is out for human brain so AI is opening that the automatic analysis of your data and finding interesting things that's where notes you know where how to see in that wonderful now just so I let me just wrap this up really quickly we talked about the application layer and how an application architectures are being transformed by container technologies aalia we talked about these these the transformation of data overall and and then why I really take away from you the idea that we're talking about data in the cloud at skin I want to thank you all for participating in this interesting discussion thank you thank you thank you so much thank you [Applause]
Info
Channel: TechCrunch
Views: 2,342
Rating: 4.9130435 out of 5
Keywords: tech, techcrunch, technology, newest technology, hottest technology, brand new tech, gadgets, technology gadgets, hottest gadgets 2019, 2019 tech picks, tech top picks, current news, hacker news, latest technology news, cool gadgets, enterprise, enterprise products, techcrunch enterprise, tcenterprise2019, enterprise19
Id: Xi70_iTY3FY
Channel Id: undefined
Length: 24min 42sec (1482 seconds)
Published: Sat Sep 07 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.