Best Practices for Multiplatform MLOps with Kubeflow and MLflow — MLOps NYC19 Conference Panel

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

so what we we want to talk about today's is around sir machine Emma laughs or frameworks and they're served I think there are sort of three different approaches to that one is that right now I have ammo flow coming from data registers keep flowing and David since your treasurer also representing a cloud service provider and their their platform so what we will not speak about is to understand the differences between the platform and where there is essentially room for collaboration so let's start with everyone can you describe in a few words your platform how did it came about sir we who is the customer who are you trying to target as your first citizen and what are the advantages of of what you've done all right thanks my name is Clemens I am currently at theta Brix leading the product team for data science and machine learning as mentioned I was also before it at Google and worked on tensorflow tf-x and also collaborated on cube flow so data breaks like really embraces open source and as you familiar the creators of data breaks created spark another open source project called Delta and ml flow and I think the impetus for ml flow was really very user driven in terms of Mattia one of the co-founders looked at some of the most pressing user needs in that space right and a muffler was conceived with three components in mind one was tracking so the biggest pain point was actually like data scientists and researchers weren't able to track all of the work that they do for machine learning if you're familiar with like your own workflows in many cases you just use spreadsheets right so that was the biggest pain point the second one was reproducibility actually making your code and the environment that you used to be producible in the future and in the third one was the diversity of the machine learning frameworks from like tensorflow to pythor cycads Park ml live and so on and how you lock those models and then how you deploy them so that was really the idea behind a mal flow and I think the arabic says executed and that beautifully the persona or like the the user this had in mind was first and foremost actually that the data scientists and the researchers using these tools but because data breaks is a unified platform that actually unifies the different personas in this lifecycle there was always a big like consideration to the other personas such as DevOps or the people that actually owned the deployment of these models so the component of a mal float that allows you to reproduce your code more easily and to deploy models is actually most beneficial to these DevOps and product engineers actually have to apply these models did I forget he's part of your question I mean taya Lampkin I do open-source strategy at Google and have been working on coop flow for about a year now and so I came here a little bit late to the game and I'm sure David has a lot more context on what he was thinking about when you first and I'm also focusing more but more in like the open source community and collaboration aspect of coop law as a platform one of the things that really trying to address what the project from that perspective is acknowledging that there's a huge diversity of tools and interfaces and standards that people are working with in the machine learning ecosystem and we're trying to provide away with a solid platform that allows people to leverage best practices and develop standards around how to use them no matter if they're using tensorflow XG boost or PI torch what exactly they're serving looks like and to also leverage some of these like underlying kubernetes standards that have come out of that community as much as possible to make it easy to run machine learning workflows and you know distributed architecture and so the really great thing about coop flow is that we're trying to kind of provide a platform where people can integrate their favorite tools almost immediately without having to sacrifice you know some of the the standards that they'd been working within their framework I've been till that point and so David I think we'll be able to cover some more of the technical ideas behind it yeah so I'm David Ronn tech I co-founded Q flow and you know just extending exactly what Anna did but but also interestingly extending what Clements did because Clements before he went to data bricks was the product manager for tf-x and i come from the kubernetes background I was the first non founding product manager on kubernetes and we saw back way back to the company's days that that creating a container was very easy right you just do docker build docker run you're off to the races but taking it even one step further how do you run it in production in any way was actually quite hard how do you distribute it how do you restart it you know what sort of policies do you put in place what we saw you know in 2017 when we got this thing kicked off was very similar experience with machine learning so getting tensorflow running locally on your machine quite easy you know it's just a Python script and you can use the SDK and you're off to the races doing anything more complicated in that was actually quite hard and we had come from this distributed systems background where kubernetes was you know somewhat universally used as a distributed architecture and and literally the first thing we launched was a very simple CRD for getting tensorflow up and running and then connecting to that from a jupiter notebook now tube flow has done much much more than that in in the interim but that's very much where I see a great collaboration and complement between things like ml flow which provide a data scientist a wonderful getting started experience and connecting to the distributed systems on the backend for how you might run and maintain this over time and those two together I think really are two halves of the same coin you know as you saw in Jeremy's demo earlier you know there is a very clear play in cue flow where you click a button then you're presented with the same experience that I did a scientist might experience on their local laptop and you know I'd love to work with you know just like ml flow a great platform for giving that data scientists a wonderful getting started experience but still giving them all the power of that distributed systems behind the scenes do you think the fact that cue flow essentially came from kubernetes and also tensorflow by you know the name starts with Q flow they did that impact design decisions or sort of perspective we're sort of the Rick's got two different perspective because they come from potenti data science or SPARC architecture or yeah I'll be the first to admit we absolutely you know if the cue flow is still 0.7 almost my goal is to get to a place where a data scientist can remove the letter K from their keyboard and still use cube flow they should shouldn't even understand that it's running on kubernetes no cube cuddles know qkf control nothing absolutely nothing they should be presented with a data science experience you know very similarly by the way we were inspired by Netflix with meta flow to be clear 10 Q flow has was named literally by Clemens I wanted to call it tensor Nettie's he called it not tensor cube he called it cube flow and I'm like that's exactly right because you know remember what tensor flow is right it's flowing of tensors it's that's why all these machine learning things have flow in the name it is not tensor flow specific you know on day zero we had an MX net operator we had a psychic operator I believe and a paddle paddle operator if I remember correctly either we have those now we certainly have them now I don't remember exactly what we had then but it wasn't just tensor flow but absolutely there's there's no question cube flow came at it from the hey let's get this up and running on a kubernetes deployment that's not ideal our goal is to get to zero kubernetes for data scientist it does seem to me like as you as you mentioned there is some complimentary as because qo is coming from here containers let's orchestrate them their data breaks game from users need some abstraction need some tracking today it's not really sort of possible I don't see any implementation of ml flow on kubernetes and integrating with Q plot you see that is something going forward those things coexist yes I think we take a very pragmatic approach as well through the open source community and because there is some overlap you know Jeremy showed metadata management sir yes I think the if you drew the block diagram of the different components of Q flow and ml flow I think there's certainly more area that's non overlapping then it is overlapping right but to your point there are some overlapping pieces so we take a very pragmatic approach to the open source community in the users we have community contributions to a mouthful that make it easier to actually deploy mail flow on kubernetes itself and we also provide ways to package up models that are locked with a mouthful and deploy them as kubernetes pods now I think the the question of is there a opportunity for tight integration between a mouth flow and cube flow I think the answer is definitely yes the question is just how both like where do these contributions come from and who are the users that actually benefit and the who are who are asking for these types of integrations right because I think we we see a lot of our customers and users who don't necessarily touch the kubernetes layer right because to your first question about personas in most companies the people who care about the cluster management and the kubernetes deployments are not the ones that sit in a data science department and like train machine learning models right so I think if if if data if we find the right abstraction of to David's point there's like at some point like you shouldn't have to care about communities when you use cube flow I think then the personas and the use cases also start overlapping more you're bidding me to my next presentation but that's anyway you also beat me to my next question which who invented cert the name cube flow but Pixar took the you took this thing out of the question but never mind so but you know one of the things that each one of you represent is also a managed platform and also David Frum served Asia and we do see you know I'm sort of a code geek I I know both code bases pretty well I've been sort of committed patches myself and you see a lot of serve opinionated views and or platform specific views you know so essentially I want flow is pretty nice when you download open source but you want to really use it as a as a manager offering you go to a company called data breaks okay it is pretty nicely layered etcetera to allow for more vendors but still it's it's a part of a managed platform if you want it with security and and all that you probably go to their break so the same for I think you plot there is a lot of opinionated views around Google there's a lot of areas in the code that say GCP you know and stack and do you see that if someone saw an enterprise customer now wants to deploy those things at scale and have issues around governance an authentication all that does it really Kenny work with open source or does he really have to go and go for the sort of man is offering the first to admit that I am as an open-source strategist not the best versed and our GCP offerings which is probably something that I should get more well-versed in I don't think that I think that you know just like any open-source tool like kubernetes excel itself you're always gonna you know in the enterprise need a certain extra level of support around how exactly you're implementing and continuing to run those at scale that could be support like on your own team if you have a giant team that it wants to you know manage that for you or maybe you want to leverage like the particular cooling that some cloud providers offer like I so here's my my little spiel is that you know having coop low running really well on on prom on Gigi for example makes it really easy to that and deploy it to dat and then use when you want to and some of the Google Claudia features like you know see Emily and I am and stuff like that so that's like you know at a certain point you want might want to pull in some of those strings of the cloud providers and in doing some of the non ml support things that you you all need to do when you're getting really serious about using this at scale but I think that like what we're training to you it cou flow from an open source perspective isn't really be opinionated about any particular cloud but instead make sure that it runs well on all the largest cloud providers because ultimately we want coop low to become like a noob iqua tiss and machine learning framework that runs in every data center so yeah I think echoing the first part or the Tia said I think any open source tool are offering will never be Enterprise ready right out of the box right like no open source like you don't like it clone something and then it comes with enterprise level authentication and like governance and auditing right so I think we take a pretty open approach to this of course data breaks provides one managed version of FML flow that's fully integrated in the platform that provides all of these enterprise level features such as authentication and security auditing database is deployed on multiple cloud so it does work on AWS and Ezra right now but there's other people who provide managed versions of ml flow as effectively is like one of those companies that takes a mouthful and provides a managed offering of this Microsoft actually got behind the mal flow and is now using the mo flow API on Azure ml so there's like another place where this is actually provided so we think that the like standardizing on the PI's and standardizing on the workflows is more important such that you can actually run your workloads on a local machine and use the MLA API is and then move that workload to a cloud and the same API as will then log to the manager version of this right which gives you the portability so I think the like same I guess like same intention with queue flow the ubiquity of these features and APRs makes the workloads more portable yeah I you know just following on both points you know and not just you know to stress you know ml flow as your devs were the ones who contributed you know upstream and that that's great like we we love that that's what open source is all about as your also contributed the upstream components to getting cue flow to like be able to call out an AWS contributed the cue flow components to to getting into running and sage maker so I think the the most direct answer to your question is all abstractions are leaky abstractions every platform will require some form of customization in order to work with a hosted or managed solution the people who build it tend to know their platforms first and best and they're gonna try and make sure it runs great with them if it doesn't work great with a platform it's never easier and I hope to speak to ml flow I think I am but the amount flow and keep flow if it doesn't work with a platform it's not because those people are like well you know there are enemies no it is because literally the people who have the time to work on it either don't know about it or or don't you know have other priorities that that they're aware of first and both these communities would love if if you have a specific user requirement and can detail it please submit an issue or a request and and ideally some code and you know I'm sure they would accept it I've been smiling because I have a funny story about a contributor who came to us I think at a doc summit or maybe cupola summit and said and I noticed that you don't have any AWS Doc's on your Google or cupola or website is that because you won't accept my contributions if I if I contribute Doc's your way and like why are there only GCP Doc's and my answer was because you haven't contributed the AWS Doc's yet so I think there's yeah I there's strength in having each machine learning framework that's open source be portable I think that's probably the goal from both of our perspectives and and I think it's just about kind of what use case or what standards will emerge for the use cases that are stickiest with this community they did and now we have aw socks so I think it's missing the aspect so like clustering and scale out and others that sort of cupola is bringing and you know cool boys yeah very nice but sort of missing the so the question is what is the governance model of your project that allows me if I'm coming now with an opinion and by the way we created our own serve derivative which is essentially the hybrid of the best of both and it's not that we want to maintain our own it's just that each one was served we were missing this or data science perspective in the cube flow and we were missing this scale from the helm flow so what's the governance model that if today I want to come and say you know what I want to change all that - are you going to accept that or is equals and either-or is each one of you you know even in qou you have to sign this sort of google thing before you contribute so can each one talk to those two the governance model and how your service enchi you know I think that every open-source project it wants to really become very dominant have to spend time and energy on the openness thing it's not just say lip service it means that you have to actually invest engineering resource develop interfaces abstraction the sûreté to be able to be more accommodating and governance models yes so for ml fo specifically the governance model right now is it's basically still on by the airbrakes we haven't donated it to a foundation yet but like we're not excluding that option like as we speak because of as you know like SPARC is like owned by a foundation and data bricks is really just like a committer at this point but I think we we have like a strong investment in the open source community we have we're advocates we run meetups there's a lot of contributions that come into ml flow so I think last I checked on on github they were like 140 contributors on email flow and only a very small fraction of that is actually from dinner breaks and we get significant contributions like one like as was mentioned is from Azure like they they contributed the onyx flavor we actually call a a contribution to remotely execute ml for project on kubernetes clusters which is now merged so that was a contribution as well so we are extremely open to these contributions we just asked in most cases if it's a significant contribution to start like talking about designs and like design choices early enough because often in the open-source community you see if someone works on something in isolation and then just in the end submits a big PR often that may be in conflict with with the roadmap but we were extremely open to these contributions and to your point I think any open source any successful open source project is only successful if there's more contributors than the original people that created it because that's the strength of architecture that makes it something designed for more you know local computation to serve classical computation that does require sometimes fundamental change in the architecture yep and you cannot just say okay let's add a parameter to one of them yep yeah I think we're I would say we're definitely open to like discussing what those changes would be depending on how significant they are they would probably like happen with a major version change right because we do semantic versioning but that's something where our like engineering leadership would like most likely engage and discuss some of those options good so so let's talk a minute you know so there are a bunch of cloud services and there have been widely used you know someone going to Amazon go public oh Sh maker and a bunch of other data related products someone going to a juror Tuesdays or m/l and those platformer and even with with Google's our different flavors of melanin AI products are dumbing down the level of a you know abstraction so people can just like throw a bunch of images get back a model so where do you see the differences or advantage of Laurie of using managed cloud service each one of you have a managed cloud services as well the verse is using those frameworks that sometimes sometimes even overlap to save functionality well you know I mean I I think that every cloud is going to have you know an entire portfolio of services that they're gonna want to you know offer to their customers and and many times it will be kind of a monolithic whole that that works you know as a single thing but I think more often than not what folks are looking for is to pick and choose without question the majority of data science that's going on today is going on on-premises somewhere right the data is being collected and stored you know it just and and that that's just a microcosm of of the majority of workloads period are still not on the cloud right 1.7 trillion dollars in IT spend a year and and all the cloud providers added up together or you know make well well south of a hundred billion dollars so majority of that spend is still not in the cloud and the question is how does it migrate over at Azure we certainly took a very SDK first approach so you know we're trying very hard to let you pick and choose oh you just want to use data drift or you just want to use model profiling or something like that it's a single API call you can call out and call back and that because we have that flexibility you can deploy a full queue flow pipeline on Prem and call out to our service and just used for example manage inference or model profiling and then call right back in and the exact same thing with mo flow right at the end of your ml flow you're using mo flow and at the end you just want to use Azure machine learning for inference great it's a single API call and you're able to do that and so the other way like using a pipeline tool in NHRA I and calling absolutely I didn't have a chance I didn't have enough time to in my demo today but I have a github repo that specifically does that so using Azure it calls out to cube flow in this case and it could easily call on-prem or spark hosted MLL flow execute there and then then comes back so I think that that is the majority of the way you will see a lot of these workflows happening because there are things around gravity whether or not its compliance gravity because of I need to be in Europe that that's where gdpr is or my data is hosted in this data center where you'll have a lot of these cross environment kind of like workflows that need centralization I think I'll answer this on a like higher altitude maybe maybe I'm avoiding the question but I think there's a I don't like avoiding well then you can I think the so I've worked at Google a couple of years in infrastructure right and I've seen what's happened in open source and also in the managed ml offerings on all of the clouds and I think we're at a stage in the lifecycle of of these tools where I don't think we've we've come up with the right levels of abstractions the right the glares of technology is the right form factors of what should a data science and machine learning platform look like right there's like opinionated versions of this like on all of the cloud providers but I think we're still in the in the expansion timeframe were like there's like more and more startups that are like trying their their approach and at some point there's gonna happen like this like consolidation where we actually land on something that's gonna be the best way to go forward and like one of the examples that I like to use is I remember when like 15 years ago or longer when I was looking into CRM tools they were like all kinds of like open source CRM tools I didn't like any of them so I just a well developed my own was a bunch of people and I think right now like anyone would say that like Salesforce has has like gotten to like the best form factor or at least it's the most common CRM solution out there right and I think what Salesforce is to CRM like I want to see what that is for machine learning and data science platforms right like I have an opinion on this and have a conviction and also like I took it a biased opinion by joining the Olympics but I think that's that's really yet to be figured out and I think any platform where you that like narrowly at just adopts one opinion is probably gonna be a risky choice then if you when if you use a set of tools that are more open there are more portable that run on multiple clouds where there's like still option left in terms of like hey where do you take your workloads or like we're suggesting not to use sage maker or excuse me so you're suggesting not to use sage maker no I'm not suggesting that at all actually sage maker a lot of our customers usage maker so the data bricks customers who are deployed on AWS often use sage maker for deploying their models right but I'm just saying that all of those also have some workloads where they train on Prem where they train on a laptop where they train on data breaks right like there's like still so much diversity that there is not a solution yet where you can consolidate all of your machine learning into the science for the last question which was the only question that I wanted to answer no I think that well I I'm actually kind of more interested personally in figuring out at what point like you're talking about convergence and is it going to be a technology conversions or is it going to be a convergence of the way that that data science is done within organizations because I think the like the usefulness of a tool like coop flow you know were much more right now at least providing an experience for data engineers and like ml ops and you're providing a really great experience for you know the the data scientist and I'm wondering at what point what tool is landed on it is going to be about the evolution of the rules that we're seeing within organizations like well the data science to learn more Cabrini's principles because they're expected to produce models that are performant and production environments or will data engineers like above all themselves and use these you know frameworks to provide like really self-serve a experiences for machine learning professionals so but essentially what is saying that always the abstraction goes up and never goes down right sorry if it answers your question I think it means that sort of obstruction will go up and serve data scientists will win I think it's just said like just an absolutely salient point which is I think we've really abandoned data scientists in a really bad place which is this is just software engineering it's software development and we've given them an entirely set of new tools an entire set of new software practices and and kind of like left them on an island without you know the basic frameworks that you know software development that make software developers so productive today and and I love the idea that that I just said which is how do we help them maybe not transform bringing their knowledge in but allow them to do the things that software developers do today that doesn't mean they need to go understand you know libraries and SDKs and and you know traveling salesmen algorithms but it does mean giving them the tools you know standard limpy linting compilation frameworks you know a higher level I don't know about object-oriented design but you get the idea like well give them higher level tools that let them build things without having to you know guess whether or not this cell and a jupiter notebook is executed twice or not sure so so maybe to another question so I again we came back and back to the same point that you serve each one had a different design perspective so let's assume we want to make now peace and converge those things first are you guys open to essentially working together or you know there are also other challenges are bigger fish to fry you know david has his own personal agenda around a mellow spec which is which I admire by the way since he's trying to create I am conniving which is trying to also standardize the way you you store artifacts and you lose you deliver a common model so how do we essentially toko take all those great opinions okay even us when we talk to customers sometimes I say give me a mo flow on Q flow and say okay that doesn't really work but so we but we do understand that our different personas maybe someone worked with spark and it every sort of a fan of mo flow another guy that comes from ops and comes from you know tense flow or deep learning and Q flow I'm in love with this ring so how do we take all those efforts of ml's back mo flow to flow and we work together as a community to yeah you know we can wait for the dust to settle which is one one way of doing it or we could be a little more proactive and and trying each one of the groups including us and others since you work together towards the goals or at least making things closer so when I'm as a developer I write code I don't have to say for which platform I could say oh it's pretty similar you know what I'll put some wrapper around it and so if I can say it's on you and all of you out there you know I'm working harder I know but but you said it and and Clemens said it it is these want all these problems want to be customer driven right we all would love to work together there's nothing stopping us really we promise except clear customer use cases so come say data bricks you can sell a customer if you do the following you know Google a sure you can sell a customer if you do the following bring these together clean lines and we need the help of the crowd beat you guys up absolutely you guys need to collaborate more that's her scenario whatever it is and and we think we know but we're not going to design in a vacuum right again the reason we did this each of us was because we all saw clear customer needs around these specific things and we understood those customer needs in order to develop a solution and once we were done walk out and say hey did you solve it yes no okay we can make tweaks we can't design for the vacuum yeah I mean I think that's kind of one of the points of about cupola is it's very easy to try to boil the ocean you know and solve like everything for every single a framework one of the ways we develop a new Pleau is by defining like a couple of course sets of customer user journeys that kind of try to define how the user experience should look like and then we work back from that often and one of the things that I think we benefit hugely from is getting more feedback on those user experiences more people kind of coming to us with requirements about like what needs to be able to plug when in order for them to be able to get any use out of the tool so I think I just had somebody come up to me at the cupola booth downstairs on chatting about how you know airflow could potentially fit into an attendee like use case with coop flow and that was the first time that I'd actually heard someone mention that and the first thing that I wanted to do is like please write down here use case email it to me and well you know we'll start to collect these but it's really easy as David mentioned to do nothing well if you're trying to design in a vacuum and just you know making sure that every potential use case is seated okay yeah so I think David and Tia said it well and I just want to make like one one final comment which is you started your question with what if we make peace that implies that there is no peace like I think there is definitely peace like we all know each other we're all good friends and we communicate so I think that's that's great and then the second observation that I just wanted to you don't assume peace and then the second thing that I just wanted to mention also as a side note but actually going to do is I think I'm gonna register like disambiguating flow calm because actually I was a strata I was at strata earlier today and like someone asked me what is the difference between tensor flow airflow and Q flow so I think the all of the flows need to be disambiguated and I also think that like to the point of like the overlap there's actually not that much overlap between all of these projects but it needs to be clear like what each one of those are trying to achieve okay so final words can each one sort of tell us what's what to expect serve yeah so find a question for you guys but roadmap what's coming up next you know the next few months great for each framework yes a 4ml flow as mentioned earlier we're actually getting a lot of contributions from the community so I think in terms of kubernetes support I do think we we are engaging a lot with people who had support for like running ml for projects on kubernetes deploying models in containers on kubernetes clusters I think that's definitely something that we're following the road on and as mentioned that's mostly community driven in terms of the core components there's gonna be like a bigger announcement in a couple of weeks in Amsterdam so like anyone who finds themselves in Amsterdam on October 15th stop by at the Sparky I summit we've mentioned this in the past that we're working on a component called model registry that's gonna extend the support for the ml ops or like the the ML lifecycle from locking models to managing the deployment lifecycle all the way to the deployment and then like bringing it back and that's like a big area of investment for us and we're gonna announce more in October yeah and so we're coming up soon on our 0.7 release for two hello I think anyone who saw Josh's talk today probably is up-to-date on the existing feature set we added a lot with 0.6 artifact tracking and and sub pipelines updates but 0.7 is essentially going to be you know our beta for considering a 1.0 version and the whole the whole thought is let's make sure that the API is for our core components are stable that we have really done the final polishing on some of the enterprise support features and making sure that the on-prem use case is really solid and support for multi-user on finishing up with some of our case on it - oh gosh what's the other one called things work and also finalizing the SDO integrations so not not the most exciting thing but the exciting thing is that we're getting ready to to call coop flow and some of the core components with the communities let us know our solid 1.0 earlier next year nothing at just try out queue flow 0.7 and let us know if it's ready yeah please try it sure so let's now open I open it up for the crowd anyone wants to ask them some hard questions and anyone yep not sure I understood what serve the question with you guys so I think the question was these are both very complete solutions and you're looking for just a point solution that that solves an individual component it did I roughly get that right okay so first let me be a advocate for ml flow like ml flow is one command and you're running on your local laptop so it is a wonderful getting started experience there's no question as much as I would love to say q flow is there it's not there you have to install a local version of kubernetes which is tough and then you you subscribe to a whole bunch of components totally agree to take our minutes plus right get it that said many folks are using core components that are built for cube flow as point solutions so for example one of the most common things I think we have more downloads of the tensorflow CRD from cube flow then we do for the cue flow hole platform right because it is a very well tested very well understood distributed platform for running tensor flow and I think you can pick and choose these kind of things very easily from both of these platforms it does require a little bit of understanding of the platform and configuration in order to do it properly where you're not just you know selecting things and then it just breaks but part of what you're subscribing to with the whole platform is something that has been intent tested for both of these platforms and you know though it feels like a lot I would probably bias towards downloading the whole thing and using it as is using just the components you want rather than selecting one component of it and and going to town because these are two teams that are working incredibly hard to make sure that end in experience is good and if you just peel out one I think you might miss out on on some of the goodness cute blow on Iguazu platform there is a managed cube flow as part of it and we don't just take everything because it's again too hard there are also issues like if you are in let's say an enterprise or are you building an enterprise solution then you already have an API gateway with authentication authorization you cannot just go and take the cube flow thing as is or if you for example you want to provide manage notebooks or Jupiter then maybe you have all sorts of ways of customizing images etcetera which are not squared so I say what we're doing in our platform where we don't take the whole thing we just like break it to pieces we commercialize it each one independently and we had sort of the other glue and when people sort of asked about KF casual I said that's nice for us or beginners if you really want to adopt it as a platform for your organization break it down to e amell's and or customize now any other questions yep so you have again you have an HPC cluster without reviews which abuse okay and you cannot install kubernetes on the served other cluster with GPUs you cannot do that okay any ideas I mean I think the there's like that they need to be well-defined contracts across a pipeline right and I think the deployment case is like pretty well defined in in most cases right you can train your machine learning model on one platform that is that has GPUs enabled serialized that model in a format in whatever format is that is native to the framework right so let's say you use tensorflow you can lock that you can like serialize it model intensive or safe model format and then the deployment of it can happen completely independently even if you don't have cheap use you can build it in a docker container put it on kubernetes so I think that contract between training something and then you pass off that artifacts to be deployed is well-defined in most cases I think there's other like more esoteric like hybrid configurations that you can think of but this training deployment to me is is pretty pretty straightforward and at least from the ML flow perspective itself as mentioned earlier we have use use cases where you have your ml flow tracking server running in one place and like data picks provides one managed version of this and you can run your workloads locally and lock to that server you can run your workloads locally and on cloud and like on Prem deployments and lock everything to a single place so all of those hybrid configurations actually exist let me take this opportunity to say just extending on on Clements contract comment we have a very very nascent project underway right now called ml spec which is getting a lot of people working to establish some of these standard contracts one of the first most successful ones right now is a project called KF serving it's got KF in the name but we're trying to change that to ml serving or some neutral thing because it's not cube flow specific but it is it isn't cute I have an issue please vote on it please vote for example right we use a lot for MPI job yeah it's part of a queue flow project but we meet in many cases people just die exactly running but the idea is to establish like what would it look like if as an industry we established contracts between all of these various steps Google's published many many papers on all the steps of an ml platform certainly what I stole my slides from and and and we would love to establish that as an industry collaborate together and say hey you know what for serving here's a contract for you for model packaging here's a contract for you for logging here's a contract for you and establish those things and that now allows inner communication between hybrid style environments okay for another

Info

Channel: Iguazio

Views: 1,206

Rating: 5 out of 5

Keywords: mlops, iguazio, kubeflow, mlflow

Id: TJx0d-pHyiM

Channel Id: undefined

Length: 46min 22sec (2782 seconds)

Published: Wed Oct 16 2019