H2O Live: Democratizing AI with AI Apps Recording

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] so [Music] now [Music] [Music] [Music] [Music] [Music] [Music] [Music] so [Music] [Music] [Music] [Music] [Applause] [Music] h2o ai hybrid cloud is an end-to-end platform that enables organizations to rapidly make world-class ai models and applications for virtually any use case building ai models used to take large teams of expert data scientists months or even years with h2oai hybrid cloud an individual data engineer developer or data scientist can make world-class ai models and applications many times faster rapidly putting ai into the hands of business users [Music] h2o ai hybrid cloud brings automation capabilities across the entire data science life cycle including connecting to and preparing data building and explaining models and deploying and operating them the platform also makes it easy to publish and share ai applications across an entire organization h2oai hybrid cloud runs on kubernetes allowing customers to operate on any cloud or on-premise infrastructure if you're looking for a solution that enables you to put ai into the hands of every employee get started with h2o ai hybrid cloud one platform thousands of use cases [Music] welcome to h2o live today's um sent several great speakers today the center of our attention is our ai app store which is the work of tireless creative work of several of my esteemed colleagues community and incredible co-creation from our customers and um community as well i'm showing here the power of ai is in its usage not just in its incredible algorithms and ability to learn from data one of the application you're seeing here is made with wave or open source low code framework which has been a prolific way to build applications libor as you many of you know is going away so one of our one of the leading one of our customers and leading investment bank in the world ended up collaborating with us to figure out how many real entities are there in the document and picking out um essentially what is the work related needed to go and replace these pieces in entities in in the documents with the right kind of um alternatives now of course it's an nlp problem as many of us know and it's a problem that um that could take years to to master or lots of humans um to apply but co-creating it gets us that ability to reduce the problems that otherwise would have taken years to solve here's another incredible piece that comes to look at the entire factory floor and understand why a particular engine is malfunctioning for example and then go understand the top five top five factors that matter in that driverless ai or auto ml engine is behind the scenes predicting that it's likely to fail but that's sensor 11 that's showing up as one of the top factors and sensor four the idea is we want to simplify and explain ai in ways our users can consume so um to summarize ai apps bring life to ai in ways that one can truly understand and and experience let me um walk you through why why should companies care if your existing business today is is driving 100 of your revenue today the the true asset of a company is its brand and its community and its data and if you can use ai to transform our companies to data companies that's where the value is by 2030 a lot of almost all our companies will be ai companies and it is our vision to make you all ai companies and make you have that trillion dollar app stores that are powerful powering today's largest companies and that's where the vision to democratize ai make this your movement not just our movement we are an ai software company we partner with our customers deeply to make them ai first make them transform their data and their incredible um assets uh their people to be pioneers in their applied ai and this started years ago for us ml was the beginning um was the ml the assembly language of ai and h2o's core community of mathematicians physicists data scientists and kaggle grand masters enabled us to pick the best machine learning methods and scale them to billions of rows not just millions and and millions of features um and try to make what would take days if not hours into seconds and milliseconds and that ml engine h203 still powers 20 000 companies an incredibly growing community vibrant community of ai and machine learning practitioners and data scientists are python and scala and every flavor of of um machine learning frameworks uh have embraced our um our innovation and today we you will find us embedded in one of the more on almost all payment systems out there to to um telcos as well as insurance and healthcare automl is the second generation the leap that we observed our community taking forward and the leap that we observed because of our open source nature allowed us to truly then get to the next phase and that's where um ability to fine tune the best of breed method match it with the right recipe for the right problem at hand and continuously learn from data as it's changing that's you have you have experienced this through our platform h2o auto ml and driverless ai and how do we take from there uh the leap to to operationalizing the um apply applied ai how do we bring this into the hands of end users that is what's driving most of our customers um that is what's driving um most of our most of the transformation at our customers and i think ai middleware is here and that's what we are calling h2o ai hybrid cloud this entire innovation we're allowing you to create applications and transform and innovate inside and x outside to build your own app stores and build and and participate in the marketplace you're building uh and monetize your data and make it and make it really trans bring transformation inside your organization and inside your ecosystem that's kind of the vision we are building and i think we are super excited to have uh have you the community and the incredible power of our strong customers to co-create in this vision so it starts with data obviously making models creating ai is at the heart of this automl feature engineering and then powering the feature stores that need to be that need to store the features model validation which is at the heart of safety in ai explainability how do we bring stronger transparency to models going into production how do we bring end to end operationalization of these models but think of it as monitoring and managing the models continuously and the life cycle of them as the data is changing as the and retrain them to go back to production again app development obviously is where a lot of creativity comes in this is where the data scientists the developers and the devops of ml comes to single place it is not nothing without having incredible design the ai ai is thrives on feedback all ai applications thrive on feedback and user experience or ai will be reinvented to make that even more subliminal to the experience that users can can have so empathy for the customers the users will lead to some powerful applications and our vision is that if we get model validation and more and the right data assets to to play we are able to bring some incredible power to the space the freedom to innovate the ability to democratize the ability to bring that collaboration within your company and within the ecosystem of your of your merchants or your customers or your suppliers and your vendor ecosystem and your partner ecosystem that's where the whole vision of ai apps and app stores comes together and h2o is pioneering that app stores and allowing you to build your own app stores but also participate in ours this will lead to true business expansion across the entire generation of economical transformation that's ahead for us and we think that business expansion within your company growth rate is going to be directly limited directly limited by how you're using your data and unlocking that potential for data to decisions and making that a a very um ai first organization is going to be at the heart of the cultural transformation ahead for most of us including our ecosystem suffice to say today we have a few great customers on stage who are going to talk about their experience um aegon and insurance wells fargo and finance and bs imagine health it's suffice to say starting for the from the customer from the end user how they are transforming and they're aiming to transform their user experience their customer experience is going to be at the heart of how h2o is gonna evolve its app stores but the app stores are coming from our incredible um low code framework wave and powered by the ai engines that we have historically been known for so and of course everything starts with domain and data and here's where we have some incredible partners on stage later today who are powering the future of data whether through data engineering like snowflake or um data through likes of equifax experian and in future and others the way you look at this is data to actual consumption of the data to decisions to insights to ability to transform your business is going to be an end-to-end um a holistic view of how ai is going to become immersed in every single application in your ecosystem at the heart of the middleware of course is explainability and ability to bring that kind of transparency and ability to validate your models and keep them agile as data is changing we think that this is going to be how ai is going to be rolled out in every organization and as well as the rest of the ecosystem we're super excited to unveil the ai app store and and ask you to be participants in this is there value for ai one of our customers recently demonstrated that billions of dollars can be saved if you use ml and automl and more importantly the features that come from them to power the transformation ahead and and we expect uh the app stores built by our customer at t are going to reach their customers as well as our customers in the future financial services uh has historically been the leader in transforming themselves with ai and subprime credit scoring or personalized recommendations fraud prevention aml we expect to build financial crime app stores with some of our co-creators and largest banks in the world and take them to other customers and the leaders here are going to essentially be participants and build a rich culture of their own ai applications all our businesses are moving from asset heavy to asset lite and whatever was core to our business is going to become transformed by the software and the ai movement ahead we expect our customers to take and build app stores for themselves but also app stores for their customers and then build a rich ecosystem of suppliers in in closing the the power of of ai hybrid cloud is allowing you to go cloud neutral across every cloud or on-prem build rich applications whether in finance or telco or health or insurance and bring them to the marketplace and transform and monetize from your data we expect the power of maker culture to not just be within h2o but within our customers and community and then that maker culture transforms our customers to go and power the future of ai and ai apps it's make operate innovate monetize and transform our vision is to transform the universe with ai h2o has a very iconic charter and we are super excited we are still very early in that journey and most of this transformation comes through immense customer obsession that we have but also the empathy for our customers and their journey in ai our app stores are actually here as a as a way to simplify to reuse create reusable recipes and componentry that allows people to build a very rich tapestry of applications and user experiences let me show a few of them we started with the libor application earlier one of the uh one of the powerful applications the team has been working on and co-creating with one of our one of the customers who'll be speaking later today is around um transforming um the the ability to truly build explain expandability for for deep black box models some of these models are obviously built um for here's a nlp um it's a yelp review sentiment analysis it's a complete black box model historically very difficult to understand so what what a goose and team we'll be talking about this a little bit in more detail build was a way to transform these sophisticated neural network model to simple locally linear um models and in the locally linear you can absolutely pull out the importance by filters of course historically these filters are not broken down by sections of sentences and words and engrams what you're able to pull out for the first time almost is that ability to actually go down to the cluster of words that have that meaning that led to a bad review people are usually understand and want to understand how their customer experience was and how to go and change it and i think here you're able to piece together a story from the nlp black box models and essentially triangulate and apply it for for your business one of the most popular stories of course is the boston um data set again another this is an app that was co-created with it was created with wave our team and um wells fargo team worked together to build what was actually a very uh generalized paper that's essentially now accessible as an application and the movement it becomes an application obviously you can truly see uh and get that reaction from the end users and how to kind of look at global performance versus local performance another interesting application that the team has built and we totally the call to action today is to go to cloud.h2 um you're going to see some applications on from in future later today a powerful one is essentially of course subprime credit scoring which is a big use case for us i'm going to try and walk you through how driverless ai uh your favorite application uh automl engine uh can be used in to produce a richer explainability for your machine learning um instead of focusing on just auc scores people are able to up level and talk about credit scores and that's kind of where we want to take the conversation to towards the business value of ai and not just the deep complexity um on the methodology of ai and of course most of this is running on the h2o ai cloud all of it and you're able to kind of look at the problem we're solving in this demo is actually how do you use graph data as a means to understand cell phones and phone conversations it means to understand credit and the model was built using driverless ai and you're able to kind of understand get the report on the model how it was built look at good credit bad credit and also fairness understand gender disparity and explainability as well to understand the reason codes behind why um and again a very powerful um demo uh for for um uh showing the end to end one of the things customers are usually curious about is um i have after deploying their models they want to understand if the model is actually is doing is performing very well so let me just um and the validation of the models um just go to the store here end-to-end validation of the models is going to be a very powerful um need and what happens with um models that are deployed is that they end up needing um care as the data drifts you want to be able to rebuild the model on the fly when you click this run you're actually launching a kubernetes instance a pod in kubernetes a container that's essentially almost auto scale because you can essentially launch as demand comes and you're able to truly build that ability to of course all all the various kubernetes on-prem and in the cloud are supported but i think you're able to show that end-to-end um in the models um as well as ability to launch at will and create that rich user experience that is needed to to transform i'm gonna try and um show one um one more piece let me just go if you can pick up a validation demo here the idea here is as drift happens you want your models to be rebuilt and managed and essentially build adversarial testing back testing for your models and truly bring that kind of ability to to kind of manage your models allow the system manage the models for you model validation is very different from qa monitoring your models which of course mojos are a very very powerful way to do that allows you to kind of drive that same level of adoption across your enterprise um adversarial similarity you want to understand if there is something in your in your data that is different um and you want to build a lot of drift um across your system the other uh another incredible um power that we can showcase is the integration between all the frameworks that are there ability to build these apps with your open source or closed source frameworks but ability to kind of um go beyond um the traditional ways of understanding um models to to truly getting that explainability for your models here's a a shaft lease as um you would probably get it from our driverless automl framework and again built by one of our data scientists the the idea is that ability to power these buildings apps is very very easy it's not necessarily difficult to build these apps most of the apps need explainability and ability to drive for example time series explainability another very powerful how do you drive uh how do you under how does one understand um how does one understand um models um that are time series um both at a global level how to understand shapley better how to kind of get that the the transform features versus um versus the actual features uh that you're trying to build um this is x you're looking at exponential weighted moving average a transformation that is powering a lot of the variables that are important in your data this is again trying to do beyond sharply what do we do beyond explainability uh the state of the art explainability and in closing i'm going to show how a data scientist can essentially take a simple notebook and build applications this wave is a incredible framework that's available for our audience but the app store is as you can see powered by it is limited by the innovation you can bring and this is just um kind of the goes on to show the incredible prolificity with which we expect this to happen there's a couple of hundred applications already in our app store and we are inviting all our audience and customers and community to build uh and co-create with us and go to market with us and um as part of the rest of the day just for the couple of uh rest of the hour you're going to see a lot more of these applications that are going to help um give that kind of the same level of excitement we are seeing uh in in our um in our um uh experience with the models um thank you so much for um joining us at this i want to give um uh give way to some incredible speakers that are being after me and um we're super excited the entire power of h2o um h2o team is behind this innovation but we are we would be um it would be not wise of us to to not credit the incredible community a lot of our apps and applications have been co-created with our customers and nothing short of um part of being our team is our customer team and on the forefront of that is agusta gianto who's um heads model risk management at one of the largest banks in the world wells fargo and more than that he's um is incredibly um prolific uh publisher of ideas uh both on hiv and otherwise and uh a phenomenal what what i call street fighter with data and data science and uh he has an incredible team that powers uh his experiments um and very very uh highly collaborative i would i would i often call him the lord of all models at um well at wells fargo and most of the banking industry so without further ado let me invite augusto thank you so much thank you sri thank you for the very very kind introduction i am going to share my slide and uh really uh talking about a certain subject that yeah uh we are we are embarking on on our transformation we call it as validation on demand and now they speak about this a little bit and how is this connected with the journey that that that's three talk about uh in financial institution when we build mod we we use model very very professively across our business be it for financial model uh traditional financial model as well as non-financial models and when when we use model we we we're not only thinking about the performance of the model but we also thinking very very carefully very thoughtfully about model safety because model will fail and when model failed they can create harms or unintended consequence to the user and also to our customer so if it's financial model it will create fire it potentially create financial harm for the company or the customer if it's non-financial it can create a other type of harms in terms of reputation compliance legal etc so in the heart of our aai deployment model deployments are really uh we in the industry call it modern risk management but really at the end of the day is really moderate safety how we uh how we deploy model that's safe and sound for the institution and as well as for customers now we're dealing with very very large scale and for our case is thousands of models that we deploy that's in production to run every aspect of the aspect of the bank and we have to make sure that all this model are are safe and sound when they are in the in the production and use so we have to uh we have to design model that is uh that is a very very very well understood and the weakness are very very well controlled so three talk about model validation there's a very very critical aspect of what we do so in a banking in a regulated industry like banking we have teams that do modern development and then independent teams that do model validation they are separate team in the reporting line they are very very separate this team are too uh checking and and challenging uh challenging it's each other with that when we deal with with a lot of model and we have to move very very rapidly in term of deployment uh and we're dealing with a thousand more than a thousand of of developer and validator we have a thousand of quants we call ourselves ones and data scientists the coordination of those are really really critical and the at the speed to to get it done is a very very important so what we're thinking about is this is what we do it's what we call it as a validation on demand where model developer can i can say test me this is my model test me so model developer can can choose a battery of tests that the model validator build and uh and to test their model and and vice versa model model development can go at any given time to go to any model uh to test it as well so we we doubt this approach as as validation on demand and this is can be a very very complex process in a in in a large company because model development model develop and deploy in a various different system and we have platform for model validation as well so the connection and the talking in term of model development and model validation various platform are very very critical for us and part of that is how do we how do we make it happen so we have the model development in a in a modern development platform and the model deployment and modern deployment platform and then model risk platform are really moderate validation and with that uh i put it in the in in in the in in the picture here is like we have series of model many many moral and many many tests done by by various different groups and they need to be able to uh to connect uh to be able to uh to to to to uh to invoke uh on kind of testing uh from from from both sides so this is the things that we thinking about model developer is a building application of of the model itself they need to be tested so model validation built the application for for testing that they can they can be invoked by a model developer to run all the testings that require uh so so that it will comply with our standard in term of model safety and vice versa model validation will be able to go to any system uh to to to to uh to connect with a certain application in the model development and to be able to uh uh to do testing uh in in as well uh we uh we we we we put uh one of example of that uh three uh show it and in his presentation the uh the alessia wave the alesia is basically is how to make a deep neural network as inherently interpretable model a self-explanatory model not a not explaining not explaining a deep network through a uh to uh through a post-hoc explainability tool to explain post-op explainer but really truly make a new deep neural network a self-explanatory model that is very important for model safety because it's beyond explainability we have to understand which region how the model will be wrong which area that model have weakness and then has sufficient exposure to training sample and so on and so forth so we are uh working uh very very uh uh extensively in the subject how to come up with a self-explanatory model uh to uh to to really uh uh move beyond explainability to deliver to assure model safety so let me stop right there i'll i'll turn it back to uh uh to uh to the next figure thank you agus um our next speaker is is an incredible um pioneer in healthcare fonsecours is one of the leading largest one of the largest catholic hospital chain in the world if not in the united states and samir drives transmission there and brings chief data officer is transforming um the hospital it's in and somebody joins a rank of customers and we're super excited to have him speak about the challenges and opportunities in transformation for health care especially given post coverage thanks tree for the introduction and uh uh just wanted to find out we well we are we're partners in this uh definitely not customers by any means uh i think you and your team have been super instrumental in making us feel that way and setting us up for success give me a second while i while i share my screen uh can you can she can you see my screen yeah thanks all right so uh when you uh when you look at the potential of uh deploying ai in healthcare and uh impact it can drive uh the opportunity in in our mind today or endless uh we at bonsie course are and i think healthcare in general are very ripe in this space and uh we see opportunities to leverage ai and drive impact through our care continuum i mean it's just that optimism is immense uh what is uh encouraging is that this sentiment is is shared across the the healthcare i.t sector and organizations both uh small and large are doing something about it uh you know whether it be startups where you see a lot of movement uh and and tech giants tool who are actually doing quite a bit of work now uh investing in it healthcare i t technology investing in ai and also investing in apps um there's a substantial amount of energy investment that is going into space as i said and we are starting to see that resonate both in the public and private sector uh you know there's as you can see on the screen here you know there are quotes and and and and things that are issued in the public domain today that that shows that ai is here to stay so we at bsmh have made this a priority and have made a commitment to embed ai in different ways into the way we work and also make it a priority and key area for our growth while we are encouraged by the the potential of ai uh considering our mission and the non-profit posture that we have we have in place we put a lot of emphasis to make sure that these programs and and projects are are set up for success um when we look at the the bag of um for lack of better term ai failures both internal to us and and and things that have happened industry we realized that that air projects for ai projects to be successful there has to be a bit of reshuffling around what happens and how much effort and investment goes into what part of these initiatives and projects you know while we realize that you know data and data science is really important to an ai project more focus needs to be paid attention needs to be paid to to the user and the adoption of of of these capabilities uh we have therefore uh pivoted quite a bit in in focusing on the uh as you can tell from the screen here uh you know to the ui ux element and the adoption of ai capability into into the workflow because i i feel that's where in our mind that's what the missing link is that's where when we open up that bag of ai failures we started to see trends that show us that the models were great uh they were but they were not deployed and they weren't deployed because they weren't accepted by the business and they weren't accepted by the business because they didn't have ways and means of fitting that into the workflow and i think that is where you know apps actually come in and building up apps around these ai capabilities actually come in in addition to that and you know i'll i'll we'll go to what we're doing about those those apps in them in my last slide um we also adopted a very methodical and prescriptive approach to ai projects you know this allows us to make sure that the prioritized and high level use cases make it into the build sheet which is really important we felt at times that that was not a focal point people saw a good idea and exciting thing for us builders to build and just ran with it so we wanted to make sure that that we followed the process which will allow us to think through the problem make sure this is a problem we are trying to solve make sure that there's agile built into how we build these projects which the missing piece that we saw was collecting feedback as these projects get built and being having the having a way to actually pivot around and navigate around the changes in the ask for things that we didn't understand and then finally focusing on the consumer at the end of how as i mentioned this solution will fit into the workflow and benefit the workflow in our case benefit the patients and the caregivers so and that's where i think you know apps come in again uh it's it's it's it's a way for us to bring it to the masses to ingrain the ai capability into uh the folks that need it most so we at the bs image have partnered with uh with h2o to develop our uh our ai muscle uh which includes developing the apps which includes developing models uh and and and finally handing it to to the people that that that matter the most which is our patients and our caregivers we have a variety of clinical and non-clinical use cases that have potential to deliver impact which is in the form of improving outcomes and patient lives and communities we serve so this is a really important area for us we do this today we just want to do it in a different fashion now leveraging ai and leveraging technology that exists today um h2o will not only uh bring us the platform where we will do our data science but it would also help us democratize our ai capabilities through these uh building of apps and and the app stores and a few other features that that she talked about today um we at the bsmh are truly excited about this opportunity i think this is a game changer for us uh i we we believe that the h2o team their capabilities their platforms uh and their i think way of of of thinking through uh data science and an ai problem will improve our speed to market which will allow in turn to start to deliver impact in a meaningful and efficient manner um in closing i think you know what's what's important to recognize here is that uh you know we as as as folks that have been managing data and and working in models have been doing this for a while and i think they've done a good job uh putting things together what we haven't done a good job in putting it into the hands of the consumer and that's where i think this conversation i think when this was uh the this was brought to me uh i personally was very excited to talk about this is because i feel it's it's time for us to shine it's time for us to take this capability and amazing the work that we have done in in the last 20 25 years and take it to the masses and make it meaningful and make it count so uh so so so people can say that ai is actually something which they use day in and day out which at times i think they uh they're starting to realize through other means but at least in healthcare it's not as transparent and prevalent i'll pause it for some reactions or questions you guys may have thanks amir incredible um vision and uh execution ahead for us um both together and to serve our communities better um is our second foray into working very closely with another phenomenal community first [Music] healthcare leader obviously we look forward um non-profits are a unique charter they want to do good and just kind of philosophy with which we built h2o and our communities and so we're super excited to be a partner in your journey so thank you have an amazing team really looking forward to working with you guys and extending this relationship out don't go very far because we'll see you at the panel um introduce um um hike healthcare life insurance so it's soft enhance health and life um what i call it the trembot the holy trine rate um one of the parts of life insurance is you have such long um long-term thinking and agenda and and i think it's with great pleasure i introduced hike who's um from aegon and that's our as chief data analytics officer at aegon and a leader in practicing analytics across um decades of experience and um and uh when he first invited me to der hogg to present at his conference i always wanted to have him back at ours so it's a great pleasure i i um invite hike on to the stage thank you thank you it's free for the introduction can you see my screen thank you thank you yeah okay can you see my screen or not yes like my presentation okay perfect well thanks sir for having me here and it's a pleasure to present um what we are doing at aegon with astro but also in in the broader terms in uh with analytics in our analytics journey um to start with the the journey that we have been on for analytics for the past well almost six years is presented on this slide where we really started with the building the skills for me it was important to start with capabilities and people first before we we thought about really the technology because in my experience it's in the end it's the people making sure that there is an adoption that we can integrate it into an end-to-end journey uh and only focusing on technology doesn't doesn't help you that much so we started to build skills we started to build an economical academy to really attract good talent to give them also a place to work in a global community although they work in a country unit in the academy they travel to different country units both for pre-corona they do assignments in different country units to really leverage um best practices from one country to another we saw that that was good but not enough and we started in a center of excellence and in the center of excellence but i'm heading up we really were focusing on how can we with a small team make sure that we leverage the best practices don't reinvent the wheel because we recognized if there is one area where there is something global it is around data and analytics and ai there is no need whether you are in the u.s in france america which is our brand name in the uk in europe or in asia we use the same type of mobile we use the same type of data so it is very easy or in principle very easy to to have a common way of working it the next step was really to say hey we are focusing primarily on the marketing and sales areas and of course the nutritional actuarial domain but there is much more as we digitize the organization we have data everywhere so what i start to push is to think about analytics across our whole value chain which is in of course the marketing and sales but also in commercial pricing in uh straight through processing of underwriting fraud detection was already mentioned a couple of times we built a lot of fraud models but also straight through processing of your claims but it's better to look at if a claim comes in that you can process it with no human interaction um but also in the more adjacent fields of people analytics risk analytics finance analytics we were really able to to push the envelope on working with analytics to to make better decisions that is great but it still didn't really bring all the change that we could bring so we started an a big program called analytics for leaders to really engage with leadership not that we expect leadership to program in python work with hco that we want to make them aware of what is the analytical process look like what is your role and one of the rules is to for instance articulate a business problem clearly you would say it's an easy one but in reality it's difficult how do you handle data how do you go on with immaturity in the data how do you stimulate your team to really become data driven and to work with use cases that really boosted a lot of the data innovation in our organization and but what it's good to see is that it isn't now we have a top-down and a bottom-up approach to really capture the value of of analytics in the ai what i purposely left out in the beginning was the stuff around data but now we're pushing more and more around data confidence data management in order to be able to productionalize ai in a better and faster way and in the same time we are working on democratizing analytics for really ai for the business and what was already mentioned before the the ai apps ai app store is a great way to do that because on one hand it makes it easier for the broader audience to adopt ai but in the meantime in three minutes already a couple of times explainability more moral covenants it was also mentioned by uh that it is really important in well in financial models it's well organized but there are no financial models it's more difficult so controlling the the models uh also when we we put them in production when we keep an eye on the performance the buyers etc that is one of the things that we are really pushing at the moment so this is about the journey and you will see that also that the type of way we're working focus points from for me and my team are really changing over the years well we have applied this like i said in a large number of use cases in different places so we have cross-sell models mba models but also things like email routing nlp call visa identification commercial pricing um social processing under in the writing etc etc so we i've used it now across the board in a large number of things it didn't come easy i must say because first of all there's of course some resistance in the organization but also we have a tendency well we are pretty risk averse as an insurance company should be but it also applies here in the beginning there was some skepticism against machine learning on ai so what i really try to push in the organization is to say test learn and optimize in a very quick way so create hypothesis experiment with those hypotheses share successes but also really important share the failures i mean we have learned where it does apply red works well where it doesn't work well then adjust it and and scale up so in the beginning there was a slow cycle and now we are getting better and better in it to to run that and with platforms workbenches like an h2o this is also uh enabling the process where because it becomes easier to work with a larger group of people to have the the controls in place that you want to have in order to make this happen in particular in highly regulated markets one of the things that we uh well by the boost in kind of three work streams is on one hand developing the skills in the iai space in beginning but well it's still going on but in the beginning it was important to really create a community of let's say the evangelist in this space because we were kind of scattered of the specialists in this firm so bring them together and really push the envelope on initiating project and not wait till we have the perfect solution started trying it and even in a lot of projects do it simultaneously to the let's say more traditional methods that we use like a tnm model or or whatsoever so it was kind of in beginning a parallel stream it now becomes the leading stream so what's ahead of us um i think it's important to a faster adaption of the the success uh to sexual successful use cases um started uh showing you the the journey with perspective sharing which was really like in an environment like this uh presenting a case to the community and then someone said hey that's interesting for me as well and they picked it up and you started to learn and well slowly to move it from one country to another the next step is that we took is a use case library where i was able to collect all the use cases from the different country units put them in a repository what we call the library with the contact person but also often coding code attached to it and some more information about it still it was hard to well it works it goes faster than the best practices but still it's hard to to scale it up so the internal app store that the tree was talking about i think is the next step for us to really make sure that it's easier to disseminate those successful use cases in again controlled way let's relate to the next point of the model management for non-financial models where we have a pretty strong process for model validation model management for financial models the non-financial models it's a little bit more in the wild west and we have to control it better and we are already in the process so from idea generation to am i allowed to use the data from a privacy perspective how can i use the data from a security perspective but also who owns the model how do you modify the model and how do you monitor it over time of biases but also on performance then it is about democratizing analytics so we are already working on a big training program which is accelerator where we put in some some ai elements um but still some of the the hesitation in the organization is okay now we have the tools in the hands of people that may feel comfortable about it but we don't feel comfortable always that they have it in place so we haven't processed but also again the app store would really help to make things in a more secure way and something that was mentioned by three uh well in some of his use cases is to explain the ability of ai for a wider acceptance we already came a long way in that one but it is still sometimes a blocker in the organization to have it so aspects like fairness ethics and contestability are really important for us to bring that in and that is for internal purposes so that we feel comfortable when we put out a model or when we have a fraud model but it is also important that we offer our customers clarity on which models we use and always have in what's called by the early contestability always have an option to have a newman on the line to explain what's going on and why they did not get an immediately claimed settlement or whether they were not accepted for credit or like insurance so an exciting journey i had i feel that i've done now the first s-curve in our organization going from well a scattered set of data scientists to a more structured way of doing it with the real impact and the big change will come now hopefully in the next s-curve thanks any comments questions pretty sure we're going to see a lot of questions in the in the panel uh session like so thank you so much for um being part of the journey and also being part of our um incredible speakers today thanks like for insightful presentation our next speaker is uh eric castavez and um eric's actually is representing in future which is where they're a customer partner as well and um joining him on stage will be michelle tanko will be one of our phenomenal um data scientist and product uh owner of the ai app store so with great um um great uh it's great honor to have eric on stage talking about how in twitter can infuse rich data into applications uh data fuels ai applications so without further ado thank you siri hi everybody uh sri mentioned eric gaswick director of products and partnerships at infutor and in future on the left you'll see uh we're a company that's won awards for being a great company and having great products in the middle you'll see some of our clients um fortune 100 all the way down to uh startup high growth companies and hundreds in between and then on the partner side so i'll touch on a number of partners throughout this presentation a couple partners in particular both aws and snowflake were happy to be uh early adopters and participants in their data marketplaces uh both of them are speaking after me and then of course h2o so you know full disclosure we're a happy customer of h2o uh building a bunch of our models out of h2o but also uh really happy to be partnered with them and i'll certainly talk more about h2o throughout this presentation uh but to kick us off so third-party data sets help enable ai in three main ways um the first uh improving model accuracy so during the model build process um having fresh analytics ready data um improves model accuracy and and michelle and and her wonderful team at h2o have put together a live demo of this uh leveraging in future data and other data that they had sourced so we'll talk about this more in a moment second point contextualizing and highlighting model drift so this is an interesting one at infutor we see billions of records from our clients and out of that we're able to uh ascertain that about 30 of client crm data decays uh within 12 months and so obviously consumers are dynamic uh the pandemic was another you know unfortunate reminder that markets are also very dynamic and so contextualizing and highlighting model drift as uh model accuracy and drift comes into play but also record accuracy and record drift are you running a current model on somebody you know from the snapshot of who they were two years ago and are they different today what different decisions be made by that model uh if you knew the correct demographics of who they are today uh not in the past and that kind of gets us into continuous improvement so as models make decisions ultimately there's the potential for value being left on the cutting room floor and so coming back to decisions that were made by models and revisiting those with refreshed data so maybe because you're able to refresh your view of an individual and run them through the model you return a different result because it's been six months 12 months 18 months and they've changed that much or maybe the model's changed or maybe both or maybe once you've done this and you're able to move somebody through and promote them uh to give you the right activation information on that individual so that you can target them and engage them and so this slide sort of graphically shows the accuracy uh comment i made on the previous slide so you know common times uh common we see from our partners in in clients uh first party data that looks like uh what it looks like on the left so maybe just an email maybe just a street address maybe some light demographics and you can sort of you know imagine what uh you know how accurate a model and certainly how precise a model might be built off of a sample set of data that looks like that versus what you see on the right so you know on the left you have the ability to link via emails phone numbers link those records dedupe but then on the right add additional variables on that individual to give a very clear picture of who they are and give you additional variables to include in your models i'm just going to mention quickly privacy security is at our core mybrevity does not suggest that this is unimportant to us uh it's critically important and wanted to make sure it was addressed because that then allows us to have the largest most most authoritative identity graph um and so this slide sort of serves two purposes one a quick eye check for you uh see how your prescription looks uh and then the second is to exhibit you know we've got a lot of data that can be sliced in a lot of different ways to meet your use case 265 million u.s adult consumers and they're roughly 125 million households we have them linked together with names addresses referential addresses so where have they lived in the past uh emails phone numbers digital signals and that foundation allows us to build on top interesting things like you know i'm just going to highlight a couple here demographic and lifestyle attributes so our another partner uh friends of ours truthset io the unbiased third party uh escort our data puts us at the top of the class for this type of data because of our strong deterministic data from authoritative sources which were able to build on that data foundation but then also things like segmentation cluster analyses h2o helps us here h2o also helps us with things like avm and home equity scores in market auto models uh just just to name a few how we build the graph uh the point of this slide is that there's there's a number of steps that are to be honest very expensive steps but we go through those steps so that we can uh through uh wonderful partners like aws and snowflake uh our data can just be a couple clicks away from you uh so if you prefer to go through a marketplace and not ever have to talk to somebody like me um the data's there we have full data sets we have uh uh free uh trial uh data sets uh all available on those marketplaces and again you know really really happy to uh be early members and and successful members uh both aws data marketplace and snowflakes as well um but at the end of the day we're kind of just a a simple uh you might even argue boring data company and it's really through partnerships like uh software partners like h2o where you have cool and sexy software uh that that really unlocks the value uh of this data and so you know h2o is a wonderful um thought leader and pushing and evangelizing uh ai and the the democratization of it uh along with data um and it's that software that you know really helps unlock the value of data sets like we have to offer i very quickly uh am going to mention a use case because i was just meeting with them earlier this week and i thought it was relevant here before we get into a live demo um this use case is uh one of the world's leading financial services company they handle millions of leads and customers each year and and they were struggling with with model accuracy and uh for them you know again a multi-billion uh dollar company they first we're looking to acquire referential data and build a matching solution in-house so they can handle this themselves they this multi-billion dollar company found that both in terms of time and uh financial costs the cost was too great and so the solution was we were able to cleanse link and refresh data that they had to improve model accuracy um highlight uh some areas where it appeared models were drifting add more variables in to tighten those models up and you know a good example is this uh financial services company uh mortgage for them you know going back before the great recession the presence of a mortgage on an individual almost single-handedly uh maker you know made or break uh uh that individual as that company handled them so if a record came in with a mortgage was almost all green lights in 2006 2007 then 2008 hit and and if you had a mortgage you're seeing almost all red lights and and being flagged for risk as this company uh weighted you we've now added uh hundreds of new variables to that model uh so yes mortgage is still uh important to them and we have strong mortgage uh data and metadata on that mortgage uh but also just really rich demographic information that's helped able to help uh them tighten their models um and uh pick back up you know so as as their models were making kind of very strict and severe decisions on how a person's handled coming back to them as models change and as markets change pick that value back up and and have the data that they uh need to re-engage those people because they're seeing that the uh snapshot of that person today versus who they were uh 12 18 uh 24 months ago is very different and has generated a lot of wonderful new value for them so again third-party data sets help improve model accuracy contextualize highlight model drift and that's because you know it's important to think about not just model accuracy and model drift but record accuracy and record drift uh right so are you trying to use a really really uh perfect model now but the data that you're pumping through it is two years old and isn't performing as well because uh in fact that person has changed and that gets into the third point of continuous improvement and sort of uh with that um let's see it in action the nice thing is you don't have to take my word for this uh michelle and her wonderful team at h2o took it upon themselves to pull in uh infutor data uh and and put together a great app and a great demo so michelle i will turn it over to sri who i think is going to introduce you i'm going to go ahead and jump right in because uh sri introduced me already and we're a little short on time today um so hi everyone we're going to look at a quick demo of what eric just talked about this is a demo you can a demonstration you can try yourself today in the uh h2o ai hybrid cloud trial again you can make an account on h2o.ai free we see our inventor app here and this is going to be a data set hard-coded data set on public data we're not going to be showing you any customer private data today what this application does is we took a public toggle data set about customer churn and we built a machine learning model on it the model was pretty valuable we have the cost associated with happy customers who we lose and unhappy customers who were not able to save and we are able to use driverless ai to build an automated model that creates new features and we can use this for predicting churn we went ahead and joined this data with invertors data sets and because this is a public data we did it at the zip code level but for your own data this could be joined at the individual or household level for even more insights our new model performed better in this case we were able to save more happy customers almost 200 more customers were able to be saved and so that allows us to have a strong business value here this demonstrations uh shows us how we combine both automl the rapid prototyping of ui frontends and inventories third-party data so on the side here i have all the features that were built in my model and some of these features are a little complicated but some of them are new ones that came from inventor so we're going to go ahead and look through those new features here i have partial dependency plots for each new feature from the input input data set and essentially this is telling me how my prediction on the y axis changes as the feature changes so the average age of someone in a specific zip code has an impact on our predictions if if they're likely to churn or not again we're joining at the zip code level because it's a public data set what we can find here are new features that we maybe wouldn't have thought of on our own for example the age of a car in a zip code had an impact on if our customers were likely to churn or not and this could help us with demographic information from the area we can also in this application go to the inventory data exploration and you can look at the four data sets from inventor that we used in this model so there's information about automotive profiles customer insights prop property profiles and demographic information there's no private information here so those will be no columns but it can give you an idea of what type of data comes with inventory and how you could join this to your own models to improve your accuracy thanks everyone thanks eric and michelle um i think um alternative data sets is at the heart of the matter for how we bring um life to ai and vice versa just like eric mentioned data is um is waiting to be made alive with the ai our vision is to raise a forest not just a tree which is how um which is how we've essentially built most of the journey of h2o through open source and of course our cloud is a maker space it's a space where we want you to come create your products your applications and then distribute them to the incredible ecosystem that h2o has fostered and we are here because of that incredible partnerships and um as you can imagine a large part of that end-to-end innovation um especially from the data side work on the engineering side and the kubernetes deployment cloud side are coming from our partners and today we have the opportunity to bring two of those partners to the four and have them present how we are working together let me introduce um it's a pleasure to introduce snowflakes miles atkins who's a specialist in financial markets he's been working very closely with h2o prolific principal systems engineer darling of our customers eric cutchen great and i'll just take over here oh all right there we go all right hello everybody yeah so i'm miles atkins i'm a partner sales engineer here at snowflake responsible for our ai and ml uh machine learning partners and so obviously h2o is a elite partner with snowflake um just wanted to give you a quick kind of run through when um we think of snowflake you know you traditionally think of us being a data warehouse but we've made a ton of advancements and and put out a bunch of different new features to really help that data science uh you know machine learning life cycle and so um we obviously have six distinct workloads that we can uh you know run for you on snowflake we can ingest a variety of different types of data structured semi-structured unstructured data from a ton of different data sources and then obviously make that data available to a variety of different data consumers um but wanted to really focus in on you know our data science capabilities and what we've started to enable to really um you know push this data science workload forward for us we can do actually a ton of things related to the machine learning life cycle just inside of snowflake and what customers really love about this is our first features that we put out were really around you know built-in security governance uh query performance and so is those kind those kinds of topics you know really come to light as you know there's more uh you know big headlines about um you know data security that kind of thing our customers are really loving that we built this sort of data bank where you know they know they can put their data in and we've we've built the entire suite of of security and governance um you know into the fabric of our offering and so when it comes to data prep you know there's nothing stopping you from doing data prep feature engineering inside of snowflake we just came out with what we're calling snowpark and java udfs where instead of having to write a bunch of nuanced code and sql and it can get pretty complicated when you think about nesting different queries together we've given you the ability to program in your language of choice and so right now we have scala and java available where you can type in those languages and we'll convert your code into a sql which are which our engine can then run and then for udfs we actually give you the ability to run and you know create your own custom java code and then execute that inside of our sql engine as well and so this is really opening up a whole new avenue of you know how you interact with our platform which is meant to attract a lot more of the data engineering data science personas and then we have a bunch of different other capabilities obviously for pipelines we have streams and tasks and so if you think of you know apache airflow or some sort of just uh you know change data capture mechanism we have all of that in-house so you can manage um and automate a lot of your your processing maybe for you know just uh your scoring once you have a deployment and then of course what we're known for is our third party marketplace where you know a bunch of different of our customers are putting their own data up into the marketplace and feeder is a great example of that and instead of having to send over ftp files emailing csv files you can make your data or grab data and basically just see it right in your own sandbox when you go to think about enriching your your data sets um which is awesome now where we don't cover and this is why we partner deeply with h2o is on the model training the model training side and so we obviously don't have any capability to you know for you to train a model inside of snowflake and so with h2o we have a bunch of different uh native connectors um you know that can be created to to do your model training [Music] for deployment now with java udfs you can actually take an h2o model that you created and bring that into snowflake and so instead of having to you know move maybe hundreds of thousands or millions of rows of data across some you know internet pipe to score your model that's living in some other environment you can just bring the model to the data and score and with that you get the leverage you know snowflakes unlimited scale up and scale out capability and so whether you're scoring you know 20 000 rows or 20 million rows you know you can get the compute capacity that you need to to make that an efficient job if that doesn't work for you we also have external functions and so your data might be living you know inside of snowflake you might have some more intricate kubernetes cluster that you're you're managing your h2o model on we can set up an external function that calls that api your model api and basically can pipe pipe your snowflake data get your model results and send it back in one seamless fashion as well and then of course for model monitoring and management we don't have any you know out-of-the-box capability for that either but there's a ton of capability where h2o will use snowflake on the back end as this is a system of record where you know you still get the the scaling capability and all the sharing capabilities to you know allow different teams across your organization you know see what's the performance of the models business impact uh that kind of thing and so with that i'm gonna throw it over to eric here who can walk you through some of the snowflake and h2o integrations that we have here your mute eric oh sorry about that so what we're looking at here is the uh snowflake application uh built with wave and what this allows us to do is to connect to snowflake and use those powerful apis that miles just talked about this means that business users can collect data start training or even score models directly now the interesting thing we're seeing here is that you could pick the model to score directly in snowflake at a different time or even using some of those external functions what ends up happening is that as we train um and score we write these monitoring records so uh data scientists and business users can see if models have finished training and if there are any errors during that training session as well very powerful stuff an easy way to consume that on the next slide what we actually see is the ability to still use the traditional tools so as miles was mentioning we can leverage things like uh the java udf and snowpack which are amazing technologies to scale we will also auto generate the notebook and worksheets for snowflake as well so any of the tools and any of the users that want to consume those models can still do it in the exact the same way as they did or through the wave application and then when we look at the deployment parts of this the idea behind wave is to really unify the entire experience for any of the users that are using the technology so we might build a model outside of snowflake and actually upload it as we see here then what we can do is we can pick how we want to deploy that model and it will automatically upload and deploy into snowflake and be available immediately so that you can score at any other data at scale um and then of course any of the reporting any of the drift detection that we do with snowflake as that system of record is picked up straight away a very powerful combination together so with that i'll just pass it back to kristoff to talk about the aws piece as well thank you welcome kristoff thank you thank you sir you trying to thank you around to start video as well welcome to hto live good morning good afternoon everyone and it's truly an honor uh to be on this journey with h2o in partnership for democratizing machine learning and uh what i brought quickly is that is um actually ties very well to in this in this journey to put to put ai into the hands of every customer and then as as actually samir mentioned that around 2017 we're pretty good at putting things together we started customers started to become pretty good at putting things together so science leaders then actually made machine learning applicable for for product managers who built into their products but then um we see an evolution that is that is unfolding with with first in this current wave vertical leaders wanting to consume bespoke tailored applications and also we see another way that actually line of business leaders such as cfo automation cmo automation uh wanting to consume ai applications that's why we are particularly excited to be in partnership with with h2o in which ews simplifies the deploying the managing and connecting the data to the h2o ai hybrid cloud so uh using using kubernetes on aws is really taking off the making the operation and the innovation in the machine learning life cycle being able to connect to uh for a wonderful partner snowflake as well which we we co-built a lot uh together as well and then with aws aws's data lake and um and lake house architecture which i will i will expand a bit on on how can customers build a modern data architecture that they can tap into with h2o or as well as adx and aws data exchange where if they would need we saw various examples where they would need additional accuracy and then they can just reach out and enrich the data on which they can train their models and um so with with this and like what happened in the past is that especially in the past years um customers got exposed to an unseen and unforeseen amount of data and they sit mostly in different silos and then those silos came with various problems that they were unavailable the off the authorization problems or they were not necessarily at the right price performance level and um so so for the for that they would customers really wanted to have a have a ability to use the right the right tool whether be it log analytics or whether it be documents or whether it be draft data and then to move data between them as we call them at the periphery or to move them from the periphery into the data lake or from the data lake outside to these and that's what that's that's a modern architecture that we uh we call the lake house architecture so in the late house architecture customers build um scalable data lakes with using these purpose-built data services with seamless data movement and unified governments at the right performance price performance ratio and so with that with this ability um the machine learning capabilities of h2o are tapping into into these established methodologies and infrastructures manifesting the h2 and the aws partnership with the fast and simple implementation using managed services either with the data or even under the hood uh using using kubernetes and and the entire aws service suit and also the third piece is that there's a tremendous synergy in the additional partner department capabilities so as for example um for example as as ai apps as i've mentioned become more more relevant for line of businesses which which are like requiring automation then connecting the partner partners to partners in aws more than a hundred thousand uh partners like from rpa or intelligent process automation partners with h2o or the the system integrators uh bringing in additional apps in the app store the the opportunity is endless to give uh now really practical useful rich and powerful solutions to the to the increasingly uh diverse customer needs around machine learning so with that we're really excited about about onboarding uh using more of h20 and h203 driverless ai and i'm happy to answer any of the questions in the chat panel discussion with my my peers thank you thank you kristoff it's a it's a pleasure to see such incredible um insightful presentations on preceding the panel and let me invite um speakers to a panel which would be a i'm expecting a raucous uh discussion and uh point counter points here um we folks haven't met dave ferber one of our earliest supporters customers and from equifax and let me invite the team onto on to the zoom stage it's trying to see if a goose is uh please um turn on your videos and we'll start off kick off uh a great discussion um here around data dave um uh tell us about um the transformation that's happening at um at equifax obviously um i mean both with machine learning ai and all the models in many ways banks and data companies the earliest data companies um came from you know from the same um same genesis and roots as um as equifax and love to hear your story and what you're doing with the h2o and the transformation there yeah thanks ray appreciate you having me um yeah as a lot of you know equifax has been going through a major technology transformation uh we've invested a lot of money to make a lot of our data platforms cloud-based and so with that enables a lot of additional capabilities that we're really looking forward to taking advantage of and a couple of cool things that we're working on is is basically taking our analytical ecosystems and and bolting them onto our decisioning platforms and enabling ai through that channel where our customers can access our data develop their decision strategies deploy those into production kind of what you're talking about earlier and as decisions are running through those platforms and feeding them back into our analytical ecosystem where we have tools like h2o and driverless and wave funneling back into that data and then they can monitor those decisions over time and decide whether those decisions are optimized the best way as they mature and allowing the ai to modify those strategies and redeploy so we're constantly cycling through we call that the feedback loop so it's a really uh enabling with data coming through the cloud or our customers transactions any other data that we can feed in really putting it into that environment and allowing the democratization our customers and our internal users to to do more with that data and get those insights as as fast and nimble as we we've seen so very exciting times at equifax with a lot of that technology transformation enabling our ai democratization thank you so much because i wanted to kick off the question that models i mean all models are wrong some models are useful [Music] how are we what's the data science got to do with this uh not all our customers so many of these are it's incredible to see the rich tapestry here many of you are customers of ours together and some of i just realized uh it affects the customer of intuiter all of a large chunk of the demos today you saw on both slow flake and aws so there's a lot of um really the rich ecosystem here so what is it most of us can't hire all the kaggle grand masters because they're already hired how how does uh how does one go about building great models and deploying them and taking them to production you employ half of them that's the problem three uh uh for us the question for me right three yes probably another champion well for us is building grid model is probably not the most difficult one it's probably among the easiest one built there a grid model building grid model that will be safe when we deploy that's not an easy one because we know some of you talk about that that drift environment changes covet 19 is an example right that changed everything so understanding how the model will be wrong and what to do about it how to manage it and anticipate so when you design the model you also testing that so that's uh uh so that's is really in the in the heart of what we do is really uh deploying model that's safe for that is a a few things stop of course start from the data side right so what that that can we use we talk about privacy about buyers in the data what field that we can and we cannot use for us in the banking industry demographic no you cannot use demographic because that's discriminatory right so so data government piece that you can do and you what you can do and you cannot do and then when it built into the the models side uh then you're talking about what can go wrong with the technique the algorithms and what kind of bias that uh algorithm will will uh will uh uh uh make it worse so how to control that and then talk about after that when you that's just the beginning and then you how you test it for for safety and soundness more robustness and all of those things at the heart of that of course the explainability are so important transparency of the model interpretability is so important because that is the the decade to all the things that we worry about and model safety uh then uh some of you talk about model deployment monitoring uh all the things that we worry during model development need to be done as well during modern monitoring as well so uh having people teaching people to understand you call it street fighting for people that just come from out of school that learn how to use algorithm and build models and models real world it's not like that okay real world you're dealing at the end to end and now when the model will be wrong and understanding how the model will be wrong so so changing that mindset can people talk about model risk thinking about it's not about auc it's not about prediction error that's an outcome that's the easy part but how to anticipate the unintended consequence and how to to uh to do a failure mode effect analysis to remodel safety so that we can deploy more safely to the customer so that kind of training that we have to provide to our people beyond beyond that and of course uh a big in our situation is really multidisciplinary right because when it comes to deployment is the solar engineer so we have a lot of problem too when uh when when when software engineer moonlighting to be a data scientist so that's we have a lot of problem with that because they are not really trained in motor safety so that's those kind of education that we have to do in terms the um a lot packed in that answer of this um i'm going to rip off of the covet um i think one of the things that we spoke about in a previous um talk was covet exposed the lack of imagination and testing models if you have a lot of validating models and looking for different unique scenarios right shut up some samir harvest um covet transformed practice and that's probably a question across the board for most of us but um specifically you're in healthcare how has that changed how um you're bringing a transformation inside this image yeah you know i i personally think it was it was co it was all about change you know i think the problems that we saw whether it be data related or the speed at which we have to deliver analytics and data science in general uh that changed very quickly for us so we had to very quickly pivot out of things that we did in factory mode and get into your point stream you know innovation and and get into some uncharted territory so the data that we didn't think was important also became really important for us so making that available and quickly making meaningful use of that is what in my mind was most challenging it just came up like kobe did for other things it all this came out of nowhere so the struggle i think for uh you know for healthcare was really pivoting pivoting to that change and quickly reacting to you know what's needed now mike um i know um aegon was preparing for this as a as a whole through almost every early interest in the in this um two-part question one how um how things have changed both with the vision of app stores and apps and microservices for uh the organization and there's a question from vincent in the audience is that how what does that mean for the role of data scientist in the next one to five years yeah question well maybe first come back to the your first question about um what's the the biggest challenge for me it's apart from the the few things that there are mentioned it's really the end-to-end integration that's for me the way to get models uh accepted and one of the examples that you mentioned in the beginning is to start with business rules then you have a simple model and then you have your more complicated model cutting the feedback loop from people that use the model that brings eventually the the trust in the mode for instance we have of course used front models i rather call it models that highlight suspicious claims um because you have to have if we flag it as in from this case we have to have someone look at it from a human perspective to see whether it's uh it's there so that's um well one addition to the the things that are mentioned in terms of the uh covet uh part i mean of course we have we are in life insurance business and that has uh hurt us a little bit because of the well the higher death rate uh um on the other hand if you look more on the commercial side it has really accelerated a lot of the stuff that we had already been put in place so we already pushed our digitization and the use of data for it quite a bit but it was always difficult to convince your organization to work with it with the covit sitting let's say the agents sitting at home now we were able to provide them nest activities based on our models these are the customers that you should contact lead prioritization leads come in who to follow up with as soon as possible and who not straight to uh structural processing of underwriting all kinds of things that were yeah important but not critical now suddenly became important including i think a very important one really to the first question is that the feedback loop so building in okay i put a couple of next best activities to my agents but sometimes it feels like it's into the void and you need to have a feedback loop that we call that particular person that we follow up on the on the leads um and we build so we push really the envelope on on pushing that that envelope to make sure that we got the feedback loop in there and then you got the learning um mechanism in place so um having a system well basically the next best activities were pushed into a system and the agents called from that system so we knew that it was follow up how it was followed up in the in the in the type of discussion we didn't know but at least we know that there was a follow-up and we knew uh we could follow the could factor the responses eventually and in terms of the the question of how does it influence the data scientists i think it changes the the way our data scientists should operate in the organization um ten years ago i don't know exactly it was like data scientist is the the sexiest job in the world uh well i do think that there's still a lot of potential on the other hand the role with let's say solutions like so it becomes much more being the spider in the web to make sure that we see the opportunities that there are with ai thinking creatively about what kind of data sources are there so it was already passed like what kind of third-party data can we leverage or what kind of internal data sources can we leverage better how do we embed it into the entrance process being the the safeguard of when someone in the organization builds a model uh to check whether it's explainable whether it's it's the fairness the blah blah blah all those kinds of things but that's the role i think much more of a data scientist rather than starting from scratch like i did in the past and and writing my own code and and re-doing a model that has already been done many times i think that that that role will throw down of course there's still room for a bunch of people that work in your company is free but across the board in acorn i think it should reduce but a lot of other tasks will come to them but i don't see any decreasing need for them but it's a changing role yeah i'm not sure if you guys saw this but at least for the data models that were built during covet i i see i saw a lot a lot of overfitting and and i'm you know at least i'm seeing across industry folks struggling to repurpose those data models because they were for such a specific case and it absorbed all the noise in the data because that's we had a limited data set that was available but i am actually seeing a lot of rework or wastage of things that were built during code and not being able to repurpose a lot of that there is an mit tech review christopher let you comment there's a mighty tech review uh analysis of all the models that were wrong during the goethe so it's a great um point to meet i think especially for banking and some other places um um this course the traditional credit scores were no longer useful so all the guideposts that historically we would produce uh were not no longer available thanks yeah to the feedback load that he was mentioning and and then and then to the model risk management and model failure handling it that august was also bringing up we see we see an interesting pattern unfolding that that first of all not not only model building is sort of turning into data like get there get the right data with the right intuition and business logic and the right data quality to the model but the this failure handling is there's a new human in the loop role that we see previously the human in the loop was primarily for for increasing accuracy we also have a2i a service that like if if we if you're not satisfied with the accuracy of your model then the label or it could be sent and then the data will be improved with additional labeling that was primarily the human in the loop but now there's an unfolding an unfolding role of actually humans assisting them the models the models when identifying when a model is wrong rapidly identifying even real time like in context centers as an example immediately identifying when a model is wrong and then triggering a failure handling mechanism when a human immediately jumps in assists the model and then that feedback is captured and then that trains and further improves the model so that model risk management and the model model failure handling is becoming more and more important naturally there are industries there are the risk regulation and the outcomes are much more at high stake and then and then there are there are industries where it's easier to do but this is a pattern that is unfolding with in terms of model risk and model failure management that we see from humans assisting more and more models three is not only monitoring when the model is wrong anticipating when the model because the output even you don't see it yeah right sometimes the output when you're monitoring you look at the rear-view mirror when your model make mistake so i think the human in the loop is so important because anticipation of air of how the model will be wrong so this is why it's so important that we cannot use black box model because we can anticipate how the model will be wrong and even also creating a playbook what are we going to do when the model will be wrong yeah ai to predict ai right now eric alternative data sets play a huge role in this unlocking more alpha um of course it brings in unexpected by us along with it so how are um your customers essentially kind of dealing with bringing the right kind of data sets at the same time being safe from kind of independently bringing bias yeah so that's a good question i think touches on the points that christopher mentioning is when a model has an output what's what's the action that then is taken and we're talking before about the role of the data scientist but i think we're seeing an increase in um how a data scientist fits within a cross-functional team so helping contextualize uh the model uh through the lens of the marketer who's going to receive something and then they have to actually go try and track this person down um so we've seen in using the financial services example earlier where you know we previously you know might have talked a lot with people with the title of economists and maybe that's given way uh over more recent years to data scientists who are sort of taking over uh working with uh the raw data in building the models on top and so i think you know just as long as organizations uh departments uh you know sort of like the intra organizational democratization of ai so getting the other teams across functional teams up to speed and aware so that they can also help as mentioning uh you know be that human to help make that real-time decision set the playbook uh that helps inform the decisions being made as things thanks papa you know snowflakes um experiencing a tremendous growth on all fronts um miles um what kind of uh what are the key data slash ai questions are you facing as you as you're talking to your customers yeah yeah so the main thing is hey you know we have a bunch of data inside of our four walls right but we have you know and you have this snowflake you have this growing marketplace of you know weather data we have a huge uh covid data set um and then you know how do we maybe it's not public but we still want to be able to transact with our customers our partners and have our own little private you know data marketplace right and so what we're seeing is really a demand for our customers to go to their customers and say hey look we have our data go set up a snowflake account so we can enable the secure data sharing because right now we're just using ftp files csv files and so what we're seeing is this this creation of what we call a stable edge where customers are repeatedly you know going to the marketplace and grabbing you know a third party or a a second party data set you know over and over and over again and that stable edge number to us says hey look you know people it's just not a one-time let me see if this increases the accuracy of my model it's this is now part of our business process and so that's where we have a key metric in that stable edge that you know the the marketplace in our data sharing capability is actually kind of solidifying in the market just uh democratizing data is a precursor to democratizing ai certainly data clouds proceed the ai clouds and the iclouds is kind of where the road ahead is and so you're super excited for this partnership david um did you want to uh comment on the alternative data sets uh so you were no i was actually going to comment on miles point uh i second that i i'm seeing our customers more and more come to us saying hey we're on snowflake can you get your stuff there so we we could you know share this data a lot easier and uh it is actually interesting to the the evolution of that and seeing the the demand grow to your point tree so uh yeah i'll just take in a second that but the as you know alternative data is a core part of our business model uh we work with the future and all the other sources there but uh you know our clients are constantly looking for the edge or the advantage to uh to you know make those better decisions and and they're seeing you know some positives and some negatives in the uh in the alternative data space so it's an interesting time to see you know that process of figuring out how it works for your market for your customer base for your your risk level right it's a little different for every every market and sorry if i can if david's seconding the point i'll third the point uh but the demo that uh michelle showed uh michelle and eric at h2o were actually able to via snowflakes data marketplace find uh trial data of ours uh take that you know with just a couple clicks and boom their improving a model that's again matched at the zip code level like we could get down to household or individual and see really dramatic improvements but they're just sort of showcasing what an individual developer uh without the need of a budget can do uh with you know just a couple clicks in in just a bit of time well there's a question from the audience about while transparency and explanability are admirable necessary aspects of using model what is the plan to communicate to the average consumer with denied credit or insurance coverage or or health benefits what a complex model predicts without disclosing proprietary ip it's a good question i'll start answering it with one of the products that are on the app store here's actually um uh a sharply cloud if you will of all the explainabilities and what the team has put together is essentially ability to derive rules from a cluster so they ended up clustering the sharply and so you can see you can actually derive the rule from that cluster and score that and you can also change the cluster um to kind of a way to um see if that is really 60 or 60 point um and what happens when you change that to 50. so you're deriving rules to essentially start understanding how the explainability works so you can then go back to the business user but well this is actually what happened in this particular churn prevention use case simple wave application there umap clustering on top of shapley explainability but it's a good question for both the goose and hike probably close to the end of this incredible session we could probably go on for a very long time on discussing model um validation demo democratizing ai uh samir would you have any uh thoughts on how the role of the chief data officers is transforming into more of a role of um the person who's right next to the ceo trying to make strategic decisions in in the era of ai apps yeah absolutely uh you know i i personally think you know in my role today you know i and i believe there was a question about this on on chat as well as to how chief data officer is involved in deployment deployment of these models or adoption of these models so i think it's close to home you know my response to that primarily is that you know team effort you know uh you know i think in a a model while it it's it's being built by data scientist needs to be you know owned by the business per se right so and i think that's where you know apps and things like i mean it gives me that gives them that sense of ownership that this is built for them and it's used by them uh my role to play here primarily you know once it's it's in motion is is providing the platform on which this this this data model functions you know and obviously the availability and quality and the data governance around it uh so that's where i see a bit of a i won't call it a change but at least change in my expectations of of what uh data uh achieve cheap data officer does for an organization it goes beyond just making available rape data and has some ownership on what that data means to you know the consumer and goes into a little bit of explaining you know why the data is is creating is creating an output as a result of these models so so i think there is ownership of of of these models that exist even in within my domain which i think didn't quite exist in past i think at some point the data team washed their hands off what they did and it was somebody else's problem and tree i'd like to jump in on that oh sorry go ahead on the other hand i agree with you that the role of the data officer should be that broad perspective from let's say the defensive side on the offensive side of data the risk is that it becomes such a massive job and in particular in the regulator market as uh as i'm in that the focus will be on the defensive side and the role that he can play on the offensive side of data is limited so i'm still sometimes in in a struggle to see how how can we best organize it because sometimes to split it might also be better because then you have at least one fourth that is constantly pushing the envelope on on the development and you have another force that's kind of a little holding back having it all in one basket makes it makes it also easier to to push it forward so it's never that um well so the challenge of aren't we asking too much of a cdo go ahead david yeah yeah thank you i was just going to go back to the question around the consumer and understanding you know why they were denied credit and some of the things that are happening in the industry now to help with that right i think equifax is very proud of its explainable ai and leveraging other data sources to give payment behaviors that aren't traditional credit right like how i pay my my tel my cell phone bill or those types of things and feeding that into more advanced modeling capabilities to get you know get the the next level where your risk is a lot lower so uh but the the thing i wanted to highlight is one of the things that we're working on which is leveraging ai here and it's a cool app that talks it's called optimal path and it allows a consumer to understand very easily very crisply if you pay 150 on this card this is what will do this is the optimal step in improving your credit so giving our good that consumer that doesn't know a credit industry or the details a real easy way to understand this is the step i need to take the first step the second step the third step and in the order to get the optimal path to getting that approval for credit so it's a pretty cool application for ai and and it really benefits the consumer out there in terms of actual um transformation of course this is um this is this space is just truly bringing um value from data and i think iq touched on it um samir as well uh those incredible incredibly insightful conversations around the zoom phone today in terms of mitigating risk because where do you see our space emerging from here um hyper innovation is going to happen with app stores it's a marketplace and the best idea best models so model stores an app store it's a data store which is kind of why you have this incredible conversation here between the different folks what is the um you know the key call to action mitigate risk on this one from my side uh three is uh testability right so whoever built the model the model builder need to invite scrutiny right so invite a testability hey in you're not only delivering model but you also delivering app that your model will be tested very easily so i think that is a very very important door model that will be very very testable very easy to test that's what we are uh i think that will be a a big progress if we can do that absolutely i think i think the feedback loop is is critical otherwise we end up building models for ourselves and not for the masses that actually use it i would go one step further right feedback is gold in fact feedback is triggering the new it's to making new data and that's kind of how we are in the business of making data not just consuming data making ai not just consuming i i think that's kind of the powerful team david i totally agree um in closing i would say um one thing that combines a lot of us i'll just uh jump from our panel of doing um making things improvising on data to innovating with it to combining purpose a lot of what we're here for is to bring change to the world using data and data shines light and there's darkness and brings some insights where we are seeking that was done to to go and change our lives change the world uh h2o is obviously a do good um first two good con do good company it's good to the things um a lot of the uh i think hike mentioned uh grand masters and data scientists incredible culture around our community of mathematicians physicists and data practitioners is here to power that change in the world and we are here because of the incredible love of our customers and community in samir through you in healthcare hike through you in life insurance uh goods and finance and david and data in tutor thank you for joining us today and snowflake and um aws on the cloud um we are very very excited for unleashing h2oi cloud and h2o ai app stores these app stores are co-creations so it's an open invitation the call to action is cloud.h2.ei it's a co-creation maker space um and how do you be powered by h2o powered by ai and build a bigger economy of more equals and that's kind of the vision the next trillion dollar company is going to be created with ai there's no question uh about that and it could be any of the folks um participating in the cloud dot exi ecosystem so uh thank you so much for joining us on this uh incredible moment for us and um we're super excited uh for the journey ahead thank you thank you thank you thank you
Info
Channel: H2O.ai
Views: 413
Rating: 5 out of 5
Keywords:
Id: 284RvU1Ydfw
Channel Id: undefined
Length: 124min 8sec (7448 seconds)
Published: Thu Jun 24 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.