#AIMI23 | Session 1: Responsible Implementation of AI in Health Care

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome back to this exciting day my name is Natalie pageler I am the chief medical information officer at Stanford Children's Health so anybody who is interested in applying all that the great learnings today to Pediatrics and obstetrics please come talk to me afterwards I have the exciting privilege of leading this panel today on the responsible implementation of AI and Healthcare We Stand poised at a very exciting time to to harness the power of of AI and Advanced Data science in clinical medicine to advance our understanding of disease our diagnosis our interventions and our promotion of well-being and Health Care overall or at least that's what chat GPT told me to say um I have three amazing presenters today they are each going to come up one at a time and give you a quick overview of some of their work in this important area and then at the end we'll have time for a discussion and question and answer if you have questions in the audience you're going to be able to come up and ask them if you are in the virtual audience please use slido and we will ask them for you and with that let's kick it off with our first presenter Dr Nigam Shaw who is Professor of medicine at Stanford as well as the new Chief data scientist for Stanford Healthcare and he's going to tell us a little bit about what that means and his amazing work thank you all right so uh for some of you I met you yesterday in San Diego this will be a different talk so it won't be the same slides so what I'm going to talk to you about is uh our thoughts right now on how do we create Fair useful and reliable models in healthcare and this is a background name because we wanted it to sound cool it stands for form f-u-r-m Fair useful reliable models so we set up a team inside of Stanford Healthcare to address this problem which I'm sure a lot of you are familiar with that there's hundreds of models that are built in the past two years or two and a half years faculty on our campus have published more than 500 models classifiers and predictors two have been deployed in our Healthcare System and one of them is not even published so a very small percentage deployed for routine use is kind of an understatement and it's already starting to become clear that realizing value from the use of AI to guide care remains elusive we've been talking about it for quite some time uh Natalie and I were just joking that we're so far up the hype cycle we might reach escape velocity and end up on the moon or something um I mean it's pretty dramatic so what do we do about it right I mean it's it's easy to come up with fun things to complain about so we set up this new team which does four things and I'll just put those four things here and then we're going to go through them one by one the first is thought leadership and responsible AI in healthcare and I say that because uh there's enough work building algorithms burning gpus and emitting carbon there's not enough on taking these things into the clinic second the thing that I mentioned form Fair useful reliable models and how do we ensure that that is the case so we're setting up processes to do that and then of course we need to get the infrastructure and business processes around a thousand person I.T department and its interactions with another 14 000 human beings that are part of the healthcare system to make sure that we are AI ready because very easy to spell AI it's really hard to do it and then we want to make sure that we identify and execute a projects that have Enterprise Value because otherwise if we just keep doing science projects it's not sustainable it has to lead to real benefit in quality of care value of care other Financial metrics so this is our operating plan for the health system for fy23 and quality and safety and health Equity are top followed by patient experience Employee Engagement and wellness and financial strength so it has to connect to one of these things so that's sort of the overview so let's just jump in so this is a you know Mercedes-Benz logo that I've used quite a quite often to get the point across that there's an interplay amongst models the capacity of taking actions and the actions that we take so the model policy and capacity and action and I'll give you a concrete example the pool cohort equation is a model with about nine or ten inputs and produces a real valued output which is the probability of having a heart attack in the next 10 years that's the model when I say model I mean an equation that takes any inputs and produces a real valued output now there's a policy that says if that output from the model is above 7.5 percent thou shall prescribe 40 milligrams of atorvastatin and we can pretend that our capacity to prescribe statins is unlimited and the benefits and harms of the action as an ingesting 40 milligrams of trovastatin has been studied to death like thousands of papers okay but this is one model for which we have such complete flesh out of all three parts for most other things you get two out of the three if you have a model that predicts transfer to ICU like that's awesome but if your ICU is full you have no capacity so it doesn't matter what you predict how often you predict or how good you are and so we need to understand this interplay and good AI guide at work happens at the intersection of all of these three things so we have about 27 or so papers studying this interplay and based on that we developed a process or a way to assess if any AI guided work situation is fair useful and reliable or that model is going to end up being fairly useful and reliable sometimes I often use the M for fair useful reliable and has monetary value and monetary value not in the sense of extracting a profit but being sustainable because if it's not sustainable it's not going to happen and so there's a bunch of steps uh you know they're color coded it's about 13 steps you know you could make them 10 you could make them 15 really doesn't matter but the point is we need to start with what the real need is make sure we have an ethical concerns review and do the business case up front it's one of the few situations where you'd see the word ethics review and business case like next to each other but they go hand in hand like if it's unethical it's bad business and if it's ethical and not sustainable you're not going to be able to do it so you got to do both up front and then the first tranche of work is to figure out like what and why we're doing then figure out the how and be honest in our assessment of monitoring and making sure we answer the question did it work because if it didn't we got to shut it down how many reports do your EHR systems generate that nobody looks at for those of you in health I.T you probably know that the number is in the thousands and not hundreds all of that is a cost to Health Care every model that runs that someone in I.T has to babysit and nobody follows up on it is cost to the health system so we want to make sure that we're we're spending our mental energy in a way that's uh that's that matters and I want to highlight this one particular step that people talk about I.T integration and workflow integration all the time we add a third step which is which we call organizational integration as in which vice president is going to own this Pony long term when it breaks you got to fix it when the workflow doesn't happen you got to go yell at somebody and if it is not working someone has to make the hard call of shutting it down and we want to identify that business leader up front so we've done about eight or so assessments to date and they come out with a report a written report and a recommendation saying if you do this this is what we anticipate will happen we also do a little simulation those green steps where it quantifies that if the model Flags some number of patients sample from your own data and this is your workflow with so many steps and certain failure probabilities along the way if you run 500 Days of simulation how much of that utility that's promised on paper can you achieve can you realize because often there's no Staffing to follow up with all the flags and alerts that you're throwing up and then you're not going to realize anything and so that assessment in silico is something that we're quite happy about and the libraries to do that are public published under a nice acronym called a plus so you should be able to find it all right and then I talked about the processes and infrastructure for being AI ready I mean you know the the thing up up there uh the joke my friend nikesh kotecha makes is like we're searching for this elusive ikigai which is a Japanese concept of what you're good at what you value what's possible and all of that so that we have on-prem AWS Azure gcp and we're searching for this illusiviki guy where we can do our things without without any problem um and then this is straight As A reproduction from the ml Ops paper from the Google folks like the machine learning or the AI model is like the black box it's the tiny piece there's everything else around it that is necessary to make sure that you get some value of having this and a lot of places have no processes in place for this bigger part of the solution and we get too excited about the black box and then finally to close out governance is essential for Enterprise Value so this is just a zoom up of that same uh FY 23 operating plan and the yellow box is highlighting our first project that by pure dumb luck ended up as one of the bullet points on the operating plan we're building predictors for who is likely to pass away in the next 12 months and use that to guide serious illness conversation planning which would then result in documentation of goals of care and Advanced Care planning and it just and we were always puzzled like what is going on here because everything we do leads to less revenue for the Healthcare System but but why are people so excited about this turns out that it is one of the five top level goals on quality Safety and Health Equity that we want to increase the level of Advanced Care planning uh for our patients because it's been proven in rcts then when people get that they live longer they cost less and have a better quality of life but how do we do that on a repeatable basis like I can't Bank on luck every time to hit one of these bullet points and so Abby pandya who's here uh is uh helping us figure out all of these processes and you know so that we have systematic needs finding so that we can pick the problems that matter and then governance on top of it to make sure that if you have eight good things and we can only do three we pick the three that everybody agrees on so with that I'll close out and hand it off back to uh Natalie for our next speaker [Applause] thanks Nick it's a great start um now to introduce our next speaker Dr Nina kotler I is a practicing radiologist who has been clinically practicing for 18 years and was the first radiologist hired at Radiology Partners she's served multiple leadership roles there but currently as the associate chief medical officer over clinical AI welcome Dr Cutler all right thank you everyone I'm really excited to be here and what I thought I would talk to you today about was a little bit about how do you evaluate all those many AI models and the the data that I have and most of the slides I have and the things that I'm going to talk about are really based on our experience in using and deploying AI in real practice now I know that there aren't that many models that are that are utilized across the us or even in the world but at our practice we use a lot of them and they're used in real life day-to-day practice and it's that experience that helps you understand where we need to go and how do we get there so let's get started these are my disclosures all right so why do I want to talk about creating some standards uh right now it's a bit of the Wild West out there whether it's for creating AI deploying AI educating about AI monitoring there is no standard and if you guys don't recognize this this is a Gartner hype cycle which talks about the maturity of any new technology in healthcare and really what happens the only way that you get to extensive use or the plateau of productivity where it's really being used is if you create best practices people need things to be a little more plug and play and that's what we're trying to do at our practice so what kind of best practices then do you need to create well as a consumer or a customer of AI because there'd be a whole different Circle for Creator but mostly we're a customer and so I just came up with a bunch of things that I think we need to create standards for so that we can pass them across the industry make it easier for everyone to deploy these models what I'm going to talk to you about today because we just have a few minutes is one of them and that is how to clinically evaluate an AI model all right so what I will say is that a lot of people talk about numbers The Rock curve and they look at maybe one or two things that is not relevant what I'm going to tell you about is a model to provide a step-by-step way to do a clinical live evaluation of these models to know if they're accurate enough on your own data to be worthwhile to deploy and we're going to go through two use cases these are the five steps but we'll talk through them in the two use cases that I provide and these are actually we've done a bunch of these but I'm just going to show you two of them today all right so now if you're making a decision about your your evaluating use steps you have to decide what is driving your decision and for me and actually the radiologist from the VA who spoke up about it is exactly On Target these models when you deploy them reds do not always use them so what is my measure stick when I'm evaluating if these things are happening or not I'm measuring it by Radiologists acceptance and engagement because that's what's going to get my models deployed when I'm talking about Radiology all right so let's start with the first example we were looking at a brain aneurysm model which looks for brain aneurysms on ctas of the head and um and step number one so step number one is to evaluate the model performance now there's the FDA model performance statistics and a lot of people will look at those we do our own we do our own for a couple of reasons you can notice that first of all we analyze more cases and we look at real world prevalence now prevalence really matters prevalence has a significant effect on some of the other things so if you're just looking at the FDA clearance statistics you're you're going to be skewed based on what you think is going to happen so that's number one number two is the FDA clearance data tells you nothing about what it's going to look like on your data right we all know that you could go down the street as a radiologist and read ahead CTA just the same if the data is different but it doesn't always work the same with AI all right second thing that is different a lot of people talk about the sensitivity and specificity of a model those are great but remember the yardstick that we're using to measure things is radiologist acceptance does the radiologist know the sensitivity and specificity is that what they're feeling when they are using these models no right it's related to it but what are they feeling they're feeling the positive predictive value of the negative predictive value those are the ones that are important for your radiologist or your user of the tool positive predictive value is when you have a study or when your AI model says that there is a positive result how often is that right right so in this case plasma predictive value 52 percent meaning only slightly more than half the time when the model tells you there's a positive result is it actually right so you're going to get a lot of false positives that's important negative predictive value negative predictive value is actually a really interesting one you know prevalence of disease is never what they show you in the FDA statistics it's much less and in the second example you'll see even more we'll talk more about that but here the prevalence in our large practice and we do 11 to 12 percent of the Imaging in all of the us so we have a very big practice was five percent and so it's not that high that means most of your cases are not going to have a brain aneurysm so when you as a radiologist are looking at your head CTA and this tells you it's negative if you are very confident if this number is high that it's negative we found that's the key for Reds being more efficient if they can trust when the study is negative that's going to outweigh any effect of how fast they go on a positive study now what do you have to know about this because these numbers have they're based on some of the other data they have a significant relationship with prevalence if your prevalence of disease is low your positive predictive value is going to be low if your prevalence of disease is low your negative predictive value is going to be high you have to keep that in mind when you're evaluating positive predictive value and negative particular value so positive predictive value of 52 percent with a disease prevalence of five percent is actually very good why is that well because positive predictive value also is reliant on specificity and in this case the specificity of the model was very high and even higher than the FDA statistics so that helped raise the positive predictive value in a case where your disease prevalence which is always going to be the case is low or low ish negative predictive value is related to sensitivity it's hard to detect a lot of changes in negative predictive value when your prevalence is low but the sensitivity was a little bit less but not too much less than than the FDA clearancy okay so that's step number one evaluate the model performance and we can talk after if you want about how we specifically do that all right step number two this is a really important one it's probably one of the most important steps and this is did AI enhance detection people talk all the time about missed rates like what did the what are the rad Miss RADS are standard of care right AI is not standard of care we're using AI to augment the radiologist to elevate the standard of care so we should not be talking about rad misses we should be talking about what we call the enhanced detection rate so you can see here that the rad detected 41 aneurysms in this almost a thousand case study but the AI detected 38 if you combine the two they're actually 51 that were detected by the rad and the AI and so the enhanced detection rate for the radiologist is 24 percent bank number right 24 more patients will be identified that have brain aneurysms that need follow-up that's a fantastic result now I will say if you look at this data and you say well how do you go from 38 to 51 the RADS would have improved the AI by 34 so it goes both directions all right step number three find what we call the wow cases these cases that we as Radiologists think are really instrumental in making us trust the model there's a long trust curve it usually takes about six weeks for a red to trust something new with these AI models we want to shrink that trust curve we want to make sure that the RADS are engaging with it so let's find the cases that help make a radiologist feel confident about the capability so this is a 74 year old female came in with acute neural deficit stroke suspected here's the axial CTA and we're looking at the supraclinoid ICA hard to see anything because it's in line right mostly Israel you're looking at least starting with the axials the AI that is also using the axial series to evaluate it found at the aneurysm here right here on this image it doesn't look like an aneurysm you have to check over to the sagittal now hopefully we're all using these but it helps direct your attention when there's so much data in an image to where the pathology is these are things that we found Reds Miss quite frequently a supraclinoid IC aneurysms that are in plane on the axial image here's a 47 year old male with subarachnoid hemorrhage and headache right subarachnoid hemorrhage young person you're going to look for an aneurysm what's the most common place for an aneurysm Circle of Willis where are we here skull base not a common place for an aneurysm but guess what left Pica five millimeter aners and we found two of these in our study this was a common Mist because it's not a common location for us as Radiologists to to look for step number four categorize the false positives you know 52 positive predictive value there's going to be false positives but can you put them into categories of a few different things set the expectations for your Radiologists that they're going to see them makes it easier for them to dismiss this if you don't tell them that it's going to call a calcified Dural plaque the rat's going to look at him be like this is a stupid model it's really not smart enough I'm not going they're going to dismiss it you don't want the rods to dismiss it because there's so much value so categorize the false positives now it's really about the balance so again it's not any single number but the positive predictive value if it's low you have to be weighed out by something else right you have to be weighed out by being able to predict the false positives categorize them set expectations and also have enough wow cases summarize and decide so step one real world performance was pretty good AI significantly enhance the detection of brain aneurysms 24 amazing we did have a lot of wow cases and when we show these to our Radiologists that help them feel that the the model was useful to them and then false positives you know they're common not common with respect to the prevalence but they're fairly common but they're easy for routes to dismiss and we can teach them about it so what do we decide move forward with this deployment all right let's go through example number two to show you something different so this was a pneumoth RX model on a chest x-ray now chest x-ray is half of what we do we do 13 million chest x-rays a year in my practice so the having a chest x-ray model is going to be really important for us I was very excited about this so step one evaluate model performance now in this case again we analyzed way more cases massive number of cases why prevalence of disease really small now we're looking at all of our patients our outpatients our inpatients and our Ed we are combining them all together super small actual prevalence what is that going to do it's going to affect your positive predictive value so 34 percent positive particular value actually not too bad in the setting of such a low prevalence but remember it is affected by your sensitivity sensitivity is even less than it was on the FDA statistics step number two uh actually before we get to that because this is a super fascinating thing that we found what remember prevalence has such an effect what if you look at your data and you split it up into different categories so here's the FDA data First Column second column is the data I just showed you well what if you split that up into your Ed patients your outpatients and your inpatients when you do that prevalence is the second row look at the difference in prevalence between Ed outpatient and inpatient and with that look at your inpatient numbers way better positive predictive value way better numbers and this is why you'll see some of the AI vendors just selling the model on a certain portion of your population and you can do this if you want to improve your outcome and your route acceptance all right so step two to enhance detection in this case nothing none there was not a single case that the AI found that the red hadn't and actually there were a bunch that the red detected that AI did not wow cases well generally the wow cases come from your enhanced detection and uh and here we found no wow cases there was no enhanced detection no wow cases so now look at your your scale here really not doing so well nothing to pull down your low positive predictive value to make it balance out because there are no wow cases now let's talk about the pitfalls before I just talk to you about false positives and categorizing false positives but there's also going to be false negatives so here are the false negative cases within that list missing 17 pneumothoracies is a lot that's hard to convince RADS that your model is going to be good enough when you see this and in fact when one of them was really big here's that case it'd be hard to see in the light here but here's the here's a CT no rad is going to trust a model that misses something like this right sensitivity not good enough in this case all right well we were able to categorize our false positives but there were a lot of them and some of them which we can talk about in some some other time as orchestration errors that's hard for a rad to dismiss uh orchestration as sort of a background process that most people don't know about really important in the accuracy but um but it's hard to explain it to Radiologists why why it's missing those and then some of the things like skin folds we have issues with those too in real life so it's not really helping you this was actually really hurting so this case summarize and decide um real world performance was unfortunately far inferior to the FDA clearance data the AI actually didn't enhance the detection remember enhanced detection rate use that instead of rad misses there were no wow cases nothing to help balance out that low positive predictive value and there were AI pitfalls false positives false negatives that were just clinically unacceptable that no matter how many wow cases you have probably not going to be able to balance out something like this so what do we decide not worthy of clinical deployment this is FDA cleared we did not deploy it in our practice because we went through this five-star process all right so what I talked to about today number one creating best practices is really essential for us to move AI over to standard of care this is just one out of many but it's important for all of us to think about how are we going to do this so people can more Plug and Play their models we shouldn't have everyone recreating the wheel we need to share this information with the rest of the profession number two we talked about a five-step model for clinical AI evaluation for accuracy evaluation again it's just not based on any single number it's based on a few factors so if you take this five-step model evaluate performance on your data calculate your enhanced detection rate identify your wow cases categorize your pitfalls and then you'll be able to summarize and decide with that I'd like to say thank you thank you Dr Cutler that was incredible all right and we have one more speaker uh before we go break into the question section uh and Muhammad I have to use my phone to read your Center which is very long but uh so Dr Mohammed mandani is Vice President of data science and advanced analytics at Unity Health Toronto and director of the University of Toronto temerity Center for artificial intelligence research and education in medicine T carem is that how we say it wonderful welcome thank you thanks thanks it's a pleasure to be here and I have this really unfortunate task of following minina so uh my goal is just to keep you awake um hopefully that'll be successful I come from a place called Toronto in Canada it's a it's a really nice city in the summer don't advise you to come in the winter you can if you like but Summers are wonderful there and I'm at a place called the University of Toronto but also Unity Health Toronto which is Saint Joseph St Michaels and Providence Health in Toronto and I'm going to focus on our St Michael's site it's an inner city teaching hospital we take care of the sick and the poor in Toronto and uh just to give you a few stats we have about 500 beds about 80 000 emergency department visits a year about half a million ambulatory visits a year and about 25 000 inpatient visits a year over 6 000 staff over 900 Physicians over 1600 nurses now something unique about our place is that we're the only Hospital in the country that has declared AI as a strategic pillar we have eight strategic pillars and three core pillars and AI is one of our core pillars what that means is that the leadership has bought in the organization has bought in and I get a team so my team is a team of about 30 data scientists so it's a fairly sizable team I'll tell you why it's so big but we've developed and deployed now so not just you know writing papers we've actually deployed over 40 AI solutions that are live running in our hospital right now so I can tell you a little bit about how we're structured so what does the team look like the first team is our data integration governance team why do we have a data integration governance team we are not we will be joining the Epic Army soon so basically we will become an epic hospital we are I think they call it Best in Class basically it means a disaster we have over 50 Source systems that don't talk to each other it's pretty pretty comical actually how do you get all these data points to talk to each other and integrate how do you do it in real time because a lot of these Solutions are effective if you'd leverage data in real time how do you clean the data in real time and feed it to ml algorithms that spit out results and then feed it back to clinicians in real time that's a lot of data engineering so we have data Engineers ETL developers those sorts of folks in this group who are responsible for grabbing the data scrubbing it cleaning in real time speeding into our ml algorithms maintaining the pipelines that's what they do and then there's the second team this is the fun team right they get to build a machine learning models optimization algorithms that sort of stuff so that's where you do the cool model building and then of course we say all right well that's great but how do you communicate the results do you give percentages do you actually give it to them every 10 minutes or do they only round once a day what's a workflow like so we have a whole team that's dedicated to product development and these folks is actually headed up by an artist he's a musician and I don't know if he had midlife crisis but he went and did a master science and applied computing but his whole passion is how do I actually look at human AI interaction how do I look at design I want to know about fonts and colors and solutions that people will actually use so in this group we have human factors experts designers software developers that will actually build a solution after understanding workflow and what people want to see and then of course the last team which is really core I don't know about you but I'm an academic myself and I'll tell you um I can't manage to save my life so we need really good disciplined project managers to keep data scientists going and have timelines Milestones deliverables but we also need people who understand human behavior and change management so people who can go in and say I know you don't like this but tell me why or is this where it fits into your workflow let me actually make sure that it's actually you're doing it okay and then of course I'm going to bring muffins to your meetings because I know people are really stressed out right now so these are the folks who actually do the human side of things and really make sure that we deploy properly so what does a model look like our model is very simple our data scientists do not get to ask the questions so that's rule number one the questions have to come from the ground they have to come from our front line staff we can have senior leadership buy-in that's fine but that's not where the action happens if we don't have an Engaged clinician we do not do a project because the clinicians become the champion so anyone can ask a question and we have an intake form if you ask a question you have to go through this process and it is priority driven so we thought nobody's going to take the time to fill out these forms it's not that long it just basically asks you what is the issue you're trying to solve what are you doing now and where do you expect the future to go in terms of a solution that may be able to help you but we also have a little box and this is the box that we're a bit famous for it's the outcomes box and you have to fill that out and the choices are very simple what are you going to impact and your options are deaf readmission length of stay human effort cost and other and if you click other and that's the only box you click you're automatically de-prioritized so a fairly clear in terms of what we want to impact and then we also ask you by how much are you going to expect to prove things and you have to give us a number so if you say I'm going to reduce mortality by 10 great we're going to hold you to it because after we deploy we're going to monitor the change in outcomes and if we don't meet that metric we have a discussion about shutting it down or keeping going as Nigam had said so again we're very process driven and then we actually say all right if you uh actually propose the idea you have to spend a lot of time with us and by the way it's not just you you're on the intake form there's actually signatures you as a as the lead but your program director your medical director and your division department head the whole Community has to buy in and say this is a problem for all of us not for just this one person then of course we say all right we're going to meet every two weeks for the next six months or however long it takes to develop the solution you have to be at every single one of those meetings in fact you're driving most of these meetings because you own this the other thing we also kind of ask and this sounds a little bit awkward and strange but we also have conversations around tell us about the influencers in your group because if you're young and and up and coming and you have these brilliant ideas and there's a senior person who's going to say no I don't like it it's not going to happen whoever those influences are we need them to be part of the team because they're the ones that are going to drive change in adoption because these are the people that everyone looks to right so then we actually go into this intensive exercise where we have data scientists clinicians our Engineers all sorts of folks really meeting regularly to develop that solution under the direction of the end users and then of course we deploy and that's where we have fairly rigorous evaluation and maintenance processes you can imagine when we first started off we were an academic group bunch of side data scientists playing and wanting to build models and such and then of course by accident we realized wait a minute they're in production if something goes down we have to be on call so our teams actually we take call right because if something breaks at 2 A.M somebody's got to respond to that and it's going to be RIT team but if something happens with the algorithm our folks have to respond so people have to take on that responsibility knowing it's not just a research project that you can work on in the daytime this is an ongoing initiative so here's some examples of the types of AI solutions that we deploy I'm just going to give you three we have over 40 of these I'm going to stay away from Medical Imaging examples because I think we we've had quite a few and this is the focus of this session we only have we're just getting warmed up in the medical imaging space I think we have a couple of examples there but I'm going to focus on the non-medical Imaging ones the first one is actually fairly simple our nurse team team came to us and they said okay just humor us it's going to sound boring but it's really not it's one of our top three stressors it's around Staffing allocation it drives us nuts we said okay what's the issue they said okay well when we assign our nurses to the different zones in the emergency department we have all sorts of rules a junior nurse has to be with a senior years has to be with a team lead can't work in the same Zone with the same person over a 48-hour period there's all these rules and it drives us nuts because as soon as somebody calls in sick we got into the whole thing all over again and it takes us between two to four hours every day and our repeat rate or error rate is around 21 percent we said okay that's pretty ridiculous so we created this algorithm it's a fairly sophisticated optimization algorithm where whenever there's a new hire you put in a new nurse otherwise you don't have to do anything it actually tracks all the nurses so it knows who works where with whom and you click a button and it'll actually allocate for the next four days to your allocation schedule if somebody calls in sick you just type in sick you click it and it'll redo the whole thing now there's all sorts of social issues as well because somebody will say well I want to work with my friend who's working here and so there's a little bit of that but what we've noticed is after deployment we went from two to four hours every day to under 15 minutes and the repeat or error rate went from 21 to under five percent so again value to these nurses and this is the one tool out of everything we've created that if we take it down for maintenance or upgrades we we hear screaming they do not want us to touch this thing then there's another one I'm going to use a natural language processing example it's called muskrat multiple sclerosis reporting an Analytics tool our clinicians and our multiple sclerosis Clinic we have the largest one in Canada so we see a lot of multiple sclerosis patients and these are the socratic conditions so you've got tons of information that you have to go through right so our Junior folks our residents and our fellows in particular and some of our attendings but more our residents and our fellows and our drug Navigators as well would complain about this they would say okay so I just saw a patient finished at 4 pm I'm going to see another patient 4 10 p.m this person is seven years of History to go through how am I going to get through all this and it's hard right so we sat down with them and they said here's what we'd like to see like if you could actually have NLP go through this uh seven years of history and just summarize for us on a timeline and that's basically what the visual is it's a timeline it'll say over the past seven years this patient had a relapse here here and here so it's very visual they add a new T2 lesion on their MRI here and here their edss score is this you tried this drug four years ago switched this and now they're on these medications and if you're interested you could say I wonder what happened with this there's a 2T lesion here it'll actually pull up the Radiology report for you or if you click on something else it'll pull up the note from that visit as well it just saves an incredible amount of time right and so that's been deployed for a few years now and it goes through seven years of history in maybe two seconds so it's very quick and the um the variables that the clinicians had wanted we've shown well over 90 accuracy in in those those variables the last one I'll tell you is about our clinical prediction tool our early warning system so our internist came to us and they said all right so there's about 1 in 12 patients roughly that die in our Internal Medicine unit um and our problem is early detection so a lot of people have done this I get it you know the UK has the uh the news algorithm and we actually have to step back and say how are we going to add value for all the stuff that's already out there and how are you how are we going to gain your trust because mortality is a big thing and they said yeah you know those are good questions so we actually spent several months on the floors actually asking our doctors nurses and our residents because there's this clinical validation that is far more important than your AAC AUC Matrix as anina has pointed out so we're able to collect over 3 000 clinician predictions because we asked them do you think this patient's going to die or go to the ICU in real time and so over 3000 clinician predictions so we knew what the Baseline was and I think as clinicians sometimes we're a bit arrogant thinking we know a lot better than we do we're awful at prognostication so we didn't have that high of a bar to move it was actually pretty we thought it'd be reasonable but what we noticed is our clinicians beat news pretty easily so we would never deploy news it just our clinicians do better um so we have this algorithm uh it's now based on an xgboost model it runs every hour on on the hour so an ingest data it's been trained on over 20 000 patients worth of data and what it does is it predicts in the next 48 hours if this patient's going to die or go to the ICU and it's automated now to page the medical team so our protocol as soon as the medical team is paged they have to see the patient within two hours of being paged it was deployed in October of 2020 during the pandemic because we saw our mortality rates go high and uh after about two years of deployment we're just about to write the paper now there's about a 26 reduction in mortality which has been really encouraging for us the other thing that was a side effect of this that we noticed is the nurses actually were using it we didn't anticipate this would happen but they said you know our nurses also some of them get stuck with a lot of high-risk patients and the others are having it easy so now what they do is they actually allocate Staffing according to the risk levels so the rule is that a nurse cannot have cannot have more than two high-risk patients which is better for Staffing not something anticipated so some considerations um in terms of pretty implementation I'll just preface this to say there's a lot of papers out there around responsible deployment of AI and all that sort of stuff I don't find them very crunchy so I'm just going to go over a few facets and I love nigam's uh slide around it was much more comprehensive than what I'm going to show you but these are some of the things that I'm going to highlight that we struggled with as a group that's really learning as we go one is bias assessment really really important to make sure that you actually have fairness in your algorithms here's the challenge though at what point do you say it's enough because I'll tell you we did look at age young and old we looked at sickness sick not so sick we looked at sex males and females did we look at race no we don't have race data did we look at gender no we have sex data but not gender data so is it unethical for us to deploy an algorithm without looking at race disparities I would argue we don't have that data and we could we could pull the algorithm and we could remove that 26 reduction mortality is that ethical so here's this kind of balance around how far do you go the next thing is around ethics assessment some of our algorithms we actually have to have actually notice how many of them we actually have to have ethics consultation in many cases we involve patients as well for example the the algorithm around our early warning system we actually had patients because we wanted to understand if this thing triggers do you want to know about it and they told us actually no we don't want to know unless you're changing management if you change management we need a conversation so that's our protocol but the ethics around some of the issues that we have around identifying IV drug users for example in our emergency department people who don't want to be identified you can imagine the ethical issues there so some projects really involved quite a bit of thought around that communication and clinical validation is so critical that even before you deploy even if you before you think about deploying you engage your clinicians and say is this something you want because if they don't want it you just don't do it it's not going to work and clinical validation around understanding that it's not about the AUC it's about understanding a benchmark so when somebody comes and says this algorithm has 96 accuracy I can say that's garbage because there are clinicians have 98 accuracy whereas somebody else assigned another algorithm come in and say I have 75 accuracy and I would say that's great because our clinician's at 54. so it really depends on what you're benchmarking against a lot of people talk about external validation and I'll say very important but for us at our Center do I really care if our algorithm works well in another hospital not really I need it to work in mine when you're going to deploy in other places sure that becomes relevant but for me as a single Center place not really explainability is wonderful I don't I'm not a huge fan of doing going too deep into explainability and I'll tell you why we produce happily values and we did a bunch of things around saying all right well here's some of the key drivers of wire prediction is what it is and because it's so singular and univarial and very and linear in thinking the clinicians would tell us I already know this the magic is not in in those singular things it's about all the complex relationships where the model triggers where they don't understand it that's where the magic is with these algorithms and of course where people say no I'm going to push you on that I'll say okay great who here uses Tylenol anyone uses seed benefit on your patients instead of getting all the mechanism action of the thing right we don't but we use it I would argue why um then of course we get into soft launches a lot of Silent testing the early warning system example that I gave you I think we spent about almost nine months in silent testing making sure the model performance is good the pipelines don't fail all sorts of learnings and feedback around all right if we do X or Y how are clinicians going to react having a very solid evaluation framework is really really important for fair Ai and uh and responsible AI making sure you do proper evaluation we actually had an independent statistician come in and evaluate the outcomes on our early warning system because we didn't want to bias any of the the approaches or findings and of course implementation post implementation communication reitering reiterating expectations making clinicians understand this thing is not perfect it's going to miss things that you catch and it's going to catch things you miss but together you're going to do much better than any one piece alone as Nina had suggested really important that they actually really understand that that they're not going to expect miracles to happen and of course again monitoring evaluation and maintenance understanding this is not a one-time thing that this is ongoing and it's your responsibility to make sure that you look for things like data shift model degradation that can happen two years from now we're really struggling at things like feedback loop issues now right when do you train your data how do you train it on something that is now kind of intervened on and tainted that's going to be tough to do so last point I'll make is that when we started we started with an AI lab you know we did a lot of academic Centers do we had these you know cool compute environments and data sets that were clean and stuff and then we quickly realized to both our presenters points the action is in the ml Ops environment it's not in the compute environment alone there's so much more that goes into ml Ops and a progressive data governance framework that allows our data scientists to access really sensitive data is really important I would argue we have the most Progressive data governance framework in the country because we need it to enable AI and of course engaging our end users becomes incredibly key so the concept of a living lab is far more important than a simple AI lab and the last thing I'll say is the I accelerator what do I mean by that we learn very painfully how not to do things when we want to take things to other hospitals so I'll give an example we have an emergency department volume forecasting tool we have this algorithm that grabs our emergency department data regularly looks for patterns and it scrapes I think it uses four years of historical data it scrapes weather data so is there a snowstorm happening in Toronto well you don't have to worry about it here but in Toronto tomorrow night are the Raptors playing at the Scotiabank Arena tomorrow is there a marathon happening on Lakeshore Boulevard on Sunday it grabs all that data and it tells us seven days in advance their volumes so we can tell you today is Thursday we can tell you tomorrow from noon to six there'll be 82 patients waiting 10 of them will have mental health issues 12 of them will be high Acuity but the rest will be easier to manage our accuracy is between 94 and 96 percent which is wonderful right because it helps with Staffing now that's great I presented this at a at a at a conference once and I had about a dozen hospitals come up to me afterwards and said we want this I said absolutely you can have it and so six of them followed up afterwards and we wrote them back saying we'll give you the code and we immediately got responsive back saying our team doesn't know our python I said okay there's timestamp data that we use why don't you send us your data every day and we'll actually run it through the algorithm we'll send you the output they said great so we actually did that for a few hospitals and the response was we don't understand your output because we don't have like high quality data scientists here and we jokingly said do you want us to create the dashboard for you and they said yeah could you and we quickly realized we're in no position to deploy outside of our own Hospital so we actually entered into an agreement with a startup called signal one who would then come in and say we'll focus on the other hospitals because clearly you're not pretty good at it and I would wholeheartedly agree so to get it to other places we need private sector engagement in my opinion to be able to then bring solutions that are hospitals like ours and yours to the rest of the world and that's how I think we're going to make uh have deliver better care for all of us thank you awesome all right I'll let you all up to yeah come on up to these chairs thank you all for such incredible talks and I think you really highlighted uh how complex it is to do this in the real world um so a really incredible lessons learned from uh from some extensive experience already so we are we have a time for about 30 minutes of questions like I said if you all have questions please feel free to come forward but I'm going to start with you know one of the things that we're talking a lot about is Health Equity about you all touched on these ethical ethical considerations concerns around bias and started talking a little bit about how you're evaluating that can we just start with the question digging in how are you creating the structure for that evaluation how are you making that a formal process what Lessons Learned have you have you have you gotten from your initial ethical evaluations okay yeah um lesson number one is that ethics is much larger than just fairness um we typically use those things synonymously we'll say fairness equity and ethics and and I've used them myself synonymously but ethics is an umbrella term that encompasses other things like fiduciary responsibility doctor-patient relationship degree of autonomy that you grant through our algorithmic agents and so on so ethics is much broader uh it's very hard to solve for ethics one of the things that we have found in partnership with Denton Carr who's a ethicist on our campus is that typically we do that using interviews 45 minutes hours multiple stakeholders and his Insight is that if we can identify areas where the stakeholder values mismatch or are in Collision it gives us an early warning to where the ethical flash points are going to be so that's sort of I would say one of our lessons second lesson first lesson fairness is the smaller ethics is much bigger and then we can be clued into the problems by looking for value mismatches and so the third lesson is how like when do we stop like as you know Muhammad was saying that you can we can keep doing intersectional analysis subgroup analyzes for like three years and would not be done with it like when do we stop so there I'm a huge fan of work by Professor Sharad Goyle who's now at Harvard and the point he makes is let's look at the consequences because when we talk about fairness that word has two parts part one is is there a systematic difference in the algorithm's output for people who share certain characteristics as in does it produce a different number for me and you and then part two is as a result of that difference is there a systematic difference in the allocation of some benefit or accrual of some benefit based on those same characteristics we usually care about the latter we want to make sure that there's no systematic difference in accrual of benefit we don't care much about the systematic difference in the algorithm but that's where we spend our energy on the pool cohort equation that I mentioned is systematically biased for 40 million Americans but nobody gets denied a Statin because of that so we'd ever see a headline saying the pool cohort equation is racist and so when we look at fairness we start looking at the consequences and say buy protected characteristic if I look at the model's calibration and under my policy what will I do is there a difference in the consequences in a cruel of benefit and if we see no difference we say all right good enough to use I can go next so I think an output driven attention to this is really helpful because ethics is a massive massive Arena there's ethics of data there's ethics of the model there's ethics of how you're implementing the model and we have to look at all of those differently I'll talk about first the ethics of data and I'll say that because I think one of the great things that Stanford has done is made data available publicly so that anyone could use it and create these models at my practice there were there have been many groups that have come to us and said we want to buy your data and and to me I had a lot of influence in discussing this with my practice because you're like oh well you could make money and money could then help fund other efforts right so but but is it ethical to to sell your data data that has already been used for its primary use case you know if you look at what Dr Levin wrote about it it should be used for the common good and so you have to think about those things that and with it you have to know well who owns the data and who has access to the data so there's a lot of legal implications as well so we've taken a stance on our practice about selling data because I just don't think it's the right thing to do I think you want to be able to create models that are going to be useful for a large population the idea of creating a model that works for everyone is what we do right now only because that's how at least in the U.S models can get cleared and sold and if you have to create 10 models for all these different populations you know what it's going to work better I know those populations so you said you only need to create models that work in your that's right your institution that's really important I mean in theory we should be doing that because you're going to get a higher output on the African-American female who's 24 than you would if you're combining that with all of the other patients in the population however in this industry which is business-led it costs money to get an FDA clearance it takes time it's thousands of dollars it could be hundreds of thousands of dollars and many months if you wanted to create all of those models so I think in the future this is going to evolve I think it's going to be a lot easier and easier over time to create these models maybe not to get them all after you cleared or hopefully but maybe we will move to a system where we're not trying to create a sort of one model for them all but if you are in that instance you have to look somewhat at some of your patient data and most of them do not provide any subset analysis so just having some understanding is my patient population going to be similar to the training data because the data is not going to work very well and if that affects the outcome then that's going to be important yeah I think from my perspective and I'm going to butcher this because I'm not an ethicist but we've had enough conversations with ethicists to kind of come at a couple of points the first one is around principles so we have an in-house ethicist who's fantastic he has a small team and there's a Joint Center of bioethics at the University of Toronto they're really starting to focus on AI and what we're finding is the academic ethicisms are wonderful but it is a bit of a different ball game when you actually are deploying and getting your your feet dirty on the ground and our folks typically tell us look I understand the principles but I don't understand exactly what you do so there is a bit of an education that goes both ways right so things like precision and recall first time RFS has never heard of those terms was when we started talking with them right but it's important for them to understand what are the implications to your point in terms of what's the net benefit versus harm but the things that the second point is that things that they always tell us are I'm going to focus on just a few things one is that net benefit harm right are we seeing it at benefit the second is equity um you know I I want to make sure that that we're we're going to be Equitable to different groups and where we struggle is that it's not going to be a perfect arter science um because there will be a lot of gaps and you will have to make human judgment calls and those judgment calls will vary so I I what I what we've experienced is that if we go to one ethicist who's fairly well informed they'll give us a certain opinion but if we go to another ethicist and depends on who you pick and how much the billions area they may give you a different opinion so my census it's evolving I think the more education there is both ways where we learn about ethics the ethicists learn about AI I think we're going to we're going to reach towards consistency but I don't think we're there yet yeah it's possible to do multiple hypothesis testing with an ethicist till you get the opinion I'm saying it in a half jest for the very simple reason that we've been talking about learning Health Systems and things like that for quite a while now 15 20 years and one of the provocative ethicists from Johns Hopkins uh Ruth faden makes the argument that if you as an individual want to benefit from the collective data of others is it not your duty to share your own like from a pure moral principle standpoint if I want to benefit from millions of other people's data it is my duty to share my own so that is my usual opening statement about ethics in in the data space and then the second thing is here in the U.S you know we're in the Northern America and the Western Hemisphere we tend to follow European ethics for the most part we forget that about half of humanity does not subscribe to the same ethical principles and we just pretend that what the Western philosophers came up in medieval Europe is the right ethics so just ponder that for a minute yeah very valid point uh some super easy problems to solve we've got it all done right yeah Muhammad I'm going to pick up a little bit uh on your point about educating the ethicists uh because I I was fascinated how you were talking about the the role that you expect the clinicians to play and if ethics ethicists are hard to educate I find the clinicians even more of a challenge um and so talk a little bit about that piece I mean you expect your clinicians to play a huge role in in putting together the use case and implementation and understanding it how do you approach the education that needs to happen to those clinicians in order to be able to do all of that yeah great question um so yeah we we absolutely expect that and if there's not that level of clinician engagement we will not do the project and initially we thought nobody's going to go for this but we have a backlog a wait list of projects now the the engagement has been fantastic but the way we approach things is um at a place of humility because we have to go in saying we're not going to know the clinical realm they're not going to know the data science Realm so the way things are typically structured is the first meeting is what we tell our clinicians give us a mini med school so we're going to spend a little bit of time where you're going to teach our data scientists what is a pneumothorax why is it important how do I look for it tell us when you look what your workflow is like tell us about all those things from a clinical and operations perspective so we can understand um so the data science team typically walks away really enamored by oh my gosh we're half doctors and and now we feel so much better that we know more about this uh and then of course the next session is we do we kind of understand what the clinical question is and we say you know what I think maybe this approach might work and we hone in on that and so the next session is time for us to give an education so the clinicians sit there and we say here's a mini boot camp on neural networks or for this we're going to use I don't know we tend to do a lot of XU boosts because of the time series nature of our data here's what an XG boost is here's what it does see all the assumptions here's hyper parameters and how we tune things and so the clinicians understand oh that's the process and they quickly realize so let's talk more about features in the data that goes into this algorithm and understand how you're putting those in and you do realize that we actually look for all these sorts of parameters when we make decisions are you sure that you've got those in right because that's how the algorithm is then going to process it so then they're going out educated and understanding the method that you're using these are all the things that I have to be thinking about and it's almost magic uniformly I don't think we have a single exception where the next sessions are just so much easier because we start talking the same languages it's uh it's effective that way yeah and that's a great approach Nate that's yeah we've had the same experience when you're you're talking about the first part which is creating the model um you you need to get that engagement and we had the exact same experience so when we created our first AI model was 2017 there weren't there wasn't anyone that was creating it was to do natural language processing as the radiologist dictated to provide clinical decision support AIDS for follow-up best practices out there I say there's a four centimeter AAA it tells me get a follow-up with a vascular surgeon and a one year follow-up so um so how do you do that well we engaged an external company because at the time we didn't have any data scientists and they were really smart people in fact three of them or half of them were rocket scientists literally so like super smart people and when I told them what we wanted they created something brought it back and I was like this is terrible like we're gonna have to fire them this is so like they have no idea what we want to do and then I was like well gosh maybe maybe it's not there maybe it's me and we changed how we did everything and I flew out there for three days they went through the anatomy of different parts of the body talked about physiology talked about the terminology that we did why the best practices were created second day exactly what you said they went through their stuff they ran the model on different reports came back to me and we had this wonderful collaboration all of a sudden the model output was phenomenal and that's a really important part of creation I don't know that it's always happening because at least in your institution you get that ability a vendor may not be doing that so you have to make sure that you're working closely with your vendor on the other side which is the deployment and you're doing this as well but you need to make sure that the user is going to use your tool and I'll tell you a story when I was at an institution where we were deploying a tool and we flew out there to educate the way we do we have a whole process because it's it's really hard the the team there said hey you're here at our hospital and you're educating a radiologist we have two other AI tools that we bought as a hospital and and the users are not using can you teach them those two because we want them to use those it's a very big problem you have to have a process around managing that because no human likes change change has a a very like it makes us nervous in healthcare because our consequences are very high so you have to have a really strong process there you want to add anything different no whatever Muhammad said what they said yeah good call it's fascinating as uh clearly I mean you guys are all echoing the same the same thoughts and and these are clearly very important steps uh however your average physician or your average clinician is is very busy um already and so getting that significant amount of time that you're talking about can be challenging um I I guess as a couple uh so I have my own biases here which I'm trying not to I'm trying to lead you guys down this path no um so how how how are you finding I mean our physicians and clinicians doing this after hours are they using their research time how are you getting the time that you need to get the engagement to really create these Partnerships yeah maybe I can start um I find actually the the physician's not as tough as a nurses um because for for us we're an academic teaching Hospital right so a lot of our clinician scientists have protected time and so the unprotected time to do research and then they do the clinical work and my value proposition when we first started uh to get people kind of excited about was really really easy what I told them was you can go and apply for a grant spend a year doing that get rejected put in your comments address them go another year and maybe get your funding and of course you're going to under budget because you don't have a budget and you're going to struggle with your Grant or I have a team of 30 people we can get started today and you can get a lot of great Publications out of it because if we do something novel and we show impact and get some good Publications so that was fairly easy now what happened is it snowballed and then people started realizing there's this great resource I don't have to worry about writing grants I mean I still will write grants because my CV needs to be built but I get these three resources to build things that I'm passionate about then other clinicians started saying oh wait a minute this is actually happening and it's working so they started coming in the ones who don't have as much protected time and the nurses we just have to accommodate their schedules so we literally have people coming in before their shift starts spending an hour with us because they're passionate about it so using their free time which I think is a huge issue any if you guys found Miracle yeah versions here for us because we're a private practice and we do have academic medical centers but the majority of our practice is Private Practice Physicians who their job is to consult with the referring clinicians to read studies to do interventions right their job is not to help with AI and anything that takes them away from that makes their job harder so we what we've done in our practice recognizing this because you can't do this as a second job and have it scale in any way as a private practice it doesn't have days where we give people we actually pay people to uh to join our team and help do this and make it easier so these best practices were created by a team I have a team of 12 people many of whom are paid to not read cases but to help create and deploy AI in a best practice way to make easier for the rest of the practice to consume and I think that's really important now later on when we don't have to do all of this work and it's more standard of care and you're not going to need that but right now early adopters if they want to do that they're going to need to dedicate time we have a little bit of the opposite problem that our data science team is five ftes and academic Medical Center everybody's seen the Gartner hype cycle and you know want to be on the escape Velocity trajectory uh so the question becomes how do we what is the value proposition that is the fastest to show that we can go from 5 to 30. so it's a little bit of the of a different problem given that so much excitement um but but not enough like a Workforce to go from idea to Value so that we can sustain it long term how about just to push on push on albao I mean you're really looking at the entire multi-disciplinary workflow how do you get the nurse's time how do you get the RTS time how do you get the palliative care consultant time yeah so what again this n of one or two okay so uh there's a whole bunch of uh programs for quality improvement like the cell right and so on so we piggyback onto all of those because they've already figured out a way to get the right stakeholders at the table because they're already trying to solve a problem and they have nursing they have the clinician lead they have the you know operational leaders and all of them and then we send somebody to that team and that's a way to Short Circuit like otherwise with a team of five you just can't go around doing all of those meetings just by yourself yeah it's great I mean I think what you're speaking to right in my mind there is there is new models of care there's new meaning to clinician work right and it is no longer just doing one-on-one with the patient it is also being a part of whether you call it quality improvement or clinical informatics or you know implementation of data science there it is a job that needs to be protected and there are ways I mean you've got creative ways of of making turning it into their scholarly output and getting them credit that way tying it into quality improvement which is already reimbursed but I think we need to rethink the workforce and what that means to be a clinician in this Workforce so I'll ask you a question I mean there was a time that's not allowed there was a time when cmio was a part-time job right and now most health systems have a full-time cmio or 80 absolutely like what was that Journey like and what could the what could the AI folks learn from that yeah no I think you know exactly at that point so I so I'm cmio I'm also program director for clinical informatics Fellowship across Stanford medicine and and in my mind it's really that that partnership to create these learning Health Systems is between the data scientists and the clinical informaticists um who are really who are their job and they are paid to do that because there's value in it now right is to work and to understand the clinical workflows and to and to evaluate evaluate the workflows re-engineer them and change the processes to be able to support this implementation do the change management do the communication but I think it's important that that systems recognize the value of of compensating people for this work and they have for cmios I think they're starting to do it for some of these roles but it's no longer just like ask a clinician after their 80 hour work week to come put some time in with the data scientist and I think to the other end of that you know we we uh we talk about being on the hype cycle and I think the hype cycle for potential good and hype cycle for the end of the universe uh and and one of the things I think people are worried about is you know will AI replace uh replace people's jobs and and what does this mean for for me how how is that something you're encountering is that something you're educating to how do you see people's uh people's roles about evolving in this day and age I can start I I think you have to have a vision for where you want to go and then you drive toward that Vision you don't make a decision without your vision and people don't spend enough time I think thinking about a vision they try to go immediately to what are we going to do let's just start working on it and and I think that you know having been in Radiology for so many years I feel very passionate about this specialty and the reason why I went into it was because we have the opportunity to educate our fellows clinicians right we're the doctor's doctor and so at my practice I had to create a vision for what our future is going to be for a radiologist and I thought out further and further ahead and with that the idea that there is so much data coming into the system right we we're already looking at morphologic Imaging some physiologic Imaging with our our studies that we're reading but now we've got AI which is producing structured data out of the 97 unstructured we've got molecular Imaging radiomics genomics protonomics we have all of this massive amount of information that's going to be coming into the system who is going to take that information and contextualize it and pass it back to our referring clinician in my mind the perfect person is the radiologist that's our job right so if we're going to take on all of this new work we're slammed now we can't even get through the morphologic Imaging because there's not enough Radiologists we need something to take away some of the effort to elevate us so that we can do more now it doesn't mean that it's going to replace us but you have to have a vision of where you're going and in this Future Vision if you can do something in 15 minutes that used to take three hours as the keynote speaker was saying you're saving time now to elevate your role to add in more information so we're taking a lot of time at our practice to just make sure that people understand AI because you can't be in the driver's seat unless you are an expert or have some expertise in it and then with that as we move forward we'll start adding on more things because I think that's what's going to improve patient care in the future is taking all of this information and contextualizing it absolutely yeah any other you guys want to either yeah I mean I think I think to Nina's Point um you're already working 80 hour work weeks right so what I often tell people is actually I think AI is going to make care more human uh not less human because the hope is that we're going to be able to take out some of the work that you don't like doing um you know one of the things we're actually thinking about now and I know there's all sorts of private sector folks who are into this is how do you get generative AI to actually write up your admission discharge knows your progress notes stuff that you don't like doing so you can spend more time with the patient or you can cut your 80 hour week to maybe 60 hours so I think if it was a profession where you have your typical 40 hour work week and such then yeah you know what I'd be a little bit worried because we'd be on like how do we replace you but you're not gonna replace anytime soon like there's so much of that other extra stuff that has to be covered that I think if anything AI is going to help Physicians decrease their workload I hope and spend more time with patients just add one more comment quickly everything that we've done in all of our evaluations over the many years of deploying we have unanimously found that user in our case radiologist plus AI are better than either one alone and so why would you be simplistic and say I'm just going to replace you when you can augment and Elevate the standard of care yeah I think the the fears of AI replacing humans are greatly exaggerated particularly in the medical profession and as our Amy director Kurt langlas has said several years ago Radiologists who use AI will replace Radiologists who don't and I think that can be generalized to like all providers and I think would still hold ing him Kurt a lot lately so um you all gave us really great principles for how you're evaluating and going back to the idea of you know limited resources limited conditioning resources limited data scientists resources those those principles of how you evaluate and prioritize become incredibly important and you gave us some some really nice structures for doing that unfortunately in the real world it never works out as well as we design right so I'd love to hear a little bit of your lessons learned about which you know which algorithms you they seemed like they were high priority they seemed like they had all the right you know the right factors and didn't work what were the lessons learned from that how have you changed your prioritization schema and did you actually shut them down because that we always say we will but then they go on forever so not my team yet but Stanford has shut down the sepsis BPA alert that was launched many years ago yep but we have shut one down we as an institution we chose not to launch one because we did our evaluation first right now what we're doing is we're deploying five AI computer vision models out across our practice and I think it will be reviewing 27 million exams by October and what we said is we need to make sure that there's real value creation we think there will be based on everything we did but you have to determine that there's value creation and if there's not if the clinicians don't like it if our users don't like it then absolutely you need to take it away so you can spend your time working on something that's more important and and so that's a really important principle to do that I will say that it's it takes a lot of time to make sure that you're going in the right direction and and you need to have clinician effort that's working on that together but I I think there's value in figuring that out quickly we used to run a pilot it would take us six months to figure it out now we do this work that I just showed do in two weeks right you can't take six months there's too many models that are coming out there that's the accuracy part there's two other parts that are really important and probably even bigger differentiators number one the business case when we evaluate an AI model we're looking at how clinically accurate it is but also these things cost money we have no money in the Health Care System how are you going to pay for it now when clinicians we don't like to think of it we want to just think about let's be altruistic that's why we came into medicine and let's um let's deploy things that are going to help patients but it has to be sustainable so you have to think about the business model and if there's not an Roi for someone to pet to buy it you're not going to be able to use it so that's a really important one the third one is the technical capability of the actual team if you're buying a model that can deploy in your environment the the amount that the work that people do on AI is such a small component I think there was a slide on this right most of the difficulty in deploying Ai and using it and a lot of the reason why it's not out there is because it is so hard to take that crazy amount of data that you have in all those silos move it to the AI create a system that can orchestrate make sure then the AI reviews it sends it back you get those results before the rat opens a study there's so much complexity in that we have unanimously found that to be the one of the biggest differentiators right now absolutely yeah did you want to I think we we also have a question from the audience from online um online participant Dr Hicks um so the speakers this morning presented on the importance of choosing tools carefully to provide real value how short are cycles of evaluation to decide whether to continue a model evaluation do you use Sprint Cycles to get to decisions quickly or do you use a decision tool to decide length of time to evaluate a model so continuing along the same theme go for it Mohammed all right terrific um just on the previous Point uh we're actually in the process of shutting down two projects and if people are watching you know who you are and we uh but we do have this kind of Fairly extensive feasibility assessment that we do yeah I would say if I to Peg our success to failure ratio I'd be probably 70 to 30 76s 30 to 30 failure that's pretty high I would think but we do cut off projects if it seems like a great idea but it's just in the feasibility doesn't kind of go through and sorry can we reiterate the last question so so basically along the same lines of you know what what are your cycles for evaluating like are you doing sprints like how long are you evaluating them how yeah yeah so we actually have a few uh things that we do so there's process metrics that we take on um especially during silent testing so uh we'll have a look at things like you know how often does the pipeline fail uh does the model perform as well as we're expecting it to um and then we also have um on the actual evaluation post deployment the time frame really depends on the question that you're asking because if you have something that doesn't happen very often you're going to have an extended time frame exactly yeah so for example with our early warning system we needed enough deaths to to say are we actually making a reduction in mortality and to accumulate the appropriate number of deaths we do with sample size calculation all that sort of stuff it was about 18 months because we're just one center right um so it took a while to do a proper evaluation other things like the nurse assignment tool that like you flick the switch and it was like within a few weeks you just know you're hitting the performance so I think evaluation can vary from as few as days to weeks to a year two years maybe depending on the outcomes and stuff you're looking at the other thing I'll highlight too is it's really important that people consider mixed methodologies um there are a lot of times where the numbers are going to tell you one story but the stories that you hear from clinicians from patients in particular as well they may tell you an entirely different story and those stories may actually Trump the numbers you see all right I think we are out of time am I uh yeah is not doing a great job of watching the clock but thank you all for the very lovely conversation and uh really learned a lot
Info
Channel: Stanford AIMI
Views: 7,781
Rating: undefined out of 5
Keywords:
Id: RCIleZj3rp8
Channel Id: undefined
Length: 83min 42sec (5022 seconds)
Published: Tue Jun 20 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.