#AIMI23 | Session 2: Generative AI in Health

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
thank you so much for for being here I know we're right after lunch but hopefully this topic will be exciting enough that it'll supersede any uh post lunch you know effects um so thank you Joanna for the introduction I'm a I'm a physician I'm an internal medicine hospitalist I practice in the hospital I'm the medical informatics director for digital health for the health system which means that I work with our team to design build care models enabled by digital Technologies but also Technologies like AI that we then Implement for our patients here at Stanford and I also direct this lab called the Stanford emerging applications lab that builds and tests some of these emerging applications yeah this is a really exciting topic for me for many reasons you know I I've done a little work with AI but mostly just I'm thinking about this from the perspective of but just a human being you know in the world a user potentially I mean I have a chat GPT account like the other 100 200 million people in the world um as a as a physician as an educator so I was actually rounding in the hospital last week and I'm on the teaching service so I have my residents and my medical students and I remember walking into the team room and uh you know the medical student was there and clearly she had chat DBT open right in the browser and I walked in she immediately minimized it and and I actually wanted to tell like no just leave it up right in fact I um will tell my residents in the beginning of the week let's you guys have a chat gbd account open it up now you cannot use protected health information in Chaturbate here hopefully everyone knows that so you cannot use it for actual clinical care but there are so many aspects of clinical care that's just that's not necessarily diagnosis not necessarily anything that requires Phi but involves a lot of text synthesis Insight right stuff that large language models clearly have shown to do well well even something like hey you know a patient I just had a conversation with a patient that was really confusing to me and I want to be able to reframe it in a way that's more palatable to his or her situation give me some talking points it works pretty well so here's just an example of something that is already being used in healthcare that doesn't require any Phi and it's not anything about diagnosis what that I think is showing is just how ubiquitous AI is now in in the world that is very different from even a year ago when you know we've it's Stanford we've talked about Ai and healthier for quite a while and for a while it's been some sort of a niche topic of course people more people here are interested in others but you know it's one of those things that existed in a bubble now it's everywhere and that's I think what we'll actually be speaking to a bit during this panel not only is all of this very transformative because there's machine or machine learning advances in the ability and what these models can do and there's you know I think some of our panelists will speak to what's special about generative models and why there's so much excitement there but also and I think David will speak to this there's just like this this almost like infiltration and that's kind of a negative word but either way like this this integration of AI now into your daily life to the point where my parents are talking about Ai and they're not AI or technology people so what impact impact will that have in the real world how do we think about this you know we have the privilege of talking about this somewhat early and I think we all probably have a sense that we kind of don't really know what we're talking about in some ways I mean we're at the very very nascent stages of AI in the same way that the early days of the internet you know people are super excited about sending a message or sending email they probably never imagined like how the world was going to change what new business models what new care models for healthcare will be enabled you know I think it's the same thing with AI like we're really just thinking about the building blocks right now and the degree of complexity has now increased exponentially because of the way these models work the fact that they're outputting not just a number or a classification but it's outputting language and images in a way that is much more complex and rich in information but also the amount of number of people now who are engaged in the conversation like that also exponentially increases the complexity so how do we think about that so with that I'll introduce our first panelist and um what we'll do is We'll have each panelists give a 10 to 15 minute talk I will save questions for the end and at the end we'll kind of bring everyone back up I'll ask a couple of questions first and we'll open it up to the audience thank you I'll start with uh Jason freeze so Jason is a computer scientist at Stanford in the Stanford center for a biomedical informatics research and uh his work is really focused on enabling domain experts to rapidly build and modify machine learning models in medicine it does a lot of work with large-scale expert labeled training data a lot of work with weekly supervised machine learning and I'm sure he'll he'll say a ton more about the work that he's doing so Jason welcome wonderful thanks Ron for the introduction so yeah I'm Jason freeze I I work largely in the center for biomedical informatics with Nigam Shah presented this morning and today we're going to talk about our work on generative AI for electronic health records and so as Ron sort of suggested we've had this amazing moment where suddenly generative AI becomes the mainstream you're in a universe now where a niche academic topic is something you'll overhear on the train or like recently being made fun of by people in coffee shops like it's transcended into a thing that your average person now knows about and with all the excitement comes sort of the skepticism and concerns about their ability to hallucinate make up information and sort of uh you know potentially even deceive users in some way that's very uh compelling because it uses natural language so um everyone's fixated on sort of chat gbt and the open AI large language modeling space but these are sort of there's a general larger class of models that was are called Foundation models that's a term that's coined on this campus by Percy Liang and several other researchers and it really captures this idea that we've started to transition to how AI build models are sort of built conceived and deployed and we have a universe now where we can take millions and millions of data points or billions and trillions in the case of open Ai and train a very powerful model that serves as a foundation for building other models and unlike sort of classic machine learning where you take a single task and sort of train your model with whatever data you're able to get your hands on you now have this very flexible generalized artifact that can be adapt to many tasks with a lot of desirable properties and this runs across multiple modalities we focus on language a lot but but from Imaging and language Imaging and audio to other kinds of domains we really see a shift to this sort of new paradigm of conceiving AI all sitting on top of this is now the ability for humans to interact with these artifacts using natural language which creates a very powerful and uh opens up a lot of you know opportunities and challenges for how we conceive of ML and AI operating in society and so you know we can see with these things as massive sort of like things built by the big Tech players but in the open source Community we see this huge Trend particularly this year of just rapidly replicating in an open uh fashion the big Innovations coming out of open AI or other big tech companies and there was a leaked Google demo or a memo a few weeks back just talking about how these tech companies don't have a moat like they don't have anything special these artifacts are becoming commoditized and accessible to everybody and you know we had a moment where um some really great work here at Stanford uh called alpaca about sort of distilling information found in gbt models and doing it on the real cheap uh for something like replicating a model for 600 and it suggested this sort of really promising way to sort of get you know a free model of amazing abilities but we've seen sort of very recently some kind of you know skepticism of that perception and what brings us back is to the sort of the same place that we have to do the heavy lifting and hard work of building if we if we truly believe in this Foundation model mentality we need to do the heavy lifting and hard work of building a base model that we can rely on and trust and it doesn't come easy so we have a lot of thought leadership on this idea of Medical Foundation models we have a number of blogs through the Hai human uh through the that website uh particularly on how AI Financial models can be used in Ai and then particularly how shaky and sort of suspect a lot of the ways we evaluate these models are and we kind of proposed not an exhaustive but a simple list of dimensions and axes by which we need to start thinking about better ways to measure Foundation models these are things like classically we focus on accuracy but increasingly it's stuff like less label data making a simplified deployment in the hospital we heard a lot of great stuff on that this morning multi-modality and particularly the ability to interact with humans is Paramount to understand measure and make better for the people that actually have to use these tools and so they're kind of two big world views of Medical Foundation models um there is the big language model of Paradigm which is inputs are documents and outputs are text this is the kind of chat gbt view of the world that's got everyone so excited but there's also another Paradigm that we in our lab have a lot of experience working with and this is uh what we call Foundation models for electronic health records and that's where instead of viewing documents you move to entire timelines of structured medical data so basically this is can we take the entire patient history and learn a representation of it using a machine learning model that can be then used for Downstream machine learning tools and in this you know the input is this whole notion of patient history whatever that may include and then an emphasis on using that embedding or feature that comes out of that model and so the the key insight and sort of you know how we can do such a thing with with structured data is that we just view an electronic health record as a sort of weird language natural language of course has order and structure based on syntax and other you know semantics for an EMR EHR language we're just concerned about the order in which these artifacts or these medical codes we've defined occur over time and it turns out there's also structure and meaningful signal in the way that those are ordered and using the same technology to train a language model is naturally applicable and it can naturally be applied to these these structured language models and so the big picture idea that that we've worked on in our lab particularly a PhD student Ethan Steinberg is that's trying to machine learning model over everyone in Stanford medicine and use that to build our foundation model that we then can uh you know train at a massive scale and then adapt to the many many other little use cases or large use cases that may occur in the hospital while having huge advantages brought by Foundation models around better performance more robustness faster to train and modify and so this sort of hypothesis is that there's some shared information or structure between every human who goes to Stanford um so we have a lot of work under these themes um uh our first work was this paper called climber which was to take this language model idea and train on everybody and what we found is that we got you know up to a 19 increase in overall aurc performance don't glom too much on metrics I think metrics are complicated this is this a loose indicator of utility and if you were to take that model and evaluate how well it performs against sort of state of the art it decays 43 percent less performance over time versus sort of a standard approach and it appears to be more robust when you take that model and transfer it across subgroups so it has all these really attractive properties we have a follow-up paper that um Ethan has been working on where we move from classification models to time to event models where we're less concerned about you know sort of a rough notion of if something will happen but predicting actually when it will happen for prognosis and other sort of clinical decision making in this approach we find that it's not only eight times faster to train than current state of the art but you need 95 percent less training data so you dramatically improve your ability to use a few examples to train a powerful model the story isn't only models pre-trained models so that's a huge thing in this Foundation model era but it's also shared data sets that we can evaluate performance on and that is a particular area in health AI that has been challenging for various reasons and where I feel like Amy has done a really amazing job of sort of leading and making sure diverse types of data can be made available so in conjunction with Amy we've been uh we're working to release two data sets one is called earshot which is a fully de-identified 7 000 sets of patients full longitudinal EHR data and a standardized Benchmark for measuring that sort of fuse shot or you know how how many examples you actually need to train something and we're in the process of releasing uh or getting uh set up to release a data set of 20 000 patients including CT scans longitudinal EHR and Radiology notes and diagnosis and prognosis prognosis labels for standardized benchmarking and so we hope this really contributes to this scarcity of data where machine learning folks really want to help and build new methods but they're really only using mimic and we don't have this good notion of transfer learning and all these other exciting properties of Medical Foundation models that we want to test and so I just want to sort of wrap up with the sort of idea of these two worlds of evaluation on the left we have natural language this is stuff like I need to pull certain information out of a medical record this is conical information extraction you might want to be interested in some fact or some critical bit of information a lot of problems fall under that bucket you may now in this new era be interested in assistive writing where you may want to write drafts of notes or messages to patients or rephrase a note so that it's uh you know if you don't have time to write a really nice letter maybe you want gbt to make it a little nicer when you send it so all these types of helping tools for drafting language and then of course just more complicated question answering and information synthesis on the other side you have these more like academic kind of currently things like I'm interested in Risk stratifying patients or I'm interested in time to event models or phenotyping individuals so that I could build the clinical trial recruitment or do my study and those tend to not be sort of you know they tend not to be formulated as language tasks but more like classic EHR research tasks and you know if you were to look at these from the perspective of like reproducibility and open models um the left side has a lot of open models and the right side has virtually nothing so you can't replicate any sort of the advantages that we have found in training these models which is a huge reason why we're pushing to release a lot of these as part of our papers so that other institutions can can evaluate our performance and so there's no holistic View and we really need to do a better job and then just finally a work that we're spinning up now like why do we divide codes and texts why do we view these as artificially separated things can't we combine them so we are working on training a large language model that focuses on capturing all of patient history and combines codes and languages or language so that you can interact both naturally do sort of writing tasks but also do all these sort of risk stratification and more complicated sort of reasoning tasks with that I just want to say that you know I really liked um the the talk this morning that really said that we need to move to a universe where we look at how AI enhances our ability not replaces our ability and really the key to any of adoption here is building new evaluation tasks where humans are firmly in the loop and we look at how their ability to use AI tools effectively uh actually it materially improves people's in patients lives so with that thank you [Applause] thank you so much Jason next I'd like to introduce Katie link Katie is a machine learning engineer at hugging face where she leads Healthcare and biomedical applications of AI and before hugging face she co-led the creation of the largest open data set of longitudinal brain MRIs and this is at NYU so she's currently in New York she is actually on leave as a medical student at Mount Sinai and before medical school she was an AI resident at Google X and a data scientist at the Allen Institute for brain science Katie thank you Ron so today I'll be talking about what many people have been talking about the movement towards using generative Ai and Healthcare and I do want to try to motivate us to think that maybe openness and open science principles are possibly one of the key components of this movement so a bit about me I'm a machine learning engineer at hugging face as Ron mentioned I lead Healthcare applications of artificial intelligence and I'm also a medical student between my second and third years at The Icon school of medicine at Mount Sinai uh you may have heard about hugging face if you're an AI researcher or engineer we are a collaborative open platform of machine learning models data sets and interactive demos we also maintain a number of Open Source libraries like our popular Transformers Library we also have an open source chat UI called hugging chat and maybe you have to reduce that our mission is to democratize good machine learning and traditionally in machine learning research open science has been part of the culture that has really driven a lot of Rapid transformation of artificial intelligence researchers commonly share code model weights demos which really allows other researchers to build upon those results reproduce them and interrogate them however in medical AI it's we've been a lot slower to have this open science culturally permeate so for example two percent of radiology AI articles shared both code and experimental data and as shown in this paper we're really lagging behind a lot of the computer vision and natural language processing papers and using public data sets and sharing code and there's many reasons why it's really hard to do this in medical AI but I have a lot of empathy for this process I was part of the team that created one of the largest data sets of brain scans we're also working to open source our code through open source libraries like Mani and really as many people have mentioned prior there's this Chasm between implementing AI developing it doing research on it and implementing in the clinic and a lot of experts think that one of the reasons that this might we might be able to make progress on this is through changing our culture to be more open science friendly and for the most part in medical AI we've been really focused on discriminative algorithms these are your classic image classification object detection segmentation algorithms and computer vision these are also kind of your classic prediction models traditionally these were trained on large amounts of label data now we're seeing this movement obviously towards generative AI where you can give for example a text prompt and a text generation model we'll be able to complete that paragraph or longer story and so there's some generative AI tasks that we've um you know come out in the past couple of months for example text to image models where you can give it a text prompt such as a small right-sided plural fusion and it will generate realistic looking Medical Imaging such as these synthetic check accessories question answering is also a very popular use case for example consumer medical questions and a lot of the focus has been around these large language models especially in the past few months so just to give a kind of a quick overview of what this training recipe looks like currently so you'll start with a base model that's usually billions of parameters randomly initialized you'll feed it a lot of unstructured unlabeled data this can be Text data but it could also be things like proteins amino acid sequences or as Jason mentioned ICD-10 codes and so maybe it's better to think about these more as large sequence models but right now we'll just focus on Text data and so after training it in a self-supervised manner on all of this text Data you'll come up with this really powerful pre-trained model or Foundation model that has gotten a really good representation of whatever sequence you're training it on and then you can kind of go back to your classified or your classic supervised fine-tune methods where you could for example uh train it on a number of medical licensing questions to have a really good usme model you could also do PubMed QA um but there's a number of different tasks you can fine-tune your model on you can also just generally train it on a large instruction tuning data set for example this is also pretty popular right now The Next Step which is still very new it's called reinforcement learning from Human feedback I won't go into too much detail on how this works but just if it was a brief overview first you'll train a reward model from Human feedback and then you'll fine tune your original language model based on the reward model outputs and so with the these new large language models we've seen in a number of them being evaluated or used within the biomedical space and there's been some kind of key components of how this these have been trained within specific domains like biomedicine you can first pre-train your your model on biomedical text you could chain it on General text and then fine-tune it on biomedical text or you can take a generalist model train it on a lot of instruction data sets during your fine-tuning mechanism and then I use rlhf to make it more chatty so these have all been kind of evaluated within the medical space there are a number of ways to then improve your model's usefulness outside of the training regimen so for example prompt engineering is a very popular method these days things like a Chain of Thought reasoning where you prompt the model to create kind of a sequence of thoughts for example like think step by step is one really popular prompt also to kind of critique its past outputs is another really popular method in prompt engineering these days retrieval methods are also very popular especially in order to ground your model in factual information and update the information that it can pull from as well as using tools such as apis or even smaller specialized models as well and so we're in really nice example of this currently being research this model uses a vector database to retrieve information it also then uses a calculator to really factually answer some of these Expert Medical questions and so with all these generative um General AI models that have been really popular over the past few months we've seen a number of possible use cases emerge um which might be some people might consider be the future of gender of AI medicine however I think that it's still really important for us to think back to some of the pitfalls that we encountered with discriminative AI models and see if they still apply to generative AI so that we can hopefully avoid them while moving forward some of these include the problem selection making sure that we're identifying meaningful problems within clinical medicine to apply generative AI really utilizing the power of these models increasing data set and model transparency we still have a lot of questions on what a lot of these models are trained upon and it seems we're moving towards a world where this information is becoming less and less transparent model generalization across different subgroups or different sites is still could still be a big issue as well as degradation of models performance over time meaningful evaluation in the real world this is obviously always a really big issue when we're thinking about deploying these models in medicine we want to be able to know that they're safe and effective and then as many people have talked about human AI interaction is still going to be a very important thing for us to consider going forward things like automation bias especially with some of the new pitfalls that AI um the general of AI comes with such as hallucinations um this is something we really need to be focusing on as well there are also a number of new pitfalls that we're encountering while researching generative Ai and Healthcare some of these include that these models are very general purpose they can be leveraged for a broad range of tasks and this means that if we use a model that has for example biases in it that those biases can be then applied to a number of tasks Downstream so we need to be mindful of how that negative impact might scale hallucinations or confabulations are still a big issue where a model might sound very confident but output you know in correct and factual information different types of adversarial attacks such as prompt injection are also becoming more and more um useful or you know use we're seeing those more in social media for example continuous model updates where the model is being trained on human feedback and how that kind of value how that can change how we continuously do validation these models can regenerate training data which obviously has a lot of privacy implications and the way that we share data through you know inputting um your prompts into different chat uis obviously we need to be mindful of what the consent process of that is like what the security of that data is and finally human feedback and evaluation is a really important part of these models so we need to be thinking about you know where is that human feedback coming from is it coming from bias sources is it coming from certain demographic groups Etc and so I I believe there's a lot of benefits for openness particularly in kind of addressing some of these pitfalls one being the increased transparency that comes with open sourcing these models for example um these are my some of my colleagues on the ml society and ethics team at hugging face have created this nice interactive demo where you can look at different diffusion models and see what the average faces look like for different groups of people so for example they've inputted the passionate doctor and these are the different average faces that come from some of the different diffusion models that are currently out there it's important for us to be able to reproduce different models and their evaluations and finally making these models more open increases accessibility not just to other AI researchers or Engineers but also to other stakeholders within domains such as medicine like clinicians and patients and finally there are also a lot of challenges though with openness patient privacy as I mentioned these these models can regenerate patient data if you open these models they can be used for unintended uses both malicious uses like spreading medical misinformation or non-malicious uses like patient self-diagnosing and they're also still very difficult to play especially safely where you have different moderation tools or other types of systems that are built upon these models to make them safer these are very difficult to deploy for people who are just you know taking these models out of the box and so luckily we don't have to necessarily think about openness as you know binary open versus closed we can actually think about it as a gradient and so one of my really fantastic colleagues Irene Solomon recently published this paper on the gradient of generative AI release there's really a number of different options we can consider when we're thinking about releasing generative models and so to kind of sum up here I just wanted to kind of go over some best practices when you're thinking about maybe sharing some of these generative biomedical models and demos one to be formulating a release strategy based on some of that gradient that was previously described this is especially important if you're using patient data to make sure that it's compatible with the data sharing policies of the patient data that you used your model obviously to invite a multi-disciplinary discourse among different stakeholders when you're thinking about releasing models and demos being mindful of your model's license so you have a number of different license choices you can make when you share a model I'd recommend considering some of the responsible AI licenses also known as rail licenses these are really great responsible AI licenses that have specific language for medical models as well creating informative and detailed model and data sets cards can really impact you the model's transparency as well as kind of Define different use cases or unintended use cases when you make your model card consider including disaggregated evaluations of your model so that other users of your model know how it performs on different subgroups when you create an accessible demo for your model you really are able to access a broader range of audiences to interrogate or build upon your model such as non-technical audiences being clear about medical claims is important for regulatory purposes for example and also any user agreements you know you want to make sure that your your users are being able or consenting or having informed consent when they access your model and finally When You Do release models it's also important to release documentary preschool code so that others can more easily build upon your work so thank you for your attention [Applause] it's Katie and finally I'd like to introduce Dr David Magnus um David is a is the Thomas a Raffin professor of medicine and biomedical ethics at Stanford and he's also the director of the Stanford center for biomedical ethics and an associate at the end of research and many other things so I'll have David introduce to the rest of his uh his background thank you so much thanks Ron well thanks very much for having me here um two disclaimers to start with um first this is going to be a little rough because this is sort of new content that I'm trying out I'm kind of excited because I have kind of two main Labs that I sort of run where we do research one is around ethical issues and Ai and the other is a group that studies communication between patients and Physicians and we use tools from linguistic and anthropology and philosophy of language to do that and they've been very separate and it's kind of exciting for me that this is my first chance to publicly try and Link these two very different things that I that I do and work on so uh so sorry I apologize in advance for baking you guys all get guinea pigs and I'm gonna but I'm gonna really be focusing on the use of these generative uh models in general but especially large language models and the downstream use of them in clinical care what it might mean as those are being implemented clinically there's a whole another set of issues that arise in just the research to develop them but I'm not going to be really focusing on as much okay so um first just point out that the ethical issues that I and a whole bunch of other people have been writing about in AI apply to this particular the uses of large language models and AI in in uh um in in healthcare right so ways in which bias data leads to discriminatory outputs obviously there's uh for some of these uh llms there's much larger training data set so some of the problems that arise in Ai and Healthcare that's Arisen might be avoided by that but at the same time my understanding is from what I've heard the internet might be somewhat biased and so um and so uh so it still can be a biased Source even though there's a lot of a lot of bias data is still biased also there can be very and one thing that's very interesting about this and I appreciate the fact that we're starting to talk about prompt engineering because it's fascinating to see how you get very different outputs depending on how the prompts are awarded and prompt engineering is a really a new skill that we're going to need people to be able to do but obviously they're still bias in the data I love this line that's been attributed variously to the talmud ananias nin we do not see things as they are we see things as we are and so ultimately the what we get out of these kinds of data is just a mirror on the biases and the things that we have in clinical care and in society that are then reflected in the data so yeah so none of that's new in Ai and there's a lot of literature talking about it and it's going to be true in this area too privacy issues you know will more patient data be put into new llms that are more clinically specific for training purposes you already heard an example the answer that is yes and that creates all kinds of privacy issues that need to be worried about security issues well that's something that we've been dealing with again in AI for a while you'll need consent you've already heard discussion do we need to get consent from our patients before using their their data in these kinds of research and clinical applications that's going to be another set of challenges but again ones that we are used to dealing with when we think about Ai and the uses of this data and then finally issues around accountability and liability once you start using an AI in different roles in clinical care is there a notion of you know who if something goes wrong who's ultimately responsible for that is that the clinician who utilized it is it the people who designed the the uh the model that wound up having systematic problems that led to Patient harm and there's again a pretty not huge literature but Glenn Cohen and others have written about liability and accountability issues in the uses of AI all of that this stuff is while going to be very present for the uses of llms these are things that we've been writing about for the last four or five years and thinking about for a while and there's some sense of sort of at least pointing the direction of what it would take to get from here to there and be able to deal with these things so today I want to talk about the new issues I mean there's lots to be said about all those old issues but I've published a bunch on that other people have too I want to talk about new issues that are raised by the uses of these large language models in healthcare uh first of all is there a right to know you are talking to a a bot um is there right to know that a bot was used in formulating a care plan I'm going to talk about the problem of anthropomorphism of AI which is a pet peeve of mine uh and then I'm gonna and then I'll talk about the consent and privacy of the data from users of llms and I'm going to go through these in reverse order so let me start with consent and privacy of users so in addition to the worries about the data that's being used for the generation ultimately like uh you know the companies that are building these large language models hope to make money out of them which means they're hoping to supplant things like Google which means it's not just that they're getting data from the internet it means they're also going to have to get data from the users of all these things so as every time you go into chat GPT and enter a query there's going to be information about you and about the queries that you enter in and presumably at some point they're going to have to figure out how to monetize that through advertising and other things as they learn about you from the different kinds of things that you enter in so um so so that's going to raise all kinds of interesting challenges because sometimes you know if you want to improve your Grant submission you want to work on this article you're writing you want to figure out whether the data that you're thinking about publishing should be organized one way or another now you've entered it in it's now you've now given that data to a commercial third party and so that could raise some interesting challenges will that wind up being accessible to others depends probably on the on the large language model and the company could this be commercialized in ways that people will find unacceptable I mean potentially yes people are a little bit freaked out when they find out what Google does with our information on the other hand we sort of adjusted to it so it may be that we don't really care in the end whether or not learning from our prompts from chat GPT bothers people any more than learning our search histories that Google does maybe it won't bother people anymore but people should at least be aware that this is what's going on worry that it already was made mentioned earlier if Healthcare data or images or anything that's protected health information wound up being entered into for research or clinical use purposes in a way that's not HIPAA compliant I have a real worry about that especially since a lot of these companies are really not set up to think about HIPAA and are not covered entities and so I'm very worried about that and in spite of to say that of course nobody would do that I I I see a lot of non-compliance related to breaches of information I am confident that protected health information has already been uploaded into chat GPT by individuals and I am confident that it will happen again in the future and so that's just something that we really want to flag and make people aware of to be very very careful especially as fine-tuning of the models happen for poor explicitly for clinical purposes I think that risk goes was going to go up okay I want to talk about my pet peeve which is anthropomorphism can't do an Ethics talk without at least referencing a TV show called the good place which I don't know if any of you have ever seen but this is Janet Janet is essentially uh a model she's an artificial construct who helps people navigate around and at one point in this plot they need to delete her and so they're going up to delete her uh as they're going as she's leading them hopefully to the place to do that because that's her job as a assistant to help people navigate here's the button you need to press to delete me and then she warns them that you know I remember I have no feelings I can't die I'm just a construct but we have a Fail-Safe mechanism so I will you know try and plead with you and stop to stop you from doing it and as they go to press the button she says please please don't kill me I don't want to die and all those things and they back off at one point she pulls up a picture of her three children and says please don't deprive Skippy and and Jimmy and Billy of their mother and then they back off just again remember I'm not alive this is just an image taken off of the internet it was from a Nickelodeon concert so um right so anyway but you know the anthropomorphism that's built in is I I find at best a pet peeve but I think at most it's potentially worrisome and it's one thing when it's just done by people in the public but the worry in that article that was published in uh a job uh um Neuroscience is that this kind of language doesn't just permeate the public but researchers programmers and designers and we've already heard everybody using it I hate it every time I hear anybody talk about terms like hallucinations um chat GPD does not hallucinate um uh right and and it it what it does is it does it functions in the way it is designed to function it is not a problem with a car that it cannot get you to Chicago in one day you need an airplane to do that um it's it's it is designed to do certain things and it does those things and that is not a hallucination or anything else it has to do with the way it's designed talking about understanding is at best wishful thinking but it may also lead to problems right because if you start to think about this in a certain kind of way as an agent and think about these problems in this way you misunderstand the nature of what the thing is and I worry that that may lead us astray in like what the real nature of the design problems are it's not a glitch it's a design feature now we have to think of it differently and that's different than when you think of it as an agent anyway so that worries about me and it just we have to always remember that like Janet the behavior of chat GPT and these llms is purely syntactic and not semantic two days ago I entered into Chachi PT what is the name of the only grandson of Paul's grandfather and the answer I got was this is just an open source AI I don't have enough information about who who Paul is and or any biographical information that would allow me to make this judgment allow it allow a judgment to be made about this right and so that's um right that it just doesn't it doesn't think of things in the same way it's making a prediction based on a string and what the most probable next next sentence is um it's important to think of it that way because what was mentioned earlier the sort of certainty that comes out reflects this anthropomorphism and it also is a function of a design feature when you do Google searches because of that probabilistic nature of the algorithms that are used you get a whole range of answers you don't get that with cat GPT which leads to thinking about a certain way but that's just a design decision was made they could have had outputs that have like probability is that this is what would be next but there's a 98 chance that would be this 92 chance would be this they've chosen not to put the outputs that way but it's easy to misread into it things that really shouldn't shouldn't be okay so now let's get to the even media issues do patients have a right to know and AI was used to formulate a care plan for them uh and there's a split in the literature about how think about that um Glenn Cohen has written a nice piece uh laying out some of this that will be coming out in the American Journal of bioethics that I'm the editor-in-chief of um one way of thinking about it is that look see generally and he's thinking is from very much a legal perspective and what courts are likely to say about this issue right patients generally are entitled to the information that a reasonable patient would want to know and so if it's something that they would be entitled or resources would want to know then that's something that they should know and he made an analogy with uh if you saw at the last minute without the patient's knowledge you substitute a different surgeon to come in and perform the procedure in general and there's been some court cases like this we usually What patients might want to know who's operating on them and therefore this is information that would have to be given to them and so if an AI is used uh in the in their care plan that would suggest that patients have a right to know that the second approach is to say point out that well we don't generally explain in detail all the factors that give rise to physician judgment about why they're doing things in their care plan right we don't go through well this person trains at Harvard if they had trained at Stanford they probably would have done things this way because they changed at Harvard they like this approach and also the drug detailer who came by their office this drug this drug company has uses cheerleaders that are really cute so they like them whereas this company just buys lunch and this person's dieting a lot and so that's why they like this drug better right we don't go into any of the various factors that govern why Physicians formulate the care plans that they do and and that's generally seen as outside of the things that people can reasonably be asked to know about and so the to some extent the answer to that question should this be seen more like background knowledge they attended a conference information that they've gotten or is this more like substituting the surgeon judgment you know changing who's who the one who's really responsible what's going on probably depends to some extent on how the AI is used and utilized and so anyway so that's just a there's a lot more work to be done for distinguishing these two and figure out which which examples belong in which bucket okay now let's talk about conversational Ai and some of its uses because that's that's already here and going to be here more I mean right now we use AI in our text when you text people and it sort of suggests things to say right this is probably the next word you want you can just do that this is you know more and more this is you know LMS or like a version of that on steroids um right um and given that the uh over covered the dramatic increase in the number of like email exchanges between clinicians and patients has exploded obviously real applications in use there and then there's also conversational aspects of clinical care that are important including therapy so here is an article in agile in my in my journal about the uses of conversational Ai and Psychotherapy and whether or not there's a right to know that you're communicating with a bot so there was a mental health app to try and help volunteers write messages more quickly um and using basically using AI to really generate messages to help people with their uh which is what this app was going designed to do however once people knew that they were communicating with a bot the even if the messages were the same as what they would have gotten from a person the efficacy of them dropped so knowing who said it's not just about the message that's something I want to emphasize a lot it's not just about the content of the message it's the context and that includes who people perceive it to be coming from and so that that seems to to make it less effective uh in that study that I just mentioned the success of conversational therapy sometimes depends on recognizing that your interlocutor is a person and not a tool so it may have use as a tool in the therapeutic realm but it can't be a substitute because that concept that you're talking to an actual moral agent a human being may be important for the therapeutic success regardless of what went being said and here's a really interesting report from outside of healthcare that builds on some of those things and amplifies them in ways that I found really interesting they were looking at a couple of different contexts with fairly large studies of communication between people some in a more informal social occasions and some where people were working together to help formulate a policy and it was one of those things where it was sort of a an upgraded version of you know uh you know give you suggested texts to use in your communication tool and then they studied the quality of the communication as well as whether people thought that they were communicating which which queries did they thought think were AI generated and which weren't and there were three findings from these studies first people were terrible at knowing when they were interacting with an AI generated response or not and in both directions some things that they thought were AI generated weren't and some things that they thought weren't AI generated where second if people believe their response was generated by AI they responded negatively in multiple Dimensions so it had a negative effect on communication when people thought they were communicating with an AI however because of the first point that people were bad at making the predictions the actual AI mediated Communications had that were better in multiple Dimensions than uh than when it was not used so it has both better but as long as as long as people don't know about it right and yet lying is not a good a solution and not one that's going to be very sustainable so uh what I think this highlights is something that I think is important to think about which is it's not just about syntax and it's not just about semantics the meaning of the words or the words that are used it's all about something in linguistics and linguistic anthropology and philosophy language that we call pragmatics and just to be clear the difference in semantics and pragmatics semantics is about meaning of words it's about the propositional content expressed in an utterance or active communication based on the conventionally understood meaning of terms pragmatics which is something that I spend a lot of time studying is what is conveyed by the speaker in addition to the information that is literally stated or really really is the considers to be the primary thing that's convulated it also highlights that when we do what we do with words is we sometimes think of language as just about the conveyance of information that is really a very tiny part of what we do with language we do things with words if I say to to two of you I Now Pronounce You man and wife you are not legally married but in a different in context if I'm licensed and I have the right people if I utter those words I've created a marriage and so we do things with words we make requests we request physician assisted death we give informed consent we do all kinds of different things with words those are called Speech acts we perform things we use language as a performative um anyway so that's something that we need to get a better handle on with this which I'll talk about more and we need to understand the effects of the utterances on The Listener over and above the conveyance of information so just to give some examples to make this clear indirect speech acts and conversational applications are ways in which we sometimes use words to convey things over and above the literal meaning of the words do you know what time it is how much so sorry I I was using that as an illustration I asked I asked you I asked you a question you did not answer my question I asked you know what time it is the answer presumably would either be the yes or no right the it's I was literally the literal meaning of the words that I used was to ask a question to which the answer was yes or no however you assumed that there was really no reason why in the middle of a talk I would become fascinated with the state of your temporal knowledge and therefore I want you to convey that information to me and instead made an inference that my intention behind asking that question related to the timing of the talk was to know what the time was that is you interpreted my literal question in terms of the meaning of the words as a request and we can use the same sentences in a lot of ways when my wife is running late and I go do you know what time it is right I'm using that same sentence not to make a request nor to ask a question but to make a complaint right that she's late you can use it you know at any moment now I expect someone to say do you know what time it is as a for me to get off the stage right so you can do all kinds of things and when you're talking to a patient who has some clinical problems you might ask it as a literal question like do you know what time it is as part of like a mini mental exercise right so or a mocha so anyway so all ways of using the exact same words to perform different speech acts which depend in complex ways on the context of the utterance as well as the intentions of the speaker and the interpretation of the listener we all know if I write a letter of recommendation for a student to medical school that says that that student has really neat handwriting and is very punctual that that is a bad letter of recommendation and I've just told them over and above the literal meaning of the words this is a lousy student do not take them all right again AI won't be able to understand that semantics of the pragmatics of this just won't just understand the semantics it won't understand the pragmatics of this either but these are all things that we do with language and of course all metaphors are like this right when I say my love is a rose I'm not saying my wife is a particular type of plant or that she has thorns though maybe I'm using it to say she's thorny I don't know anyway so what are the implications of this the pragmatics of AI facility communication has been understudied so far we've got these few studies highlighting what we would call perlocutionary effects the effect that some of these things have on listeners but we're just at the very early stages of starting to really try and bring these powerful tools we have from Linguistics and linguistic anthropology and philosophy language to really dig in and study what happens in these very complicated communication change between Physicians and patients where they're using AI mediation in all kinds of different complicated ways some with varying degrees of Independence it could have unforeseen impacts on patients who receive messages in the relationship with clinicians and trying to figure out the illocutionary force that is what type of speech act it is we just went through what time it is do you know what time it is can be used to do very different things how is that going to play out when you're using an ai-mediated communicative tool to create a speech act in interacting with a patient all of this is very understudied and something that I'm very excited that we're hoping to be able to look into and learn a lot more about so I just want to end by saying first something about human to human communication is hard it's not just hard it's harder than you think it is right it really our our whole field of study is because it's because it's so hard patients and Physicians don't don't the people who think they're communicating usually don't um uh and and they think they're on the same page they're usually not and we could empirically study that so now we're entering using AI mediation on top of a process that is already tremendously challenging I do think the fact that we've got some evidence of ways that can make it better so I think AI Media communication has potential to potentially make communication better and improve it at the same time it can also make it worse depending on a lot of details about how this works out I think the inferential chain has to go back and I think this is the thing I want to end with it's again it's not just the words that are used or expressed it's the context and the intention behind it and the inferences about those intentions which means you've got to see that person behind it who's using it in a mediated fashion and I'm worried about language that's completely independent and divorced from Context and what that would possibly mean so I'm going to just end with that thank you very much foreign all right well thank you so much David and thank you all for for um your great talks and I think uh Joanna mentioned we have about 10 minutes or so so I'll start off with one question but I really would like to leave some time for the audience to ask some questions as well so my question is um and I was thinking about this as as you guys were presenting it seems striking to me that uh you know with chat TBT being released all of a sudden we're of course very excited about Ai and how close we are to AGI and all of that stuff um and but but at the same time it does seem like there there's been these vectors of progress there this whole time right whether it's around managing data sets and openness and foundational models and even well there's the ethics of AI which I know they've already been working on but also just something that seemed totally separate but now very relevant which is understanding language like the idea of language how that impacts how we think and feel and society which previous prior to chapter BT I probably I I imagine wasn't a huge Topic in AI circles but now is is there in the middle um I'd love for you to like each of you if possible to to comment on one vector of progress that you're closest to and contextualize that um with what that the broader conversation of you know this this this you know this inflection point in AI like where where is that Vector now and uh are there new vectors that we haven't thought of or are we just kind of talking about the same thing this whole time but now we all have just more attention pay to it Jason do you want to start sure yeah um I mean I think we you know largely enabled by companies like hugging face are way better at sharing stuff now and the ability to iterate and not replicate work like has enabled like a lot of really amazing Innovation reproducing results I think we're not there yet in medicine and that should be really a first tier goal to start really interrogating claims made about these models and their abilities is to get them share them in a safe way right we have to you know be very cognizant of the patient protection at play but getting those out there to test at other places on different populations of humans for people to interrogate in really critical ways I think that is an area that I think there's finally some consensus that that's critical if you actually want to use Ai and so I I'm you know more uh positive that that will happen sooner than I was in maybe a couple years ago in addition to obviously increased methods to be able to share models and data sets I'm also really encouraged by the progress that's being made on the accessibility of these models to be run locally on device being able to you know make these models smaller and smaller so that you know you don't have to think about some of these privacy concerns which are obviously a big concern when you're thinking about programs like Chachi PT as well as to be able to more easily fine-tune these models on consumer gpus some of like the parameter efficient fine-tuning libraries that are coming out these days I think are really going to be incredible towards making progress on making these models more usable making them more personalized and kind of fine-tune them towards tasks that are really meaningful within Healthcare yeah I I I'm I'm struggling a little with this question so I I would say the biggest progress that I'm hoping will make though I think it's far from certain is recognizing language as a contextualized system and uh um often that's not what happens even people who study communication between patients and Physicians uh there tends to be a focus on semantics and the meaning of words of being clear a little bit Automotive content and the pragmatics are usually left out but you can't really do some of that with with with once you start to think about language being used uh through an AI and so I'm hoping that this is the opening of the door into really recognizing the social context of language to a greater degree than really people have in medicine people in linguistics have but it hasn't really permeated medicine very much yeah I think that's that's a really interesting point David in particular because uh you know we definitely talked about the impact of AI on workflows and uh the second third order effects particularly in medicine but it's usually assuming that the AI output is some kind of an alert I mean there's been so much that's that's been done about you know looking at automation bias automation fatigue Etc but it's all kind of assuming that this AI is some type of alert or something that will discreetly change the system whereas now all of a sudden we're very it's very clear that the impact of AI on people it's going to be mediated through language and uh you know it does seem like a new Vector right that previously wasn't really part of the AI conversation but now it's critical um great well I really love to open it up to the audience for questions um so yeah if you'll just raise your hand and I'll call on the first one who's uh who raises their hand thank you thank you so much for sharing your experience and this institute talks so my question is about AGI artificial general intelligence again I am a clinicians and also a practitioner of a professor type A Medical University I am more worried about how AGI is going to change healthcare because I am a bit skeptic with AGI than AI so what is your take AGI is my pet peeve I hate talking about it so I think it's I think it's very far away I think we are are deceived for many of the things that like David highlighted about how far we actually are because we're glomming under very surface statistics and in these you know over being fooled and I think that'll go away in time but we're nowhere near close like an embodied agent that operates in the world to be able to understand the types of things that like in theory AGI might even look like it needs but I have to agree that AGI is another pet people I think I think we need to come to a kind of a definition of what AGI means still within the community and until we do that I think it's going to be hard for us to really know if we've reached that point um so I'm looking forward to seeing if people are able to to come up with that type of definition and you know how we think about how that might affect um future regular regulation for example or you know what AI kind of means in in the society for for the ethics of this I TR I have a rule of thumb it's hard to predict the future I've been wrong before but it seems like this is more than five years away and I have a rule that I don't worry about any ethical issues that are more than five years away so yeah I like to just add because you said you're a clinic and the question was specifically from a clinical perspective right um your concerned and I'll just quickly add as a clinician an internist um I think your point about how we Define AGI is probably you know it's really important because AGI means very different things what I will say though is that the difference between working with some kind of prediction model where the only thing it does is predict Aki acute kidney injury you know versus now you have this large language model where you can basically it can do a lot of things I mean it's not AGI but it's certainly the output is much more generalizable much more General in terms of like its different capabilities so that is a difference and I I'll be honest you know I think about it like hey you know I this is kind of what I do you know how what am I going to do that's unique and not purely just retrieving the information synthesizing information rewriting it which a lot is a lot of I think you know what a lot of us do right so I'll just put that out there as a kind of a devil's advocate someone else raise their hands I think you in the back you may have raised your hand first oh go ahead hi this is Wade uh my question is for catty like uh even though hugging face is bringing like so many open source cellular models but still big companies like Google and meta are like bringing like everyday new models which are much larger how do you see open source Community like compared to big Tech uh in future for llm and like I mean for example a lot of these companies are sharing open models I think they do recognize that it has importance for their own um their own progress um you know helping people build off of their models um I mean I think we try to be collaborative with everyone um whether you're a big company whether you're a small researcher I think we're we don't want to shut the doors on on any progress that's being made um but I really do believe in the community's efforts to create better and better open source models um kind of what Jason mentioned we need more more and more better models and I really do see the community working towards that that goal um obviously Google has you know put out some or there's a leaked memo that you know that's kind of one of the maybe uh aspect that might disrupt a moat or so um but I think so I think that they're they are kind of worried about um that type of aspect but it'll we'll see how it plays out I think that it's still an open question you know what what force will win and in this um how do you guys see you like in that way considering the next few years of ail because big tax has like some sort of mode considering the resources yeah I mean definitely there's a lot of um there's a lot of moats that can happen within you know the compute is obviously very expensive um getting all of this data if you're a player that's already been in the space for a while you might have created like a large proprietary data set um I do think that like like I mentioned earlier some of these methods that are coming out where you're able to better fine-tune models on consumer gpus for example I think will help with some of this work obviously we're working really hard to open source a lot of the techniques that are being coming out like rlhf um so hopefully we can you know create kind of a community-led movement to you know bring a democrifice a lot of this technology so that you know the power this powerful technology doesn't just stay in the hands of a few you know major tech companies thank you I think we have time for one more question I think oh okay sure um thank you David that was for the funny talk you made us laugh so I you asked a question do patients need to know whether AI was used in their care or if they're talking to a chat bot I know the question was not for me but I'm going to behave like an llm and think the question was for me um yes we would but what ultimately matters is did The Physician like my doctor was the final decision made by him or her so the question is who regulates that like should it be a regulation does the um should FDA regulate or should the hospital regulate should the doctor okay the final say will be mine yeah I think it's going to depend on the application so I think most of this is going to probably not be it's going to be used as like a decision support tool and the ways we use other tools in which the clinician will be getting an output and and using it in different ways but there may be exceptions where there's some things which are generally autonomous AI decision making that happens and some of that might lead to Better Care since we know they have clinicians often behave in idiosyncratic ways and do things that are not what they shouldn't based on the evidence and sometimes the the data will come in that's that's better and proves it I think you still need an agent there who's going to be the final check and have some notion of responsibility even if the real decision making has really been largely turned over but I think the answer so I do think the answer is going to be that it just completely depends that in terms of how what the role of Physicians is going to depend on the particular application and the particular example and then in terms of what the patient is told that will then vary with that use the more the clinician is involved as a mediator the more this is just like you know we don't explain to them like how the ahr works to generate in the things that we read uh it's as we're looking at a patient chart before going into talking to the patient and so if it's like that well then that's not a problem but the more independent it is and the more it's really the driver even if it's for the best interest of the patient there may be a threshold where you say okay at this point a reasonable page would want to know that and probably the courts will enforce that all right class do we have time for me to ask a question just ask Michelle do we have time you think one question yeah I have a question for you about open source which is um uh so I'm doing a project around something called dual use research of concern or sometimes uh reasonably foreseeable misuse of of AI models and uh you know there was a pop popular or very uh very uh scary publication by some folks using AI for pharmaceutical modeling where they realize that gosh if we just tweaked our model a little bit it turns into like a great bioweapons uh system to help you define new new and better agents for Warfare and there's lots of things obviously open source for that kind of stuff could be dangerous how are you thinking about open source in relation to to the increasing recognition that lots of these applications you know some of them could be used for surveillance tools biorepressive regimes how do you sort of deal with the tension between open source and the potential for dual use research of concern yeah definitely I mean it's obviously a very um big thing that we think a lot about within you know Earth entire team um I mean when you're using any tool obviously it's the human that's the one in charge of that tool and for example creating bio weapons is you know regulated or you know illegal um so it's and that's still going to be the case if you're using whether you're using you know traditional methods or you're using like something like an um AI model right um so really I think we should focus on you know that this is these are illegal use cases there are ways that we can you know moderate some of the these outputs for example um but you can't right now there's obviously a lot of large language models that have been released um so you can't necessarily put the genie back in the bottle in some sense um but also you know we think that creating models um and you know open sourcing them for the larger Community to do more good or to maybe stop some of these bad use cases is another way that you could think about um these models being used in a good way when you open source them as well as I think that some of the negative effects of keeping them within certain communities or certain organizations that is a net negative when you're thinking about um whether to open or close these models so keeping that concentration of power within certain hands overall could lead to more negative impacts than positive impacts all right thank you very much a round of applause for panel thanks Ron
Info
Channel: Stanford AIMI
Views: 26,605
Rating: undefined out of 5
Keywords:
Id: _u-PQyM_mvE
Channel Id: undefined
Length: 72min 6sec (4326 seconds)
Published: Tue Jun 20 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.