AWS re:Invent 2018: [NEW LAUNCH!] Introducing Amazon Comprehend Medical (AIM398)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello everyone thank you so much for your time today my name is Taha catsuit senior leader at Amazon focus on healthcare and AI related initiatives included in the Amazon comprehend medical that we're going to be discussing today a great opportunity here we hope to kind of have a little bit deep dive with you guys today and also get the opportunity to listen to our some of our customers one thing really important over the last decade here in the United States and globally the speed adoption of electronic health records basically your records are now digitized as opposed to these reams of papers that we all have to carry around from one place to another now if you're trying to canna as you do with any kind of data be able to extract insights and actionable information well you need to have access to structured data while the electronic medical record holds quite a bit of that structured data unfortunately the action that you really need to make often time is missing from those structure fields and often time the treasure is within the unstructured part of the records the an structure part of the record that's the most active part of the record this is basically when you have an encounter with the healthcare system is with the patient and the doctor or the caregiver often time documenting that experience or when you transfer from one place to another that records sort of an experience travelled with you and it has a lot of information about your medical conditions but the history of of your condition of those of your family as well as the medications you're on or you stopped along along with other nuances that might be related to this for example this drug might be related to this condition or this medication might have been stopped in the past but there are some other information about it that might be relevant to to you and and so on and the challenge has been as you use natural language processing for those of us have been in the field for a long time this experience about how you capture this information is it's quite broad and complex that you cannot really capture it with simple rules so often time is a lot of manual effort that goes into capturing this information and this is really where the learning part is very very important as you learned from many many in Iraq Asians and encounters as you look at the variety of these medical notes whether it's from your primary care doc or whether it's someone in the hospital or in ICU or really to pathology or related to x-ray or related to triage note so be able to kind of capture all that in in ways that you can build something can extract that information structure it in meaningful ways so you can start deriving these number of use cases that you're gonna hear about from our customers today some of these use cases as you'll hear from Fred Hutchinson we should a research institution is about how can I match a patient with the right hit the right medication to a new therapy that might be more beneficial to them as you'll hear also today from Roche but how can I look at the longitudinal health journey of an individual and be able then to stitch all these together and aggregate I can have a better picture about the population that I'm that I'm evaluating and then how can you be able to do this at scale with regards to accuracy with regards to volume as well as regards to a wide breadth of things I can extract from these medical notes so with that is some of the things that we've been hearing from our customers are quite varied I mean some of these customers are hospital systems on health care providers that are looking at how can I identify patients that might be at risk of and/or might benefit from a different treatment or from insurers and billers and coders that are looking at did I capture all the treatments and procedures accurately so I can build for them accurately and be able to do this across entire record or as you look at pharmaceutical in the end recruitment of patients into the right clinical trials that's accurate and and and and precise so without further ado I'd like to introduce you now to the most honorable mr. Arun Ravi who's been the product manager for comprehend medical I was going to introduce you to the service and start diving deep as well as introduced to the customer thank you very much thank you thank you thanks really happy to be here and excited to talk about would be built with Amazon comprehend medical so Amazon comprehend medical is a new hip eligible service that uses machine learning to extract medical information with high accuracy reducing the cost time and effort of processing large amounts of in structured medical text so the way that you know I kind of want to take the next 20 to 30 minutes as you know again reiterate the customer problem really focus on what we're trying to solve you know what is the service what does it do we've taken a lot of thought and effort and really putting the right features together together to give the output that we think will give customers the most benefit secondly like where can you use this we see this as a really interesting service where you can consume it in so many different ways it fits with existing applications and can help you build newer applications and have a to of our great customers come on stage and talk about some of the things that dahab mentioned and you know I usually like to end with a call to action essentially like what we would like to see what feedback we would like to have from customers and how we believe we can take this further so again drilling down on this 1.2 billion on structured clinical documents are created per year according to the hems health story project all documents are not created equally of size various oncology we have longer notes you may have admission notes soap notes there's such a huge variety of data that's actually created and in that there's so much critical information that's actually trapped in these documents even if you just take a step back and try to get the inside out of you know a discharge note how do you do that at scale how do you do that consistently and how do you do that in an accuracy that actually helps your applications as you build on it and these are problems that we're honestly they're not going away if you look at it there's more data being created and a lot of retrospective look at data is based on billing data not actually on patient data so in the end patient data is actually the source of truth and that's what we want to kind of unlock with our customers so they can see what more they can do with their applications so looking at what we've launched we have two api's the first is the any re api and then the ph ID api so we have five entities or categories that we extract and each of these entities will have subtypes as well and I'll talk about how we kind of tie that in so we have medication medical condition tests treatments procedures Anatomy and pH I protected health information now once you've extracted okay that's cool that helps now how do you tie in the related entities together so medication could have dosage router mode strength frequency so we do something very unique once we extract these entities in this relation relationship extraction so we basically tie medication to dosage an example could be test and result and there are many more because there's different subtypes that are associated with medication and with test treatments and procedures and those are the two entities that we actually do that for but we don't stop there because we realize there's a context in the note and there's a lot of that that we're trying to surface through something we're calling entity traits so an example of an empty trailer it is negation that we've launched to it so negation is you know patient denies taking this medication and I have a few examples and negation is not just you know denies are not taking there are variables around that and we're very happy with kind of the accuracy in the results that we're getting to kind of provide that to customers and easy to consume format then the second one is whether it's a diagnosis sign or symptoms so again I'll talk a little bit more about it and what the differentiation is but being able to understand that is extremely critical in building smarter applications and I guess you know what I'd like you guys to take away from this if anything is like we want to distill a complex process into a very simple API call also add you know the service is HIPAA eligible and it's stateless which means that we actually don't store any customer data that flows through the service by default so no training will be done on any data that flows through the service the second API is the pH ID API and that is just extracting pH I and the reason we separated it is because there's a lot of compliance data security David privacy use cases that we like that our customers have come to us and said this we would like to do this at scale and so we felt it was appropriate to provide it as a separate API so let's let's go step by step so I explain what you know the API actually does let's see how it goes from one end to another so let's start with the entity so there's simple text there mr. Smith is a 63 year old gentleman with coronary artery disease and hypertension current medications taking a dose of lipitor 20 milligrams once daily so let's start with the extraction part so we extract you know pH I the name the age we have anatomy which is coronary artery so that's a system organ site if you look at medical condition coronary artery disease diagnosis name hypertension is a diagnosis name and medication lipitor we look at the dosage which is 20 milligrams and the frequency that's one daily as you can see also in the API structured like basically CLI cleanse really easy to use I mean it is it is it is very easy for you to run this and when I do the demo of kind of the JSON output you'll see how we put a little more thought into it to make the output of the API usable right away and that's something we're really focused on and now in in the future as well as we keep iterating on this product now moving on to the next step so the next step is relationship extraction so let's take a subset of that and look at it so when you take current medications you know taking a dose of lipid or twenty milligrams once daily now what we do is we actually tie the once daily and the dosage to the brand name here which is here lipitor and we do the same thing for test treatments and procedures so you could have a certain test and you have a result we actually tie the result to the test and in the JSON output which I'll show a little later we actually nest it within that actual parent category so the advantage is that it's literally if you take it and put into a database you can sift and sort immediately and those are some kind of minor changes that we've done that we think will really enhance customers not only using this in current applications but building on it as well now NTD traits so let's let's go and you know kind of just say what is an ante trait it's more contextual information about what has been extracted the ones that we have at launch our negation so if as you can see an example here which is you know discontinued as an example denies taking not taking stopped there's a variety of different ways that actually clinicians document this and we had to make sure that we covered a wide variety of that to show that okay this is actually negated and that information is extremely important especially when you know as Taha talked about earlier some of the use cases when you're looking at patient history when you're looking over a long timeline this is actually critical information even when you get into clinical trial recruitment you need to know what medication they may have taken Avastin now they've stopped there's different things that come in to understanding what the patient is taking at any given point in time over a period of time that is is critical to capture and we're able to do that with with comprehend medical now when you go into diagnosis sign or symptoms so the reason why you know the differentiation is that you know a symptom is something that's patient reported a sign is physician reported and diagnosis is the cause of the results of the symptoms and that could be you know through physical findings lab reports radiology reports so in the end there this is a differentiation it's very important to know that you may want to create an application that just pulls all the self-reported outcomes from a patient versus what is physician reported so there's different ways that you can now actually sift and sort through that data which we believe is really useful for for again a lot of the applications that we're seeing so with with the pH ID API again it's a very straightforward API it extracts pH I and what we've done is so the any re API also does extract pH I so you get you get it with that but we've also separated it so that it just focuses on this on this model on this extraction and as you can see here we not not just we don't extract pH I or tell you there's pH I we actually identify it and we're doing it by the relevant patient identifiers as described in the HIPAA safe harbor method of the deification and as you can see in this example we you know it actually tells you what the subtypes so name age address you know teacher for example so I you know it's kind of a modification of the previous example but he currently lives in Seattle and works as a teacher he's pcp dr. John you know works at the University of Washington and so essentially we you know again capture the name and capture the University of Washington as an address and and and this we see as a really important use case and I'll jump into the use cases later and talk about where we see our customers using this and where we believe that it can go as well but being able to do this at scale provides something easily deployable you know whether it's hl7 messages or anything that's going between different healthcare systems or within an existing system between different hospitals there's easy ways to deploy this and we see a lot of our customers excited about it and what they can do because this has been traditionally a very manual process so I'm just going to switch into the demo quickly so this is the live console I don't know if you guys have seen it but you know we're pretty excited about how we've actually represented this information because more importantly than just returning an output we want to make sure that you guys can actually see the relationships you can see what we're extracting and understand that visually because in the end really trying to break out these notes and put them into applications if you really understand how the how the service works before doing that so this is a this is a de-identified clinical note and basically you know pretty pretty straightforward and let's analyze it and so these are the insights so let's let's kind of deep dive into this so now you can see here we we pull the the age and the profession and this is from you know this is pH I we have sleeping troubl which is a symptom so it's under it's under diagnosis medical condition but it's labeled as a symptom you can also see that how we capture other again symptoms here and with the system organ side with you have face like slightly and we also kind of separate it to make sure it's easy to read because again you know clinical notes tend to be very dense and so it's important to really show those relationships in a unique way now if you if you look down I like I like focusing on relationship extraction because the extraction is one part but really being able to relate these entities is is critical because in the end it allows for a lot more structured queries in the end if you know if you want to search for how many my patients are taking a certain dosage or how many patients are taking a certain dosage as a certain frequency that those are things that are actually extremely valuable and in the end you can do those searches right off the bat and we really see that important for a lot of the use cases we talked about and but clinical decision support we feel is something where this can impact immediately and and just being able to do this in one API we see that as kind of a lot of value that we're providing to our customers Thanks so if you look at the let's go into vyvanse the medication so if you look here so vyvanse is the medication 15 milligrams you know the router mode and then you have at breakfast daily so what we do is we basically tie that through relationship extraction and in the end you know kind of what I was talking about you know it gets complicated a lot of a lot of medications have just different ways that are being represented and the fact that we will catch us at high accuracy is you know is kudos to the team and being able to really do that but I think it also kind of ties in with working backwards from the customer as well and and we do that for you know test treatments and procedures as well and there's not an example here but essentially you can get if you know you have temperature weight and you have the actual results it's tied to that so if if there are notes that may have multiple you know you can be taking many medications you look at patients with chronic conditions taking multiple medications or you look at patients that are taking you know depression drug and going through a chronic disease are just so such variability between the conditions that patients are going through being able to separate that and it gets much more complex and that's the thing that we're really trying to break down and a simple to see you I as you can see here again when you look through the note there's just different and and we've identified all the other kind of entities under it so you can you can see basically what's covered so lungs would be a system organ site but lungs clear you know which create which is a medical condition a diagnosis and so in the end we capture it in a way that that's that's really relevant and you know when I get into the JSON output I like showing that part to customers because it's just we we really made it very simple and and you know I recommend you guys tried right away because it's just really simple and you can use it off the bat so it's not going to require you know all the you know the ml training that you know you would need to do this and you can literally do those applications or do those searches right away so I definitely recommend you guys look into it as soon as possible but you know we also help sort it and sometimes viewing this and in kind of a list view also make sense in relation to what's being extracted and so for example let's go into vyvanse again so this is the brand name and if you open it we have a drop-down of all the entities that were extracted but again this is relationship extraction so yeah the category is medication but it is tied to tie to violence and then again we do that for clonidine as well and again this is just a very quick way for for customers to really view what we've done going back I actually want to go back up and pick up negation so here we have like for example no oropharyngeal lesion and so basically it is you know a diagnosis name and it's negation so that's important it's we've recognizes as a diagnosis but we're saying no so that's negated right so how do we actually you know so as you can see here so here's the entity that was extracted with the score so we also do provide a conference score and the reason that's important is that we want to make sure that you also see the fidelity in the data and maybe push you know with the the data where the conference core isn't high enough either please give us feedback or at the same time like you can actually push that of a application or you know have have some sort of kind of look into it to make sure that it fits your need but if you look here in in terms of the negation so this is where you know we put the traits on the right hand side and you can see here that you know the negation score is you know as a sign so first of all it's a sign versus a symptom or diagnosis and that it's negated so it's a fairly you know high score so we provide this that we're in any of these extracted kind of diagnosis you have this extra information to actually put it in context and that's something that you know again really important in in healthcare in general now this is the JSON output so let's let's look into this so pretty straightforward you know there's an ID there's an offset so where it's located in the document we actually have the score the text the category and the type and so this is where the traits would be so let's jump into some of those examples that I was just talking about earlier just to kind of see how it's represented here so here we have the medication so if you look at vyvanse then you know you have the category you have the brand name and we've in the attributes we've actually in the traits we've basically nested this output so you can actually see what's related here and again each each attribute with hat which will have its own kind of structure but again you know we the type and the score and the relationship score so we also provide how confident we are that there is a relationship between these two entities so there's different ways that you can actually sort through this data to see if it's actually relevant and so that that is extremely valuable again moving on because you know there's different things that we're doing right we're trying to do we're not just extracting we're not just doing relationship we're also make sure that the anti traits are there how do we make sure we provide this in the easy consume format so you can actually sift through it and this is this is a great way of being able to do it especially when it's nested because it's in relation to actually what what the parent category is moving down I mean same thing with clonidine and again you can see this here where we nest it here and and you may have you may have negation to like a non nested entity as well so you know for example here with this with the Boogie interior turbinates you can see that you know it's a sign and here's the conference core as well so we do it with the individual entities as well it's not just something that exists with a relationship extraction it's a across you know the entities that we that we extract for the in this case it's medical condition all right with that I am going to switch back to the presentation so I like focusing on use-cases how our customer is actually using this and so you know I'll deep dive into a few of them and some of them will capture you know some of the issues in health care and where we can help accelerate but the same time NLP currently in a lot of these systems is is kind of built in a way that is not easy to deploy across various products in your organization and in any organization and in the end that's something we really want to address in a way that we feel can not just help you kind of extend the NLP you may have currently that you're using but maybe in some ways allow you to build something new so patient and Popish and health analytics for example and I'll deep dive into a particular use case that I like it's it's yeah it's fairly broad right and but in the end you know let's take clinical decision support out of that right now how do you how do you do anything that works across a hospital right how do you do something that can create that single-lens so then that's an example of something that works right off the bat right now revenue cycle management again revenue cycle is a very long process there's you know you have to look at insurance you have to do coding you have to do claims management you have to work with insurer there's different aspects of that but medical coding is an extremely difficult part it's become extremely siloed as well where you will have coders that are very good at cardiology you'll have coders are very good at oncology how do you actually work across that and that's something that we really think thoughtfully I think thoughtfully about because in the end you know if the more that they're stretched they're working tons of shifts hospitals are three to six months behind on on their coding tasks like how do we help not not you know obviously not replace but give them a superpower that's the way I look at like how do I give them a superpower to be more efficient how can I let them do their job better can I really support that because we really all know especially if you're in the in the space everyone's getting burned out there's there's just too much going on we really need to help them you know be more effective and we see this that's what we see the solution fitting in us now pharmacovigilance essentially you know tracking a drug post post-launch truly important there's a lot of factors in there but I'll talk a little bit about how much information and where it actually comes from and a lot of it's in text so how do you go through that PHA compliance the use case I've talked about a lot but like there's a lot of information flowing through different systems how do you keep track of it I mean in the end just telling you there's pH I there is not useful you actually need to tell them what sort of pH I is and then is there is there ways that you can do more processing on it does this help accelerate your move to the cloud you know with this information there's so many things that you can do that if you can really identify that at high accuracy now clinical trial management again is management is a broad term I mean there's you know three phases for if you take to A to B but there's you know four phases that you need to go through and the patient population increases so the amount of documentation per phase increases how are you keeping track of that a lot of that some structured so in the end there's so many of these different areas that that need NLP and where it's really unstructured what you know what can we do to help what can you do to accelerate and that's really kind of the way that we're approaching it but I like putting what else because what else I mean what else can you do now that you have you know and a service that you can deploy at ease with you know the accuracy and the scalability the pay-as-you-go approach that you expect out of a AWS or amazon service so I'm gonna talk a little bit more about the use cases and I like some of some of these cases because they talk about not not necessarily the difficulties of healthcare but the difficulties that physicians and providers and clinicians and nurses and lab techs are actually going through and and and that's something we really want to want to help with and we feel like a lot of our customers can help with that as well so again unstructured data is extremely difficult to mine you know clinical team so if you just look at you know patient coming to the ER going into the ICU going into the general ward let's take that workflow right now you have 120 clinical decisions made per day in the ICU they usually work in two shifts right 12-hour shifts sometimes they're eight hours really depends now how do teams actually transfer information now they're trying to capture this to progress notes they really are they're trying to do whether they're using a s or whether they're using speech-to-text or whether they're documenting it they're really trying to document this but the problem is that once maybe within the ICU you have control but if you're you know the ICU of a Beth Israel you know it's gonna be kind of crazy now how do you do that right but the second thing is that how do you make sure this follows the patient you know that's kind of critical when you're in the ICU or when you're in the ER there are decisions being made in a split second and it's through the training of these great clinicians that they're able to do that how do you capture that for someone downstream I mean those things are extremely important so in the end can you create a single lens for your patient when they initially come in to when they're discharged is that single-lens useful extremely useful when you're in the hospital but how about after right how do you keep track of a patient as they go through the system they meet many physicians again these are ways that we feel that this solution you can you can extract create a timeline that's it and you can literally just scroll through it if you'd like right there's so many ways that you can use this now that you have an application like this and that's our service like this and that's something we're truly excited about because you feel that allows a lot of our customers to do these things easily and that's that's what we're really aiming for now let's go into medical coding so the process of coding and we all know the stats right I mean icd-9 to icd-10 switch so let's take the the icd-10-cm the clinical modification codes right it went from 13,000 I think thirteen thousand to 69 thousand right that's not even including the procedural I think overall if you do that it's like seventeen thousand to like a hundred and forty thousand or something like that I mean that's crazy right there's more depth so the thing is what that's interesting with the ship is there's a depth no now when you when you create something hierarchical like similar to sno-med or any other ontology like that you need accuracy it is extreme porn you get exactly what you need but going also back to that that that process is complex but how do you enable medical coders to work in different areas of a hospital because they're very specific to that area but if you can deploy them in different ways it becomes a lot easier for you to improve coding efficiency which is a huge need but also reduce the burden and I like bringing that up a lot because in the end we're really just trying to do that with this we really just want to reduce the burden on the system and allow them be effective in in in different ways that they're deploying so clinical trial management again and you know identifying the right patients I mean if you look at a patient that has come through maybe drug-resistant you know just say non-small cell lung cancer stage 3b you know gemcitabine cisplatin taking Avastin that's a lot of things to capture and that's that's a very hard search now how do you how do you actually you know identify the right patients but also more importantly how do you identify the patients quickly you know this is huge issues and clinical trials in the US where they're not able to find enough patients and the process is insane like if you think about how so as just say I'm a PI I'm gonna annotate the patients I see I'm only going to focus on my area I know what I do I know how to annotate that so you may create a pipeline there but what if you could do it across right how patients could be found in and all of us have had that kind of personal thing we've gone to clinical trials.gov we've looked at you know looked at that data but how do you actually help you know find patients for those clinical trials I mean that is a way that you can really accelerate the process and and if you look at the way that we're trying to apply the service and ingestion in general our tenants it's it's really focusing on that how do we accelerate that but even even if you're looking at accurate indexing across large paper patient populations there's a lot of data on clinical trials that are like just unstructured data that's created how do you know how patients are doing how do you know how they're going through therapy see oncology may be different because if you take a late stage of cancer it's a it may end up like within the immunotherapy may be six months right it may be a smaller trial but how do you do it for the longer trials how do you do even beyond that right and these are areas we feel that you can deploy comprehend medical very easy and get that data and and put them into a clinical trial management system or anything like that to just accelerate this process and there are different processes there's different parts of a trial and you know as you go further along the trial the population increases it becomes a lot more complex it's a lot of data now pharmacovigilance again there's so many different avenues of reporting an adverse drug event or an adverse drug reaction so again it could be nurse calls it could be emails it could be faxes it could be chats it could be Facebook posts it could be anything now how do these teams actually go through this data it's very manual it's very manual unless it's a nurse call and she transcribes it and then you capture it right so there's just a huge manual process here can we accelerate that yeah what if I mean let me just you know put this out there what if you could create an adverse drug event data base post launch immediately where you essentially capture you look at everything that's been extracted you put it in a database and as you go post you know phase four of a trial right post launch actually create a corpus of what all the ADEs could be back you can do that you know and those are things that really can help the ecosystem and push it further and we see that as something again you know decreasing burden if you there's a theme right I mean simplicity of the of the of the API kind of the areas that were really use cases there we feel very passionate about and we're targeting but also decreasing that burden on stuff like in the end like every if you seen every slide on the on the right side you know we know that outcome can be improved with throughput but how do we decrease that burden and that's something is really important to us so pH I compliant so this is again you know there's there's to maintain HIPAA compliance and the technical requirements for pH I as you know information goes between systems is really difficult now you know there's data privacy there's you know there's accurate ways you know of doing maybe pH I repository but not accurate enough where you can actually point to the data and understand what is there then once you extract it can you change things can you maybe resynthesize the data can you maybe accelerate moving that data to the cloud use sage maker why not right I mean there's so many different ways that you can now take it because you're able to deal with pH I in an effective way and again you know whether you it's identifying or masking for data security those are really you know important areas and this is kind of like I like to call those I don't say it's a boring use case Monday in case but this is a bread-and-butter this has to happen and and when we created this service that was something we needed to target right away because in the end if we don't have that everything else there's always gonna be a question so we feel very very confident and and and really happy with how this service has evolved and again we're following all the patient identifiers that are described in the HIPAA safe method of save power safe harbor method of Deana fication and and truthfully like any other aw service it's it's truly just getting better and that's that's kind of again another ten of the team just keep improving keep iterating keep making it better so you know our customers can make really really impactful solutions so with that I'd like to introduce Matthew and very excited you guys will love what he has so I'll stop there so thank you Thank You Aaron I get to talk to you a little bit about cancer research and cancer which is not as much of a fun topic as many of the things that we've had to had the chance to talk about this conference but it's a very interesting time in cancer right now the conversation is starting to shift over the last few years because of advances in in novel therapies particularly the the range of immunotherapies so using a patient's own immune system that Irene sort of referred to we've started to talk about curing cancer not just treating cancer and this is a this is a tremendous shift for the industry and the this has moved us into a period where we're really really focused on time it's not a question now of whether we can develop curative approaches to cancer but when how quickly at the Fred hutch Cancer Research Center where I come from we've put a stake in the ground we think this needs to be done by 2025 so what is it going to take to get us to 2025 Fred hutch for those that don't know is a forty three-year-old Cancer Research Center based in Seattle which together with our clinical partners the University of Washington Seattle Cancer Care Alliance and the Seattle Children's Hospital forum the Seattle the Seattle cancer consortium Fred hutch is best known for having pioneered bone marrow transplant as a treatment for blood cancers this was done decades ago it earned a Nobel Prize but more importantly it saved upwards of a million lives globally so far and we remain pioneers in this space of immunotherapy research so we have our researchers that service physicians and our clinical partners we drive clinical trials and clinical trials is where I want to focus because this is how research gets to patients in an immunotherapy space there's there things that are a little more complicated where we're working to re-engineer a patient's own immune system we have to advanced laboratory processes to make this work so let's talk a little bit about how how we move from research into a therapy into a medication that you can take the process is pretty complex so there's a period of research sort of a general cartoon approach basic science discovery the only way to translate that into clinical care is through clinical trials and clinical trials are a very complex time-consuming process that is is something that we are focused on how can we make this part of the of the pipeline go faster and then finally there's a regulatory approval phase to make sure that everything that we do is safe as to the extent that it is to the extent that we can make it so some things have happened in recent years one is that we've actually gotten a little better on the research side we've had new technologies that have come in to the research side genomics single-cell sequencing advanced imaging we've actually started to to speed up the research engine to the point now where we actually have clinical trials that are pooled up waiting to start so why can't we start those why can't we move those into into action so that we can bring this research into the clinic and the answer is that clinical trials are hard so if we look across the patient population many patients many cancer patients are being treated successfully by conventional therapies but many are not and for those that are not clinical trials are really the only option that most of these patients have today it's estimated that about 5% of patients who should be on clinical trials are actually on clinical trials so it's actually pretty tremendous think about how the SLOS research not not let alone how this impacts the individual patient so clinical trials are just hard when we start a clinical trial trying to find the right patients takes time it typically takes more time than we think it should most clinical trials actually fail to find enough patients and this is a problem that just gets worse as we move into this era of precision medicine where where as we focus the therapies more and more finding the specific patients that match to a clinical trial becomes harder and harder and it turns out that a substantial number of clinical trials fail to enroll any patients at all these may be trials that have therapies that would benefit not just individual patients but but broader broader populations and it's not happening this is a young crowd here this is interesting but I'd like to invite you to to take a look at your two nearest neighbors so statistically one of the three of you will be diagnosed with cancer in your lifetime we have to make this engine go faster so how do we make it go faster well what's hard about clinical trials a-room referred to this early on is that most of the information that we need on the research side to identify patients to understand what clinical trials are feasible lives in unstructured text documents in clinical notes in the narrative that physicians and specialists create around a patient the way we have access to this today is through clinical data experts who will sit and read through these notes in a recent project that we did we could read through with professional abstractors people who spend their whole days doing this work took a few hours to go through a patient record this was actually for a relatively simple condition for some of the more precise in these immunotherapy clinical trials the abstraction piece might be 24 hours 36 hours substantial amount of time but in this case a single abstract er can only look at 3 patients a day we have tens of thousands of clinical trials that are waiting to start it's been estimated that if we were to try to clear this backlog of clinical trials and this is not just in cancer now this is across across the whole space that we would have to identify about a million patients so even at this modest two-and-a-half hours it's about four million hours of abstraction five hundred years almost worth of abstraction just to move forward the clinical trials that are that are waiting today so this is why we're really excited about comprehend medical we now can start from the same place and with simple API calls extract information specific about the patient their disease their treatment their the drugs that they've been taking the dosage regimen all of this is the information that an abstract er needs in order to make the assessment of how to connect how to how to understand the the patient record and whether they can connect this to a particular clinical trial this gives us from the electronic medical record a computable more or less comprehensive view of the patient so right now from a research perspective probably three-quarters of the information that we're interested in is locked away in those clinical notes but having a structured framework like this means that we can now search across patient population it means that across our entire population we can now surface these connections these similarities among patients to make the process of identifying what we call cohort selection how do identify that the possible population of people that might participate in a clinical trial now we can do that much faster so from our standpoint as a research organization we take data from our clinical partners for research use and we can now stream that through comprehend Medical such that as we build our clinical research data warehouse these data are now annotated they carry all of this rich annotation that comes from comprehend Medical which gives us gives our researchers the opportunity to interact much more directly and comprehensively with the data if we just look at the annotation process pulling out these specific entities a really advanced abstract er might be able to do a couple of notes an hour in a recent study that our pilot study that we did with with comprehend medical this is where we are and of course this is completely scalable so this is removing an enormous and really unnecessary bottleneck today in this clinical trial process this means that we can shift our focus from just those few patients that we can apply this very manual process of deep annotation and abstraction and look across the whole patient population potentially connecting patients to life-saving clinical trials this is really a question of time both for researchers but really for patients and something like comprehend medical gives us time that will help us get to our goal of 2025 so thank you very much [Applause] with that like to introduce Aneesh from Roche Diagnostics Kjar well hi everybody my name is a nice tutorial try it again I I'm a director of analytics engineering group in Roche Diagnostics information solutions so for those of you don't know Roche Roche is I've been around for a hundred twenty years and it's a leader in both pharmaceuticals and diagnostics and as part of this it's really a leader in looking at how to use science to really figure out how to treat patients better one of the you know key concepts that a lot of people talk about are things like decisions clinical decision support or precision medicine and really the idea here is how do you take all this information about a patient and use that as a way to figure out how to best treat the patient and really arm physicians and care teams with all of this information so that they are have the right the right options in mind as part of you know rocha strategy we launched this platform now have a five portfolio decision support portfolio which really looks at how to to enable physicians with this data and this information when when a patient has cancer it's you know time is of the essence says we were just hearing about and really you know what a patient wants to hear about is that the physician has all the information at hand that they can make these decisions quickly and that there making sound judgments and what we want to do is really build digital dashboards that bring all this information together and we then build products on top of it to really enable physicians to understand this data so one of the challenges in hospitals is that this patient information is distributed across a lot of different different systems so you could have data in electronical electronic medical records you can have radiological images you can have laboratory information you can a pathology reports radiology reports it's it's so much different information and the state is often siloed based on a particular specialty so for you know physician and a care team to be able to look at this data they often have to look at all these different systems and then the other challenge is as we've been talking about a lot of this data isn't structured so what we do as part of the Navitus is support portfolios we build integrations with with hospital systems we bring this data in we standardize on things like fire we bring this data into our our platform and we build these these products to really enable care teams with you know how to make these actionable insights and and how to use analytics so we've talked we've been talking about unstructured data right for us one of the key things here is you know how do we unlock all of this on structural data so that we can have a comprehensive view of the patient but not just a comprehensive view of the patient but a view of the patient that really follows the patient over time from the first time they come into the hospital to the time they leave to the time they come back to you know all the treatment decisions all the medications that they've been given we really need to be able to have this longitudinal view of the patient and again to be able to enable physicians to to understand that the data the best but as part of our nap novel five portfolio we also have some additional challenges so we are a you know we're running a multi-tenant platform that's distributed across the world and as part of that we have all these additional challenges with unstructured data so we have these diverse set of customers distributed across this means there's so many different languages that we need a support we have all these different diseases that we will be supporting right now we're focused on oncology but with these different diseases means different terminologies each hospital each specialty they could have different report formats and all of these could be using different terminologies and this makes this unstructured data and just healthcare data in general a very challenging problem but I'd also say a very interesting problem too so here is a sample TCGA report so for those of you don't know TCGA is the Cancer Genome Atlas and this is a public data source and it's a joint effort between the NCI NH NHGRI and it's really an amazing data set so if you're not familiar that definitely recommend you look at it and this here is a very very simple reports there's if you look at you'll see there's such a wide diversity reports you'll see reports with tables you'll receive reports with sections key value pairs it it makes this whole problem very challenging this is like I said this is a this is a very simple report and it looks short you see two paragraphs but it's actually very densely populated with information and you also see there's of course some handwritten notes too which makes things fun so we had one of the the data scientists on on my team go through and really just annotate and curate it to see what what's in there and you can see a bunch of scribbling and this is what you know an abstract ER might do this is what a physician or a nurse might be doing in their head and you can see there is a there's a lot of information there and there's these entities the relationships between these entities and you know the current process is often to to manually create this and that's obviously it's very time-consuming it's expensive but it's also prone to error often it'll have to have multiple abstractors really look at this data to come up with some consensus here's what it would look like if you took that data and just really simply extracted the entities and I would think you would agree this is a lot easier to to manage do you now can look at this and you can understand the characteristics of the patient the cancer and early ultimately their diagnosis so this is users are easier to understand but not only that this is something that could then be fed into a machine and this is really important because this is really what you know we can use as a way to surface in our products and this is what we can use to to power our analytics so as we've been on this journey our team really identified two needs and I think it's obvious based on what I've said but one one need is NLP so really how do we unlock the structure data but the key part here is just NLP just general on the P isn't enough this has to really be NLP that's specialized for medical data it's been trained on medical data otherwise it just doesn't work you know medical data is so it's so nuanced the language is ultimately like a different language and it's a lot of work to have to train you know all these NLP models the other part of this is a lot of these documents that we get you know sometimes we get you know nice just text that our physician notes but often we get documents that are scanned and ultimately it comes to us as PDFs and as I said before there's a lot of tables and a lot of structure to it so this is where we really look at how do we how do we extract the data from these documents and make it machine readable but also how do we retain that structure so that we can optimize NLP just pulling out the text kind of randomly losing the structure is not sufficient you would really have to to retain that structure and we're really excited to to have this partnership with it with AWS really excited to to be here on stage and really the timing of you know both of these services so Amazon's comprehend a comprehensive that of course we're here talking about and then the other one about I don't know if you guys know about was you know Amazon's Tech's extract which is really what they're calling the OCR plus plus service it your perfectly timed as we kind of embark on this journey of how to tell unlock structured data and you know as we think about as we thought about you know what are the what are the criteria for choosing this you know a lot of this a lot of standard criteria I think you know all of you guys you know think about you know we've obviously want things that are you know follow the the applicable privacy laws things like HIPAA we want things that integrate well with alw services another thing is we really love services that are serverless that means we don't have to manage servers and then for us scalability is a big thing you know where as I said we're our platforms distributed across the world and we really need to be able to handle large volumes of these reports here's kind of a conceptual overview of how we are thinking about how these these services will really enable this data flow so on the Left there's the the hospitals and on the right is really you know the the kind of the databases and the analytics that would that would really power the the now five portfolio so if we take the the simplest example which is on the the bottom there really we're taking the machine readable structure text that's coming from hospital EMR systems we start on an s3 hopefully we can do some pretty minimal processing of that and we can then stick it in a relational database like Aurora we can make it searchable on something like classic search and we can do our analytics machine learning and in the various services but then at the top this is where you know these services become really enabling so how do we take something like scan documents or raw PDFs so we would bring these in from the hospital's store them on s3 we would then want to apply text extraction on to to pull that information out we would then store that extract extracted text and this while retaining the structure and store that on s3 and then we would run a run NLP such as through comprehend medical and then again stored that structure data now on s3 so now we have the structured data that came from from these PDFs as well as the structured data that just came directly from the hospitals we can bring this information together and now we have a true you know comprehensive view of the patient that that the physicians can use to to really understand the patient's and really know what are the options for best treating those patients NLP is really going to be a journey for us it's I think it's not like we're just gonna turn it on and all of a sudden everything just comes through automated I think that's especially you know in this field where you can't you know have risks of data being incorrect especially for what we're doing so we're gonna we'll start with trying to automate as much as possible we'll start out by like so right now you know physicians will be reviewing pathology reports how can we make that easier then we'll focus on how can we extract entities and their entity relationships that really have high confidence these would be things that you know we determined to be low-hanging fruit things that have high confidence values and we start to start to automate it and then as as we gain more confidence as we understand what the low-hanging fruit is we will automate this more more bring this together and we will then have more structured data which ultimately improves our data quality which we can then use to better enable clinical decision supports I really appreciate you guys listening me I've enjoyed talking you guys and I think this is a hard and I think it's a challenging problem and if you'd like to join us on this journey of trying to solve it please reach out to me go check out this website and we'd love to have you part of it thank you very much oh sorry with that you know kind of call to action this is where you can find comprehend medical a blog was out about it as well so please check that out one thing I'd like to say we're very customer obsessed team so email me ask me questions really want to work backwards from your needs and you know really solve the hard problem so feel free to reach out to me and you know we're excited to talk to you guys also please I think please complete the session survey when you can all right we'll take be taking questions off stage if you guys are interested thank you [Applause]

Info

Channel: Amazon Web Services

Views: 6,847

Rating: undefined out of 5

Keywords: AWS, Amazon Web Services, Cloud, cloud computing, AWS Cloud, AIM398, reinvent 2018, re:Invent 2018

Id: cJ3eUPOXV4Q

Channel Id: undefined

Length: 55min 36sec (3336 seconds)

Published: Fri Dec 07 2018