
Schedule your workshop today

Conner [00:00:04] Yesterday, Kristen walked us through all of her research findings, you know, with Sydnor and Andrew. So we're doing about double-speed to design a solution that improves the experience for the patient after that discharge, when I no longer have reliable access to the doctor or the healthcare staff, I'm overwhelmed with information, and I need an easy way that I can interact with the session. We looked at an idea that is recording the sessions themselves so that you're getting that transcript of the conversation you had with the doctor while you were exhausted, while you are ready to get out of the hospital. I can interact with that AI assistant to get access to both that information as well as doctor-curated medical information. It has the context of my own medical background, any medications I might be taking, so that it can provide advice to me that's as useful and as relevant to me as possible.
Nish [00:00:54] We understand a lot about the technology and what it would take to build stuff like this. Where things get a lot more complicated is making sure that the system is private, it's trustworthy.
Christopher [00:01:09] The responsible AI is really important. As we're going through the process of creating our projects through brainstorming and ideation, it is an essential tenant to make sure that we're delivering safe and quality software.
Nish [00:01:22] Introducing some of these new and amazing GenAI systems, we think we can build something really, really cool that patients will love.
Conner [00:01:31] So what I'm hearing is for each of these different components, we need to build really robust evaluation data sets that are informed by humans who are telling us: here's what the thing looks like in real life. So if a doctor sees the AI system hallucinate, there's some mechanism by which maybe you can highlight the chat bubble and say, "hey, that's wrong" or something like that. And that's informing how we train the system over time.
Nish [00:01:54] These are kind of augmenting their roles. They are accountable for what it is that is released to the patient. So it's accountable, it's explainable, and most importantly, that's how you de-risk and make sure that the system is safe for everyone.
Christopher [00:02:10] The high level architecture that Nish and I were really looking at when it came to this app is looking at the kind of integration of the speech-to-text (STT), being able to take a doctor and patient's interaction, being able to synthesize that and have the patient themselves be able to query and ask questions.
Christopher [00:02:32] One of the big things that we want to be able to test with these models are the use of medical terminology. And I'll have you speak through it, take a look at the transcription.
Conner [00:02:42] Cool, well, let's try it out.
Conner [00:02:44] "The patient presents with an open fracture of the subtercan-- subtercantric? -- section of the femur." I guess a real test would need somebody who can definitely pronounce this stuff right.
Nish [00:02:54] In the field, in real life, people make mistakes, people will pronounce things either wrong or sometimes with their own flair on top of it. And so those are some of the challenges that, in building a comprehensive eval suite for something like this, you want to keep in mind.
Conner [00:03:13] Too often when we're building proof of concepts in the AI industry, you get the output and you look at it and you're like, "that looks good to me!" But what we do is a lot more scientifically rigorous. We're talking about experimentation for thousands, maybe tens of thousands of observations, so that when we tell our clients, "Hey, the AI system performs to this level," we can be absolutely confident that's true.
Conner [00:03:34] It's great that the Data and AI Research Team -- DART -- has been able to do all this foundational research about how do we evaluate subjective criteria in LLMs last year. And then earlier, towards the end of last year, you did all of this research around speech-to-text models and making sure we had a foundational knowledge of that. And so maintaining that baseline understanding of what's cutting edge, what are the best practices, what are the common design patterns, it seems like it allowed you guys to really accelerate the specific research that you needed to do for this project.
Christopher [00:04:03] The real aim of DART and the research that we do is to empower not only our projects, but also to empower our clients.
Conner [00:04:11] You guys have done great work and I have everything that I need to make sure that we hand this off to Design. We can start getting some mock-ups made and some interactive design prototypes put together.
Conner [00:04:23] This solution, once it's been designed and once it's been built, provides such an enormous value to the patient. Because when you're coming out of that experience where you've been in the hospital, you've gone through potentially a traumatic experience, and you're exhausted, having the ability to interface with trustworthy, accurate information, with a virtual assistant that is personable, that is friendly, that is prompt and accessible to you 24/7 is absolutely critical.