What Medicine needs to get right about AI

1 The AI Revolution in Medicine: Beyond the Hype

The transformer architecture has revolutionized AI, enabling systems to capture complex non-linear relationships in vast datasets. In medicine, this has led to remarkable capabilities:

Current Applications

Clinical Communication: When applied to medical language, AI systems now understand medical context and can answer patient questions at a level comparable to or exceeding doctors
Administrative Efficiency: When applied to human conversations, we can now automate clinical scribing and writing medical letters
Workflow Enhancement: When applied to the EMR, with text-to-action & computer use, you could even automate tedious EMR navigations.
Research Advancement: When applied to massive multi-omic biological data in data-rich fields like oncology, the next biomedical breakthroughs will be aided by AI foundation models.

2 The Implementation Challenge

We clinicians will, or already are, using AI tools at work. It’s crucial that we, as a field, speak the same language as those implementing these tools. This is to ensure patient safety (Epic’s Sepsis cautionary tale) and to use the tools properly. They are quite good, and we should make the most of them.

2.1 Understanding AI: Models vs Products

A crucial distinction often missed is that an AI model itself is not a product. Take OpenAI as an example - while they excel at building powerful models, their success with ChatGPT comes from transforming that model into a helpful assistant. As highlighted in this brilliant Stanford talk, considering the specific context and software surrounding the model allows us to be imaginative and practical.

2.2 The Clinical Decision Support Dilemma

Consider clinical decision support in radiology. While companies focus on creating high-performance diagnostic models, the implementation pathway remains unclear. There is practical use in screening and translating reports for patient understanding, but clinical practice implementation remains murky.

Currently, using the model, the main product being created is one that generates imaging reports. Here are some options:

2.3 Implementation Models

Human & AI Case Collaboration
- Clinician works on the case at the same time as the AI
- The AI report is visible for the clinician to use as desired
AI-First Verification
- AI generates initial report
- Clinician reviews and validates
Human-First Verification
- Clinician writes initial report
- AI system performs error check
- Discrepancies trigger senior clinician review
AI as a Co-Worker
- AI handles routine cases & calculates confidence/complexity metrics
- Complex cases routed to senior clinician where appropriate

Current Models are like GPT-4o

Current models lack intelligent clinician-AI interaction. For instance, a very obvious to improve interaction would be to show clinicians a tree-of-thought reasoning trace for clinical reasoning transparency. As of writing this article, these are not the norm. Assume we’re talking about your run-of-the-mill GPT-4o fine-tuned on radiology data, generating reports.

Without sufficient thought to human-computer interaction, it’s looking pretty bleak.

Options 1, 2 and likely 3 cause time-poor and stressed out radiologists. Option 1’s ‘helpful’ reporter product is like a genius who sometimes gets the hardest question right and sometimes the easiest question wrong. In a healthcare setting, there is limited value - more time will be spent on all discordant cases (which may not even result in better clinical performance). Option 2 is option 1 in disguise - you risk over-reliance or ignoring useful outputs. Option 3 is more useful; it sets clear boundaries on the human-AI relationship. By only making the AI visible in discordant cases, it may serve as a good tool to ‘triage’ scans up the chain of experience. However, you run into the same ‘Who is right?’ dilemma.

Financially, only option 4 makes sense to radiology practices and hospitals. Ide & Talamas describe this as an autonomous agent replacing routine work, displacing humans to more specialised problem-solving. If this leads to better patient outcomes, we must choose this option. However, we also need to face significant restructuring of training programs and retrain displaced early-career specialists.

3 Breaking Free from False Assumptions

Our limited options stem from several unfortunate assumptions/starting points:

Our best way to help radiologists is to diagnose for them
The best way to help radiologists is to write reports for them
AI is a black box that cannot truly reason, so we can’t truly understand it
This means that as long as we have high-quality training data of prior reports, we can generate high-quality reports and trust them

Reading medical imaging itself is a process. Why can’t we have asked questions like:

How can we automatically identify and show the radiologist the key references (Radiopaedia/StatDx) they would need to look at to solve this case?
Can we automatically show the patient’s last 5 CXRs, process them and identify exactly where changes have evolved?
Considering the speed of system 1 thinking, how can we best display anomaly detection with attached tree-of-thought reasoning traces while enabling a clinician’s systematic read of an image?
During dictation, can we let the radiologist think out in a very unstructured manner, offering real-time reasoning feedback as well as scribing a high quality radiology report?
Can we automate and adapt reporting for specific protocolised research guidelines?
Can we use LLMs to enhance inter-radiologist communication to get rapid second opinions from leading experts?

4 Why are we here?

Outside of a resource-poor setting, there is little unmet clinical need for an autonomous radiologist agent. The explosion of AI, the abundance of radiology reports and the monetary value in creating a high-quality autonomous agent all culminates in these foundation models that can perform exceptionally well.

However, given its training with human-labelled reports and diagnoses, I question if we can truly grow in medicine with these types of models. Can we get closer to ‘perfect medicine’ by having models that talk and breathe our biases?

Here is a direction I think would be more fruitful, we already have high-quality intelligent staff, why can’t we empower them to perform efficiently and improve to be their best? All of those 6 questions I’ve posed that aim to directly augment a radiologist’s work are tractable now. Note that they are useful products, not necessarily new models (Section 2.1).

Unsupervised data-driven approaches can teach us so much about biomedicine - medicine will look incredibly different in the upcoming decades. We need nimble well-supported staff, with both autonomous AI and better non-autonomous copilots to maximise their clinical impact.

We’ll explore non-autonomous copilots and autonomous AI in more detail here including specifics of how we can think about human-AI interaction.