Headlines such as these that tout the abilities of artificial-intelligence tools to surpass¹ the skills of physicians are becoming more common.
大肆宣扬人工智能诊疗能力赶超执业医师的新闻标题如今屡见不鲜。
But an advanced large language model (LLM) beating a physician at a single task doesn’t necessarily mean that AI is ready to take over medicine in the real world. Nature spoke to researchers studying the use of AI in health care to understand which ‘AI doctors’ have shown the most promise² so far — and when such tools might take command of medical diagnoses³. Some scientists point out that various AI systems are already handling simple medical tasks, such as taking notes and even renewing prescriptions⁴, but they say that physicians can never fully be replaced⁵ by machines.
但一款先进大语言模型在单项任务上胜过医生,并不代表人工智能已能在现实中全面接管医疗工作。《自然》采访了深耕医疗人工智能领域的科研人员,探究现阶段哪类 “AI 医生” 发展前景最佳,以及这类工具何时能主导疾病诊断。部分科研人员表示,各类人工智能系统已经可以处理问诊记录、续开处方等基础医务工作,但医师绝不会被机器彻底取代。
“Medicine is messy and patients don’t always have textbook stories to tell,” says David Wu, a resident physician who studies AI at Harvard Medical School in Boston, Massachusetts. “I don’t think we’ve proven that these systems can handle that mess.”
任职于马萨诸塞州波士顿哈佛医学院、同时研究人工智能的住院医师戴维・吴表示:“临床诊疗错综复杂,患者的病情从来不会和教科书案例一模一样。目前我们还无法证实人工智能系统能应对这类复杂状况。”
Still, a few demonstrations have got researchers excited about the AI revolution brewing in medicine. One study, published in April in the journal Science, concluded that an advanced LLM performed better than physicians when evaluating the conditions of people visiting the emergency department at a Boston hospital. When the AI model — called o1 and developed by OpenAI in San Francisco, California — reviewed the information recorded by hospital staff members during a visit, it got the diagnosis correct or almost correct in 67% of cases, compared with around 50–55% for the two human doctors who participated in the experiment.
尽管如此,多项实验成果仍让科研人员对医疗领域正在酝酿的人工智能革新满怀期待。今年4月刊登于《科学》期刊的一项研究得出结论:在评估波士顿某医院急诊就诊患者病情时,一款先进大语言模型表现优于执业医师。这款由加州旧金山OpenAI研发、名为o1的人工智能模型,在审阅医护人员留存的就诊信息后,诊断准确率或近似准确率达67%;而参与对照试验的两名人类医生准确率仅在50%至55%区间。
Because the study used real-world data, it marks an evolution for AI tools, which have in the past been tested on simulated⁶ patient scenarios or neatly curated medical cases, say researchers who spoke to Nature. But it’s still a long way from emulating what goes on in a real emergency department, they say. For example, neither the AI model nor the doctors in the study had the opportunity to interact with the patients.
受访科研人员称,该研究采用真实临床数据,标志着人工智能测评迈入新阶段;过往人工智能大多依托模拟病患场景、整理规整的标准化病案完成测试。不过专家坦言,该模型距离复刻真实急诊室诊疗流程还有很大差距。举例而言,本次试验里无论是人工智能还是参与测试的医生,都没有和患者面对面沟通问诊。