当前位置:首页|资讯|ChatGPT

Generative AI could revolutionize health care

作者:降坡苯拉灯发布时间:2023-12-05

https://www.nature.com/articles/d41586-023-03803-y

Large language models such as that used by ChatGPT could soon become essential tools for diagnosing and treating patients. To protect people’s privacy and safety, medical professionals, not commercial interests, must drive their development and deployment.
ChatGPT 使用的大型语言模型可能很快成为诊断和治疗患者的重要工具。为了保护人们的隐私和安全,必须由医疗专业人员而不是商业利益来推动其发展和部署。


ChatGPT was released by the technology company OpenAI for public use on 30 November 2022. GPT-4, the large language model (LLM) underlying the most advanced version of the chatbot1, and others, such as Google’s Med-PaLM2, are poised to transform health care.
ChatGPT 由科技公司 OpenAI 于 2022 年 11 月 30 日发布供公众使用。GPT-4 是聊天机器人最先进版本的基础的大型语言模型 (LLM),以及其他模型,例如谷歌的 Med-PaLM,有望改变医疗保健。

The possibilities — such as LLMs producing clinical notes, filling in forms for reimbursement and assisting physicians with making diagnoses and treatment plans — have captivated both technology companies and health-care institutions (see ‘Betting on AI for health care’).
这些可能性——例如法学硕士制作临床记录、填写报销表格以及协助医生制定诊断和治疗计划——已经吸引了科技公司和医疗保健机构(参见“押注人工智能在医疗保健中的应用”)。

Earlier this year, the tech giant Microsoft began discussions with Epic, a major provider of the software used for electronic health records, about how to integrate LLMs into health care. Thanks to the two companies collaborating, initiatives are already under way at the University of California San Diego Health system and at Stanford University Medical Center in California. Also this year, Google announced partnerships with the Mayo Clinic, among other health-care organizations. In July, Amazon Web Services launched HealthScribe, a generative artificial intelligence (AI) clinical documentation service. And venture-capitalist firms have invested US$50 million in a US start-up called Hippocratic AI, which is developing an LLM for health care.
今年早些时候,这家科技巨头Microsoft开始与电子健康记录软件的主要提供商Epic讨论如何将LLM集成到医疗保健中。由于两家公司的合作,加州大学圣地亚哥分校的卫生系统和加利福尼亚州的斯坦福大学医学中心已经在开展计划。同样在今年,谷歌宣布与梅奥诊所(Mayo Clinic)以及其他医疗保健组织建立合作伙伴关系。7 月,亚马逊云科技推出了生成式人工智能 (AI) 临床文档服务 HealthScribe。风险投资公司已经向一家名为Hippocratic AI的美国初创公司投资了5000万美元,该公司正在开发用于医疗保健的LLM。

In the rush to deploy off-the-shelf proprietary LLMs, however, health-care institutions and other organizations risk ceding the control of medicine to opaque corporate interests. Medical care could rapidly become dependent on LLMs that are difficult to evaluate, and that can be modified or even taken offline without notice should the service be deemed no longer profitable — all of which could undermine the care, privacy and safety of patients.
然而,在急于部署现成的专有LLMs时,医疗保健机构和其他组织冒着将药物控制权拱手让给不透明的企业利益的风险。医疗保健可能会迅速变得依赖于难以评估的 LLM,如果服务被认为不再有利可图,这些 LLM 可能会被修改甚至离线,恕不另行通知——所有这些都可能破坏患者的护理、隐私和安全。

Source: Artificial Intelligence Index Report 2023 (Stanford Institute for Human-Centered Artificial Intelligence, 2023)

Although technology companies dominate in terms of resources and processing power, health-care systems hold a powerful asset — vast repositories of clinical data. Also, thousands of hospitals and institutions worldwide are now investing millions of dollars in disparate efforts to integrate AI into medical care. In an executive order on AI that US President Joe Biden signed last month, several organizations, including the US Department of Health and Human Services and the US Department of Veterans Affairs, have been tasked with investigating how to safely implement AI in health care3. In the United Kingdom, the National Health Service has allocated more than £123 million ($153 million) to the development and evaluation of AI, and a further £21 million to its deployment. Similarly, in June, the European Union allocated €60 million ($65 million) to research for AI in health care and its deployment.
尽管科技公司在资源和处理能力方面占据主导地位,但医疗保健系统拥有强大的资产——庞大的临床数据存储库。此外,全球数以千计的医院和机构现在正在投资数百万美元,用于将人工智能整合到医疗保健中的不同努力。在美国总统乔·拜登(Joe Biden)上个月签署的一项关于人工智能的行政命令中,包括美国卫生与公众服务部(US Department of Health and Human Services)和美国退伍军人事务部(US Department of Veterans Affairs)在内的多个组织被要求调查如何在医疗保健中安全地实施人工智能。在英国,国家卫生服务局已拨款超过1.23亿英镑(1.53亿美元)用于人工智能的开发和评估,另有2100万英镑用于人工智能的部署。同样,今年6月,欧盟拨款6000万欧元(6500万美元)用于研究人工智能在医疗保健中的应用及其部署。

By pooling their resources and expertise, such organizations could develop LLMs that can be transparently evaluated and that meet local institutional needs — even if they are also working with corporations. Specifically, these organizations could develop open-source models and software tailored for health care, and then fine tune these base models to create privacy-compliant, locally refined models that incorporate privately held data. In other words, carefully governed open collaboration between diverse stakeholders could steer the development and adoption of LLMs so that AI enhances medicine rather than undermines it.
通过汇集其资源和专业知识,这些组织可以开发可以透明评估并满足当地机构需求的LLM,即使它们也与公司合作。具体来说,这些组织可以开发为医疗保健量身定制的开源模型和软件,然后微调这些基本模型,以创建符合隐私要求的本地优化模型,这些模型包含私有数据。换句话说,不同利益相关者之间精心管理的开放式合作可以引导LLM的开发和采用,以便人工智能增强医学而不是破坏医学。

The promise and pitfalls 承诺和陷阱

Typically, the first step in training an LLM involves feeding the model massive text-based data sets from the Internet, to produce a base model. This initial training period requires considerable engineering expertise and vast computing power. The pre-trained model is then trained further on higher-quality curated data sets, and specialists assess the model’s output to ensure that it is accurate and aligns with relevant safety protocols and ethical norms. This expert feedback can even be used to train the model further. For example, ChatGPT has been fine-tuned to give users the experience of having a human-like conversation.
通常,训练 LLM 的第一步是向模型提供来自 Internet 的大量基于文本的数据集,以生成基础模型。这个初始培训阶段需要大量的工程专业知识和强大的计算能力。然后,在更高质量的精选数据集上进一步训练预训练模型,专家评估模型的输出,以确保其准确并符合相关的安全协议和道德规范。这种专家反馈甚至可以用于进一步训练模型。例如,ChatGPT 已经过微调,为用户提供类似人类对话的体验。

Some LLMs have shown impressive capabilities in the medical domain2,4,5. In March last year, Microsoft researchers described how GPT-4, which has no medical-specific training, can pass certain medical tests, including the United States Medical Licensing Examination5. In July, two of us (A.T. and B.W.) co-authored a study in which we found that clinicians often preferred clinical notes that were generated by GPT-4 to those generated by physicians6. Other work has shown that GPT-4 can pass examinations in some specialist areas, such as neurosurgery7 and medical physics8. Studies have also demonstrated the impressive abilities of LLMs in diagnosing challenging cases9,10 and in translating complex surgical consent forms into language that can be easily understood by patients11.
一些法LLMs在医学领域表现出令人印象深刻的能力。去年 3 月,Microsoft研究人员描述了没有接受过特定医学培训的 GPT-4 如何通过某些医学测试,包括美国医学执照考试。7 月,我们中的两个人(AT 和 B.W.)共同撰写了一项研究,我们发现临床医生通常更喜欢由 GPT-4 生成的临床笔记,而不是医生生成的笔记。其他研究表明,GPT-4 可以通过一些专业领域的考试,例如神经外科和医学物理学。研究还表明,LLM 在诊断具有挑战性的病例和将复杂的手术同意书翻译成患者易于理解的语言方面具有令人印象深刻的能力。

Yet, despite the promise of LLMs to improve the efficiency of clinical practice, enhance patients’ experiences and predict medical outcomes, there are significant challenges around deploying them in health-care settings.
然而,尽管LLM有望提高临床实践的效率,增强患者的体验并预测医疗结果,但在医疗保健环境中部署它们仍存在重大挑战。

Some large language models have shown impressive capabilities when it comes to taking clinical notes.Credit: Jim Varney/SPL

LLMs often generate hallucinations — convincing outputs that are false12. If circumstances change — for example, because a new virus emerges — it is not yet clear how a model’s knowledge base (a product of its training data) can be upgraded without expensive retraining. If people’s medical records are used to train the model, it is possible that with the relevant prompts, the model could recreate and leak sensitive information13 — particularly if it is trained on data from people with a rare combination of medical conditions or characteristics.
LLM 经常会产生幻觉——令人信服的输出是错误的。如果情况发生变化(例如,由于新病毒的出现),目前尚不清楚如何在不进行昂贵的重新训练的情况下升级模型的知识库(其训练数据的产物)。如果人们的医疗记录被用来训练模型,那么在相关提示下,该模型可能会重新创建和泄露敏感信息——特别是如果它是根据具有罕见医疗条件或特征组合的人的数据进行训练的。

Because the models are products of the vast swathes of data from the Internet that they are trained on, LLMs could exacerbate biases around gender, race, disability and socioeconomic status14. Finally, even when those studying LLMs have access to the base models and know what training data were used, it is still not clear how best to evaluate the safety and accuracy of LLMs. Their performance on question-answering tasks, for example, provides only a superficial measure that doesn’t necessarily correlate with their usefulness in the real world15.
由于这些模型是来自互联网的大量数据的产物,因此LLM可能会加剧对性别,种族,残疾和社会经济地位的偏见。最后,即使那些研究 LLM 的人可以访问基础模型并知道使用了哪些训练数据,仍然不清楚如何最好地评估 LLM 的安全性和准确性。例如,它们在问答任务中的表现只提供了一个肤浅的衡量标准,不一定与它们在现实世界中的有用性相关。

Safe integration 安全集成

As long as LLMs are developed in relative secrecy, it is especially difficult to envision how this technology could be safely integrated into health care.
只要LLM是在相对保密的情况下开发的,就很难想象如何将这项技术安全地整合到医疗保健中。

Many LLM providers, including OpenAI, use a closed application programming interface (API). This means the instruction from the user (to produce a clinical note from a transcribed conversation between a patient and a physician, for example) and the data from the user (the transcribed conversation) are sent to an external server. The model’s outputs are then returned to the user. With this approach, users often do not know the exact model or method that is processing their request. Typically, the user does not know what data the model was trained on or whether the model was modified between their uses of it16. In some cases, it is unclear what happens to the data provided by the user and how those data are protected from being accessed or misused by others.
许多LLM提供商,包括OpenAI,使用封闭的应用程序编程接口(API)。这意味着来自用户的指令(例如,从患者和医生之间的转录对话中生成临床记录)和来自用户的数据(转录对话)被发送到外部服务器。然后,模型的输出返回给用户。使用这种方法,用户通常不知道处理其请求的确切模型或方法。通常情况下,用户不知道模型是在什么数据上训练的,也不知道模型在使用它的两次之间是否被修改过。在某些情况下,不清楚用户提供的数据会发生什么变化,以及如何保护这些数据不被他人访问或滥用。

Partly in response to complaints from users, OpenAI stated in March that it would make any one version of its LLMs available for three months so that users can have consistent access to the same models for at least this period. What other providers are doing concerning model updates is unclear. Moreover, many models might have been trained on the questions that are then being used to evaluate them. Yet, because the developers of many proprietary models do not share the data sets their models are trained on, the degree to which this kind of ‘contamination’ is occurring is unknown.
在某种程度上,为了回应用户的投诉,OpenAI 在 3 月份表示,它将在三个月内提供任何一个版本的 LLM,以便用户至少在此期间可以一致地访问相同的模型。目前尚不清楚其他提供商在模型更新方面正在做什么。此外,许多模型可能已经根据用于评估它们的问题进行了训练。然而,由于许多专有模型的开发人员不共享其模型训练的数据集,因此这种“污染”的发生程度尚不清楚。

Another problem specific to proprietary LLMs is that companies’ dependency on profits creates an inherent conflict of interest that could inject instability into the provision of medical care. This was demonstrated recently by the UK health-tech company Babylon Health, which promised to combine “an artificial-intelligence-powered platform with best-in-class, virtual clinical operations” for patients.
专有LLM特有的另一个问题是,公司对利润的依赖造成了固有的利益冲突,这可能会给医疗服务的提供带来不稳定。英国健康科技公司Babylon Health最近证明了这一点,该公司承诺为患者提供“人工智能驱动的平台与一流的虚拟临床操作”。

When it went public in 2021, Babylon Health was valued at more than $4 billion. After complaints about its services and other problems, and reportedly costing the UK National Health Service more than £26 million in 2019, the company filed for bankruptcy protection for two of its US subsidiaries in August this year.
2021 年上市时,Babylon Health 的估值超过 40 亿美元。据报道,在对其服务和其他问题提出投诉后,该公司在2019年花费了英国国家卫生服务局超过2600万英镑,该公司于今年8月为其两家美国子公司申请破产保护。

All in all, it is hard to see how LLMs that are developed and controlled behind closed corporate doors could be broadly adopted in health care without undermining the accountability and transparency of both medical research and medical care.
总而言之,很难看出如何在不破坏医学研究和医疗保健的问责制和透明度的情况下,在闭门造车的情况下开发和控制的LLM在医疗保健中被广泛采用。

Open models 开放模型

What’s needed is a more transparent and inclusive approach.
我们需要的是一种更加透明和包容的方法。

Health-care institutions, academic researchers, clinicians, patients and even technology companies worldwide must collaborate to build open-source LLMs for health care — models in which the underlying code and base models are easily accessible.
世界各地的医疗保健机构、学术研究人员、临床医生、患者甚至技术公司都必须合作构建用于医疗保健的开源 LLM,即易于访问底层代码和基本模型的模型。

What we’re proposing is similar to the Trillion Parameter Consortium (TPC) announced earlier this month — a global consortium of scientists from federal laboratories, research institutes, academia and industry to advance AI models for scientific discovery (see go.nature.com/3strnsu). In health care, such a consortium could pool computational and financial resources as well as expertise and health-care data.
我们的提议类似于本月早些时候宣布的万亿参数联盟(TPC)——一个由来自联邦实验室、研究机构、学术界和工业界的科学家组成的全球联盟,旨在推进用于科学发现的人工智能模型(见 go.nature.com/3strnsu)。在卫生保健方面,这样一个联盟可以汇集计算和财政资源以及专门知识和卫生保健数据。

This consortium could build an open-source base model using publicly available data. Consortium members could then share insights and best practices when fine-tuning the model on patient-level data that might be privately held in a particular institution. Alternatively, to save the considerable costs associated with the first phase of training LLMs, consortium members could work together to improve open models that have already been built by corporations.
该联盟可以使用公开可用的数据构建开源基础模型。然后,联盟成员可以在对特定机构中可能私有持有的患者级数据进行微调模型时分享见解和最佳实践。或者,为了节省与第一阶段培训LLM相关的大量成本,联盟成员可以共同努力改进公司已经建立的开放模型。

It is encouraging that some organizations have committed to making their LLMs more accessible. For example, for both LLaMA (Large Language Model Meta AI)17, which was publicly released by technology company Meta in February (although its status of ‘open-source’ is debated by some), and Mistral 7B18, an LLM released by the French start-up Mistral AI in September, users can download the models and fine-tune them using their own data sets. This means that users can probe the performance of the models on a deeper level than is currently possible with closed LLMs such as GPT-4.
令人鼓舞的是,一些组织已承诺使其 LLM 更容易获得。例如,对于科技公司 Meta 在 2 月份公开发布的 LLaMA(大型语言模型 Meta AI)(尽管其“开源”状态受到一些人的争论)和法国初创公司 Mistral AI 在 9 月份发布的 LLM Mistral 7B,用户可以下载模型并使用自己的数据集对其进行微调。这意味着用户可以比目前使用 GPT-4 等封闭式 LLM 更深入地探测模型的性能。

Some people might question whether a global consortium would have enough resources to build LLMs from scratch. The computing time needed to build GPT-3, a precursor to GPT-4, is estimated to have cost around $4.6 million. But the potential cost savings from AI in the US health-care sector alone is projected to be between $200 billion and $360 billion annually. Also, thanks to advances in hardware and techniques, the cost of training high-quality models is rapidly falling.
有些人可能会质疑全球联盟是否有足够的资源从头开始构建 LLM。构建 GPT-3(GPT-4 的前身)所需的计算时间估计约为 460 万美元。但仅在美国医疗保健领域,人工智能每年可能节省的成本预计将在2000亿至3600亿美元之间。此外,由于硬件和技术的进步,训练高质量模型的成本正在迅速下降。

And with their access to vast troves of clinical data, health-care institutions, governments and other consortium members have a significant advantage over technology companies. This, combined with it being easier to use such data for non-commercial uses, means that consortium members are well positioned when it comes to curating high-quality clinical data that could be used to improve LLMs.
由于可以访问大量临床数据,医疗保健机构、政府和其他联盟成员比技术公司具有显着优势。这一点,再加上更容易将此类数据用于非商业用途,意味着联盟成员在策划可用于改进 LLM 的高质量临床数据方面处于有利地位。

Such an open consortium-led approach provides several advantages over the development of proprietary LLMs for medicine. First, testing LLMs across multiple consortium organizations would help to ensure their reliability and robustness. In principle, clinicians, machine-learning specialists and patients could collectively and transparently contribute to the evaluation of models — similar to how volunteers contribute to editing entries of the free online encyclopedia Wikipedia or how researchers contribute to the review of scientific papers.
这种由联盟主导的开放方法与开发专有的医学LLM相比具有多项优势。首先,在多个联盟组织中测试LLM将有助于确保其可靠性和鲁棒性。原则上,临床医生、机器学习专家和患者可以共同透明地为模型的评估做出贡献——类似于志愿者如何为编辑免费在线百科全书维基百科的条目做出贡献,或者研究人员如何为科学论文的审查做出贡献。

A future ideal would be for consortium members to share any patient-specific data that they use to fine-tune LLMs, should they find ways to do so safely. In the meantime, with local institutional control over data, it will be easier to ensure that patient-privacy and other requirements are met. By coordinating efforts, LLMs can be integrated into electronic health-record systems, such as health-care company Oracle Cerner’s platform, Epic and other systems that are already widely used by hospitals and health-care institutions. Also, designers and engineers can optimize models as well as ways to evaluate them and user interfaces without reinventing the wheel each time.
未来的理想是,联盟成员可以共享他们用来微调LLM的任何患者特定数据,如果他们找到安全的方法。与此同时,通过当地机构对数据的控制,将更容易确保满足患者隐私和其他要求。通过协调努力,LLM可以集成到电子健康记录系统中,例如医疗保健公司Oracle Cerner的平台,Epic以及医院和医疗保健机构已经广泛使用的其他系统。此外,设计师和工程师可以优化模型以及评估模型和用户界面的方法,而无需每次都重新发明轮子。

Up for debate 有待商榷

All sorts of issues need thrashing out. To protect patient privacy, stringent guidelines for how clinical data can be used and measures to prevent data leaks will be crucial. LLMs must be adjusted to reflect variations in institutional requirements and varying health-care practices and regulations across different countries and regions. Steps will need to be taken to guard against LLMs being used to exacerbate inequity, and to mitigate harm from inappropriate use of LLMs, such as for self-diagnosis and treatment.
各种各样的问题都需要解决。为了保护患者隐私,关于如何使用临床数据的严格指南和防止数据泄露的措施至关重要。必须对LLM进行调整,以反映不同国家和地区的机构要求和不同医疗保健实践和法规的差异。需要采取措施防止LLM被用来加剧不平等,并减轻不当使用LLM的危害,例如用于自我诊断和治疗。

At least in relation to data sharing, various efforts offer some guidance. The MIMIC (Medical Information Mart for Intensive Care) database contains unidentifiable information for people admitted to a medical centre in Boston, Massachusetts. External researchers can use the data if they complete a training course in human-subjects research and sign a data-use agreement. Other successful platforms for sharing health data include the UK Biobank, a biomedical database containing genetic and health information from half a million UK participants. In some cases, federated learning, a method in which groups enhance a shared AI model using their data without exchanging it, could be instrumental19.
至少在数据共享方面,各种努力提供了一些指导。MIMIC(重症监护医疗信息集市)数据库包含马萨诸塞州波士顿医疗中心收治人员的无法识别的信息。如果外部研究人员完成人类受试者研究的培训课程并签署数据使用协议,则可以使用这些数据。其他成功的健康数据共享平台包括英国生物银行,这是一个生物医学数据库,包含来自五十万英国参与者的遗传和健康信息。在某些情况下,联邦学习(一种群体在不交换数据的情况下使用其数据增强共享 AI 模型的方法)可能很有帮助。

But for many of these challenges, a range of strategies will need to be considered. In fact, it is precisely because the use of LLMs in medicine poses such formidable challenges around safety, privacy and equity that those at the front line of care should drive the development and deployment of the models. Whereas transparent efforts could provide a solid foundation for AI in medicine, building medicine on the top of proprietary, secretive models is akin to building on a house of cards.
但对于其中许多挑战,需要考虑一系列策略。事实上,正是因为LLM在医学中的使用在安全、隐私和公平方面带来了如此巨大的挑战,所以那些处于护理一线的人应该推动这些模型的开发和部署。虽然透明的努力可以为人工智能在医学领域提供坚实的基础,但将医学建立在专有的、秘密的模型之上,类似于建立在纸牌屋之上。

Nature 624, 36-38 (2023) 自然 624, 36-38 (2023)

doi: https://doi.org/10.1038/d41586-023-03803-y doi: https://doi.org/10.1038/d41586-023-03803-y




Copyright © 2024 aigcdaily.cn  北京智识时代科技有限公司  版权所有  京ICP备2023006237号-1