London24NEWS

The AI doctor will see you now: ChatGPT passes gold-standard US medical exam

The AI doctor will see you now: ChatGPT passes gold-standard US medical exam — as researchers hail moment as ‘milestone for artificial intelligence’

ChatGPT has passed the gold-standard exam required to practice medicine in the US – amid rising concerns A.I. could put white-collar workers out of jobs.

The artificial intelligence program scored between 52.4 and 75 percent across the three-part Medical Licensing Exam (USMLE). Each year’s passing threshold is around 60 percent.

Researchers from tech company AnsibleHealth who did the study said: ‘Reaching the passing score for this notoriously difficult expert exam, and doing so without any human reinforcement, marks a notable milestone in clinical AI maturation.’

It comes after DailyMail.com revealed the five professions at most risk from the AI revolution, according to experts.

ChatGPT, a new artificial intelligence (AI) system, scored at or close to the passing threshold on the United States Medical Licensing Exam (USMLE) required to practice medicine in the US

The full findings, which were made available as a preprint a few weeks ago, have now been peer-reviewed and published in the journal PLOS Digital Health.

Developed by OpenAI, ChatGPT (short for ‘Chat Generative Pre-trained Transformer’) is a language-based bot that can generate human-like responses. 

The technology has already been put to the test and passed exams at business (University of Pennsylvania’s Wharton School of Business) and law (University of Minnesota) schools.

In the latest study, researchers tested the software on 350 questions from the June 2022 USMLE.

The test assesses med students’ and physicians-in-training’s knowledge of most medical disciplines and has been used since 1992.

USMLE Step 1 is usually taken at the end of the second year of medical school, Step 2 is taken in the fourth year, and Step 3 is taken after completing med school (which lasts four years) and the first year of residency. 

More than 100,000 students and graduate students take the test annually. 

The exam includes open-ended and multiple questions. Two doctors evaluated the results and discrepancies were reviewed by a third expert.

ChatGPT also produced ‘at least one significant insight’ that was ‘new, non-obvious, and clinically valid’ for 88.9 percent of its responses.

The results exceeded the performance of PubMedGPT, a counterpart model trained exclusively on biomedical domain literature, which scored 50.8 percent on an older dataset of USMLE-style questions.

The authors believe their findings suggest ChatGPT may become a valuable tool in medical education. 

The AI bot ‘possesses the partial ability to teach medicine by surfacing novel and nonobvious concepts that may not be in the learners’ sphere of awareness,’ the authors wrote.

The study adds that ‘AIs are now positioned to soon become ubiquitous in clinical practice, with diverse applications across all healthcare sectors.’

The AnsibleHealth team has so much confidence in ChatGPT that clinicians at the company have begun experimenting with using it as part of their workflows to rewrite jargon-heavy reports.

Even the study’s team used ChatGPT to write up their findings.

‘ChatGPT contributed substantially to the writing of [our] manuscript,’ said author Dr Tiffany Kung. ‘We interacted with ChatGPT much like a colleague, asking it to synthesize, simplify, and offer counterpoints to drafts in progress…All of the co-authors valued ChatGPT’s input.’

Still, some experts highlight limitations to the study’s results and the use of AI.

Professor Nello Cristianini, Professor of Artificial Intelligence at the University of Bath, said: ‘This does not remotely suggest that ChatGPT has any comparable knowledge to a human … we are in the presence of a statistical mechanism trained to generate text (new but ‘similar’ to the one it was trained upon), in the right context and way, so we should not talk about understanding or related concepts.’

Even OpenAI acknowledges the tool´s tendency to respond with ‘plausible-sounding but incorrect or nonsensical answers,’ an issue it considers challenging to fix.

AI technology can also perpetuate societal biases like those around race, gender and culture. 

Tech giants, including Google and Amazon, have previously acknowledged that some of their projects that experimented with AI were ‘ethically dicey’ and had limitations. At several companies, humans had to step in and address these issues.

Nonetheless, Dr Stuart Armstrong, Co-Founder and Chief Researcher at Aligned AI, believes ‘this is an impressive performance, and we should expect to see more such successes in AI in the future.

‘There are many areas where humans are much more effective than AIs … This human superiority won’t last forever, though. One day, AIs will be better than us at almost every task.’

Andrew Karolyi, dean of Cornell University’s SC Johnson College of Business, recently told the Financial Times: ‘One thing we all know for sure is that ChatGPT is not going away. If anything, these AI techniques will continue to get better and better. Faculty and university administrators need to invest to educate themselves.’