Pure language processing is a department of synthetic intelligence that goals to allow computer systems to course of, perceive, interpret and manipulate human language. After many years of exploring, present state-of-the-art NLP options are based mostly on deep studying fashions applied utilizing numerous varieties of neural networks.
THE PROBLEM
“Among the many deep studying fashions, the transformer-based fashions applied utilizing a self-attention mechanism, akin to BERT and GPT, are the present state-of-the-art options,” defined Yonghui Wu, director of NLP on the Medical and Translational Science Institute at Gainesville-based College of Florida Well being and an assistant professor within the division of well being outcomes and biomedical informatics on the College of Florida.
“The transformer-based NLP fashions cut up the coaching process into two phases, together with pretraining and fine-tuning,” he continued. “In pretraining, the transformer fashions adopted unsupervised studying methods to coach language fashions from giant unlabeled corpora (for instance, Wikipedia, Pubmed articles and medical notes).”
In fine-tuning, transformer fashions fine-tune the pretrained fashions for particular downstream duties utilizing supervised studying.
“The important thing step is pretraining, the place transformer-based fashions be taught task-independent linguistic information from large textual content information, which could be utilized to unravel many downstream NLP duties,” Wu mentioned. “Nevertheless, to make the transformer-based fashions efficient, they’re often very enormous with billions of parameters, which can not match right into a single GPU reminiscence, be skilled with a single laptop node, and apply conventional coaching methods.
“Coaching these giant fashions requires large computing energy, environment friendly reminiscence administration and superior distributed coaching methods akin to information and/or mannequin parallelisms to scale back coaching time,” he added. “Due to this fact, regardless that there are huge transformer fashions within the common English area, there are not any comparable transformer fashions within the medical area.”
For instance, if a corporation skilled a BERT mannequin with 345 million parameters on a single GPU, it could take months to finish.
“Fashions like GPT-2 with billions of parameters usually are not even capable of match right into a single GPU reminiscence for coaching,” Wu mentioned. “Thus, for a very long time, we can not take the benefit of huge transformer fashions regardless that we now have large medical textual content information at UF Well being.”
PROPOSAL
For software program, NLP vendor Nvidia developed the Megatron-LM package deal, which adopted an environment friendly intra-layer mannequin parallel method that may considerably cut back the distributed coaching communication time whereas conserving the GPUs compute sure, mentioned Jiang Bian, affiliate director of the biomedical informatics program on the Medical and Translational Science Institute at UF Well being and an affiliate professor within the division of well being outcomes and biomedical informatics on the College of Florida.
“This mannequin parallel method is orthogonal to information parallelism, which may allow us to make the most of distributed coaching from each mannequin parallel and information parallel,” Bian defined. “Additional, Nvidia additionally developed and supplied a conversational AI toolkit, NeMo, for utilizing these giant language fashions for downstream duties. These software program packages vastly simplified the steps in constructing and utilizing giant transformer-based fashions like our GatorTron.
“For {hardware}, Nvidia supplied the HiPerGator AI NVIDIA DGX A100 SuperPod cluster, lately deployed on the College of Florida, that includes 140 Nvidia DGX A100 nodes with 1120 Nvidia Ampere A100 GPUs,” he continued. “The software program solved the bottleneck in distributed coaching algorithms and the {hardware} solved the bottleneck in computing energy.”
MEETING THE CHALLENGE
The group at UF Well being developed GatorTron, the world’s largest transformer-based NLP mannequin – with round 9 billion parameters – within the medical area and skilled it utilizing greater than 197 million notes with greater than three billion sentences and greater than 82 billion phrases of medical textual content from UF Well being.
“GatorTron adopted the structure of Megatron-LM – the software program supplied by Nvidia,” Wu mentioned. “We skilled GatorTron utilizing HiPerGator AI NVIDIA DGX A100 SuperPod cluster, lately deployed on the College of Florida, that includes 140 Nvidia DGX A100 nodes with 1120 Nvidia Ampere A100 GPUs. With the HiPerGator AI cluster, the computing useful resource is now not a bottleneck.
“We skilled GatorTron utilizing 70 HiPerGator nodes with 560 GPUs, with each information and mannequin parallel coaching technique,” he added. “With out Nvidia’s Megatron-LM, we might not have the ability to practice such a big transformer mannequin within the medical area. We additionally leveraged Nvidia’s NeMo-toolkit, which gives the pliability to fine-tune GatorTron for numerous NLP downstream duties utilizing easy-to-use utility programming interfaces.”
GatorTron at the moment is being evaluated for downstream duties akin to named-entity recognition, relation extraction, semantic similarity of textual content, and query and answering with digital well being file information in a analysis setting. The group is working to use GatorTron to real-world healthcare purposes akin to affected person cohort identification, textual content de-identification and knowledge extraction.
RESULTS
UF Well being evaluated the GatorTron mannequin on 4 essential NLP duties, together with medical idea extraction, medical relation extraction, medical pure language inference, and medical query and answering.
“For medical idea extraction, GatorTron mannequin achieved state-of-the-art performances on all three benchmarks, together with publicly obtainable 2010 i2b2, 2012 i2b2 and 2018 n2c2 datasets,” Bian famous. “For relation extraction, GatorTron considerably outperformed different BERT fashions pre-trained within the medical or biomedical area akin to clinicalBERT, BioBERT and BioMegatron.
“For medical pure language inference and query and answering, GatorTron achieved new state-of-the-art performances on each benchmark datasets – medNLI and emrQA,” he added.
ADVICE FOR OTHERS
There are growing pursuits in making use of NLP fashions to assist extract affected person info from medical narratives, the place state-of-the-art pretrained language fashions are key parts.
“A well-trained giant language mannequin may enhance many downstream NLP duties by fine-tuning, akin to medical chatbots, automated summarization, medical query and answering, and medical determination assist techniques,” Wu suggested. “When growing giant transformer-based NLP fashions, it is beneficial to discover numerous mannequin sizes – variety of parameters – based mostly on their native medical information.
“When making use of these giant transformer-based NLP fashions, healthcare suppliers have to consider the real-world configurations,” he concluded. “For instance, these giant transformer-based NLP fashions are very highly effective options for high-performance servers, however usually are not possible to deploy on private computer systems.”
Twitter:Â @SiwickiHealthIT
E mail the author:Â bsiwicki@himss.org
Healthcare IT Information is a HIMSS Media publication.