Artificial intelligence (AI)-based drug candidates, just like people, need to hit the gym to be at their best, Insilico Medicine reasons.
So the newly public AI-based drug developer has launched Science MMAI Gym, a domain-specific training infrastructure designed to transform any “causal” large language model (LLM) like those that predict the next word in a sequence, or most-advanced “frontier” LLM into the best shape for drug discovery and development.
Science MMAI Gym is designed to adapt general-purpose LLMs—including ChatGPT, Claude, Gemini, Grok, Llama, Mistral—for training to carry out tasks in medicinal chemistry, biology, and clinical development, with the aim of achieving the precision required for biopharma R&D.
Alex Zhavoronkov, PhD, Insilico’s founder, chairman, executive director, and CEO [Insilico]“Insilico is using the ‘gym’ as a metaphor for a structured, curriculum-based training environment,” Alex Zhavoronkov, PhD, Insilico’s founder, chairman, executive director, and CEO, told GEN. “Models undergo ‘workouts’ or ‘trainings’—weeks to months of supervised and reinforcement fine-tuning; are evaluated with objective benchmarks; and ’emerge’ stronger, paired with a membership-style business model where partners bring a model in and receive an upgraded version out.”
The gym is needed, Zhavoronkov said, because general-purpose or “frontier” LLMs often fail or underperform at key drug discovery tasks, such as predicting drug metabolism and pharmacokinetics (DMPK) and toxicity endpoints that include blockage of the human Ether-à-go-go Related Gene (hERG), drug induced liver injury risk, and median lethal dose (LD50). Insilico says its benchmarks show that without specialized training, general models often produce vague or chemically implausible reasoning, even with advanced prompting.
“The gap isn’t solved by prompting or light fine-tuning. The gym is positioned as a systematic, repeatable training-and-benchmarking environment that can reliably adapt any causal or frontier LLM into a pharmaceutical-grade scientific engine with measurable gains on relevant benchmarks,” Zhavoronkov added.
Scientific reasoning
Rather than treating drug discovery as a simple natural language processing (NLP) benchmark, Insilico says, Science MMAI Gym teaches LLMs domain-specific scientific reasoning—the language, formats, and conceptual chains used by chemists and biologists.
The training carried out by the gym entails applying a multi-stage curriculum.
First, Insilico exposes the base model to high-quality, domain-specific reasoning datasets spanning medicinal chemistry optimization and synthesis/retrosynthesis, DMPK, and toxicity endpoints, and 3D structure-grounded information.
Second, Insilico’s gym performs multi-task-supervised fine tuning, followed by reinforcement fine-tuning using domain-specific reward models. Third, the gym applies data “decontamination” aimed at preventing leakage, then evaluating the result on public and internal out-of-distribution benchmarks.
Insilico has opened the “gym” to external companies as well as using it for its own models. The company says it has already held conversations with potential clients about the training program but is not commenting on which companies are interested in using the gym.
“MMAI Gym is the next step, a partner-facing training environment that can take any customer-owned, open-source, or proprietary causal LLM and run it through a standardized regimen, curated scientific reasoning datasets, multi-task supervised fine-tuning, reinforcement learning with domain reward models, data decontamination, and robust public plus in-house benchmarking,” Zhavoronkov explained.
The outcome: An upgraded model is returned to the drug developer, along with benchmark reports.
Success story
Zhavoronkov cited as a successful example of an MMAI Gym-trained model Qwen3-14B, part of the latest generation of large language models in the Qwen3 model family developed by Alibaba Cloud, the cloud computing division of Chinese tech giant Alibaba.
Qwen3-14B has 14 billion parameters (hence its “14B” suffix), can support 119 languages and dialects, and is optimized for coding and agentic capabilities. But the tested open-source causal LLM failed on 70% of medicinal chemistry tasks before starting training via the gym, from which it emerged as a “single-model-does-it-all” chemistry engine.
The MMAI Gym-trained variant, called Qwen3-14B-MMAI, achieved a 10-fold increase in performance, achieving 95%+ of benchmarks following a two-week gym membership—including state-of-the-art (SOTA) or near-SOTA performance on multiple absorption, distribution, metabolism, excretion, and toxicity (ADMET) tasks and achieved SOTA success rate on five optimization tasks in the MuMO-Instruct benchmark, matching or exceeding strong category-specific generalist models, according to Insilico.
Two other Alibaba Cloud-developed Qwen variants showed improvement after going through the gym. One is Qwen3-4B, a 4-billion-parameter lightweight language model designed for advanced reasoning, coding, and multilingual tasks (100+ languages).
Qwen3-4B showed improvement on the ClinBench benchmark designed to assess LLMs in real-world clinical reasoning, following supervised fine-tuning and reinforcement training based on DeepSeek’s Group Relative Policy Optimization (GRPO)-based reinforcement training algorithm. Specifically, its F1 score for predicting Phase II trial outcomes rose significantly enough to outperform a broad set of frontier LLMs.
The other Qwen variant to show improvement was Qwen3-1.7B, a 1.7-billion-parameter causal language model designed for high efficiency in desktop, mobile, and edge applications that include document analysis, coding, and multilingual support. Qwen3-1.7B features 28 layers, supports a 32k context window, and offers dual-mode operation designed for both rapid responses and advanced reasoning. Qwen3-1.7B improved on the Target-Bench open-source AI evaluation benchmark after undergoing supervised fine-tuning (SFT) and GRPO training at the MMAI Gym.
External users of the gym can sign up for chemical superintelligence (CSI), biology/clinical superintelligence (BSI), or pharmaceutical superintelligence (PSI) memberships tailored to their R&D pipelines. Partners can choose CSI if their priority is chemistry-centric workflows, BSI if their priority is biology workflows, and PSI if they want both pillars integrated. Organizations may choose a chemistry-centric or biology/clinical-centric track depending on immediate goals, internal capabilities, scope, or the type of upgrade they want in a given training cycle.
Insilico is not disclosing the price of its gym memberships, which Zhavoronkov said will differ with the length of engagement, which can range from two-week or one-month sprints to multi-month programs. When partners provide their base model, the gym transforms that with a CSI/BSI/PSI-enhanced version that promises up to 10x performance improvement compared to baseline, along with detailed benchmark reports and optional wet-lab validation through Insilico’s automated assay platforms.
Pipeline progress
Before creating the gym, Insilico built and operated its own specialized foundation models. Insilico has applied more than a decade of AI research to develop its own internal pipeline of 27 preclinical candidates, more than 10 molecules with investigational new drug (IND) clearance to start clinical trials, and multiple Phase I and Phase IIa clinical trials that have either been completed or are ongoing.
Last week Insilico’s latest candidate won IND clearance—ISM8969, an oral NLRP3 inhibitor targeting inflammation and neurodegenerative disorders starting with Parkinson’s disease, for which the drug will be assessed in a Phase I trial. The study will be designed to evaluate the safety, tolerability, and pharmacokinetics of ISM8969 in healthy volunteers, as well as identify an optimal dose level or levels to be recommended in future studies.
Insilico holds 50% of global rights to ISM8969; the other 50% is held by Hygtia Therapeutics under an exclusive license and co-development collaboration announced January 19. Insilico will lead the IND submission and Phase I trial of ISM8969, after which Hygtia Therapeutics will oversee further clinical development, regulatory filings, and commercialization. Insilico could receive up to $66 million in upfront and milestone payments under the agreement with Hygtia, an incubatee of Shenzhen Pengfu Fund of Fosun Health Capital and Fosun Pharma.
Insilico’s pipeline is led by its TNIK-inhibitor rentosertib (ISM001-055), set to launch a Phase IIb/III study in China for idiopathic pulmonary fibrosis (IPF) in the first half of 2026. Insilico also plans to file IND applications with the FDA to launch trials of inhalable rentosertib in IPF and small molecule rentosertib in kidney fibrosis, both in the first half of the year.
In December, Insilico raised HKD 2.277 billion (about $292 million) on the Hong Kong Exchange through an initial public offering of 94.69 million shares at HKD 24.05 ($3.08) each. Insilico’s shares since then have more than doubled, zooming 148% to HKD $59.70 ($7.65) as of Wednesday.
The post No Pain, No Gain: Insilico ‘Gym’ Gets AI Models Into Shape appeared first on GEN – Genetic Engineering and Biotechnology News.



