0

Scientists at the Icahn School of Medicine at Mount Sinai have created a new artificial intelligence (AI) model that helps reveal how genes function together inside human cells, offering a powerful new way to understand biology and disease.

RELATED: AI can describe human experiences but lacks experience in an actual ‘body.’ UCLA researchers say understanding this ‘body gap’ may matter for safety

The study, published in the May 21 online issue of Patterns, a Cell Press Journal [https://doi.org/10.1016/j.patter.2026.101565], introduces a gene set foundation model (GSFM) designed to learn patterns in how genes are grouped and function across thousands of biological contexts.

ALSO READ: Common questions about gene drive, answered

The work draws inspiration from advances in large language models (LLMs) such as ChatGPT, which learn how words gain meaning depending on their context. In a similar way, a GSFM learns how genes behave differently depending on their cellular “context.”

ADVERTISEMENT

Could AI learn the ‘meaning’ of genes?

“Genes rarely act alone. Instead, they participate in multiple biological processes, forming different molecular groupings depending on where and when they are active in the cell. A single gene can play different roles in different settings, much like a word can have different meanings in different sentences,” says senior corresponding author Avi Ma’ayan, PhD, Professor of Pharmacological Sciences and Director of the Mount Sinai Center for Bioinformatics at the Icahn School of Medicine at Mount Sinai.

“Just as modern language models learn the meaning of words from context, we asked whether AI could learn the ‘meaning’ of genes in the same way. Our GSFM was designed to do exactly that.”

The model provides a new way to understand the structural and functional organization of genes and their products inside human cells. This improved understanding could eventually support the development of better diagnostics, biomarkers, and therapies.

Mapping how genes relate across many biological situations

By mapping how genes relate to one another across many biological situations, the GSFM creates a reference framework that can help scientists interpret complex multi-omics datasets more effectively, say the investigators.

ADVERTISEMENT

The organization of genes within cells remains one of the major unsolved questions in biology. The GSFM helps address this by learning from millions of gene groupings derived from published research and gene expression datasets,” says Dr. Ma’ayan.

The model can:

  • Help identify the function of poorly understood genes without immediate laboratory experiments
  • Highlight genes involved in disease processes
  • Suggest potential new drug targets and biomarkers
  • Provide a reusable knowledge system for many types of biomedical research data analysis tasks—for example, improved gene set enrichment analysis

GSFM offers a new “map” of how genes work together

In essence, say the investigators, GSFM offers a new “map” of how genes work together in different contexts.

To build the model, the researchers compiled millions of gene sets from published scientific studies and gene expression datasets. In total, the system learned from hundreds of thousands of independent research efforts.

ADVERTISEMENT

The AI model was trained in a way similar to solving a puzzle: it was given part of a gene set and asked to predict the missing pieces. Over time, it learned underlying patterns that describe how genes are grouped and interact.

The AI model was then benchmarked against other approaches and demonstrated strong performance, including the ability to identify gene-gene and gene-function relationships before they were confirmed experimentally.

To evaluate this, the model was trained using gene sets from publications up to a defined cutoff date, and then tested on whether it could predict discoveries reported in studies published after that cutoff date.

“Unlike previous biological AI models that primarily rely on gene expression data, our GSFM is uniquely trained on gene sets, a different and largely underused type of biological information,. This approach allows the model to integrate diverse data from many diseases, experimental methods, and research conditions, creating a unified representation of gene relationships across biology,” says Dr. Ma’ayan.

GSFMs could enhance existing bioinformatics tools, improve interpretation of data collected

GSFMs could enhance existing bioinformatics tools and improve the interpretation of data collected with omics technologies. One immediate application is in gene set enrichment analysis, a widely used method in molecular biology research. By improving how scientists interpret gene groupings, the model may help uncover new biological insights from both existing and future datasets.

The research team plans to expand the system by combining GSFM with other AI foundation models. One goal is to integrate it with language-based models to generate natural-language explanations of gene functions.

Another future direction is combining GSFM with drug-focused AI models, with the long-term aim of predicting how drugs interact with cells and supporting the design of new therapeutics.

The gene pages and the GSFM model are accessible at https://gsfm.maayanlab.cloud and https://github.com/MaayanLab/gsfm.

The paper is titled “GSFM: A Gene Set Foundation Model Pre-Trained on a Massive Collection of Diverse Gene Sets.”

The study’s authors, as listed in the journal, are Daniel J. B. Clarke, Giacomo B. Marino, and Avi Ma’ayan.

This work was partially funded by NIH grants OT2OD036435, OT2OD030160, U24CA264250, U24CA271114, R01DK131525, RC2DK131995.

About the Icahn School of Medicine at Mount Sinai

The Icahn School of Medicine at Mount Sinai is internationally renowned for its outstanding research, educational, and clinical care programs. It is the sole academic partner for the seven member hospitals* of the Mount Sinai Health System, one of the largest academic health systems in the United States, providing care to New York City’s large and diverse patient population.

——————————————————-

* Mount Sinai Health System member hospitals: The Mount Sinai Hospital; Mount Sinai Brooklyn; Mount Sinai Morningside; Mount Sinai Queens; Mount Sinai South Nassau; Mount Sinai West; and New York Eye and Ear Infirmary of Mount Sinai. 

Courtesy: NewsWise

More in Features

You may also like