Show HN: AIDO.ModelGenerator and Expanded Foundation Model Suite

Show HN: AIDO.ModelGenerator and Expanded Foundation Model Suite

We’re excited to announce the latest release for AIDO (AI-driven Digital Organism). This release includes a set of foundation models, AIDO.ModelGenerator benchmark datasets, utilities, and hands-on tutorials, all of which are now live on Hugging Face and GitHub.

Foundation models are the backbone of AIDO, but applying foundation models across different biological domains and scales can be tricky. To help scientists and engineers fully utilize such models with their own data, we built AIDO.ModelGenerator, a software package that enables researchers to rapidly adapt, fuse, and gain insights using pre-trained foundation models.  Whether you’re working on genome analysis, protein folding, multi-molecule structure prediction, single-cell simulation, whole-tissue analysis, or fine-tuning models across multiple biological data types, this release is built to help you utilize all public data, move faster, and go deeper across scales.

In addition to providing access to GenBio AI’s suite of SOTA foundation models, AIDO.ModelGenerator now also allows users to utilize and benchmark open-source foundation models (FMs), such as the Enformer and ESM models, to select the best one for their task. These models are highly configurable, allowing both from-scratch and from-pretrained inference and finetuning, and will automatically have access to the wide range of relevant datasets supported for all models in AIDO.ModelGenerator. In addition, AIDO.ModelGenerator is rigorously tested to ensure efficiency and correctness for all FMs, datasets, and tasks, allowing scientists to skip setup and run models directly on their data.

The current version of AIDO.ModelGenerator is open source, with many pre-made experiments and tutorials to get started, serving as a starting point developing new and highly customizable workflows. It uses PyTorch for optimization and leverages HuggingFace for data management. While the current version is command-line (CLI) based, future releases will provide a fully supported user interface (UI) to enable non-technical users to utilize AIDO.ModelGenerator for their own data processing and analysis needs.

In addition to the general package, this current release of AIDO provides several new and updated foundation models that allow us to connect more biological modalities and scales. This includes 

  • AIDO.StructurePrediction: structure prediction model that handles all biomolecules in a unified model, including protein, DNA, RNA, ligands, and multi-molecular complexes.
  • AIDO.Protein-RAG: protein modeling that connects MSA and structure tokens
  • AIDO.Tissue: context-aware model trained on spatial transcriptomics data, which connects single-cell and tissue modeling


The release also contains a number of use cases that demonstrate how a combination of the different FMs can be used to address real biological problems in drug design, mRNA vaccine optimization, and target identification.

Specifically, the new release contains the following updates:

New Foundation Models 

AIDO.StructurePrediction

Predicted molecular structures across six key interaction types, from antibodies, nanobodies, RNA, and ligand-bound proteins. Try it now in the tutorial.

A multi-molecule structure prediction model designed for complex biological systems, which achieves SOTA performance for immunology problems. Inspired by AlphaFold3, this model supports a wide range of data types, including proteins, DNA, RNA, ligands, antibodies, nanobodies, and complex multi-molecule interactions.

For more information, please visit:

AIDO.Protein-RAG

Comparison of supervised fitness prediction on the ProteinGym benchmark across different protein property prediction models.

A next-generation protein foundation model that goes beyond sequences, using evolutionary data across species (MSA) and structure information from AIDO.StructureTokenizer to better understand and predict protein behavior. The model itself is evolved from the AIDO.Protein base model, by training on another 180 billion tokens from public protein datasets. On the ProteinGym benchmark, AIDO.Protein-RAG achieves state-of-the-art results, outperforming previous models.

To learn more, please visit:

AIDO.Tissue

A SOTA context-aware cellular language model evolved from AIDO.Cell through spatial pre-training to understand each cell’s role in complex tissues.

Additional information below:

  • Overview
  • Finetuning
  • Results
  • How to use

Open-source Models

AIDO.ModelGenerator provides unprecedented functionality for standardized benchmarking and rapid prototyping with biological foundation models, in addition to the SOTA AIDO models, AIDO.ModelGenerator now supports open-source models from the community to enable users to benchmark and select the best model for their use case.

  • This includes direct support for the popular Enformer, Borzoi, scFoundation, and ESM2 models, as well as the ability to directly work with any compatible model across HuggingFace.


These are intended to work independently and together, which is part of our long-term goal of building a cohesive system of interoperable biological models and a thriving research community.

New Benchmarks Datasets

Visualization from the new target identification tutorial using AIDO.Cell

Models are only as good as the data and tasks on which we evaluate them. We’re also preparing to release some of the largest benchmark datasets harmonized under our AIDO.ModelGenerator framework:

These benchmarks are designed to be practical, diverse, and representative of real-world biological complexity—giving developers, researchers, and collaborators a common ground for measuring performance.

Available Utilities 

Alongside models and datasets, we’re also packaging a few utilities that help with experimentation and scaling. These include:

Use Cases, Not Just Releases

To make it easier to get started, we’ll also be rolling out a set of tutorials and demos—short walkthroughs that show how these models can be combined to address end-to-end real-world problems in biology and medicine. Topics include:


Whether you’re experimenting with a new use case or evaluating integration into your pipeline, these examples will offer practical starting points.

This is only part of what’s to come, and we’re looking forward to seeing what the community builds with AIDO.ModelGenerator. The models and resources are released openly on Hugging Face and GitHub. To stay tuned, follow us on YouTube, LinkedIn, and X for updates.

Stay Informed

Get the best articles every day for FREE. Cancel anytime.