Oct 9, 2025 Current Resources

A121 Labs’ Jamba Reasoning 3B is a powerful tiny model that promises to transform AI economics

Generative artificial intelligence developer AI21 Labs Inc. says it wants to bring agentic AI workloads out of the data center and onto user’s devices with its newest model, Jamba Reasoning 3B.

Launched today, Jamba Reasoning 3B is one of the smallest models the company has ever released, the latest addition to the Jamba family of open-source models available under an Apache 2.0 license. It’s a small language model or SLM that’s built atop AI21 Labs’ own hybrid SSM-transformer architecture, making it different from most large language models, which are based on transformer-only frameworks.

SSM means that it’s a “state space model” which refers to a class of highly efficient algorithms for sequential modeling that identify a current state and then predict what the next state will be.

Jamba Reasoning 3B combines the Transformers architecture with AI21 Labs’ own Mamba neural network architecture and boasts a context window length of 256,000 tokens, with the ability to handle up to 1 million. It demonstrates efficiency gains of between two and five times that of similar lightweight models.

In a blog post, the company explained that Jamba Reasoning 3B utilizes rope scaling technology to stretch its attention mechanism, allowing it to handle tasks with much less compute power than larger models.

AI21 Labs highlighted its impressive performance, with a “combined intelligence” and “output tokens per second” ratio that surpasses similarly sized LLMs such as Alibaba Cloud’s Qwen 3.4B, Google LLC’s Gemma 3.4B, Meta Platforms Inc.’s Llama 3.2 3B, IBM Corp’s Granite 4.0 Micro and Microsoft’s Phi-4 Mini. That evaluation was based on a series of benchmarks, including IFBench, MMLU-Pro and Humanity’s Last Exam.

AI21 Labs believes there will be a big market for tiny language models such as Jamba Reasoning 3B, which is designed to be customized using retrieval-augmented generation techniques that provide it with more contextual knowledge.

The company cites research that shows how 40% to 70% of AI tasks in enterprises can be handled efficiently by smaller models. In doing so, companies can benefit from 10 to 30 times lower costs. “On-device SLMs like Jamba Reasoning 3B enable cost-effective, heterogeneous compute allocation — processing simple tasks locally while reserving cloud resources for complex reasoning,” the company explained.

SLMs can also power most AI agents, which perform tasks autonomously on behalf of human workers, with a high degree of efficiency, the company said. In agentic workflows, Jamba Reasoning 3B can act like an “on-device controller” orchestrating their operations, activating cloud-baed LLMs only when the extra compute power is needed to get more sophisticated tasks done. That means SLMs can potentially power much lower-latency agentic workflows, with additional benefits such as offline resilience and enhanced data privacy.

“This ushers in a decentralized AI era, akin to the 1980s shift from mainframes to personal computers, empowering local computation while seamlessly integrating cloud capabilities for greater scalability,” the company wrote.

AI21 Labs co-Chief Executive Ori Goshen told VentureBeat in an interview that SLMs like Jamba Reasoning 3B can free up data centers to focus only on the hardest AI problems and help to solve economic challenges faced by the industry. “What we’re seeing right now in the industry is an economics issue, where there are very expensive data center buildouts, and the revenue that is generated [from them] versus the depreciation rate of all their chips shows that the math doesn’t add up,” he said.

The company provided a number of examples of where AI is better processed locally by SMBs. Contact centers can run customer service agents on small devices to handle customer calls and decide if they can handle issues themselves, if a more powerful model should do it, or if the issue needs to be taken care of by a human agent.

Futurum Group analyst Brad Shimmin told AI Business that the theory behind state space models is an old one, but until recently the technology hasn’t existed to create them. “Now you can use this state space model idea because it scales really well and is extremely fast,” he said.

Images: AI21 Labs

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.