Pushed into the background by the sudden breakthrough across the Atlantic of DeepSeek, the new generation of Chinese AI, the presentation, on January 30, of Small-3, new AI model generative designed by the French startup Mistral, nonetheless constitutes a major event as its performances are impressive and promising and its technical options are more virtuous, breaking with dominant models, opening the doors to a rapidly expanding market for him Mistral has just signed a partnership with AFP which will greatly increase the relevance and timeliness of its responses.

Breaking at all levels for greater efficiency

The designers of Small-3 chose to limited to 24 billion parameters research when traditional competitors work on bases of 500 billion and more, with the exception of DeepSeek, precisely, whose research base is only 37 billion parameters.

The other good point of Small-3 is the consequence of these combined choices: a very high reactivity; it is in fact up to 3x faster than the competition, without altering the relevance and regularity of his responses, on the contrary.

Its light structure also makes it usable on ordinary computers such as the MacBook (32GB RAM) or PCs running on RTX 4090! No need for ultra-powerful and/or next-generation processors. Enough to widely open the market for individuals as well as businesses or communities.

Another differentiating and significant point for the development and distribution of Small-3 is that it is a open source model, which everyone can therefore use and adapt to their needs.

For Arthur Mensch, co-founder of Mistral AI in April 2023, it's simple: "Mistral Small-3 complements large open source reasoning models such as recent versions of DeepSeek and can serve as a solid base model to bring out reasoning capabilities."

ModelSettingsAccuracy (MMLU)Speed (Token/s)Open source?Recommended material
Mistral Small-324B81%150YESMACBOOK 32GB Ram or RTX 40990
Llama 3.370B85%50%NODedicated infrastructure
Qwen-2.5 32B83%60NO idem
GPT-4o-mini30B82%70NO idem

How to read the table: Small-3 has a smaller base, but its answers are as relevant as the competition and they are rendered faster. This model is freely modifiable and is compatible with "ordinary" hardware, such as MacBooks.

Faster, more flexible, more economical, therefore… better

Technically, Small-3 is quite similar to its Chinese competitor DeepSeek in that it has an optimized "Mixture of experts" type architecture, which could be compared in the business world to using specialized subcontractors to perform each task, the difficulty being to properly separate and "route" the tasks. 

To do this, Mistral has opted for a limited number of search layers, as indicated above, which reduces latency and speeds up processing while maintaining high precision. In this case, Small 3 has the best score on the MMLU benchmark: 81%!

  • Blazing speed. In terms of speed, Small 3 processes up to 150 tokens* per second, is up to three times faster than its competitors, significantly outperforming them in frequent scenarios, those requiring quick responses such as chatbots or conversational assistants. (See table below)
  • Freely adaptable model : Unlike proprietary solutions like GPT-4o-mini, Small 3 is freely modifiable and adaptable, making it attractive to companies seeking transparency and flexibility at a lower cost. A huge market, then.
  • These significant efficiency gains are also accompanied by much better energy efficiency. It results from the technical choices of Mistral AI: a restricted search base and responses generated via Mixture of experts, as seen above. All this influences energy consumption, and this exponential consumption of AI is one of the incidental fears of its development and its cost.
  • Transparent learning. Unlike some competitors like DeepSeek R1, Small 3 does not practice “reinforcement learning”, so it does not use synthetic data, thus promoting greater transparency in its training while preventing the risks of reproducible bias.

In summary, we will remember that Small-3 shares with DeepSeek R1 a philosophy focused on energy efficiency and high relevance of responses. Both models favor a compact architecture to reduce energy consumption while maintaining high performance. However, Small 3 stands out for its transparency thanks to its open source license and its absence of use of synthetic data, offering a robust alternative to competing so-called proprietary solutions.

Until the wind turns for Mistral AI and propels it to the top of the downloads…