Tensoria
AI Tools By Anas R.

Mistral Small 4, the Multimodal AI Model That Replaces Three Models at Once

Lire cet article en français →

Until now, handling diverse AI tasks meant juggling multiple models: one for complex reasoning, another for image analysis, a third for code. Each with its own API, its own costs, its own quirks. For a business, that is unnecessary complexity.

Mistral Small 4 solves this. Launched in March 2026, it is Mistral AI's first model to unify reasoning, vision, and code in a single package. With 119 billion total parameters but only 6.5 billion active per request, it delivers the performance of a massive model at the cost of a lightweight one. Here is what that means in practice.

What Mistral Small 4 Actually Changes

Mistral Small 4 is the result of merging three of Mistral AI's specialized models into one:

  • Magistral: deep reasoning, logical analysis, complex problem-solving
  • Pixtral: image understanding, OCR, chart analysis, and visual document processing
  • Devstral: code generation, debugging, development agents

Previously, if you wanted to analyze a PDF containing charts (vision), draw conclusions from it (reasoning), and generate a script to automate the processing (code), you had to chain multiple API calls to different models. With Small 4, a single call handles all of it.

Specification Mistral Small 4
Total parameters 119 billion
Active parameters per token 6.5 billion (~22B at inference)
Architecture Mixture-of-Experts (128 experts, 4 active per pass)
Context window 256,000 tokens
Inputs Text + images
Capabilities Reasoning, vision, code, instruction following, agents
API price (input) $0.15/million tokens
API price (output) $0.60/million tokens
vs Small 3 40% lower latency, 3x throughput

The Mixture-of-Experts Architecture, Simply Explained

Mistral Small 4 uses a MoE (Mixture-of-Experts) architecture. This is what makes it both powerful and economical. Here is the principle in plain terms.

How MoE works

Imagine a firm of 128 specialists. For every question, only the 4 most relevant experts are called in — the other 124 stay on standby. The result: you get access to the knowledge of 128 experts but only pay for the work of 4. That is exactly what MoE does: 119 billion parameters of knowledge, but only 6.5 billion computations per token.

What this means in practice for a business:

  • Lower cost: you pay the price of a 6.5B model, not a 119B model
  • Speed: 40% faster than Mistral Small 3, with 3x the requests-per-second throughput
  • Quality: responses benefit from the depth of 119B parameters, not just 6.5B

Benchmarks: Where Mistral Small 4 Stands

Benchmarks are not the whole story, but they give a useful reference. Here is how Mistral Small 4 performs compared to models in the same price bracket.

Benchmark Mistral Small 4 Observation
AA LCR 0.72 (1.6K chars) Comparable to Qwen models that need 3.5–4x more tokens
LiveCodeBench Outperforms GPT-OSS 120B With 20% fewer output tokens
AIME 2025 Comparable to GPT-OSS 120B High-level mathematical reasoning
Latency vs Small 3 -40% Measured in latency-optimized configuration
Throughput vs Small 3 3x Measured in throughput-optimized configuration

The key point: Mistral Small 4 produces shorter, more precise responses than its competitors. On AA LCR it scores 0.72 using 1.6K characters, while Qwen models need 5.8–6.1K characters for a comparable result. Fewer output tokens means lower cost and more direct answers.

6 Concrete Enterprise Use Cases

1. Document analysis with charts and tables

A CFO receives a 50-page quarterly report with charts, tables, and narrative text. With Mistral Small 4, they can upload the document, ask questions about the charts, request a trend analysis, and get an executive summary — all in a single call. The 256K-token context window handles long documents without splitting.

2. Business code automation

An operations manager describes a data processing workflow in plain language. Mistral Small 4 generates the corresponding Python or SQL script, debugs it if needed, and suggests optimizations. The inherited Devstral capabilities make it a competent development assistant, not just a snippet generator.

3. Invoice extraction and structuring

The combination of vision and reasoning is particularly powerful for extracting data from scanned documents. Supplier invoices, purchase orders, technical data sheets: Small 4 reads the document (vision), extracts the relevant information (reasoning), and can structure it as JSON or CSV for injection into your ERP.

4. Tier-2 technical support

A support agent receives a ticket with an error screenshot. Mistral Small 4 analyzes the image, identifies the error message, reasons about likely causes, and proposes a resolution. It is an assistant that sees and reasons, not just a text-based chatbot.

5. Reasoning over complex data

Time-series analysis, experiment result interpretation, logistics planning: the reasoning capabilities inherited from Magistral handle problems that require multi-step thinking, not just text completion.

6. Multi-step AI agents

Mistral Small 4 is built to work as an autonomous agent. It can call tools (function calling), chain reasoning steps, and self-correct. It is the ideal building block for automated workflows with n8n or custom agent pipelines.

Want to integrate Mistral Small 4 into your business processes?

A free 30-minute diagnostic to identify the highest-ROI use cases for your company.

Book a call

Pricing and Real-World Costs

Mistral Small 4 is one of the most price-competitive models on the market for its level of performance.

Model Input ($/M tokens) Output ($/M tokens) Max context
Mistral Small 4 $0.15 $0.60 256K
GPT-4o mini $0.15 $0.60 128K
Claude Haiku 4.5 $0.80 $4.00 200K
GPT-4o $2.50 $10.00 128K
Mistral Large $2.00 $6.00 128K

What it actually costs in practice

A 10-page document (~5,000 tokens) analyzed by Mistral Small 4 costs roughly $0.001 in input. Even at 1,000 documents per month with detailed responses, the API budget stays under €50/month. That is the cost of a software subscription, not an AI project.

Mistral Small 4 vs the Alternatives

Criterion Mistral Small 4 GPT-4o mini Claude Haiku 4.5
Multimodal (vision) Yes Yes Yes
Advanced reasoning Yes (built-in reasoning mode) Limited Basic
Code generation Excellent (Devstral heritage) Good Good
Context window 256K tokens 128K tokens 200K tokens
Self-hosting Yes (open-weight) No No
Data sovereignty France / self-hosted USA USA
Input price $0.15/M $0.15/M $0.80/M
Multilingual quality Excellent (native French) Good Good

Our take: at the same price as GPT-4o mini, Mistral Small 4 offers double the context (256K vs 128K), significantly stronger reasoning, and the option to self-host. Compared to Claude Haiku 4.5, it is 5x cheaper with comparable performance. For European companies, the data sovereignty advantage and native French support are meaningful differentiators.

Deployment: API or Self-Hosted

Via the Mistral API (recommended for SMEs)

The simplest path. Create an account on console.mistral.ai, get an API key, and integrate Mistral Small 4 into your applications. Data is processed on Mistral's servers in France.

Self-hosting (for mid-market and large enterprises)

The model weights are available on Hugging Face. The required infrastructure is substantial:

Configuration Minimum Recommended
NVIDIA HGX H100 4x 4x
NVIDIA HGX H200 2x 4x
NVIDIA DGX B200 1x 2x

This is clearly reserved for companies with existing GPU infrastructure or a substantial cloud budget. For SMEs, the API is the pragmatic path. For full data-sovereignty needs with a lighter model, Ministral (3B, 8B) can be self-hosted on far more modest hardware.

If you need intermediate self-hosting performance, Mistral Small 3 (24B) remains an excellent option, deployable on a single GPU. For fine-tuning, Mistral Forge allows model customization without managing infrastructure.

How to Get Started with Mistral Small 4

  1. Test via Le Chat: Mistral's Le Chat uses Small 4 as its default model. Upload a document with charts and ask questions about it to judge quality
  2. Create an API account: on console.mistral.ai, get a key and test with the integrated playground. Try instruction, reasoning, and vision modes
  3. Prototype a use case: pick a concrete business process (invoice extraction, report analysis, technical support) and measure output quality against your real data
  4. Track real costs: monitor your usage on the Mistral dashboard for 2 weeks to project a realistic monthly budget
  5. Integrate into production: the Mistral API is compatible with the OpenAI format — migrating from GPT-4o mini is nearly seamless

Limitations to Know

Self-hosting is GPU-intensive

With 119B total parameters, self-hosting Mistral Small 4 requires significant GPU infrastructure (minimum 4x H100). This is not a model you can run on a laptop. For lightweight self-hosting, stick with Mistral Small 3 (24B) or the Ministral variants.

No image generation

Mistral Small 4 understands images but does not generate them. It can analyze a chart, read a scanned invoice, or interpret a technical diagram — but it does not create visuals. For image generation, Le Chat integrates Flux Ultra separately.

Still early in production

Launched in March 2026, the model has only a few weeks of production track record. Feedback on large-scale reliability, edge-case handling, and stability on long contexts is still limited. Test against your own data before deploying in critical workflows.

Creative writing and long-form content

Like earlier Mistral versions, Small 4 is optimized for efficiency and precision, not style. For creative writing, copywriting, or storytelling, GPT-4o and Claude remain generally stronger.

Frequently Asked Questions

Mistral Small 4 is a multimodal AI model from Mistral AI that combines deep reasoning, image understanding, and code generation in a single model. It uses a Mixture-of-Experts architecture with 119 billion total parameters but only 6.5 billion active per token, making it highly cost-efficient and fast.
Mistral Small 4 costs $0.15 per million input tokens and $0.60 per million output tokens via the Mistral API. It is one of the most price-competitive models at this performance tier — significantly cheaper than GPT-4o or Claude Sonnet.
Yes. Mistral Small 4 incorporates the multimodal capabilities of Pixtral. It can analyze images, extract text via OCR, interpret charts, and read technical diagrams or blueprints. This is particularly useful for companies processing visual documents such as invoices, architectural plans, or reports containing charts.
Yes. The model weights are available on Hugging Face. Self-hosting requires significant GPU infrastructure: a minimum of 4x NVIDIA H100, 2x H200, or 1x DGX B200. This is realistic for mid-market companies and large enterprises, but less practical for SMEs who will typically prefer the API.
Mistral Small 4 delivers 40% lower latency and 3x the throughput compared to Small 3. More importantly, it consolidates three of Mistral's specialized models into one: Magistral for reasoning, Pixtral for vision, and Devstral for code. One model instead of three.
Yes, via the API. At $0.15 per million input tokens, it is one of the most accessible models at this performance level. An SME processing 1,000 documents per month will spend a few tens of euros. Self-hosting, on the other hand, is reserved for companies with substantial GPU infrastructure.

Put Mistral Small 4 to work in your business

Mistral Small 4 is the most capable model in its price bracket. Integrating it into real business workflows to generate measurable ROI is our specialty.

Book a Free AI Audit

Further Reading

Go Further

Explore our LLM integration service or our AI audit offering, or get in touch to discuss your specific use case.

Anas Rabhi, data scientist specializing in generative AI and LLM systems
Anas Rabhi Data Scientist & Founder, Tensoria

I am a data scientist specializing in generative AI, with a focus on LLM fine-tuning, NLP, and production RAG systems. I build custom AI solutions that integrate into existing workflows and deliver concrete, measurable results: document intelligence, internal assistants, and process automation.