What is the real cost of analyzing a full tender specification with Claude in long context mode?

An 80-page specification runs to roughly 80,000 to 100,000 input tokens. With Claude Sonnet 4, the cost is approximately 0.24 to 0.30 EUR per full analysis (input plus output). Across 40 tenders per year, that is 10 to 12 EUR in annual LLM cost for specification analysis, well below the cost of one hour of a senior engineer's time. Cost increases if you re-run the model multiple times to refine individual sections.

Can you use Mistral on-premises to keep tender documents confidential?

Yes. Mistral Large or Mistral Medium deployed locally via Ollama or vLLM on GPU ensures no confidential document is sent to a third-party cloud. Mistral Large has a 128,000-token context window, sufficient for mid-size specification documents. The trade-off is slightly lower writing quality compared to Claude 3.5 Sonnet on complex methodology sections, plus GPU infrastructure costs to factor into the TCO.

Long Context vs RAG for AI Tender Response

You want to deploy an AI agent to accelerate your tender responses. You have read that Claude can ingest 200,000 tokens in a single request. You have also read that RAG is the reference pattern for enterprise document bases. Which one should you choose? The honest answer: it depends on your situation, and the choice has concrete consequences for draft quality, cost per submission, and data confidentiality.

This article is not about RFP workflow orchestration (we covered that in detail in our n8n agent guide) nor about the overall ROI of AI on tenders. This is an architecture article, written for teams that have already decided to build and want to avoid ending up with a system that hallucinates references or costs ten times too much in tokens.

We will walk through the three available approaches, their respective limits, the decision framework we use at Tensoria for clients in construction, engineering firms, and IT services companies, and the pitfalls that are most expensive to fix mid-project.

Architecture decision diagram: long context vs RAG for an AI tender response agent — Long context or RAG: an architecture choice that determines draft quality, cost per submission, and data sovereignty.

In brief

Long context: load the full specification document into the prompt. Effective under 80 pages, zero information loss, higher cost per call.
RAG: semantic search over your winning reference base. Essential when the historical corpus exceeds 50 documents. Requires a high-quality structured base.
Hybrid architecture: long context on the incoming specification, RAG on the internal base. The recommended production pattern for engineering firms, IT services companies, and SMEs in construction.
Non-negotiable prerequisite: the internal reference base (project sheets, winning proposals, team CVs) must exist before deploying anything.
Confidentiality: Mistral on-premises for sensitive sectors (defense, critical infrastructure, strict client NDA), Claude or GPT-4o on sovereign cloud for everyone else.

1. Why this architecture choice is not trivial

A tender response AI agent actually works with two very different types of corpus, and this is where most projects take the wrong approach.

The first corpus is the incoming tender specification document: the technical requirements document, the consultation rules, the bill of quantities, the administrative clauses, and sometimes technical drawings. This document is new with every tender. It cannot be indexed in advance, and a single missed requirement in the analysis can eliminate your submission before anyone reads the technical content. Precision is critical.

The second corpus is your internal reference base: winning responses from previous years, project delivery sheets, team CVs, certifications, and proprietary methodologies. This corpus grows over time, can reach several hundred documents, and is the raw material the agent will draw from to build reusable content for the new draft.

These two corpora have different characteristics. They do not call for the same architecture. Conflating them in a single approach, whether "put everything in context" or "route everything through RAG," is the primary source of disappointment during pilot phases.

2. Approach 1: long context, its strengths and its real limits

How it works on a specification document

With Claude 3.5 Sonnet (200,000 tokens) or GPT-4o (128,000 tokens), you can load a complete tender dossier in a single request. The agent receives the entire consultation document, and you instruct it to extract the key requirements, the scoring criteria and their weightings, format constraints, and differentiating points to address.

The advantage is real: no information is lost through chunk splitting. With RAG, a poorly delimited chunk can split a requirement in two, lose a criterion weighting, or miss a cross-reference between documents. On a specification where every requirement counts, a full model read is more reliable than fragmented retrieval.

The limits the benchmarks do not show

The first limit is what researchers call "lost in the middle": LLMs are less precise on information located in the middle of a very long context than on information placed at the beginning or end. On a 150-page specification where technical requirements are distributed throughout the document, this degradation becomes noticeable.

The second limit is cost. An 80-page specification runs to roughly 90,000 input tokens. At 3 EUR per million input tokens (Claude Sonnet 4 pricing), that is approximately 0.27 EUR per analysis. Negligible across 40 tenders per year. But if you re-run the model multiple times to refine sections, which happens consistently in practice, the cost per submission climbs fast. And long context does not solve the internal reference corpus problem: you cannot put 300 past technical proposals into the same context window.

The third limit is latency. Generating a response from a 100,000-token prompt takes several tens of seconds. Not a blocker for asynchronous use, but worth planning for if you want near-real-time feedback in your interface.

When long context alone is sufficient

It works well in two specific scenarios: when your firm responds to highly standardized tenders (same contract type, same document structure) and most content is produced by the expert rather than retrieved from historical records. And when you are in a POC phase with few historical responses available, where long context lets you start quickly without having to build a vector database.

3. Approach 2: RAG over the internal reference base

RAG solves the historical corpus problem

When your reference base exceeds around fifty documents, RAG becomes essential. The principle: each past winning response, each project sheet, each CV and each certification is split into semantic chunks, encoded as vectors, and indexed in a vector store (Qdrant, Weaviate, or Chroma depending on your stack).

When the agent works on a new tender, it extracts keywords from the specification requirements, queries the vector store, and retrieves the most relevant passages from your history. It then builds the draft for each section from that content, adapted to the context of the new contract.

This is the pattern we deploy for engineering firms and IT services companies that respond to high volumes of tenders with a rich reference base. The reuse rate of sections retrieved by RAG sits between 70 and 85 percent when the base is homogeneous and well structured.

The RAG limits to plan for in advance

The first limit is the most critical in a tender context: the risk of reference hallucination. If the retrieved chunk mentions a project figure (budget, user count, mission duration) and the model reuses it with a slight distortion to fit the new context better, you end up with an incorrect reference in your technical proposal. In a public tender where references can be verified by the contracting authority, that is an elimination risk. RAG must be configured in strictly sourced mode: the agent cites exact excerpts; it does not freely paraphrase factual data.

The second limit is chunking quality. An 80-page technical document split into 512-token chunks without respecting the logical structure of the document will produce chunks that cut tables in half, separate a requirement from its weighting, or mix two different work packages in the same fragment. The data preparation phase, including PDF parsing, semantic chunking, and metadata enrichment, typically represents 40 percent of total project effort.

The third limit: RAG does not solve the analysis of the incoming specification. You cannot index in advance a tender document you have not received yet. For analyzing the requirements of the new contract, you need a different strategy, and that is where long context comes in.

What the reference base needs to contain for RAG to work

An effective RAG base for tenders is structured around three document types:

Delivered project sheets (one to two pages per project): client context, mission performed, measurable outcomes, technologies or methods used, duration, budget. This is the content the model will adapt to draft the "Similar References" sections of your responses.
Validated methodology sections: project approach, mission organization, standard deliverables, standard timeline, by contract type. These sections are the most stable and the most reusable from one tender to the next.
Structured team profiles and CVs: skills, past experience by sector, certifications. Essential for the "Proposed Team" sections of technical proposals in IT services and engineering firms.

The golden rule: if your team has not formalized this base before starting the AI project, building the documentation base will be the real project, and the technical development will be its extension.

4. Approach 3: the hybrid architecture recommended for production

The principle: two corpora, two strategies

The architecture we recommend for organizations responding to more than 20 tenders per year combines both approaches, applying each to its natural corpus:

Long context on the incoming specification: the full consultation dossier for the new tender is loaded into Claude's context window. The agent extracts requirements, scoring criteria and weightings, format constraints, and differentiating points to address. Zero information loss on the document that determines whether your submission qualifies or gets eliminated.
RAG on the internal reference base: for each section of the response to build, the agent queries the vector store and retrieves the most relevant passages from your historical content. It adapts that content to the context of the new contract without starting from scratch.

Both results are combined in the generation prompt for each section: the agent has both the precise requirements of the current specification and the relevant reference content from your history. This is the "context engineering" pattern: giving the model exactly the information it needs at the right moment, nothing more, nothing less.

The complete processing flow

Here is how this flow materializes in practice, in a LangGraph or n8n stack:

Tender dossier ingestion: parsing PDF and DOCX files via PyMuPDF or Azure Document Intelligence, structured text extraction.
Specification analysis in long context: sending the complete dossier to Claude with a structured extraction prompt (mandatory requirements, scoring criteria, weightings, expected format, submission deadline). Output: a structured JSON of requirements.
Section decomposition: the agent plans the response sections based on the specification structure and extracted requirements.
RAG per section: for each section, a semantic query over the internal vector base. Metadata filtering (contract type, sector, similar size). Retrieval of the 3 to 5 most relevant chunks with their relevance scores.
Generation of each section: a prompt combining the section requirements (from the specification), the historical passages (from RAG), and format constraints. The model produces a sourced draft.
Compliance checklist: automated verification that every mandatory requirement from the specification is addressed in the draft.
Export and submission to the expert: DOCX generation via python-docx or Google Docs API, notification with the draft and the checklist.

5. The decision framework by document volume

Here is the framework we use in the first client session to guide the architecture choice:

Situation	Recommended architecture	Reference stack
Specification under 50 pages, reference base under 30 docs	Long context only	Claude Sonnet + Make / n8n
Specification 50-120 pages, base 30-200 docs	Hybrid (recommended)	Claude + Qdrant + LangGraph or n8n
Specification over 120 pages or multi-lot tender dossier	RAG on spec + RAG on base	LangGraph + Qdrant + semantic chunking
Maximum confidentiality (critical infrastructure, defense, strict NDA)	Hybrid on-premises	Mistral Large (Ollama/vLLM) + Qdrant on-premises
Base over 300 docs, multi-sector	RAG with metadata filtering	Qdrant or Weaviate + sector/size filters

6. Choosing the LLM for your context

Claude 3.5 Sonnet for long context and writing quality

For sections requiring high writing quality, methodology, project approach presentation, argumentation on qualitative criteria, Claude 3.5 Sonnet remains the reference model in 2026. Its 200,000-token context window covers virtually all specification documents encountered in French public procurement. It handles long instructions and complex format constraints well.

The limitation: it is a cloud model. Every document you send to it transits through Anthropic's servers. For most engineering firms and construction SMEs responding to standard public contracts, this is not a problem. For IT services companies with strict client NDAs, or organizations working on defense or critical infrastructure contracts, it is a dealbreaker.

Mistral on-premises for confidentiality

Mistral Large, deployed locally via Ollama (for low volumes) or vLLM on GPU (for production), offers a 128,000-token context window and writing quality sufficient for the standard sections of technical proposals. No document leaves your infrastructure.

Infrastructure cost must be factored into the TCO: an A100 or H100 GPU on cloud rental costs between 2 and 4 EUR per hour. For intermittent use (no real-time processing), on-demand instances can remain economical. For intensive use, the calculation must be done case by case.

GPT-4o is a viable third option, with a 128,000-token window and quality comparable to Claude on shorter sections. It is the natural choice if your stack is already integrated into the Microsoft Azure ecosystem (Azure OpenAI Service, SharePoint, Teams).

7. Integrations: ERP systems, CRM, n8n

The agent does not live in isolation. The organizations that extract the most value are those that integrate it into their existing business tools rather than running it as a parallel tool that no one actually uses.

For construction SMEs using ERP or estimating software: the agent can read existing cost elements (unit price schedules, unit rates) to pre-fill the bill of quantities and generate the technical proposal in line with the financial offer. Integration typically goes through CSV or Excel exports from these ERPs, which generally lack native APIs.

For IT services and consulting firms on HubSpot or Salesforce: the agent can feed CRM opportunities (automatic creation of the tender record, attachment of the generated draft, status tracking) and pull from the CRM any information about existing contacts within the buying organization, valuable context for personalizing the technical proposal introduction.

For orchestration: n8n remains the reference stack for organizations that want a visual, self-hostable workflow system integrable with their tools without heavy backend development. LangGraph is the right choice when you need complex persistent state (multiple analysis passes, section refinement loops, granular error handling). Make is an alternative to n8n if your team is comfortable with no-code tools and tender volume stays under 10 per month.

8. The internal reference base: the real project, not the AI

This is the point where all our tender projects hit a wall at some stage, and the one we consistently raise during the scoping phase.

The agent is only as good as the base it draws from. If your project references are not formalized, if your winning proposals are DOCX files scattered across network folders with no structure, if project sheets run to 20-page reports instead of one-page summaries, if team CVs have not been updated in 18 months, then RAG will surface unusable content.

The most common consequence: the model fills in the gaps with free generation. It invents plausible figures for missing references, distorts results to fit the new contract context, or merges two distinct projects into one that never existed. In a public tender where references can be verified, this is an elimination risk and potentially a legal one.

The right approach: before starting agent development, run a reference base structuring workshop. Define the standard project sheet format (context, mission, deliverables, results, team, budget, duration, technologies). Ask each project manager to retroactively fill in the last 10 to 15 delivered projects. This work takes 2 to 4 weeks but determines the agent's quality far more than the choice between Claude and Mistral.

9. Costs and timelines: POC, MVP, annual TCO

The figures below are based on projects we have delivered or scoped in 2025-2026 for engineering firms, IT services companies, and construction SMEs between 20 and 200 employees:

POC

6,000 to 10,000 EUR

6 to 8 weeks. Specification analysis in long context, RAG on a base of 30 to 80 references, generation of 3 to 4 key sections, basic compliance checklist. No automated tender monitoring.

Production MVP

15,000 to 28,000 EUR

3 months. Complete reference base with ingestion pipeline, full response generation across all sections, hybrid long context plus RAG architecture, CRM or ERP integration, monitoring dashboard.

Annual TCO

15,000 to 35,000 EUR

Of which LLM API 3,000-8,000 EUR, vector store hosting 1,000-3,000 EUR, tender monitoring aggregator 2,000-6,000 EUR, maintenance and enhancements 5,000-10,000 EUR, observability.

What consistently extends the timeline: building the reference base (unformalized project sheets, winning responses scattered across folders), calibrating writing quality (multiple iterations on reference tenders to tune prompts), and integrating with tender monitoring APIs that are sometimes poorly documented or unstable.

10. Four pitfalls that cost the most to fix

Pitfall 1: starting development before the reference base exists

Without a structured base, the POC will "work" on the 5 or 10 test tenders you provide manually and collapse the day you try to scale it to your real historical corpus. Build the base in parallel with development, not after.

Pitfall 2: not configuring RAG in strictly sourced mode

The default configuration of most RAG frameworks allows the model to "complete" missing information with free generation. In a tender context, this is unacceptable for factual data (reference figures, durations, budgets, headcounts). Explicitly configure the system prompt so the model cites exact excerpts and states "information not available in the base" when it cannot find something, rather than inventing it.

Pitfall 3: underestimating human validation time

Teams expecting to divide their tender response time by five are often disappointed. The realistic gain is 40 to 60 percent of writing time. The draft reduces effort but does not eliminate it: the expert must validate, refine, and intellectually own each response. A poor-quality draft creates more correction work than it saves. Set this expectation clearly during the scoping phase to avoid disillusionment later.

Pitfall 4: choosing the cheapest model for technical sections

Methodology, mission organization, and technical differentiation sections are the ones that determine whether a tender is won or lost. This is not the right place to save on the model. A methodology section draft produced by a low-quality model requires as much correction time as manual writing. Reserve lightweight models for extraction and classification tasks; use Claude Sonnet or GPT-4o for generating the high-value sections.

11. Required safeguards before going to production

Three non-negotiable rules before putting this type of agent into production for a client:

Mandatory human review on all factual data: project references, figures, dates, client names, results. No factual data leaves without expert validation. The draft is a starting point, not a final output.
Systematic source citation in the draft: each section of the draft must reference the internal records it was drawn from. This lets the expert verify the relevance of the match and correct quickly if RAG retrieved an off-topic reference.
Feedback loop on tender outcomes: systematically track whether tenders where the agent draft was used were won or lost, and why. This is the only way to measure real impact and identify sections where the prompt or the RAG requires adjustment.

If you have strong confidentiality requirements, defense contracts, critical infrastructure clients, or strict NDA clauses on your project references, review our guide on self-hosted RAG architectures before choosing your stack.

Talk to an engineer

Not sure whether to go long context, RAG, or hybrid? We will tell you in one call.

Book a call

FAQ: Long Context vs RAG for tender response agents

Long context is the right call when the specification document is under 50 pages (roughly 60,000 to 80,000 tokens) and you need to guarantee no requirement is lost through chunk splitting. Beyond 80 pages, or when your internal reference base exceeds 200 documents, RAG becomes necessary for searching your historical content. The hybrid combination of both approaches remains the reference architecture for production deployments.

An 80-page specification runs to roughly 90,000 input tokens. At approximately 3 EUR per million tokens (Claude Sonnet 4), that is about 0.27 EUR per full analysis. Across 40 tenders per year, the LLM cost for specification analysis is negligible. Cost increases if you re-run the model multiple times to refine sections.

A RAG system over a tender reference base starts performing well at around 20 winning responses, provided they are homogeneous (same market type, same vertical). Below that threshold, semantic retrieval returns passages that are too generic. The optimal performance range is 50 to 300 structured reference sheets. Beyond that, RAG remains effective but requires metadata filtering (sector, contract size, geography) to avoid relevance dilution.

Yes. Mistral Large deployed via Ollama or vLLM on GPU covers specifications up to 128,000 tokens without sending any document to a third-party cloud. Writing quality is slightly lower than Claude on complex sections, and GPU infrastructure cost must be factored into the TCO.

The n8n agent article covers workflow orchestration: how to chain the steps via n8n. This article covers the upstream architecture decision: which information access strategy to choose (long context, RAG, or hybrid) based on your document volumes and cost and confidentiality constraints. The two topics are complementary.

The technical POC takes 6 to 8 weeks. What consistently extends the timeline is building the internal reference base: project sheets that were never formalized, winning responses scattered across network folders, consultant CVs that have not been updated. Budget 4 to 8 additional weeks for documentation structuring work, often running in parallel with development. Full production deployment typically lands between 3 and 5 months.