MCP addition to RAG workflow project
- Ken Munson
- Mar 2
- 3 min read
Updated: Mar 31
From RAG to MCP: Turning a Document Search System into a

Composable AI Capability.
Executive Summary
In this project, I extended an earlier Retrieval-Augmented Generation (RAG) system into a fully operational MCP (Model Context Protocol) server, exposing document search as a standardized, inspectable AI capability.
The result was not just a working tool — it was a deeper understanding of modern AI architecture: separating model reasoning from orchestration, retrieval, and execution.
This post documents:
The original RAG system (Vertex AI + FAISS)
Why MCP matters
How I wrapped RAG behind an MCP server
What I learned about orchestration, context, and AI systems design
At a super high level, here is how to think about MCP architecture:
Phase 1: Building a Production-Style RAG System
The original project implemented a structured RAG pipeline:
Architecture Overview
Documents → Chunking → Embeddings → FAISS Vector Store ↓ Query Embedding ↓ Similarity / MMR Search ↓ Context Injection → LLMStack
Embeddings: Vertex AI (text-embedding-004)
Vector store: FAISS (local persistent index)
Chunking strategy: 600–1000 token segments with overlap
Search modes: Similarity + MMR
Interface: CLI + optional FastAPI
What This Solved
Scalable retrieval from large PDFs
Context-aware answers grounded in source documents
Reduced hallucination risk
Full control over chunking, ranking, and injection
This project established something important:
The model does not “know” your documents. The application layer retrieves and injects relevant context.
That insight set up the next phase.
Phase 2: Understanding the Missing Layer — Orchestration
During the RAG build, a larger architectural realization emerged:
The LLM is stateless.
The LLM never accesses external systems directly.
The application/orchestration layer:
Maintains context
Decides when retrieval is needed
Calls tools
Injects results into the prompt
This led directly into MCP.
Phase 3: Wrapping RAG Behind MCP
Instead of treating RAG as an internal pipeline, I exposed it as a standardized capability using MCP.
What MCP Adds:
MCP is a protocol that allows AI clients to:
Discover tools
Read resources
Execute structured operations
Maintain strict boundaries between reasoning and execution
Instead of hard-wiring document search into an application, I defined:
MCP Resources
askyourdocs://statsReturns FAISS index metadata (vector count, mapped chunks)
MCP Tools
search_docs
Parameters:
query
k
method (similarity or MMR)
Returns structured search results
The RAG logic remained intact — but now it was accessible through a clean, inspectable protocol boundary.
The Architecture After MCP
User ↓Application / Orchestration Layer ↓LLM ↓ (structured tool request)MCP Client ↓MCP Server (my server.py) ↓RAG Engine (FAISS + embeddings) ↓Results injected back into LLMImportant:
The model never touches FAISS.
The model never executes search.
The model emits structured intent.
The application layer executes the request.
This separation is critical in enterprise environments.
What I Learned (The Real Value)
1. RAG Is About Knowledge
RAG solves:
Large unstructured documents or data (the new buzz word is corpora)
Context window limits
Scalable semantic recall
It is a retrieval strategy, not a protocol.
2. MCP Is About Capability Boundaries
MCP solves:
Standardized tool exposure
Clean separation of reasoning vs execution
Multi-tool composability
Auditable, controlled system integration
It is a control-plane protocol, not a retrieval engine.
3. The LLM Is Not the System
This project reinforced something fundamental:
The LLM is a reasoning engine. The application layer is the system.
The intelligence of the overall solution depends more on:
Context construction
Tool exposure
Retrieval strategy
Guardrails
Orchestration logic
than on the raw model itself.
4. MCP Does Not Replace RAG
At first glance, it might seem MCP makes RAG unnecessary. It does not.
MCP standardizes how tools are called
RAG determines what knowledge is retrieved
In practice, MCP often wraps RAG.
Operational Lessons
Dev Mode vs Production Mode
The MCP Inspector:
Launches the server
Proxies stdio traffic
Allows manual tool invocation
Is a development and validation client
It is not part of production runtime.
Structured Tool Calls
The model does not execute search.
It emits:
{ "tool": "search_docs", "arguments": { ... }}The orchestration layer:
Parses this
Executes the tool
Injects results into the model prompt
The model only ever sees text.
Why This Matters
This project demonstrates competency in:
Vector-based RAG design
Embedding workflows
Chunking strategy
Tool schema design
MCP protocol integration
Context orchestration
AI systems architecture
More importantly, it shows understanding of:
Separation of concerns
Capability boundaries
Model vs control-plane architecture
Secure tool invocation patterns
This is not just “using AI. ”This is building AI systems correctly.
Final Reflection
The most valuable realization from this project was not technical — it was architectural:
Models reason. Applications orchestrate. Protocols expose capabilities. Retrieval selects knowledge.
Once this framework clicked, MCP, RAG, and tool use all became coherent parts of the same system.




Comments