MCP addition to RAG workflow project

Ken Munson
Mar 2
3 min read

Updated: Mar 31

From RAG to MCP: Turning a Document Search System into a

Composable AI Capability.

Executive Summary

In this project, I extended an earlier Retrieval-Augmented Generation (RAG) system into a fully operational MCP (Model Context Protocol) server, exposing document search as a standardized, inspectable AI capability.

The result was not just a working tool — it was a deeper understanding of modern AI architecture: separating model reasoning from orchestration, retrieval, and execution.

This post documents:

The original RAG system (Vertex AI + FAISS)
Why MCP matters
How I wrapped RAG behind an MCP server
What I learned about orchestration, context, and AI systems design

At a super high level, here is how to think about MCP architecture:

Phase 1: Building a Production-Style RAG System

The original project implemented a structured RAG pipeline:

Architecture Overview

Documents → Chunking → Embeddings → FAISS Vector Store                                     ↓                              Query Embedding                                     ↓                           Similarity / MMR Search                                     ↓                           Context Injection → LLM

Stack

Embeddings: Vertex AI (text-embedding-004)
Vector store: FAISS (local persistent index)
Chunking strategy: 600–1000 token segments with overlap
Search modes: Similarity + MMR
Interface: CLI + optional FastAPI

What This Solved

Scalable retrieval from large PDFs
Context-aware answers grounded in source documents
Reduced hallucination risk
Full control over chunking, ranking, and injection

This project established something important:

The model does not “know” your documents. The application layer retrieves and injects relevant context.

That insight set up the next phase.

Phase 2: Understanding the Missing Layer — Orchestration

During the RAG build, a larger architectural realization emerged:

The LLM is stateless.
The LLM never accesses external systems directly.
The application/orchestration layer:
- Maintains context
- Decides when retrieval is needed
- Calls tools
- Injects results into the prompt

This led directly into MCP.

Phase 3: Wrapping RAG Behind MCP

Instead of treating RAG as an internal pipeline, I exposed it as a standardized capability using MCP.

What MCP Adds:

MCP is a protocol that allows AI clients to:

Discover tools
Read resources
Execute structured operations
Maintain strict boundaries between reasoning and execution

Instead of hard-wiring document search into an application, I defined:

MCP Resources

askyourdocs://statsReturns FAISS index metadata (vector count, mapped chunks)

MCP Tools

search_docs
- Parameters:
  - query
  - k
  - method (similarity or MMR)
- Returns structured search results

The RAG logic remained intact — but now it was accessible through a clean, inspectable protocol boundary.

The Architecture After MCP

User  ↓Application / Orchestration Layer  ↓LLM  ↓ (structured tool request)MCP Client  ↓MCP Server (my server.py)  ↓RAG Engine (FAISS + embeddings)  ↓Results injected back into LLM

Important:

The model never touches FAISS.
The model never executes search.
The model emits structured intent.
The application layer executes the request.

This separation is critical in enterprise environments.

What I Learned (The Real Value)

1. RAG Is About Knowledge

RAG solves:

Large unstructured documents or data (the new buzz word is corpora)
Context window limits
Scalable semantic recall

It is a retrieval strategy, not a protocol.

2. MCP Is About Capability Boundaries

MCP solves:

Standardized tool exposure
Clean separation of reasoning vs execution
Multi-tool composability
Auditable, controlled system integration

It is a control-plane protocol, not a retrieval engine.

3. The LLM Is Not the System

This project reinforced something fundamental:

The LLM is a reasoning engine. The application layer is the system.

The intelligence of the overall solution depends more on:

Context construction
Tool exposure
Retrieval strategy
Guardrails
Orchestration logic

than on the raw model itself.

4. MCP Does Not Replace RAG

At first glance, it might seem MCP makes RAG unnecessary. It does not.

MCP standardizes how tools are called
RAG determines what knowledge is retrieved

In practice, MCP often wraps RAG.

Operational Lessons

Dev Mode vs Production Mode

The MCP Inspector:

Launches the server
Proxies stdio traffic
Allows manual tool invocation
Is a development and validation client

It is not part of production runtime.

Structured Tool Calls

The model does not execute search.

It emits:

{  "tool": "search_docs",  "arguments": { ... }}

The orchestration layer:

Parses this
Executes the tool
Injects results into the model prompt

The model only ever sees text.

Why This Matters

This project demonstrates competency in:

Vector-based RAG design
Embedding workflows
Chunking strategy
Tool schema design
MCP protocol integration
Context orchestration
AI systems architecture

More importantly, it shows understanding of:

Separation of concerns
Capability boundaries
Model vs control-plane architecture
Secure tool invocation patterns

This is not just “using AI. ”This is building AI systems correctly.

Final Reflection

The most valuable realization from this project was not technical — it was architectural:

Models reason. Applications orchestrate. Protocols expose capabilities. Retrieval selects knowledge.

Once this framework clicked, MCP, RAG, and tool use all became coherent parts of the same system.

Governing the Machine