Turning My Personal OneNote Into a Searchable RAG System

Ken Munson
Mar 22
7 min read

Until recently, I had been curious about RAG systems in the abstract. I understood the basic promise: take a body of documents, convert them into embeddings, store them in a vector index, and then retrieve the most relevant pieces when a question comes in. But that was still fairly theoretical to me. About 2 months ago, I did my first RAG project which is in a prior post on this site.

But lately, something came to me: what if I could load my own OneNote into a RAG type system ?

I have used OneNote personally and professionally for about 15 years as a kind of personal knowledge base. My personal OneNote file is over 50 Gig of notes, copied excerpts, project thoughts, architectural ideas, technical concepts, and half-finished lines of reasoning. In other words, it is exactly the kind of material I would want a retrieval system to search on my behalf.

So instead of treating RAG as a generic AI pattern, I decided to test it against something concrete: my own notes.

The test case: my personal OneNote, starting with the AI section

My main notebook is very large, so I did not try to ingest everything at once. The first practical decision was to use only one part of the notebook as a pilot. I chose the AI section as the test case.

That turned out to be the right move. It kept the scope manageable, gave me a realistic technical corpus to work with, and let me test the workflow without committing to a giant ingestion job on day one.

So the project became:

Take the AI section from my personal OneNote, export it, chunk it, embed it, place it into FAISS, and test whether retrieval actually works.

That is the essence of this experiment.

What RAG actually means in plain English

Before this project, I think I understood RAG okay because of the first RAG project I did. But now I would say I could adapt this process to a new corpus without asking for you-know-who's help.

Here is the simplest way I would describe it now - still very similar to my initial understanding:

A RAG system does not rely on a model already “knowing” your documents. Instead, it works more like this:

take your source material
break it into chunks
convert each chunk into a numerical vector called an embedding (Vertex AI Embeddings)
store those vectors in a searchable index (FAISS)
when you ask a question, convert the question into an embedding too (new understanding!)
find the chunks whose embeddings are closest to the question embedding
return those chunks as the context for answering

So the system is not memorizing my OneNote. It is retrieving the most relevant parts of it at query time.

That distinction matters a lot.

Step 1: export the OneNote content

The first major step was getting the note content out of OneNote in a usable form.

That was done through the Microsoft Graph API (who knew this was even a thing) using delegated interactive authentication. Once that was working, I was able to:

identify my notebook
identify the AI section
list pages in that section
download each page’s HTML
convert the HTML to plain text
save text plus metadata locally

The export gave me a local corpus made up of:

.html files
.txt files
.json sidecar metadata
a manifest

That was the point where the project stopped being “an idea” and became a usable document pipeline.

Step 2: chunk the exported notes

Once the text was exported, the next question was how to break it into chunks.

This turned out to matter a lot more than I initially appreciated.

At first, the chunker created far too many fragments. The output technically worked, but many of the chunks were tiny and ugly. Some had more metadata than actual text. That would have made retrieval noisy and much less useful.

So the chunking logic was refined until it produced chunks that were:

larger
more coherent
more readable
better aligned with actual ideas instead of random formatting breaks

That was one of the biggest learning points in the whole exercise: chunking quality strongly affects retrieval quality.

In the end, the pilot chunk file was good enough to move forward, even though it was not perfect.

Step 3: generate embeddings with Vertex AI

After chunking, the next stage was to convert each chunk into an embedding.

For this project I used Vertex AI text embeddings, specifically gemini-embedding-001, which Vertex documents as a text embedding model. Vertex also supports retrieval-oriented task types such as RETRIEVAL_DOCUMENT for corpus chunks and RETRIEVAL_QUERY for incoming user questions, which is exactly what I used in this flow. The default embedding size for gemini-embedding-001 is 3072 dimensions. That gives a new meaning to high dimensional vector space! That means each chunk of my OneNote text was converted into a 3072-number vector.

Those numbers are not human-readable, of course. But they capture the semantic meaning of the text in a way that lets similar ideas sit near each other in vector space.

So when I embedded my OneNote chunks, I was effectively turning my notes into points in a mathematical space that can be searched by meaning rather than simple keyword matching.

That is a pretty amazing sentence to be able to write about one’s own notes.

Step 4: build a FAISS index

Once the embeddings were created, they were loaded into FAISS, which is a local vector search library developed by Meta for similarity search over dense vectors. FAISS supports indexes that can search for the nearest vectors to a query vector, which makes it a very common choice for local RAG experiments and prototypes.

In plain language, FAISS became the searchable vector memory for my chunked OneNote notes.

So at that point the pipeline looked like this:

Personal OneNote → exported text → cleaned/chunked notes → Vertex embeddings → FAISS index

That is the core RAG workflow.

Step 5: ask questions against my own notes

This was the point where the project became genuinely satisfying.

I embedded a query using Vertex AI with the retrieval query task type, searched the FAISS index, and got back the top matching chunks from my own OneNote-derived corpus.

And the results were good.

Not in a vague “AI demo” way. In a practical way.

The top five chunks were enough to answer the kinds of questions I would naturally ask of this material, without overwhelming me. They were readable in a couple of minutes and clearly relevant.

That was the moment the full workflow really clicked for me.

This is not magic. It is not the model “remembering” my notebook. It is a structured retrieval pipeline built over my own notes.

And it worked.

So where is the “inference” happening?

This was one of my questions too, and it is worth explaining clearly.

There are really two different kinds of model activity in this workflow.

1. Embedding inference

When I sent my note chunks to Vertex AI, Vertex ran the embedding model on them and returned vectors. That model computation happens in Google’s Vertex AI service. Likewise, when I typed a question, Vertex embedded the question too. That is part of the Vertex AI component.

So yes: the “where is the model” question, in this case, is answered partly by Vertex AI in Google Cloud, because that is where the embedding model call is happening.

2. Vector search

The actual retrieval over those embeddings was done locally in FAISS on my machine. FAISS is not an LLM. It is the similarity search engine over the vectors.

So the retrieval flow is split:

Vertex AI: converts text and questions into embeddings
FAISS: searches those embeddings locally

If I wanted a full question-answering system afterward, I could take the retrieved chunks and send them to a generative model for summarization or synthesis. But the pilot I built here was focused first on retrieval quality.

That was the right decision.

Why using my personal OneNote mattered

This was not just a generic RAG demo over random PDFs.

The input was my personal OneNote, and that made all the difference.

Why?

Because the results felt useful in a deeply personal and practical way. I was not asking, “Can this system answer trivia?” I was asking, “Can this system find what I previously wrote or collected about this topic?”

That is a much more compelling use case.

And because I used the AI section as the pilot corpus, the questions were naturally aligned with the content: embeddings, vector search, OneNote as a knowledge base, RAG architecture, Obsidian comparisons, knowledge graphs, and so on.

It was the right slice of the notebook to test first.

What I learned

If I had to summarize the biggest lessons from this experiment, they would be these.

1. RAG is easier to understand when the corpus is personal

Once I used my own notes, the workflow stopped being abstract.

2. Chunking matters a lot

Bad chunks produce bad retrieval. Good-enough chunks produce surprisingly good results.

3. Metadata matters too

Page title, section, source file, and chunk index all make the system much more usable.

4. Vertex + FAISS is a very workable combination

Vertex handled embeddings. FAISS handled vector search. That division of labor was simple and effective. Vertex’s task-type support for retrieval use cases was particularly helpful.

5. Small pilots are the right way to start

Beginning with one bounded part of the notebook, in this case the AI section, was absolutely the right choice.

Where this could go next

Now that the pilot is working, the obvious next move is to scale beyond the sample data I used and likely ingest the entire AI area of my OneNote.

There is also cleanup work still worth doing, especially removing some formatting artifacts left behind by the HTML-to-text conversion. But that is refinement, not reinvention.

The core pipeline is already proven.

And that is the important part.

Final thought

For me, the most exciting part of this project is that it takes something I already built over years, namely my own OneNote knowledge base, and gives it a new interface.

I am not replacing my note system.

I am making it searchable by meaning.

Governing the Machine