March 5, 2026

Build a Local/Private RAG System with LlamaIndex and Python

A practical guide to implementing RAG on local documents using LlamaIndex. Load files, create embeddings, build a vector index, and query your data with Python.

πŸ“Œ Goal: Build a local RAG(Retrieval-Augmented Generation) system over files using LlamaIndex.
RAG sounds complicated. In reality it’s simple. You give the system documents. It chops them into pieces, finds the relevant ones, and the model answers using those pieces. No guessing. No hallucinating wild stuff. Just answers grounded in your files. πŸ“‚
Perfect for contracts, internal docs, research notes, or anything that lives in folders on your machine.

🧰 Setup

Install Python 🐍

Download Python from the official site and install it.
Check it in terminal:
python --version
If it prints a version number, you’re good. If not, your terminal is silently judging you.
Β 

Create a project folder πŸ“

mkdir rag-project cd rag-project
This keeps everything clean instead of scattering Python files around your computer like digital confetti.

Create a virtual environment πŸ§ͺ

python -m venv venv
Activate it.
macOS / Linux
source venv/bin/activate
Windows
venv\Scripts\activate
Your terminal should now show (venv). Congratulations, you entered Python’s tiny universe.
notion image

Add your documents πŸ“„

Create a folder called data.
Put your PDF, TXT, DOCX files there.
rag-project β”œβ”€β”€ data β”‚ β”œβ”€β”€ contract.pdf β”‚ β”œβ”€β”€ notes.txt β”‚ └── handbook.docx
These are the knowledge base your RAG system will search.

Optional: Add an API key πŸ”‘

If you plan to use hosted models like OpenAI/Anthropic, set the key.
macOS / Linux
export OPENAI_API_KEY="your_api_key"
Windows
setx OPENAI_API_KEY "your_api_key"
If you plan to run fully local models later, you can skip this.

Install dependencies πŸ“¦

pip install llama-index openai
Two packages and you’re already building a document AI system. Five years ago this needed a research lab.

🧱 Build the Index

Create app.py.
Load documents and build a vector index.
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader documents = SimpleDirectoryReader("./data").load_data() index = VectorStoreIndex.from_documents(documents) query_engine = index.as_query_engine()
That’s it. Your documents are now searchable by meaning instead of keywords.

πŸ”Ž Ask Questions

Now query the data.
response = query_engine.query("Summarize the contract terms") print(response)
The system retrieves the relevant chunks and feeds them to the model. The answer comes directly from your documents.
No guessing. No Wikipedia energy.

🧠 What happens behind the scenes

  1. Documents are split into chunks
  1. Chunks are converted into embeddings
  1. Relevant chunks are retrieved
  1. The LLM answers using those chunks
This is the core idea of Retrieval-Augmented Generation.

πŸŽ› Control embeddings

You can control which embedding model is used.
from llama_index.embeddings.openai import OpenAIEmbedding from llama_index.core import Settings Settings.embed_model = OpenAIEmbedding( model="text-embedding-3-small" )
Good embeddings = better retrieval.
Bad embeddings = your AI suddenly forgetting where things are.

πŸ’Ύ Persist the index

Rebuilding an index every run is annoying. Save it once.
index.storage_context.persist( persist_dir="./storage" )
Reload later.
from llama_index.core import StorageContext, load_index_from_storage storage_context = StorageContext.from_defaults( persist_dir="./storage" ) index = load_index_from_storage(storage_context)
Now your RAG system boots instantly.

βœ… Where this is useful

Document AI shows up everywhere:
  • internal knowledge assistants
  • contract review
  • research search tools
  • support documentation bots
Clean chunking and good embeddings matter more than complicated pipelines.
Everything else is just plumbing.
Β 

Β 
Β 
Β