When you ask a well-tuned RAG system to synthesize five research papers, five competitor analyses, or five quarters of internal reports, something quietly expensive happens: the model starts from zero. It scans chunks. It reconstructs relationships. It re-reasons through context it already processed last week.
You pay for that compute again. And again. And again.
This isn't a retrieval problem. It's a memory problem. And the industry has been mistaking one for the other.
The Thing RAG Was Never Designed to Do
Traditional RAG was built for lookup, not accumulation. It treats every document as immutable reference material, synthesizes on the fly at query time, and forgets everything the moment the session ends. For static documentation , "what does clause 4.2 say?", "Where can I find our company's employee handbook?" , it's exactly the right tool.
But organizations don't run on lookups. They run on compounding knowledge. Due diligence that builds over months. Competitive landscapes that shift weekly. Research pipelines where paper 47 changes the interpretation of papers 1 through 46. For that kind of work, RAG is a just-in-time compiler that throws away the binary every time it runs.
The brutal accounting: you're paying token costs to rediscover connections your system already made three queries ago.
The Pattern: LLM Wiki Architecture
The LLM Wiki pattern doesn't replace RAG. It reframes what knowledge infrastructure actually means.
Instead of retrieving raw chunks at query time, you use the LLM to incrementally compile source material into a persistent, interlinked knowledge base — a git-tracked directory of structured markdown. The LLM maintains entity pages, updates topic summaries, flags contradictions, and cross-references concepts as new sources arrive. At query time, the model reads the compiled wiki. Not raw documents. Not chunks. The already-organized output of prior reasoning.

The mental model shift: stop treating LLMs as librarians asked to quote passages. Start treating them as analysts asked to maintain institutional memory.
This isn't a product. It's an architectural pattern, and it's been quietly working for teams doing serious research work while everyone else was optimizing their chunk size.
Where This Pays Off — and Where It Doesn't
Strong fit:
Long-horizon research: months-long due diligence, literature reviews, competitive tracking
Evolving domains where new data revises old conclusions — not just adds to them
Synthesis-heavy workflows connecting dozens of sources into coherent, durable positions
Poor fit:
High-velocity, low-latency queries: customer support bots, real-time dashboards
Strict source attribution requirements — wiki content is synthesized, not verbatim quoted
Very large-scale corpora (10,000+ documents) without additional indexing infrastructure on top
The Implementation: Three Layers, No Vector DB Required
The architecture is deliberately lean. Python, LangChain, a git-tracked markdown directory. At moderate scale — hundreds of sources — no vector database is needed.
from pathlib import Path
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
import json
from datetime import datetime
class LLMWiki:
def __init__(self, base_path: str, model: str = "gpt-4o-mini"):
self.base = Path(base_path)
self.raw_dir = self.base / "raw"
self.wiki_dir = self.base / "wiki"
self.schema_file = self.base / "SCHEMA.md"
self.raw_dir.mkdir(parents=True, exist_ok=True)
self.wiki_dir.mkdir(parents=True, exist_ok=True)
self.llm = ChatOpenAI(model=model)
def _read_schema(self) -> str:
if self.schema_file.exists():
return self.schema_file.read_text()
return "# Default Schema\nMaintain entity pages in /entities/, concepts in /concepts/, sources in /sources/."
def _update_index(self):
pages = list(self.wiki_dir.rglob("*.md"))
index_content = ["# Wiki Index\n\n"]
for page in sorted(pages):
rel_path = page.relative_to(self.wiki_dir)
first_line = page.read_text().split('\n')[0] if page.read_text() else "No title"
index_content.append(f"- [[{rel_path.stem}]]: {first_line.replace('# ', '')}")
(self.wiki_dir / "index.md").write_text("\n".join(index_content))
The schema file is your system prompt for the wiki maintainer — page types, linking conventions, update rules. Treat it like an architectural decision record: it governs how the LLM files knowledge, so it compounds correctly over time.
Layer 1: Ingest — The Compilation Step
This is where the economics change. Instead of embedding chunks, ingest compiles each new source into the existing knowledge graph — touching entity pages, concept pages, source summaries, and the contradiction log in a single pass.
def ingest(self, source_path: str, source_content: str):
schema = self._read_schema()
source_name = Path(source_path).stem
analyze_prompt = ChatPromptTemplate.from_messages([
("system", f"""You are a wiki maintainer. Follow this schema: {schema}
Analyze the source and produce:
1. A summary page (markdown)
2. A list of entities mentioned (people, companies, tools) with their relationships
3. A list of concepts discussed
4. Any contradictions with existing wiki content
Format as JSON with keys: summary, entities, concepts, contradictions"""),
("human", "Source content:\n{content}")
])
analysis = self.llm.invoke(analyze_prompt.format(content=source_content))
try:
result = json.loads(analysis.content)
except json.JSONDecodeError:
import re
json_match = re.search(r'```json\n(.*?)\n```', analysis.content, re.DOTALL)
result = json.loads(json_match.group(1)) if json_match else {}
source_page = self.wiki_dir / "sources" / f"{source_name}.md"
source_page.parent.mkdir(exist_ok=True)
source_page.write_text(f"""# {source_name}
*Ingested: {datetime.now().isoformat()}*
## Summary
{result.get('summary', 'No summary generated')}
## Key Entities
{chr(10).join([f"- [[{e}]]" for e in result.get('entities', [])])}
## Concepts
{chr(10).join([f"- [[{c}]]" for c in result.get('concepts', [])])}
""")
for entity in result.get('entities', []):
entity_path = self.wiki_dir / "entities" / f"{entity.replace(' ', '_')}.md"
if entity_path.exists():
current = entity_path.read_text()
update_prompt = ChatPromptTemplate.from_messages([
("system", "Update this entity page with new information. Maintain chronological order. Note contradictions."),
("human", f"Current page:\n{current}\n\nNew source mentions:\n{source_content[:2000]}...")
])
updated = self.llm.invoke(update_prompt.format())
entity_path.write_text(updated.content)
else:
entity_path.parent.mkdir(exist_ok=True)
entity_path.write_text(f"""# {entity}
## Mentions
- [[{source_name}]] ({datetime.now().year})
## Description
Mentioned in source material. Details to be expanded.
""")
log_entry = f"\n## [{datetime.now().isoformat()}] ingest | {source_name}\n- Created: sources/{source_name}.md\n- Updated entities: {', '.join(result.get('entities', []))}\n"
with open(self.base / "log.md", "a") as f:
f.write(log_entry)
self._update_index()
return f"Ingested {source_name}, touched {len(result.get('entities', []))} entities"
One source typically touches 10–15 wiki pages. The upfront token cost at ingestion is real. It pays back on the third query — when you're not re-synthesizing the same relationships from raw documents for the third time.
Layer 2: Query — Reading Compiled Knowledge
The query layer navigates structured wiki pages rather than running semantic search over raw chunks. At moderate scale, the index file acts as the query router. No embeddings, no vector DB.
def query(self, question: str, output_format: str = "markdown") -> str:
index_content = (self.wiki_dir / "index.md").read_text() \
if (self.wiki_dir / "index.md").exists() else "No index"
route_prompt = ChatPromptTemplate.from_messages([
("system", "Given the wiki index, list the 3-5 most relevant page titles to answer the question. Return as comma-separated list."),
("human", f"Index:\n{index_content}\n\nQuestion: {question}")
])
relevant_pages = self.llm.invoke(route_prompt.format()).content.split(',')
relevant_pages = [p.strip().replace('[[', '').replace(']]', '') for p in relevant_pages]
context_parts = []
for page_title in relevant_pages:
for subdir in ['entities', 'concepts', 'sources']:
page_path = self.wiki_dir / subdir / f"{page_title.replace(' ', '_')}.md"
if page_path.exists():
content = page_path.read_text()
context_parts.append(f"--- {page_title} ---\n{content[:1500]}...")
break
answer_prompt = ChatPromptTemplate.from_messages([
("system", """You are answering based on a compiled knowledge wiki.
Use only the provided wiki pages.
Cite sources using [[PageName]] format.
If information is missing, say so explicitly."""),
("human", f"Wiki content:\n{chr(10).join(context_parts)}\n\nQuestion: {question}\n\nFormat: {output_format}")
])
answer = self.llm.invoke(answer_prompt.format())
return {
"answer": answer.content,
"sources_used": relevant_pages,
"suggested_wiki_page": f"Could be filed as wiki/analyses/{question.replace(' ', '_')[:30]}.md"
}
The LLM uses the index the way a senior analyst uses a well-maintained file system — it knows where things live, it drills into the right pages, it synthesizes from pre-organized context rather than raw noise.
Layer 3: Lint — Keeping the Knowledge Honest
Wikis rot from staleness. This is what kills human-managed knowledge bases — not the initial effort, but the maintenance nobody does after month two.
def lint(self):
all_pages = list(self.wiki_dir.rglob("*.md"))
page_contents = {p: p.read_text() for p in all_pages}
lint_prompt = ChatPromptTemplate.from_messages([
("system", """Analyze this wiki for:
1. Contradictions between pages (same entity, different facts)
2. Orphan pages (no [[links]] pointing to them)
3. Missing pages (links to non-existent pages)
4. Stale claims that need verification
Return structured findings."""),
("human", f"Pages: {list(page_contents.keys())}\n\nContent sample: {str(list(page_contents.items())[:3])}")
])
findings = self.llm.invoke(lint_prompt.format())
with open(self.base / "log.md", "a") as f:
f.write(f"\n## [{datetime.now().isoformat()}] lint | maintenance\n{findings.content}\n")
return findings.content
Run this weekly on active projects. The LLM surfaces when a 2024 conclusion contradicts a 2026 source — the kind of catch that only happens in human-managed systems when someone notices, which is rarely. The maintenance burden that kills wikis isn't a burden for the model. It doesn't get bored. It doesn't skip the filing.
The Decision Framework
Dimension | Traditional RAG | LLM Wiki | Hybrid |
|---|---|---|---|
Query Latency | Low | Higher | Cache wiki pages in vector DB |
Synthesis Depth | Shallow (limited context) | Deep (pre-compiled relationships) | Wiki for complex queries only |
Maintenance Cost | Zero (immutable chunks) | Active (requires linting) | Periodic re-compilation |
Source Attribution | Exact (verbatim chunks) | Synthesized (refs in YAML frontmatter) | Store source refs per page |
Scale Ceiling | 10k+ docs | ~1k pages (without search infra) | BM25/qmd for wiki search at scale |
Best For | Support docs, static FAQs | Research, evolving analysis | Enterprise knowledge bases |

Choose RAG when queries are simple lookups against stable documentation, you need legal-grade verbatim attribution, or latency is the primary constraint.
Choose LLM Wiki when you're processing 50 documents over three months for a research or diligence project, you need to track how understanding evolves — contradictions, reversals, refinements — or you're building knowledge that should compound, not just answer.
Choose hybrid when you have a large static corpus alongside active analysis work. RAG for the reference layer. Wiki for the synthesis layer. Most serious production teams will end up here.
The Real Lesson
The expensive part of knowledge work has never been the reading. It's the organizing and the cross-referencing, the filing, the updating of 15 pages when a new source arrives, the flagging of a contradiction that only becomes visible after you've read papers 1 and 47.
That's the work humans abandon. That's the work LLMs don't get bored doing.
RAG was never going to solve this. It was designed for retrieval, not accumulation. Treating it as the default architecture for knowledge-intensive work is like using a search engine to maintain an org's institutional memory and technically functional, practically insufficient.
The wiki becomes a compounding asset. RAG is a rental.
Pay the compilation cost once. Let every subsequent query be cheaper, faster, and deeper. That's a different kind of leverage than most teams are currently building toward.
Resources
Obsidian — Local markdown editor with graph view; essential for navigating wiki structure visually
qmd — Local hybrid search (BM25 + vector) for when the index-file approach hits its ceiling
LLM-WIKI by Karpathy — The original gist that made this pattern legible
Thanks for reading.
— Rakesh's Newsletter


