When you ask a well-tuned RAG system to synthesize five research papers, five competitor analyses, or five quarters of internal reports, something quietly expensive happens: the model starts from zero. It scans chunks. It reconstructs relationships. It re-reasons through context it already processed last week.

You pay for that compute again. And again. And again.

This isn't a retrieval problem. It's a memory problem. And the industry has been mistaking one for the other.

The Thing RAG Was Never Designed to Do

Traditional RAG was built for lookup, not accumulation. It treats every document as immutable reference material, synthesizes on the fly at query time, and forgets everything the moment the session ends. For static documentation , "what does clause 4.2 say?", "Where can I find our company's employee handbook?" , it's exactly the right tool.

But organizations don't run on lookups. They run on compounding knowledge. Due diligence that builds over months. Competitive landscapes that shift weekly. Research pipelines where paper 47 changes the interpretation of papers 1 through 46. For that kind of work, RAG is a just-in-time compiler that throws away the binary every time it runs.

The brutal accounting: you're paying token costs to rediscover connections your system already made three queries ago.

The Pattern: LLM Wiki Architecture

The LLM Wiki pattern doesn't replace RAG. It reframes what knowledge infrastructure actually means.

Instead of retrieving raw chunks at query time, you use the LLM to incrementally compile source material into a persistent, interlinked knowledge base — a git-tracked directory of structured markdown. The LLM maintains entity pages, updates topic summaries, flags contradictions, and cross-references concepts as new sources arrive. At query time, the model reads the compiled wiki. Not raw documents. Not chunks. The already-organized output of prior reasoning.

The mental model shift: stop treating LLMs as librarians asked to quote passages. Start treating them as analysts asked to maintain institutional memory.

This isn't a product. It's an architectural pattern, and it's been quietly working for teams doing serious research work while everyone else was optimizing their chunk size.

Where This Pays Off — and Where It Doesn't

Strong fit:

  • Long-horizon research: months-long due diligence, literature reviews, competitive tracking

  • Evolving domains where new data revises old conclusions — not just adds to them

  • Synthesis-heavy workflows connecting dozens of sources into coherent, durable positions

Poor fit:

  • High-velocity, low-latency queries: customer support bots, real-time dashboards

  • Strict source attribution requirements — wiki content is synthesized, not verbatim quoted

  • Very large-scale corpora (10,000+ documents) without additional indexing infrastructure on top

The Implementation: Three Layers, No Vector DB Required

The architecture is deliberately lean. Python, LangChain, a git-tracked markdown directory. At moderate scale — hundreds of sources — no vector database is needed.

from pathlib import Path
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
import json
from datetime import datetime

class LLMWiki:
    def __init__(self, base_path: str, model: str = "gpt-4o-mini"):
        self.base = Path(base_path)
        self.raw_dir = self.base / "raw"
        self.wiki_dir = self.base / "wiki"
        self.schema_file = self.base / "SCHEMA.md"

        self.raw_dir.mkdir(parents=True, exist_ok=True)
        self.wiki_dir.mkdir(parents=True, exist_ok=True)

        self.llm = ChatOpenAI(model=model)

    def _read_schema(self) -> str:
        if self.schema_file.exists():
            return self.schema_file.read_text()
        return "# Default Schema\nMaintain entity pages in /entities/, concepts in /concepts/, sources in /sources/."

    def _update_index(self):
        pages = list(self.wiki_dir.rglob("*.md"))
        index_content = ["# Wiki Index\n\n"]
        for page in sorted(pages):
            rel_path = page.relative_to(self.wiki_dir)
            first_line = page.read_text().split('\n')[0] if page.read_text() else "No title"
            index_content.append(f"- [[{rel_path.stem}]]: {first_line.replace('# ', '')}")
        (self.wiki_dir / "index.md").write_text("\n".join(index_content))

The schema file is your system prompt for the wiki maintainer — page types, linking conventions, update rules. Treat it like an architectural decision record: it governs how the LLM files knowledge, so it compounds correctly over time.

Layer 1: Ingest — The Compilation Step

This is where the economics change. Instead of embedding chunks, ingest compiles each new source into the existing knowledge graph — touching entity pages, concept pages, source summaries, and the contradiction log in a single pass.

    def ingest(self, source_path: str, source_content: str):
        schema = self._read_schema()
        source_name = Path(source_path).stem

        analyze_prompt = ChatPromptTemplate.from_messages([
            ("system", f"""You are a wiki maintainer. Follow this schema: {schema}

Analyze the source and produce:
1. A summary page (markdown)
2. A list of entities mentioned (people, companies, tools) with their relationships
3. A list of concepts discussed
4. Any contradictions with existing wiki content
Format as JSON with keys: summary, entities, concepts, contradictions"""),
            ("human", "Source content:\n{content}")
        ])

        analysis = self.llm.invoke(analyze_prompt.format(content=source_content))

        try:
            result = json.loads(analysis.content)
        except json.JSONDecodeError:
            import re
            json_match = re.search(r'```json\n(.*?)\n```', analysis.content, re.DOTALL)
            result = json.loads(json_match.group(1)) if json_match else {}

        source_page = self.wiki_dir / "sources" / f"{source_name}.md"
        source_page.parent.mkdir(exist_ok=True)
        source_page.write_text(f"""# {source_name}

*Ingested: {datetime.now().isoformat()}*

## Summary
{result.get('summary', 'No summary generated')}

## Key Entities
{chr(10).join([f"- [[{e}]]" for e in result.get('entities', [])])}

## Concepts
{chr(10).join([f"- [[{c}]]" for c in result.get('concepts', [])])}
""")

        for entity in result.get('entities', []):
            entity_path = self.wiki_dir / "entities" / f"{entity.replace(' ', '_')}.md"
            if entity_path.exists():
                current = entity_path.read_text()
                update_prompt = ChatPromptTemplate.from_messages([
                    ("system", "Update this entity page with new information. Maintain chronological order. Note contradictions."),
                    ("human", f"Current page:\n{current}\n\nNew source mentions:\n{source_content[:2000]}...")
                ])
                updated = self.llm.invoke(update_prompt.format())
                entity_path.write_text(updated.content)
            else:
                entity_path.parent.mkdir(exist_ok=True)
                entity_path.write_text(f"""# {entity}

## Mentions
- [[{source_name}]] ({datetime.now().year})

## Description
Mentioned in source material. Details to be expanded.
""")

        log_entry = f"\n## [{datetime.now().isoformat()}] ingest | {source_name}\n- Created: sources/{source_name}.md\n- Updated entities: {', '.join(result.get('entities', []))}\n"
        with open(self.base / "log.md", "a") as f:
            f.write(log_entry)

        self._update_index()
        return f"Ingested {source_name}, touched {len(result.get('entities', []))} entities"

One source typically touches 10–15 wiki pages. The upfront token cost at ingestion is real. It pays back on the third query — when you're not re-synthesizing the same relationships from raw documents for the third time.

Layer 2: Query — Reading Compiled Knowledge

The query layer navigates structured wiki pages rather than running semantic search over raw chunks. At moderate scale, the index file acts as the query router. No embeddings, no vector DB.

    def query(self, question: str, output_format: str = "markdown") -> str:
        index_content = (self.wiki_dir / "index.md").read_text() \
            if (self.wiki_dir / "index.md").exists() else "No index"

        route_prompt = ChatPromptTemplate.from_messages([
            ("system", "Given the wiki index, list the 3-5 most relevant page titles to answer the question. Return as comma-separated list."),
            ("human", f"Index:\n{index_content}\n\nQuestion: {question}")
        ])

        relevant_pages = self.llm.invoke(route_prompt.format()).content.split(',')
        relevant_pages = [p.strip().replace('[[', '').replace(']]', '') for p in relevant_pages]

        context_parts = []
        for page_title in relevant_pages:
            for subdir in ['entities', 'concepts', 'sources']:
                page_path = self.wiki_dir / subdir / f"{page_title.replace(' ', '_')}.md"
                if page_path.exists():
                    content = page_path.read_text()
                    context_parts.append(f"--- {page_title} ---\n{content[:1500]}...")
                    break

        answer_prompt = ChatPromptTemplate.from_messages([
            ("system", """You are answering based on a compiled knowledge wiki.
Use only the provided wiki pages.
Cite sources using [[PageName]] format.
If information is missing, say so explicitly."""),
            ("human", f"Wiki content:\n{chr(10).join(context_parts)}\n\nQuestion: {question}\n\nFormat: {output_format}")
        ])

        answer = self.llm.invoke(answer_prompt.format())

        return {
            "answer": answer.content,
            "sources_used": relevant_pages,
            "suggested_wiki_page": f"Could be filed as wiki/analyses/{question.replace(' ', '_')[:30]}.md"
        }

The LLM uses the index the way a senior analyst uses a well-maintained file system — it knows where things live, it drills into the right pages, it synthesizes from pre-organized context rather than raw noise.

Layer 3: Lint — Keeping the Knowledge Honest

Wikis rot from staleness. This is what kills human-managed knowledge bases — not the initial effort, but the maintenance nobody does after month two.

    def lint(self):
        all_pages = list(self.wiki_dir.rglob("*.md"))
        page_contents = {p: p.read_text() for p in all_pages}

        lint_prompt = ChatPromptTemplate.from_messages([
            ("system", """Analyze this wiki for:
1. Contradictions between pages (same entity, different facts)
2. Orphan pages (no [[links]] pointing to them)
3. Missing pages (links to non-existent pages)
4. Stale claims that need verification

Return structured findings."""),
            ("human", f"Pages: {list(page_contents.keys())}\n\nContent sample: {str(list(page_contents.items())[:3])}")
        ])

        findings = self.llm.invoke(lint_prompt.format())

        with open(self.base / "log.md", "a") as f:
            f.write(f"\n## [{datetime.now().isoformat()}] lint | maintenance\n{findings.content}\n")

        return findings.content

Run this weekly on active projects. The LLM surfaces when a 2024 conclusion contradicts a 2026 source — the kind of catch that only happens in human-managed systems when someone notices, which is rarely. The maintenance burden that kills wikis isn't a burden for the model. It doesn't get bored. It doesn't skip the filing.

The Decision Framework

Dimension

Traditional RAG

LLM Wiki

Hybrid

Query Latency

Low

Higher

Cache wiki pages in vector DB

Synthesis Depth

Shallow (limited context)

Deep (pre-compiled relationships)

Wiki for complex queries only

Maintenance Cost

Zero (immutable chunks)

Active (requires linting)

Periodic re-compilation

Source Attribution

Exact (verbatim chunks)

Synthesized (refs in YAML frontmatter)

Store source refs per page

Scale Ceiling

10k+ docs

~1k pages (without search infra)

BM25/qmd for wiki search at scale

Best For

Support docs, static FAQs

Research, evolving analysis

Enterprise knowledge bases

Choose RAG when queries are simple lookups against stable documentation, you need legal-grade verbatim attribution, or latency is the primary constraint.

Choose LLM Wiki when you're processing 50 documents over three months for a research or diligence project, you need to track how understanding evolves — contradictions, reversals, refinements — or you're building knowledge that should compound, not just answer.

Choose hybrid when you have a large static corpus alongside active analysis work. RAG for the reference layer. Wiki for the synthesis layer. Most serious production teams will end up here.

The Real Lesson

The expensive part of knowledge work has never been the reading. It's the organizing and the cross-referencing, the filing, the updating of 15 pages when a new source arrives, the flagging of a contradiction that only becomes visible after you've read papers 1 and 47.

That's the work humans abandon. That's the work LLMs don't get bored doing.

RAG was never going to solve this. It was designed for retrieval, not accumulation. Treating it as the default architecture for knowledge-intensive work is like using a search engine to maintain an org's institutional memory and technically functional, practically insufficient.

The wiki becomes a compounding asset. RAG is a rental.

Pay the compilation cost once. Let every subsequent query be cheaper, faster, and deeper. That's a different kind of leverage than most teams are currently building toward.

Resources

  • Obsidian — Local markdown editor with graph view; essential for navigating wiki structure visually

  • qmd — Local hybrid search (BM25 + vector) for when the index-file approach hits its ceiling

  • LLM-WIKI by Karpathy — The original gist that made this pattern legible

Thanks for reading.

— Rakesh's Newsletter

Keep Reading