LEANN: The World’s Most Lightweight Semantic Search Backend for RAG Everything 🎉

leann:-the-world’s-most-lightweight-semantic-search-backend-for-rag-everything-

Introducing our team’s latest creation – a revolutionary approach to local RAG applications

TL;DR: We built LEANN, the world’s most “lightweight” semantic search backend that achieves 97% storage savings compared to traditional solutions while maintaining high accuracy and performance. Perfect for privacy-focused RAG applications on your local machine.

🚀 Quick Start

Want to try it right now? Run this single command on your MacBook:

uv pip install leann

📚 Repository & Paper

What is RAG Everything?

RAG (Retrieval-Augmented Generation) has become the first true “killer application” of the LLM era. It seamlessly integrates private data that wasn’t part of the training set into large model inference pipelines.

Privacy scenarios are absolutely the most important deployment direction – especially for your personal data and in highly sensitive domains like healthcare and finance.

RAG Everything starts from the most essential needs of personal laptops. We natively support a bunch of out-of-the-box scenarios (currently supporting macOS and Linux, Windows users need WSL):

🔍 Supported Applications

1. File System RAG

Replace Spotlight Search entirely. Spotlight not only consumes disk space but only does keyword matching. We transform it into a semantic search powerhouse.

2. Apple Mail RAG

Easily find answers to personal questions (like “How many courses should Berkeley EECS freshmen take in their first semester?”).

3. Google Browser History RAG

Track down those vague search records you suddenly forgot – the ones you only have a fuzzy impression of.

4. WeChat Chat History RAG

This is what I use most! I’ve used LEANN to summarize conversations with friends and extract research ideas + slides. We implemented a small hack to bypass WeChat’s encrypted database and extract chat records – don’t worry, everything stays local with zero leakage.

5. Claude Code Semantic Search Enhancement 🔥

One of Claude Code’s biggest pain points is that it’s always grepping and finding nothing. LEANN is one of the first open-source projects to bring true semantic search to Claude Code through an MCP server – enabling it with just one line of code.

These are just the scenarios we think have the most “potential” – we’ll continuously integrate more features based on user feedback until it becomes a personalized local Agent that remembers your LLM memory and masters all your private data.

Why LEANN? The Technical Deep Dive

The Problem with Current Vector Databases

Current mainstream vector databases excel in latency – most queries complete within 10ms-100ms even with millions of data points. In RAG’s search + generation pipeline, search time is “far below” generation time, especially with reasoning models and long chain-of-thought processes.

Latency isn’t the bottleneck in RAG – storage is.

The most important RAG deployment scenario is privacy, especially on personal computers where resources are naturally scarce. Consider this reality check:

For high recall in text RAG, you need fine chunk sizes → embedding storage becomes 3-10x the original text size → Real example: 70GB raw data → 220GB+ index storage

Our Solution: Trade Storage for Compute

LEANN makes a bold design choice: replace storage with recomputation.

Core Innovation

Key Observation: In graph-based indices, a query actually accesses very few nodes → Why store all embeddings?

Our pipeline:

  1. Build a normal vector store
  2. Delete all embeddings, keeping only the Proximity Graph to record relationships between data chunks
  3. Convert memory loading to recomputation during inference
  4. Leverage lightweight embedding models for efficient graph-based recomputation

Graph Structure Pruning

We observed significant visit skewness patterns in post-RNG graphs. Our strategy:

  • Keep high-degree nodes to ensure connectivity
  • Limit out-edges for low-degree nodes while allowing unlimited in-edges
  • Use heuristics to preserve only essential high-degree nodes

Results That Matter

97%+ reduction in index size

<2 seconds retrieval time on 3090-level hardware

90%+ Top-3 recall on real RAG benchmarks

Zero vector storage – all in 200GB+ embedding spaces

Note: Under this high compression rate, PQ, OPQ, and even state-of-the-art RaBitQ cannot guarantee high accuracy – proven in our paper.

Performance Optimizations

  • Adaptive pipeline combining coarse-grained and accurate search
  • Efficient GPU batching for better utilization
  • ZMQ communication using distances instead of embeddings
  • CPU/GPU overlapping
  • Selective caching of high-degree nodes

The Vision: RAG Everything

We’re continuously maintaining this open-source project at Berkeley SkyLab with full-stack optimization across algorithms, applications, system design, vector databases, and kernel acceleration.

Our Goals

🎯 Seamlessly connect all your private data

🧠 Build long-term local AI memory and agents

💻 Zero cloud dependency, low-cost operation

Technical Details & Future Work

If you want to dive deeper into implementation details, check our arXiv paper and repository. I can write a follow-up post covering all implementation specifics if there’s interest.

We hope LEANN inspires more vector search researchers to think about vector databases from a different angle, especially in popular RAG settings. We were fortunate to discuss our work at SIGMOD/ICML vector search workshops this year and received great recognition from the community.

Get Involved

  • Star our repository
  • 🤝 Contribute to the project
  • 🔗 Join our Berkeley SkyLab team

Ready to transform your local machine into a RAG powerhouse?

uv pip install leann

What private data would you want to RAG first? Drop a comment below! 👇

Tags

#rag #vectordatabase #semanticsearch #privacy #opensource #machinelearning #ai

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post
mcp/tools-are-not-rest-api:-here’s-a-better-design

MCP/Tools Are Not REST API: Here’s a Better Design

Related Posts