Software

3 minute read

LEANN: The World’s Most Lightweight Semantic Search Backend for RAG Everything 🎉

August 17, 2025

leann:-the-world’s-most-lightweight-semantic-search-backend-for-rag-everything-

Introducing our team’s latest creation – a revolutionary approach to local RAG applications

TL;DR: We built LEANN, the world’s most “lightweight” semantic search backend that achieves 97% storage savings compared to traditional solutions while maintaining high accuracy and performance. Perfect for privacy-focused RAG applications on your local machine.

🚀 Quick Start

Want to try it right now? Run this single command on your MacBook:

uv pip install leann

📚 Repository & Paper

GitHub: https://github.com/yichuan-w/LEANN ⭐ (Star us!)
Paper: Available on arXiv

What is RAG Everything?

RAG (Retrieval-Augmented Generation) has become the first true “killer application” of the LLM era. It seamlessly integrates private data that wasn’t part of the training set into large model inference pipelines.

Privacy scenarios are absolutely the most important deployment direction – especially for your personal data and in highly sensitive domains like healthcare and finance.

RAG Everything starts from the most essential needs of personal laptops. We natively support a bunch of out-of-the-box scenarios (currently supporting macOS and Linux, Windows users need WSL):

🔍 Supported Applications

1. File System RAG

Replace Spotlight Search entirely. Spotlight not only consumes disk space but only does keyword matching. We transform it into a semantic search powerhouse.

2. Apple Mail RAG

Easily find answers to personal questions (like “How many courses should Berkeley EECS freshmen take in their first semester?”).

3. Google Browser History RAG

Track down those vague search records you suddenly forgot – the ones you only have a fuzzy impression of.

4. WeChat Chat History RAG

This is what I use most! I’ve used LEANN to summarize conversations with friends and extract research ideas + slides. We implemented a small hack to bypass WeChat’s encrypted database and extract chat records – don’t worry, everything stays local with zero leakage.

5. Claude Code Semantic Search Enhancement 🔥

One of Claude Code’s biggest pain points is that it’s always grepping and finding nothing. LEANN is one of the first open-source projects to bring true semantic search to Claude Code through an MCP server – enabling it with just one line of code.

These are just the scenarios we think have the most “potential” – we’ll continuously integrate more features based on user feedback until it becomes a personalized local Agent that remembers your LLM memory and masters all your private data.

Why LEANN? The Technical Deep Dive

The Problem with Current Vector Databases

Current mainstream vector databases excel in latency – most queries complete within 10ms-100ms even with millions of data points. In RAG’s search + generation pipeline, search time is “far below” generation time, especially with reasoning models and long chain-of-thought processes.

Latency isn’t the bottleneck in RAG – storage is.

The most important RAG deployment scenario is privacy, especially on personal computers where resources are naturally scarce. Consider this reality check:

For high recall in text RAG, you need fine chunk sizes → embedding storage becomes 3-10x the original text size → Real example: 70GB raw data → 220GB+ index storage

Our Solution: Trade Storage for Compute

LEANN makes a bold design choice: replace storage with recomputation.

Core Innovation

Key Observation: In graph-based indices, a query actually accesses very few nodes → Why store all embeddings?

Our pipeline:

Build a normal vector store
Delete all embeddings, keeping only the Proximity Graph to record relationships between data chunks
Convert memory loading to recomputation during inference
Leverage lightweight embedding models for efficient graph-based recomputation

Graph Structure Pruning

We observed significant visit skewness patterns in post-RNG graphs. Our strategy:

Keep high-degree nodes to ensure connectivity
Limit out-edges for low-degree nodes while allowing unlimited in-edges
Use heuristics to preserve only essential high-degree nodes

Results That Matter

✅ 97%+ reduction in index size

✅ <2 seconds retrieval time on 3090-level hardware

✅ 90%+ Top-3 recall on real RAG benchmarks

✅ Zero vector storage – all in 200GB+ embedding spaces

Note: Under this high compression rate, PQ, OPQ, and even state-of-the-art RaBitQ cannot guarantee high accuracy – proven in our paper.

Performance Optimizations

Adaptive pipeline combining coarse-grained and accurate search
Efficient GPU batching for better utilization
ZMQ communication using distances instead of embeddings
CPU/GPU overlapping
Selective caching of high-degree nodes

The Vision: RAG Everything

We’re continuously maintaining this open-source project at Berkeley SkyLab with full-stack optimization across algorithms, applications, system design, vector databases, and kernel acceleration.

Our Goals

🎯 Seamlessly connect all your private data

🧠 Build long-term local AI memory and agents

💻 Zero cloud dependency, low-cost operation

Technical Details & Future Work

If you want to dive deeper into implementation details, check our arXiv paper and repository. I can write a follow-up post covering all implementation specifics if there’s interest.

We hope LEANN inspires more vector search researchers to think about vector databases from a different angle, especially in popular RAG settings. We were fortunate to discuss our work at SIGMOD/ICML vector search workshops this year and received great recognition from the community.

Get Involved

⭐ Star our repository
🤝 Contribute to the project
🔗 Join our Berkeley SkyLab team

Ready to transform your local machine into a RAG powerhouse?

uv pip install leann

What private data would you want to RAG first? Drop a comment below! 👇

MCP/Tools Are Not REST API: Here’s a Better Design

August 16, 2025

Software

Best Practices & HTML Style Guide: Writing Clean, Maintainable, and Effective HTML

August 17, 2025

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hand-Picked Top-Read Stories

Optimizing B2B Lead Generation with Intent Data and Personalized Outreach

I made a thing because taking notes from YouTube sucks

How Vue Reactivity Actually Works Under the Hood (A Simple Explanation With Internals)

Trending Tags

LEANN: The World’s Most Lightweight Semantic Search Backend for RAG Everything 🎉

🚀 Quick Start

📚 Repository & Paper

What is RAG Everything?

🔍 Supported Applications

1. File System RAG

2. Apple Mail RAG

3. Google Browser History RAG

4. WeChat Chat History RAG

5. Claude Code Semantic Search Enhancement 🔥

Why LEANN? The Technical Deep Dive

The Problem with Current Vector Databases

Our Solution: Trade Storage for Compute

Core Innovation

Graph Structure Pruning

Results That Matter

Performance Optimizations

The Vision: RAG Everything

Our Goals

Technical Details & Future Work

Get Involved

Tags

Leave a Reply Cancel reply

Previous Post

MCP/Tools Are Not REST API: Here’s a Better Design

Next Post

Best Practices & HTML Style Guide: Writing Clean, Maintainable, and Effective HTML

LEANN: The World’s Most Lightweight Semantic Search Backend for RAG Everything 🎉

🚀 Quick Start

📚 Repository & Paper

What is RAG Everything?

🔍 Supported Applications

1. File System RAG

2. Apple Mail RAG

3. Google Browser History RAG

4. WeChat Chat History RAG

5. Claude Code Semantic Search Enhancement 🔥

Why LEANN? The Technical Deep Dive

The Problem with Current Vector Databases

Our Solution: Trade Storage for Compute

Core Innovation

Graph Structure Pruning

Results That Matter

Performance Optimizations

The Vision: RAG Everything

Our Goals

Technical Details & Future Work

Get Involved

Tags

Leave a Reply Cancel reply

Previous Post

Next Post

Related Posts