Built this because every “which LLM should I self-host on my [hardware]”
thread ends with “depends” without anyone actually doing the math.
You tell it:
- Platform (NVIDIA, AMD, Apple Silicon, Intel Arc, CPU-only)
- Available VRAM or unified memory
- Use case (chat, code, long-context, math)
- License preference (any vs permissive-only)
You get a ranked list of open-weight models that actually fit in your
memory budget with 15% safety margin, the right GGUF quantization picked
automatically, and copy-paste install commands for Ollama or llama.cpp.
The picker runs entirely in your browser — nothing sent to a server.
The site also has:
- A curated model directory with explicit license labels colour-coded
(permissive / open-weight / non-commercial) - Three install guides for Ollama, llama.cpp and LM Studio
- A glossary in plain English for newcomers
- A live trending section from Hugging Face, refreshed weekly via a
GitHub Action that commits the snapshot back to the repo (full diff
history in git)
Source code is MIT, content is CC BY 4.0. No accounts, no analytics,
no ads, no affiliate links.
Picker: https://runlocal.blog/picker
Site: https://runlocal.blog
Feedback welcome, especially on the memory estimates and the picker
scoring formula (downloads + likes + recency, weighted). If a model
you’d want is missing from the catalog, drop the name in the comments.