Software

2 minute read

How to Run LLMs Locally with Ollama — A Developer’s Guide

April 17, 2026

You don’t need an API key or a cloud subscription to use LLMs. Ollama lets you run models locally on your machine — completely free, completely private. Here’s how to set it up and start building with it.

What is Ollama?

Ollama is a tool that downloads, manages, and serves LLMs locally. It exposes an OpenAI-compatible API at localhost:11434, so any code that works with the OpenAI API works with Ollama — zero changes.

Installation

# Linux / WSL
curl -fsSL https://ollama.com/install.sh | sh

# macOS
brew install ollama

# Windows
# Download from https://ollama.com/download

Start the server:

ollama serve

Pick a Model

# Code-focused (best for dev tools)
ollama pull qwen2.5-coder:7b      # 4.7GB, good balance
ollama pull qwen2.5-coder:1.5b    # 1.0GB, fast, good enough for many tasks
ollama pull deepseek-coder-v2      # 8.9GB, top quality

# General purpose
ollama pull llama3.1:8b            # 4.7GB, Meta's latest
ollama pull mistral:7b             # 4.1GB, fast and capable

My recommendation: start with qwen2.5-coder:1.5b for speed, upgrade to 7b when you need quality.

Your First API Call

Ollama serves an OpenAI-compatible endpoint. Here’s a call with plain fetch:

const response = await fetch("http://localhost:11434/v1/chat/completions", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "qwen2.5-coder:7b",
    messages: [
      { role: "system", content: "You are a helpful assistant." },
      { role: "user", content: "Explain what a closure is in JavaScript." },
    ],
    temperature: 0,
    stream: false,
  }),
});

const data = await response.json();
console.log(data.choices[0].message.content);

That’s it. No API key, no SDK, no account.

Structured Output (JSON Mode)

The key to building real tools with LLMs is getting structured output. Tell the model to respond with JSON:

const response = await fetch("http://localhost:11434/v1/chat/completions", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "qwen2.5-coder:7b",
    messages: [
      {
        role: "system",
        content: `Respond with ONLY valid JSON matching this schema:
        { "summary": "string", "topics": ["string"], "difficulty": "beginner|intermediate|advanced" }`,
      },
      {
        role: "user",
        content: "Analyze this article topic: Building REST APIs with Express.js",
      },
    ],
    temperature: 0,
    stream: false,
  }),
});

Tip: always validate the response with Zod or a similar schema validator. Smaller models sometimes return invalid JSON.

Building a Provider Abstraction

If you want your app to work with both Ollama (local) and Claude/OpenAI (cloud), create a simple interface:

interface LlmProvider {
  chat(system: string, messages: Message[]): Promise<string>;
}

class OllamaProvider implements LlmProvider {
  constructor(private model: string) {}

  async chat(system: string, messages: Message[]): Promise<string> {
    const response = await fetch("http://localhost:11434/v1/chat/completions", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({
        model: this.model,
        messages: [{ role: "system", content: system }, ...messages],
        temperature: 0,
        stream: false,
      }),
    });
    const data = await response.json();
    return data.choices[0].message.content;
  }
}

Now your code doesn’t care where the model runs. Swap OllamaProvider for AnthropicProvider with a flag.

Performance Tips

First call is slow — the model loads into memory. Subsequent calls are fast.
Keep the server running — don’t start/stop per request.
Use smaller models for dev — 1.5b for iteration, 7b for production quality.
Set temperature: 0 for deterministic output (important for structured responses).
Add a timeout — local models on CPU can take minutes for long prompts.

When to Use Local vs Cloud

Use Case	Local (Ollama)	Cloud (Claude/GPT)
Development	Great	Expensive
Privacy-sensitive data	Required	Risky
Production quality	Good (7b+)	Best
Speed	Depends on hardware	Fast
Cost	Free	Per-token

What I Built With It

spectr-ai — an AI smart contract auditor that works with both Claude and Ollama. The --model ollama:qwen2.5-coder:1.5b flag runs everything locally, free, no API key.

Local LLMs are good enough for real developer tools. The quality gap is closing fast.

Anthropic launches Claude Design, a new product for creating quick visuals

April 17, 2026

AI - Artificial-Intelligence

Tokenmaxxing, OpenAI’s shopping spree, and the AI Anxiety Gap

April 17, 2026

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hand-Picked Top-Read Stories

Teledyne Linea HS2 8k Camera

The limits of horizontal PMM playbooks in vertical markets

Best Railway Alternatives for AI Apps in 2026

Trending Tags

How to Run LLMs Locally with Ollama — A Developer’s Guide

What is Ollama?

Installation

Pick a Model

Your First API Call

Structured Output (JSON Mode)

Building a Provider Abstraction

Performance Tips

When to Use Local vs Cloud

What I Built With It

Leave a Reply Cancel reply

Previous Post

Anthropic launches Claude Design, a new product for creating quick visuals

Next Post

Tokenmaxxing, OpenAI’s shopping spree, and the AI Anxiety Gap

How to Run LLMs Locally with Ollama — A Developer’s Guide

What is Ollama?

Installation

Pick a Model

Your First API Call

Structured Output (JSON Mode)

Building a Provider Abstraction

Performance Tips

When to Use Local vs Cloud

What I Built With It

Leave a Reply Cancel reply

Previous Post

Next Post

Related Posts