Local LLM

Planted 02025-07-08

macOS, Ollama makes it easy to run LLMs locally. As of writing, small LLMs capable of...

macOS, Ollama makes it easy to run LLMs locally.

As of writing, small LLMs capable of running on 16GB of ram have limited coding ability - when compared to new models from Anthropic they are not even worth it. However, as a form of text compression, small LLMs are very useful to have - and have the added benefit of working offline.

Download Ollama

brew install ollama

Start Ollama server

ollama serve

Download a model

ollama pull llama3.2:3b

Run a model in the command line

ollama run llama3.2:3b

Opencode

For actual use, you will want to change the prompt default limits of Ollama by using a Modelfile.

Example modelfile (this can be named anything)

FROM llama3.2:3b

# Increase context window for larger codebases (32K tokens)
PARAMETER num_ctx 32768

# Lower temperature for more focused, deterministic generation
PARAMETER temperature 0.1

# Reduce repetition
PARAMETER repeat_penalty 1.1

# Look back further to avoid repetitive patterns
PARAMETER repeat_last_n 128

# More conservative sampling
PARAMETER top_p 0.9
PARAMETER top_k 40

# System message
SYSTEM """Insert your system prompt here"""

ollama create <new_model_name> -f <path_to_model_file>

Connect a model to opencode

{
	"$schema": "https://opencode.ai/config.json",
	"provider": {
		"ollama": {
			"npm": "@ai-sdk/openai-compatible",
			"options": {
				"baseURL": "http://localhost:11434/v1"
			},
			"models": {
				"llama3.2": {}
			}
		}
	}
}

Making the most out of small models

Your prompt > LLM prompt optimization > LLM search engine query results > LLM summarization / tool calls.

See How to fix your context.

Prompts

opencode prompts
fabric

Strategies

fabric
DSPy

Claude models have “think” directives”.