Local LLM
Planted 02025-07-08
macOS, Ollama makes it easy to run LLMs locally. As of writing, small LLMs capable of...
macOS, Ollama makes it easy to run LLMs locally.
As of writing, small LLMs capable of running on 16GB of ram have limited coding ability - when compared to new models from Anthropic they are not even worth it. However, as a form of text compression, small LLMs are very useful to have - and have the added benefit of working offline.
Download Ollama
brew install ollama
Start Ollama server
ollama serve
Download a model
ollama pull llama3.2:3b
Run a model in the command line
ollama run llama3.2:3b
Opencode
For actual use, you will want to change the prompt default limits of Ollama by using a Modelfile.
Example modelfile (this can be named anything)
FROM llama3.2:3b
# Increase context window for larger codebases (32K tokens)
PARAMETER num_ctx 32768
# Lower temperature for more focused, deterministic generation
PARAMETER temperature 0.1
# Reduce repetition
PARAMETER repeat_penalty 1.1
# Look back further to avoid repetitive patterns
PARAMETER repeat_last_n 128
# More conservative sampling
PARAMETER top_p 0.9
PARAMETER top_k 40
# System message
SYSTEM """Insert your system prompt here"""
ollama create <new_model_name> -f <path_to_model_file>
Connect a model to opencode
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"ollama": {
"npm": "@ai-sdk/openai-compatible",
"options": {
"baseURL": "http://localhost:11434/v1"
},
"models": {
"llama3.2": {}
}
}
}
}
Making the most out of small models
Your prompt > LLM prompt optimization > LLM search engine query results > LLM summarization / tool calls.
Prompts
Strategies
Claude models have “think” directives”.