Overview
Ollama allows you to run open-source AI models locally on your own infrastructure. Perfect for privacy-sensitive applications, development, and cost optimization.Available Models
Llama 3.3 (70B) Local
2 credits • Powerful local model
- 128K context window
- Excellent reasoning for self-hosted
- Speed: Medium • Cost: Very Low (local)
- Best for: Local powerful performance
Llama 3.1 (405B) Local
4 credits • Massive local model
- 128K context window
- Exceptional capabilities self-hosted
- Speed: Slow • Cost: Very Low (local)
- Best for: Maximum local intelligence
Llama 3.1 (70B) Local
2 credits • Balanced local model
- 128K context window
- Excellent for self-hosted apps
- Speed: Medium • Cost: Very Low (local)
- Best for: Strong local performance
Llama 3.1 (8B) Local
1 credit • Efficient local model
- 128K context window
- Good for basic local tasks
- Speed: Fast • Cost: Very Low (local)
- Best for: Light local processing
Setup
Prerequisites
Configure BoostGPT
- Dashboard Setup
- Core SDK
- Router SDK
Navigate to Integrations
Go to app.boostgpt.co and select Integrations
Configure Host
Enter your Ollama host URL (default:
http://localhost:11434)Select which agents will use OllamaHardware Requirements
| Model | Min VRAM | Recommended RAM | CPU Cores | Best Hardware |
|---|---|---|---|---|
| Llama 3.1 (8B) | 8GB | 16GB | 4+ | Gaming PC, M1 Mac |
| Llama 3.1 (70B) | 40GB | 64GB | 8+ | Workstation, A100 |
| Llama 3.3 (70B) | 40GB | 64GB | 8+ | Workstation, A100 |
| Llama 3.1 (405B) | 200GB+ | 256GB+ | 16+ | Multi-GPU server |
Best Practices
Using provider_host for Ollama
When using the Core SDK chat method, specify the Ollama host withprovider_host:
The
provider_host parameter is required when using Ollama with the Core SDK chat method, as it tells BoostGPT where your Ollama instance is running.Model Selection for Hardware
Production Deployment
Troubleshooting
Connection refused
Connection refused
Cause: Ollama server not running or firewall blockingSolutions:
- Run
ollama serveto start server - Check firewall allows port 11434
- Verify host URL in configuration
Model not found
Model not found
Cause: Model not pulled locallySolutions:
- Run
ollama pull <model-name> - Verify model name matches exactly
- Check
ollama listfor available models
Slow responses
Slow responses
Cause: Insufficient hardware resourcesSolutions:
- Use smaller model (8B instead of 70B)
- Add more RAM/VRAM
- Reduce max_reply_tokens
- Close other GPU-intensive applications
Out of memory errors
Out of memory errors
Cause: Model too large for available VRAMSolutions:
- Switch to smaller model
- Reduce context window
- Use CPU fallback (slower but works)
- Upgrade hardware