Skip to main content

Overview

Ollama allows you to run open-source AI models locally on your own infrastructure. Perfect for privacy-sensitive applications, development, and cost optimization.

Available Models

Llama 3.3 (70B) Local

2 credits • Powerful local model
  • 128K context window
  • Excellent reasoning for self-hosted
  • Speed: Medium • Cost: Very Low (local)
  • Best for: Local powerful performance

Llama 3.1 (405B) Local

4 credits • Massive local model
  • 128K context window
  • Exceptional capabilities self-hosted
  • Speed: Slow • Cost: Very Low (local)
  • Best for: Maximum local intelligence

Llama 3.1 (70B) Local

2 credits • Balanced local model
  • 128K context window
  • Excellent for self-hosted apps
  • Speed: Medium • Cost: Very Low (local)
  • Best for: Strong local performance

Llama 3.1 (8B) Local

1 credit • Efficient local model
  • 128K context window
  • Good for basic local tasks
  • Speed: Fast • Cost: Very Low (local)
  • Best for: Light local processing

Setup

Prerequisites

1

Install Ollama

Download and install Ollama from ollama.com
# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows
# Download installer from ollama.com
2

Pull a Model

# Pull Llama 3.1 8B (recommended for testing)
ollama pull llama3.1:8b

# Or pull Llama 3.3 70B for production
ollama pull llama3.3:70b
3

Start Ollama Server

# Ollama runs on localhost:11434 by default
ollama serve

Configure BoostGPT

1

Navigate to Integrations

Go to app.boostgpt.co and select Integrations
2

Select Ollama

Find and click on the Ollama provider
3

Configure Host

Enter your Ollama host URL (default: http://localhost:11434)Select which agents will use Ollama
4

Save Configuration

Click save to apply your Ollama configuration

Hardware Requirements

ModelMin VRAMRecommended RAMCPU CoresBest Hardware
Llama 3.1 (8B)8GB16GB4+Gaming PC, M1 Mac
Llama 3.1 (70B)40GB64GB8+Workstation, A100
Llama 3.3 (70B)40GB64GB8+Workstation, A100
Llama 3.1 (405B)200GB+256GB+16+Multi-GPU server
Start with Llama 3.1 (8B) for development and testing. It runs well on consumer hardware and M-series Macs.

Best Practices

Using provider_host for Ollama

When using the Core SDK chat method, specify the Ollama host with provider_host:
// Chat with local Ollama instance
const chatResponse = await client.chat({
  bot_id: 'your-bot-id',
  provider_host: 'http://localhost:11434', // Required for Ollama
  message: 'Analyze this code for bugs'
});

// Use custom Ollama host (e.g., remote server)
const remoteResponse = await client.chat({
  bot_id: 'your-bot-id',
  provider_host: 'http://192.168.1.100:11434', // Custom host
  message: 'Hello!'
});
The provider_host parameter is required when using Ollama with the Core SDK chat method, as it tells BoostGPT where your Ollama instance is running.

Model Selection for Hardware

# For MacBook Pro M1/M2 (16GB RAM)
ollama pull llama3.1:8b

# For Workstation with RTX 4090 (24GB VRAM)
ollama pull llama3.3:70b

# For Server with A100 (80GB VRAM)
ollama pull llama3.1:405b

Production Deployment

1

Use Docker

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
2

Configure Firewall

Ensure Ollama port (11434) is accessible to your BoostGPT application
3

Monitor Resources

Watch GPU/CPU usage and scale hardware as needed
4

Set Up Load Balancing

For high volume, run multiple Ollama instances behind a load balancer

Troubleshooting

Cause: Ollama server not running or firewall blockingSolutions:
  • Run ollama serve to start server
  • Check firewall allows port 11434
  • Verify host URL in configuration
Cause: Model not pulled locallySolutions:
  • Run ollama pull <model-name>
  • Verify model name matches exactly
  • Check ollama list for available models
Cause: Insufficient hardware resourcesSolutions:
  • Use smaller model (8B instead of 70B)
  • Add more RAM/VRAM
  • Reduce max_reply_tokens
  • Close other GPU-intensive applications
Cause: Model too large for available VRAMSolutions:
  • Switch to smaller model
  • Reduce context window
  • Use CPU fallback (slower but works)
  • Upgrade hardware

Next Steps