Ollama

Overview

Ollama allows you to run open-source AI models locally on your own infrastructure. Perfect for privacy-sensitive applications, development, and cost optimization.

Available Models

Llama 3.3 (70B) Local

2 credits • Powerful local model

128K context window
Excellent reasoning for self-hosted
Speed: Medium • Cost: Very Low (local)
Best for: Local powerful performance

Llama 3.1 (405B) Local

4 credits • Massive local model

128K context window
Exceptional capabilities self-hosted
Speed: Slow • Cost: Very Low (local)
Best for: Maximum local intelligence

Llama 3.1 (70B) Local

2 credits • Balanced local model

128K context window
Excellent for self-hosted apps
Speed: Medium • Cost: Very Low (local)
Best for: Strong local performance

Llama 3.1 (8B) Local

1 credit • Efficient local model

128K context window
Good for basic local tasks
Speed: Fast • Cost: Very Low (local)
Best for: Light local processing

Setup

Prerequisites

Install Ollama

Download and install Ollama from ollama.com

# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows
# Download installer from ollama.com

Pull a Model

# Pull Llama 3.1 8B (recommended for testing)
ollama pull llama3.1:8b

# Or pull Llama 3.3 70B for production
ollama pull llama3.3:70b

Start Ollama Server

# Ollama runs on localhost:11434 by default
ollama serve

Configure BoostGPT

Dashboard Setup
Core SDK
Router SDK

Navigate to Integrations

Go to app.boostgpt.co and select Integrations

Select Ollama

Find and click on the Ollama provider

Configure Host

Enter your Ollama host URL (default: http://localhost:11434)Select which agents will use Ollama

Save Configuration

Click save to apply your Ollama configuration

bot.js

import { BoostGPT } from 'boostgpt';

const client = new BoostGPT({
  project_id: process.env.BOOSTGPT_PROJECT_ID,
  key: process.env.BOOSTGPT_API_KEY
});

// Create bot with Ollama model
const botResponse = await client.createBot({
  name: 'My Local Ollama Bot',
  model: 'llama3.1:8b', // Must match pulled model
  instruction: 'You are a helpful local assistant.',
  max_reply_tokens: 1000,
  status: 'active'
});

if (botResponse.err) {
  console.error('Error:', botResponse.err);
} else {
  console.log('Bot created:', botResponse.response);
}

// Chat with Ollama bot using provider_host
const chatResponse = await client.chat({
  bot_id: botResponse.response.id,
  provider_host: 'http://localhost:11434', // Specify Ollama host
  message: 'Hello, how are you?'
});

if (chatResponse.err) {
  console.error('Error:', chatResponse.err);
} else {
  console.log('Response:', chatResponse.response.chat.reply);
}

router.js

import { Router, WhatsAppAdapter } from '@boostgpt/router';

const router = new Router({
  apiKey: process.env.BOOSTGPT_API_KEY,
  projectId: process.env.BOOSTGPT_PROJECT_ID,
  defaultBotId: process.env.BOOSTGPT_BOT_ID, // Ollama bot
  adapters: [
    new WhatsAppAdapter({
      useLocalAuth: true
    })
  ]
});

// Router uses your local Ollama bot
router.onMessage(async (message, context) => {
  // Handle commands
  if (message.content === '/status') {
    return 'Running locally with Ollama!';
  }

  return null; // Ollama handles it
});

await router.start();

Hardware Requirements

Model	Min VRAM	Recommended RAM	CPU Cores	Best Hardware
Llama 3.1 (8B)	8GB	16GB	4+	Gaming PC, M1 Mac
Llama 3.1 (70B)	40GB	64GB	8+	Workstation, A100
Llama 3.3 (70B)	40GB	64GB	8+	Workstation, A100
Llama 3.1 (405B)	200GB+	256GB+	16+	Multi-GPU server

Start with Llama 3.1 (8B) for development and testing. It runs well on consumer hardware and M-series Macs.

Best Practices

Using provider_host for Ollama

When using the Core SDK chat method, specify the Ollama host with provider_host:

// Chat with local Ollama instance
const chatResponse = await client.chat({
  bot_id: 'your-bot-id',
  provider_host: 'http://localhost:11434', // Required for Ollama
  message: 'Analyze this code for bugs'
});

// Use custom Ollama host (e.g., remote server)
const remoteResponse = await client.chat({
  bot_id: 'your-bot-id',
  provider_host: 'http://192.168.1.100:11434', // Custom host
  message: 'Hello!'
});

The provider_host parameter is required when using Ollama with the Core SDK chat method, as it tells BoostGPT where your Ollama instance is running.

Model Selection for Hardware

# For MacBook Pro M1/M2 (16GB RAM)
ollama pull llama3.1:8b

# For Workstation with RTX 4090 (24GB VRAM)
ollama pull llama3.3:70b

# For Server with A100 (80GB VRAM)
ollama pull llama3.1:405b

Production Deployment

Use Docker

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Configure Firewall

Ensure Ollama port (11434) is accessible to your BoostGPT application

Monitor Resources

Watch GPU/CPU usage and scale hardware as needed

Set Up Load Balancing

For high volume, run multiple Ollama instances behind a load balancer

Troubleshooting

Connection refused

Cause: Ollama server not running or firewall blockingSolutions:

Run ollama serve to start server
Check firewall allows port 11434
Verify host URL in configuration

Model not found

Cause: Model not pulled locallySolutions:

Run ollama pull <model-name>
Verify model name matches exactly
Check ollama list for available models

Slow responses

Cause: Insufficient hardware resourcesSolutions:

Use smaller model (8B instead of 70B)
Add more RAM/VRAM
Reduce max_reply_tokens
Close other GPU-intensive applications

Out of memory errors

Cause: Model too large for available VRAMSolutions:

Switch to smaller model
Reduce context window
Use CPU fallback (slower but works)
Upgrade hardware

Next Steps

Model Comparison

Compare Ollama with cloud providers

Deployment Guide

Learn about production deployment

SDK Reference

Full API documentation

Provider Overview

See all providers

Getting Started

Choose Your Path

Providers & Models

Overview

Available Models

Llama 3.3 (70B) Local

Llama 3.1 (405B) Local

Llama 3.1 (70B) Local

Llama 3.1 (8B) Local

Setup

Prerequisites

Configure BoostGPT

Hardware Requirements

Best Practices

Using provider_host for Ollama

Model Selection for Hardware

Production Deployment

Troubleshooting

Next Steps

Model Comparison

Deployment Guide

SDK Reference

Provider Overview

Getting Started

Choose Your Path

Providers & Models

​Overview

​Available Models

Llama 3.3 (70B) Local

Llama 3.1 (405B) Local

Llama 3.1 (70B) Local

Llama 3.1 (8B) Local

​Setup

​Prerequisites

​Configure BoostGPT

​Hardware Requirements

​Best Practices

​Using provider_host for Ollama

​Model Selection for Hardware

​Production Deployment

​Troubleshooting

​Next Steps

Model Comparison

Deployment Guide

SDK Reference

Provider Overview

Overview

Available Models

Setup

Prerequisites

Configure BoostGPT

Hardware Requirements

Best Practices

Using provider_host for Ollama

Model Selection for Hardware

Production Deployment

Troubleshooting

Next Steps