Administration

AI Services & Providers

Configure workspace-scoped AI providers (Ollama, OpenAI) and manage connection settings

Overview

AI providers are configured per workspace, giving you complete flexibility in how each team or project uses AI models.

Each workspace can have its own Ollama and OpenAI configurations, allowing you to mix providers based on privacy requirements, performance needs, or budget constraints.

Workspace Scoping

Provider configurations are isolated per workspace. Switching workspaces shows different available models based on each workspace's configured providers.

Accessing AI Services Admin

Navigation

1

Switch to target workspace

Use the workspace switcher in the top navigation

2

Open Settings menu → Admin → AI Services

URL: /admin/ai-services

3

Configure providers for this workspace

Changes only affect the current workspace

Permissions Required: You must be a Workspace Admin or Owner to configure AI providers.

Configuring AI Providers

Ollama (Local)

Self-hosted AI models for privacy and offline use

  • Complete data privacy
  • No API costs
  • Offline operation
  • GPU required for performance

OpenAI (Cloud)

Cloud-based AI models from OpenAI

  • Latest GPT models
  • High performance
  • No infrastructure needed
  • Pay per use (API costs)
Configuring Ollama Provider

Required Settings:

  • Name: Descriptive name (e.g., "Production Ollama")
  • Base URL: Ollama server address (e.g., http://ollama:11434)
  • API Key: Optional (if Ollama authentication enabled)
  • VRAM (GB): Total GPU memory available (helps with load balancing)

Connection Testing:

Click "Test Connection" to verify the configuration before saving. This will:

  • Check network connectivity to Ollama server
  • Verify API authentication (if enabled)
  • List available models on the server
After configuring Ollama, navigate to Admin → AI Models and click "Sync Models" to discover available models.
Configuring OpenAI Provider

Required Settings:

  • Name: Descriptive name (e.g., "OpenAI Production")
  • Base URL: API endpoint (default: https://api.openai.com/v1)
  • API Key: Your OpenAI API key (starts with sk-...)

Security:

  • API keys are encrypted in the database
  • Keys never exposed to frontend (shown as sk-...xy)
  • Can test connection without re-entering key

Connection Testing:

Click "Test Connection" to verify your API key and check available models.

Cost Monitoring: OpenAI charges per token. Monitor usage in Admin → AI Models to track costs.

Provider Performance

George AI caches workspace provider configurations for optimal performance:

Cache TTL
60s
Provider config cached for 1 minute
Auto Invalidation
Cache clears on provider changes
Performance Gain
10x
Faster model queries

Advanced: Multi-Instance Ollama Load Balancing

Advanced Configuration

This section covers Ollama clustering with multiple GPU servers. Most users can skip this.

George AI can distribute AI processing across multiple Ollama servers, automatically balancing load based on each server's capabilities.

This ensures reliable, high-performance AI processing even under heavy workload by using all available GPU resources intelligently.

How It Works

Intelligent Routing

George AI monitors each Ollama server and routes requests based on:

  • Available GPU memory
  • Current load (requests in progress)
  • GPU processing speed
  • Which models are loaded on each server

Automatic Failover

If a server goes offline or becomes unresponsive, requests are automatically routed to available servers

Model-Aware Distribution

Each server can run different models. George AI only sends requests to servers that have the required model loaded

Adding an Ollama Server

1

Navigate to AI Services

Admin Panel → AI Services → Add Server

2

Enter Server Details

Name: Descriptive name (e.g., "GPU Server 1 - NVIDIA A100")

URL: Server address (e.g., "http://ollama-server-1:11434")

API Key: If authentication is enabled

3

Configure Capabilities

GPU Memory: Total GPU VRAM (e.g., "80GB")

Relative Speed: Performance multiplier (1.0 = baseline, 2.0 = twice as fast)

Max Concurrent: Maximum simultaneous requests (default: 4)

4

Test Connection

George AI will verify connectivity and detect available models

Monitoring Server Health

The AI Services dashboard shows real-time status:

Server Status
Online
Last heartbeat: 2 seconds ago
Current Load
3/4
Concurrent requests
GPU Memory
65%
52GB / 80GB used
Requests Today
1,247
Average: 95ms response

Load Balancing Strategies

Round Robin (Default)

Distributes requests evenly across all available servers

Best for: Balanced workloads with similar server capabilities

Least Loaded

Sends requests to the server with the lowest current load

Best for: Mixed workloads with varying request complexity

Weighted by Speed

Faster servers receive proportionally more requests based on their speed rating

Best for: Clusters with different GPU generations (e.g., mixing A100 and V100)

Best Practices

Start Small, Scale Up

Begin with 2-3 servers and add more as needed based on usage patterns

Keep Models Consistent

Load the same models on all servers for best distribution. Different models = fewer routing options

Monitor GPU Memory

If servers frequently hit 100% GPU memory, reduce max_concurrent or add more servers

Troubleshooting

Issue Possible Cause Solution
Server shows "Offline" Network connectivity or Ollama not running Check server URL, verify Ollama service is running
Slow processing All servers at max capacity Add more servers or increase max_concurrent carefully
Requests failing Model not available on any server Pull required model on at least one server
Uneven distribution Server speed ratings incorrect Adjust speed multipliers based on actual performance
George-Cloud