Administration

AI Services & Providers

Configure workspace-scoped AI providers (Ollama, OpenAI) and manage connection settings

Overview

AI providers are configured per workspace, giving you complete flexibility in how each team or project uses AI models.

Each workspace can have its own Ollama and OpenAI configurations, allowing you to mix providers based on privacy requirements, performance needs, or budget constraints.

Workspace Scoping

Provider configurations are isolated per workspace. Switching workspaces shows different available models based on each workspace's configured providers.

Accessing AI Services Admin

Navigation

Switch to target workspace

Use the workspace switcher in the top navigation

Open Settings menu → Admin → AI Services

URL: /admin/ai-services

Configure providers for this workspace

Changes only affect the current workspace

Permissions Required: You must be a Workspace Admin or Owner to configure AI providers.

Configuring AI Providers

Ollama (Local)

Self-hosted AI models for privacy and offline use

Complete data privacy
No API costs
Offline operation
GPU required for performance

OpenAI (Cloud)

Cloud-based AI models from OpenAI

Latest GPT models
High performance
No infrastructure needed
Pay per use (API costs)

Configuring Ollama Provider

Required Settings:

Name: Descriptive name (e.g., "Production Ollama")
Base URL: Ollama server address (e.g., http://ollama:11434)
API Key: Optional (if Ollama authentication enabled)
VRAM (GB): Total GPU memory available (helps with load balancing)

Connection Testing:

Click "Test Connection" to verify the configuration before saving. This will:

Check network connectivity to Ollama server
Verify API authentication (if enabled)
List available models on the server

After configuring Ollama, navigate to Admin → AI Models and click "Sync Models" to discover available models.

Configuring OpenAI Provider

Required Settings:

Name: Descriptive name (e.g., "OpenAI Production")
Base URL: API endpoint (default: https://api.openai.com/v1)
API Key: Your OpenAI API key (starts with sk-...)

Security:

API keys are encrypted in the database
Keys never exposed to frontend (shown as sk-...xy)
Can test connection without re-entering key

Connection Testing:

Click "Test Connection" to verify your API key and check available models.

Cost Monitoring: OpenAI charges per token. Monitor usage in Admin → AI Models to track costs.

Provider Performance

George AI caches workspace provider configurations for optimal performance:

Cache TTL

60s

Provider config cached for 1 minute

Auto Invalidation

✓

Cache clears on provider changes

Performance Gain

10x

Faster model queries

Advanced: Multi-Instance Ollama Load Balancing

Advanced Configuration

This section covers Ollama clustering with multiple GPU servers. Most users can skip this.

George AI can distribute AI processing across multiple Ollama servers, automatically balancing load based on each server's capabilities.

This ensures reliable, high-performance AI processing even under heavy workload by using all available GPU resources intelligently.

How It Works

Intelligent Routing

George AI monitors each Ollama server and routes requests based on:

Available GPU memory
Current load (requests in progress)
GPU processing speed
Which models are loaded on each server

Automatic Failover

If a server goes offline or becomes unresponsive, requests are automatically routed to available servers

Model-Aware Distribution

Each server can run different models. George AI only sends requests to servers that have the required model loaded

Adding an Ollama Server

Navigate to AI Services

Admin Panel → AI Services → Add Server

Enter Server Details

Name: Descriptive name (e.g., "GPU Server 1 - NVIDIA A100")

URL: Server address (e.g., "http://ollama-server-1:11434")

API Key: If authentication is enabled

Configure Capabilities

GPU Memory: Total GPU VRAM (e.g., "80GB")

Relative Speed: Performance multiplier (1.0 = baseline, 2.0 = twice as fast)

Max Concurrent: Maximum simultaneous requests (default: 4)

Test Connection

George AI will verify connectivity and detect available models

Monitoring Server Health

The AI Services dashboard shows real-time status:

Server Status

Online

Last heartbeat: 2 seconds ago

Current Load

3/4

Concurrent requests

GPU Memory

65%

52GB / 80GB used

Requests Today

1,247

Average: 95ms response

Load Balancing Strategies

Round Robin (Default)

Distributes requests evenly across all available servers

Best for: Balanced workloads with similar server capabilities

Least Loaded

Sends requests to the server with the lowest current load

Best for: Mixed workloads with varying request complexity

Weighted by Speed

Faster servers receive proportionally more requests based on their speed rating

Best for: Clusters with different GPU generations (e.g., mixing A100 and V100)

Best Practices

Start Small, Scale Up

Begin with 2-3 servers and add more as needed based on usage patterns

Keep Models Consistent

Load the same models on all servers for best distribution. Different models = fewer routing options

Monitor GPU Memory

If servers frequently hit 100% GPU memory, reduce max_concurrent or add more servers

Troubleshooting

Issue	Possible Cause	Solution
Server shows "Offline"	Network connectivity or Ollama not running	Check server URL, verify Ollama service is running
Slow processing	All servers at max capacity	Add more servers or increase max_concurrent carefully
Requests failing	Model not available on any server	Pull required model on at least one server
Uneven distribution	Server speed ratings incorrect	Adjust speed multipliers based on actual performance