Administration

AI Models & Providers

Manage AI models from multiple providers and configure them for embeddings, chat, and document processing

Overview

George AI supports multiple AI providers simultaneously, giving you flexibility to choose the best models for your use case.

All providers are optional - you can run George AI with Ollama only, OpenAI only, both, or neither (app runs but AI features are disabled until configured).

Supported Providers
6+
2 stable, 4 planned (Anthropic, Google, Azure, HF)
Auto-Detection
100%
Model capabilities detected automatically

Supported Providers

Provider Status Capabilities Best For
Ollama
Local models
Stable Chat, Embedding, Vision Privacy, offline use, self-hosted
OpenAI
API-Key models
Stable Chat, Embedding, Vision, Function Calling Performance, reliability, latest models
Anthropic (Claude)
Claude 3.5 Sonnet/Haiku
Planned Chat, Vision, Function Calling Long context (200K), reasoning
Google AI (Gemini)
Gemini 2.0 Flash, 1.5 Pro
Planned Chat, Embedding, Vision, Audio, Video Multimodal, long context (2M), cost-effective
Hugging Face
Open model hub
Planned Chat, Embedding, Vision, Specialized Open models, experimentation, custom fine-tuning
Azure OpenAI
Enterprise cloud
Planned Chat, Embedding, Vision, Function Calling Enterprise compliance, regional data residency
OpenAI Models Reference
40+ OpenAI models with detailed capabilities
View Details
Open Source Models Reference
30+ OSS models (Llama, Gemma, Qwen, DeepSeek, Phi, Mistral)
View Details

Provider Support Roadmap

George AI is expanding support for multiple AI providers to give you flexibility, cost optimization, and access to the best models for your use case.

Tier 1: High Priority

P1-high

Anthropic (Claude) #866

Claude 3.5 Sonnet, Claude 3.5 Haiku

Chat Vision Function Calling

Long context (200K tokens), excellent reasoning, enterprise adoption

Google AI (Gemini) #867

Gemini 2.0 Flash, Gemini 1.5 Pro

Chat Embedding Vision Audio Video

Multimodal leader, ultra-long context (2M tokens), cost-effective

Hugging Face #868

500,000+ open models

Chat Embedding Vision Specialized

Open models, custom fine-tuning, domain-specific models (legal, medical, code)

Azure OpenAI #869

GPT-4o, GPT-4 Turbo, GPT-3.5

Chat Embedding Vision Function Calling

Enterprise compliance (HIPAA, SOC2), regional data residency, Microsoft ecosystem

Tier 2: Medium Priority

P2-medium

Mistral AI #870

Mistral Large 2, Mistral Small

Chat Embedding Function Calling

European provider, GDPR-compliant, high-performance open models

Cohere #871

Command R+, Embed v3

Chat Embedding Reranking

RAG specialists, multilingual (100+ languages), unique reranking capabilities

Tier 3: Lower Priority

P3-low

AWS Bedrock #872

Multi-provider platform

Chat Embedding

AWS-native, unified access to Anthropic, Meta, Cohere

Groq #873

Ultra-fast inference

Chat Speed

500+ tokens/sec, real-time applications, low latency

Together.ai #874

100+ open models

Chat Embedding Fine-tuning

Custom model deployment, bleeding edge releases

Want to Contribute?

Provider implementation is a great way to contribute to George AI! Check the linked GitHub issues for implementation details, or suggest a new provider we should add.

Managing AI Models

1

Navigate to AI Models

Go to Admin Panel → AI Models

URL: /admin/ai-models

2

Sync Models from Providers

Click the "Sync Models" button to discover available models

This will:

  • Connect to all configured providers (Ollama, OpenAI)
  • Discover available models
  • Auto-detect capabilities (embedding, chat, vision, function calling)
  • Update existing models (if already synced)
3

Enable/Disable Models

Toggle models on/off to control which ones appear in selection dropdowns

Disabled models are hidden from users but remain in the database for historical usage tracking
4

View Statistics

See usage statistics for each model:

  • Total requests
  • Total tokens processed
  • Used by (Libraries, Assistants, List Fields)

Workspace-Scoped Model Availability

Model dropdowns automatically filter based on your current workspace's configured AI providers.

This means users only see models from providers that are enabled in their current workspace, ensuring proper access control and preventing confusion.

How It Works

When you select a model for libraries, assistants, or list fields, George AI automatically shows only models from providers configured in your current workspace. Switching workspaces updates available models in real-time.

Benefits

  • Zero configuration - filtering happens automatically
  • Users can't accidentally select unavailable models
  • Each workspace can use different AI providers
  • Real-time updates when switching workspaces
  • Applies to all model selections (embedding, chat, OCR)

Example

Workspace A: Configured with OpenAI

Users see: gpt-4o, gpt-4o-mini, text-embedding-3-small, etc.

Workspace B: Configured with Ollama

Users see: qwen3, gemma3, nomic-embed-text, etc.

Checking Available Models

1

Switch to your target workspace

Use the workspace switcher in the top navigation

2

Open any model selection dropdown

Library embedding models, assistant language models, etc.

3

View workspace-filtered models

Only models from workspace providers are shown

Note: If no providers are configured in a workspace, model dropdowns will be empty. Admins should configure at least one provider via AI Services.

For detailed information about workspaces and provider configuration, see the Workspace Documentation.

Understanding Model Capabilities

Each model is automatically tagged with capabilities based on its name and provider information:

Chat Completion

For Assistants - conversational AI, question answering

Examples: gpt-4o, qwen3, gemma3

Embeddings

For Libraries - semantic search, vector generation

Examples: text-embedding-3-small, nomic-embed-text

Vision/OCR

For Libraries - image processing, document OCR

Examples: qwen3-vl, gemma3, gpt-4o

Function Calling

For Lists - structured data extraction from documents

Examples: gpt-4o, qwen3, gemma3

Where to Select Models

Library Settings - Embedding Model

Used for generating vector embeddings to enable semantic search

Where: Library Settings → Embedding Model dropdown

Required Capability: Embeddings

Recommended: text-embedding-3-small (OpenAI), nomic-embed-text (Ollama)

💡 Tip: Changing the embedding model requires reprocessing all documents

Library Settings - OCR Model

Used for extracting text from images and scanned PDFs via Optical Character Recognition

Where: Library Settings → Advanced → OCR Model dropdown

Required Capability: Vision

Recommended (Ollama): qwen2.5-vl (3B/7B/72B), qwen3-vl (2B/32B), gemma3 (1B/4B/12B/27B)

Recommended (OpenAI): gpt-4o-mini (cost-effective), gpt-4o, o3 (reasoning)

💡 Tip: For Ollama models, choose size (e.g., 7B, 32B) based on your GPU memory

Assistant Settings - Chat Model

Used for conversational AI when chatting with your Assistant

Where: Assistant Settings → Language Model dropdown

Required Capability: Chat Completion

Recommended (Ollama): qwen3 (4B/8B/14B/32B), qwen2.5 (3B/14B/32B), gemma3 (1B/4B/12B/27B)

Recommended (OpenAI): gpt-4o-mini (fast/cheap), gpt-4o, o3 (reasoning)

💡 Tip: Larger models (32B, 27B) provide better answers but need more GPU memory

List Field Settings - Enrichment Model

Used for extracting structured data from documents (e.g., extract invoice numbers, dates, amounts)

Where: List → Field Settings → Language Model dropdown

Required Capability: Function Calling (preferred) or Chat Completion

Recommended (Ollama): qwen3 (4B/8B/14B/32B), qwen2.5 (3B/14B/32B), llama3.3 (70B), mistral (7B), deepseek-r1 (reasoning)

Recommended (OpenAI): gpt-4o-mini (fast/cheap), gpt-4o, o3 (reasoning)

💡 Tip: Use qwen3/qwen2.5 for best Ollama function calling. Models with native tool support provide better structured extraction.

Monitoring Usage & Costs

George AI automatically tracks usage for all AI models to help you understand costs and optimize performance.

Total Requests
5,247
Across all models this month
Total Tokens
2.4M
Input + output tokens
Estimated Cost
$12
OpenAI models only

Usage Tracking Details

View detailed usage per model in the AI Models page. Usage includes request count, token usage, and average processing time.

Configuring AI Providers

Required Setup

Every workspace needs at least one AI provider (Ollama or OpenAI) configured to use AI features like embeddings, chat, and enrichments.

How to configure providers for your workspace:

  1. Go to Settings → AI Services in your workspace
  2. Click "Add Provider"
  3. Configure the provider details (see below)
  4. Click "Save" and then "Sync Models" to discover available models

Ollama Configuration

Provider: Select "Ollama"

Name: Descriptive name (e.g., "GPU Server 1")

Base URL: API endpoint (e.g., http://ollama:11434)

API Key: Optional (leave empty if not required)

VRAM: GPU memory in GB (e.g., 32) - used for load balancing

OpenAI Configuration

Provider: Select "OpenAI"

Name: Descriptive name (e.g., "OpenAI Production")

Base URL: Optional (defaults to https://api.openai.com/v1)

API Key: Required (your OpenAI API key)

Pro Tip: Multiple Instances

You can add multiple provider instances to the same workspace for load balancing and high availability. See details below.

Advanced: Multiple Instances for Load Balancing

For high-load scenarios, you can add multiple instances of the same provider type to distribute AI processing across multiple servers.

Benefits of Multiple Instances:

  • Load Distribution: Spread AI requests across multiple GPU servers
  • High Availability: Automatic failover if one server goes offline
  • Scalability: Add more capacity as your workload grows

How to Add Multiple Instances:

  1. Click "Add Provider" for each additional Ollama server you have
  2. Give each instance a unique name (e.g., "GPU Server 1", "GPU Server 2")
  3. Configure the base URL pointing to each server
  4. Set the VRAM value for each instance (used for intelligent routing)
  5. After adding all instances, click "Sync Models" to discover available models

⚙️ Automatic Load Balancing (Ollama Only)

When multiple Ollama instances are configured, George AI automatically:

  • Routes requests to the instance with the most available GPU memory
  • Checks which instances have the requested model loaded
  • Performs automatic failover if an instance goes offline
  • Deduplicates models (same model on multiple instances = one entry in dropdowns)
  • Monitors instance health and load in real-time

💡 Note: Load balancing is Ollama-specific. For OpenAI, you typically only need one provider instance per workspace.

Troubleshooting

Issue Possible Cause Solution
No models appear after sync No providers configured Configure at least one provider (Ollama or OpenAI) and restart the application. Then sync models.
Sync fails for OpenAI Invalid API key or network issue Check Settings → AI Services and verify the OpenAI provider's API Key is correct. Check logs for error details.
Sync fails for Ollama Ollama not running or incorrect URL Check Settings → AI Services and verify the Ollama provider's Base URL is correct and accessible. Ensure Ollama service is running.
Model not available in dropdown Model is disabled or doesn't have required capability Go to Admin → AI Models and enable the model. Verify it has the required capability (e.g., embeddings, chat).
OCR not working on images No vision-capable model selected Select a vision model in Library Settings → OCR Model (e.g., gpt-4o, llama3.2-vision)
High OpenAI costs Using expensive models for all operations Use smaller models for embeddings (text-embedding-3-small) and consider Ollama for high-volume operations.

Best Practices

Mix Providers for Cost Optimization

Use Ollama for high-volume operations (embeddings) and OpenAI for quality-critical tasks (chat, OCR)

Sync Models Regularly

When you add new models to Ollama or OpenAI releases new models, re-sync to make them available

Don't Disable Models in Active Use

Check usage statistics before disabling. If a model is used by Libraries/Assistants/Lists, they'll need reconfiguration.

George-Cloud