Administration

AI Models & Providers

Manage AI models from multiple providers and configure them for embeddings, chat, and document processing

Overview

George AI supports multiple AI providers simultaneously, giving you flexibility to choose the best models for your use case.

All providers are optional - you can run George AI with Ollama only, OpenAI only, both, or neither (app runs but AI features are disabled until configured).

Supported Providers

2 stable, 4 planned (Anthropic, Google, Azure, HF)

Auto-Detection

100%

Model capabilities detected automatically

Supported Providers

Provider	Status	Capabilities	Best For
Ollama Local models	Stable	Chat, Embedding, Vision	Privacy, offline use, self-hosted
OpenAI API-Key models	Stable	Chat, Embedding, Vision, Function Calling	Performance, reliability, latest models
Anthropic (Claude) Claude 3.5 Sonnet/Haiku	Planned	Chat, Vision, Function Calling	Long context (200K), reasoning
Google AI (Gemini) Gemini 2.0 Flash, 1.5 Pro	Planned	Chat, Embedding, Vision, Audio, Video	Multimodal, long context (2M), cost-effective
Hugging Face Open model hub	Planned	Chat, Embedding, Vision, Specialized	Open models, experimentation, custom fine-tuning
Azure OpenAI Enterprise cloud	Planned	Chat, Embedding, Vision, Function Calling	Enterprise compliance, regional data residency

OpenAI Models Reference

40+ OpenAI models with detailed capabilities

View Details

Open Source Models Reference

30+ OSS models (Llama, Gemma, Qwen, DeepSeek, Phi, Mistral)

View Details

Provider Support Roadmap

George AI is expanding support for multiple AI providers to give you flexibility, cost optimization, and access to the best models for your use case.

Tier 1: High Priority

P1-high

Anthropic (Claude) #866

Claude 3.5 Sonnet, Claude 3.5 Haiku

Chat Vision Function Calling

Long context (200K tokens), excellent reasoning, enterprise adoption

Google AI (Gemini) #867

Gemini 2.0 Flash, Gemini 1.5 Pro

Chat Embedding Vision Audio Video

Multimodal leader, ultra-long context (2M tokens), cost-effective

Hugging Face #868

500,000+ open models

Chat Embedding Vision Specialized

Open models, custom fine-tuning, domain-specific models (legal, medical, code)

Azure OpenAI #869

GPT-4o, GPT-4 Turbo, GPT-3.5

Chat Embedding Vision Function Calling

Enterprise compliance (HIPAA, SOC2), regional data residency, Microsoft ecosystem

Tier 2: Medium Priority

P2-medium

Mistral AI #870

Mistral Large 2, Mistral Small

Chat Embedding Function Calling

European provider, GDPR-compliant, high-performance open models

Cohere #871

Command R+, Embed v3

Chat Embedding Reranking

RAG specialists, multilingual (100+ languages), unique reranking capabilities

Tier 3: Lower Priority

P3-low

AWS Bedrock #872

Multi-provider platform

Chat Embedding

AWS-native, unified access to Anthropic, Meta, Cohere

Groq #873

Ultra-fast inference

Chat Speed

500+ tokens/sec, real-time applications, low latency

Together.ai #874

100+ open models

Chat Embedding Fine-tuning

Custom model deployment, bleeding edge releases

Want to Contribute?

Provider implementation is a great way to contribute to George AI! Check the linked GitHub issues for implementation details, or suggest a new provider we should add.

Managing AI Models

Navigate to AI Models

Go to Admin Panel → AI Models

URL: /admin/ai-models

Sync Models from Providers

Click the "Sync Models" button to discover available models

This will:

Connect to all configured providers (Ollama, OpenAI)
Discover available models
Auto-detect capabilities (embedding, chat, vision, function calling)
Update existing models (if already synced)

Enable/Disable Models

Toggle models on/off to control which ones appear in selection dropdowns

Disabled models are hidden from users but remain in the database for historical usage tracking

View Statistics

See usage statistics for each model:

Total requests
Total tokens processed
Used by (Libraries, Assistants, List Fields)

Workspace-Scoped Model Availability

Model dropdowns automatically filter based on your current workspace's configured AI providers.

This means users only see models from providers that are enabled in their current workspace, ensuring proper access control and preventing confusion.

How It Works

When you select a model for libraries, assistants, or list fields, George AI automatically shows only models from providers configured in your current workspace. Switching workspaces updates available models in real-time.

Benefits

Zero configuration - filtering happens automatically
Users can't accidentally select unavailable models
Each workspace can use different AI providers
Real-time updates when switching workspaces
Applies to all model selections (embedding, chat, OCR)

Example

Workspace A: Configured with OpenAI

Users see: gpt-4o, gpt-4o-mini, text-embedding-3-small, etc.

Workspace B: Configured with Ollama

Users see: qwen3, gemma3, nomic-embed-text, etc.

Checking Available Models

Switch to your target workspace

Use the workspace switcher in the top navigation

Open any model selection dropdown

Library embedding models, assistant language models, etc.

View workspace-filtered models

Only models from workspace providers are shown

Note: If no providers are configured in a workspace, model dropdowns will be empty. Admins should configure at least one provider via AI Services.

For detailed information about workspaces and provider configuration, see the Workspace Documentation.

Understanding Model Capabilities

Each model is automatically tagged with capabilities based on its name and provider information:

Chat Completion

For Assistants - conversational AI, question answering

Examples: gpt-4o, qwen3, gemma3

Embeddings

For Libraries - semantic search, vector generation

Examples: text-embedding-3-small, nomic-embed-text

Vision/OCR

For Libraries - image processing, document OCR

Examples: qwen3-vl, gemma3, gpt-4o

Function Calling

For Lists - structured data extraction from documents

Examples: gpt-4o, qwen3, gemma3

Where to Select Models

Library Settings - Embedding Model

Used for generating vector embeddings to enable semantic search

Where: Library Settings → Embedding Model dropdown

Required Capability: Embeddings

Recommended: text-embedding-3-small (OpenAI), nomic-embed-text (Ollama)

💡 Tip: Changing the embedding model requires reprocessing all documents

Library Settings - OCR Model

Used for extracting text from images and scanned PDFs via Optical Character Recognition

Where: Library Settings → Advanced → OCR Model dropdown

Required Capability: Vision

Recommended (Ollama): qwen2.5-vl (3B/7B/72B), qwen3-vl (2B/32B), gemma3 (1B/4B/12B/27B)

Recommended (OpenAI): gpt-4o-mini (cost-effective), gpt-4o, o3 (reasoning)

💡 Tip: For Ollama models, choose size (e.g., 7B, 32B) based on your GPU memory

Assistant Settings - Chat Model

Used for conversational AI when chatting with your Assistant

Where: Assistant Settings → Language Model dropdown

Required Capability: Chat Completion

Recommended (Ollama): qwen3 (4B/8B/14B/32B), qwen2.5 (3B/14B/32B), gemma3 (1B/4B/12B/27B)

Recommended (OpenAI): gpt-4o-mini (fast/cheap), gpt-4o, o3 (reasoning)

💡 Tip: Larger models (32B, 27B) provide better answers but need more GPU memory

List Field Settings - Enrichment Model

Used for extracting structured data from documents (e.g., extract invoice numbers, dates, amounts)

Where: List → Field Settings → Language Model dropdown

Required Capability: Function Calling (preferred) or Chat Completion

Recommended (Ollama): qwen3 (4B/8B/14B/32B), qwen2.5 (3B/14B/32B), llama3.3 (70B), mistral (7B), deepseek-r1 (reasoning)

Recommended (OpenAI): gpt-4o-mini (fast/cheap), gpt-4o, o3 (reasoning)

💡 Tip: Use qwen3/qwen2.5 for best Ollama function calling. Models with native tool support provide better structured extraction.

Monitoring Usage & Costs

George AI automatically tracks usage for all AI models to help you understand costs and optimize performance.

Total Requests

5,247

Across all models this month

Total Tokens

2.4M

Input + output tokens

Estimated Cost

$12

OpenAI models only

Usage Tracking Details

View detailed usage per model in the AI Models page. Usage includes request count, token usage, and average processing time.

Configuring AI Providers

Required Setup

Every workspace needs at least one AI provider (Ollama or OpenAI) configured to use AI features like embeddings, chat, and enrichments.

How to configure providers for your workspace:

Go to Settings → AI Services in your workspace
Click "Add Provider"
Configure the provider details (see below)
Click "Save" and then "Sync Models" to discover available models

Ollama Configuration

Provider: Select "Ollama"

Name: Descriptive name (e.g., "GPU Server 1")

Base URL: API endpoint (e.g., http://ollama:11434)

API Key: Optional (leave empty if not required)

VRAM: GPU memory in GB (e.g., 32) - used for load balancing

OpenAI Configuration

Provider: Select "OpenAI"

Name: Descriptive name (e.g., "OpenAI Production")

Base URL: Optional (defaults to https://api.openai.com/v1)

API Key: Required (your OpenAI API key)

Pro Tip: Multiple Instances

You can add multiple provider instances to the same workspace for load balancing and high availability. See details below.

Advanced: Multiple Instances for Load Balancing

For high-load scenarios, you can add multiple instances of the same provider type to distribute AI processing across multiple servers.

Benefits of Multiple Instances:

Load Distribution: Spread AI requests across multiple GPU servers
High Availability: Automatic failover if one server goes offline
Scalability: Add more capacity as your workload grows

How to Add Multiple Instances:

Click "Add Provider" for each additional Ollama server you have
Give each instance a unique name (e.g., "GPU Server 1", "GPU Server 2")
Configure the base URL pointing to each server
Set the VRAM value for each instance (used for intelligent routing)
After adding all instances, click "Sync Models" to discover available models

⚙️ Automatic Load Balancing (Ollama Only)

When multiple Ollama instances are configured, George AI automatically:

Routes requests to the instance with the most available GPU memory
Checks which instances have the requested model loaded
Performs automatic failover if an instance goes offline
Deduplicates models (same model on multiple instances = one entry in dropdowns)
Monitors instance health and load in real-time

💡 Note: Load balancing is Ollama-specific. For OpenAI, you typically only need one provider instance per workspace.

Troubleshooting

Issue	Possible Cause	Solution
No models appear after sync	No providers configured	Configure at least one provider (Ollama or OpenAI) and restart the application. Then sync models.
Sync fails for OpenAI	Invalid API key or network issue	Check Settings → AI Services and verify the OpenAI provider's API Key is correct. Check logs for error details.
Sync fails for Ollama	Ollama not running or incorrect URL	Check Settings → AI Services and verify the Ollama provider's Base URL is correct and accessible. Ensure Ollama service is running.
Model not available in dropdown	Model is disabled or doesn't have required capability	Go to Admin → AI Models and enable the model. Verify it has the required capability (e.g., embeddings, chat).
OCR not working on images	No vision-capable model selected	Select a vision model in Library Settings → OCR Model (e.g., gpt-4o, llama3.2-vision)
High OpenAI costs	Using expensive models for all operations	Use smaller models for embeddings (text-embedding-3-small) and consider Ollama for high-volume operations.

Best Practices

Mix Providers for Cost Optimization

Use Ollama for high-volume operations (embeddings) and OpenAI for quality-critical tasks (chat, OCR)

Sync Models Regularly

When you add new models to Ollama or OpenAI releases new models, re-sync to make them available

Don't Disable Models in Active Use

Check usage statistics before disabling. If a model is used by Libraries/Assistants/Lists, they'll need reconfiguration.

AI Models & Providers

Overview

Supported Providers

Provider Support Roadmap

Tier 1: High Priority

Anthropic (Claude) #866

Google AI (Gemini) #867

Hugging Face #868

Azure OpenAI #869

Tier 2: Medium Priority

Mistral AI #870

Cohere #871

Tier 3: Lower Priority

AWS Bedrock #872

Groq #873

Together.ai #874

Want to Contribute?

Managing AI Models

Navigate to AI Models

Sync Models from Providers

Enable/Disable Models

View Statistics

Workspace-Scoped Model Availability

How It Works

Benefits

Example

Checking Available Models

Understanding Model Capabilities

Chat Completion

Embeddings

Vision/OCR

Function Calling

Where to Select Models

Monitoring Usage & Costs

Usage Tracking Details

Configuring AI Providers

Required Setup

Ollama Configuration

OpenAI Configuration

Pro Tip: Multiple Instances

Troubleshooting

Best Practices

Mix Providers for Cost Optimization

Sync Models Regularly

Don't Disable Models in Active Use

Related Topics