AI Models & Providers
Manage AI models from multiple providers and configure them for embeddings, chat, and document processing
Overview
George AI supports multiple AI providers simultaneously, giving you flexibility to choose the best models for your use case.
All providers are optional - you can run George AI with Ollama only, OpenAI only, both, or neither (app runs but AI features are disabled until configured).
Supported Providers
| Provider | Status | Capabilities | Best For |
|---|---|---|---|
| Ollama Local models | Stable | Chat, Embedding, Vision | Privacy, offline use, self-hosted |
| OpenAI API-Key models | Stable | Chat, Embedding, Vision, Function Calling | Performance, reliability, latest models |
| Anthropic (Claude) Claude 3.5 Sonnet/Haiku | Planned | Chat, Vision, Function Calling | Long context (200K), reasoning |
| Google AI (Gemini) Gemini 2.0 Flash, 1.5 Pro | Planned | Chat, Embedding, Vision, Audio, Video | Multimodal, long context (2M), cost-effective |
| Hugging Face Open model hub | Planned | Chat, Embedding, Vision, Specialized | Open models, experimentation, custom fine-tuning |
| Azure OpenAI Enterprise cloud | Planned | Chat, Embedding, Vision, Function Calling | Enterprise compliance, regional data residency |
Provider Support Roadmap
George AI is expanding support for multiple AI providers to give you flexibility, cost optimization, and access to the best models for your use case.
Tier 1: High Priority
P1-highAnthropic (Claude) #866
Claude 3.5 Sonnet, Claude 3.5 Haiku
Long context (200K tokens), excellent reasoning, enterprise adoption
Google AI (Gemini) #867
Gemini 2.0 Flash, Gemini 1.5 Pro
Multimodal leader, ultra-long context (2M tokens), cost-effective
Hugging Face #868
500,000+ open models
Open models, custom fine-tuning, domain-specific models (legal, medical, code)
Azure OpenAI #869
GPT-4o, GPT-4 Turbo, GPT-3.5
Enterprise compliance (HIPAA, SOC2), regional data residency, Microsoft ecosystem
Tier 2: Medium Priority
P2-mediumTier 3: Lower Priority
P3-lowWant to Contribute?
Provider implementation is a great way to contribute to George AI! Check the linked GitHub issues for implementation details, or suggest a new provider we should add.
Managing AI Models
Navigate to AI Models
Go to Admin Panel → AI Models
URL: /admin/ai-models
Sync Models from Providers
Click the "Sync Models" button to discover available models
This will:
- Connect to all configured providers (Ollama, OpenAI)
- Discover available models
- Auto-detect capabilities (embedding, chat, vision, function calling)
- Update existing models (if already synced)
Enable/Disable Models
Toggle models on/off to control which ones appear in selection dropdowns
View Statistics
See usage statistics for each model:
- Total requests
- Total tokens processed
- Used by (Libraries, Assistants, List Fields)
Workspace-Scoped Model Availability
Model dropdowns automatically filter based on your current workspace's configured AI providers.
This means users only see models from providers that are enabled in their current workspace, ensuring proper access control and preventing confusion.
How It Works
When you select a model for libraries, assistants, or list fields, George AI automatically shows only models from providers configured in your current workspace. Switching workspaces updates available models in real-time.
Benefits
- Zero configuration - filtering happens automatically
- Users can't accidentally select unavailable models
- Each workspace can use different AI providers
- Real-time updates when switching workspaces
- Applies to all model selections (embedding, chat, OCR)
Example
Workspace A: Configured with OpenAI
Users see: gpt-4o, gpt-4o-mini, text-embedding-3-small, etc.
Workspace B: Configured with Ollama
Users see: qwen3, gemma3, nomic-embed-text, etc.
Checking Available Models
Switch to your target workspace
Use the workspace switcher in the top navigation
Open any model selection dropdown
Library embedding models, assistant language models, etc.
View workspace-filtered models
Only models from workspace providers are shown
For detailed information about workspaces and provider configuration, see the Workspace Documentation.
Understanding Model Capabilities
Each model is automatically tagged with capabilities based on its name and provider information:
Chat Completion
For Assistants - conversational AI, question answering
Examples: gpt-4o, qwen3, gemma3
Embeddings
For Libraries - semantic search, vector generation
Examples: text-embedding-3-small, nomic-embed-text
Vision/OCR
For Libraries - image processing, document OCR
Examples: qwen3-vl, gemma3, gpt-4o
Function Calling
For Lists - structured data extraction from documents
Examples: gpt-4o, qwen3, gemma3
Where to Select Models
Used for generating vector embeddings to enable semantic search
Where: Library Settings → Embedding Model dropdown
Required Capability: Embeddings
Recommended: text-embedding-3-small (OpenAI), nomic-embed-text (Ollama)
💡 Tip: Changing the embedding model requires reprocessing all documents
Used for extracting text from images and scanned PDFs via Optical Character Recognition
Where: Library Settings → Advanced → OCR Model dropdown
Required Capability: Vision
Recommended (Ollama): qwen2.5-vl (3B/7B/72B), qwen3-vl (2B/32B), gemma3 (1B/4B/12B/27B)
Recommended (OpenAI): gpt-4o-mini (cost-effective), gpt-4o, o3 (reasoning)
💡 Tip: For Ollama models, choose size (e.g., 7B, 32B) based on your GPU memory
Used for conversational AI when chatting with your Assistant
Where: Assistant Settings → Language Model dropdown
Required Capability: Chat Completion
Recommended (Ollama): qwen3 (4B/8B/14B/32B), qwen2.5 (3B/14B/32B), gemma3 (1B/4B/12B/27B)
Recommended (OpenAI): gpt-4o-mini (fast/cheap), gpt-4o, o3 (reasoning)
💡 Tip: Larger models (32B, 27B) provide better answers but need more GPU memory
Used for extracting structured data from documents (e.g., extract invoice numbers, dates, amounts)
Where: List → Field Settings → Language Model dropdown
Required Capability: Function Calling (preferred) or Chat Completion
Recommended (Ollama): qwen3 (4B/8B/14B/32B), qwen2.5 (3B/14B/32B), llama3.3 (70B), mistral (7B), deepseek-r1 (reasoning)
Recommended (OpenAI): gpt-4o-mini (fast/cheap), gpt-4o, o3 (reasoning)
💡 Tip: Use qwen3/qwen2.5 for best Ollama function calling. Models with native tool support provide better structured extraction.
Monitoring Usage & Costs
George AI automatically tracks usage for all AI models to help you understand costs and optimize performance.
Usage Tracking Details
View detailed usage per model in the AI Models page. Usage includes request count, token usage, and average processing time.
Configuring AI Providers
Required Setup
Every workspace needs at least one AI provider (Ollama or OpenAI) configured to use AI features like embeddings, chat, and enrichments.
How to configure providers for your workspace:
- Go to Settings → AI Services in your workspace
- Click "Add Provider"
- Configure the provider details (see below)
- Click "Save" and then "Sync Models" to discover available models
Ollama Configuration
Provider: Select "Ollama"
Name: Descriptive name (e.g., "GPU Server 1")
Base URL: API endpoint (e.g., http://ollama:11434)
API Key: Optional (leave empty if not required)
VRAM: GPU memory in GB (e.g., 32) - used for load balancing
OpenAI Configuration
Provider: Select "OpenAI"
Name: Descriptive name (e.g., "OpenAI Production")
Base URL: Optional (defaults to https://api.openai.com/v1)
API Key: Required (your OpenAI API key)
Pro Tip: Multiple Instances
You can add multiple provider instances to the same workspace for load balancing and high availability. See details below.
For high-load scenarios, you can add multiple instances of the same provider type to distribute AI processing across multiple servers.
Benefits of Multiple Instances:
- Load Distribution: Spread AI requests across multiple GPU servers
- High Availability: Automatic failover if one server goes offline
- Scalability: Add more capacity as your workload grows
How to Add Multiple Instances:
- Click "Add Provider" for each additional Ollama server you have
- Give each instance a unique name (e.g., "GPU Server 1", "GPU Server 2")
- Configure the base URL pointing to each server
- Set the VRAM value for each instance (used for intelligent routing)
- After adding all instances, click "Sync Models" to discover available models
⚙️ Automatic Load Balancing (Ollama Only)
When multiple Ollama instances are configured, George AI automatically:
- Routes requests to the instance with the most available GPU memory
- Checks which instances have the requested model loaded
- Performs automatic failover if an instance goes offline
- Deduplicates models (same model on multiple instances = one entry in dropdowns)
- Monitors instance health and load in real-time
💡 Note: Load balancing is Ollama-specific. For OpenAI, you typically only need one provider instance per workspace.
Troubleshooting
| Issue | Possible Cause | Solution |
|---|---|---|
| No models appear after sync | No providers configured | Configure at least one provider (Ollama or OpenAI) and restart the application. Then sync models. |
| Sync fails for OpenAI | Invalid API key or network issue | Check Settings → AI Services and verify the OpenAI provider's API Key is correct. Check logs for error details. |
| Sync fails for Ollama | Ollama not running or incorrect URL | Check Settings → AI Services and verify the Ollama provider's Base URL is correct and accessible. Ensure Ollama service is running. |
| Model not available in dropdown | Model is disabled or doesn't have required capability | Go to Admin → AI Models and enable the model. Verify it has the required capability (e.g., embeddings, chat). |
| OCR not working on images | No vision-capable model selected | Select a vision model in Library Settings → OCR Model (e.g., gpt-4o, llama3.2-vision) |
| High OpenAI costs | Using expensive models for all operations | Use smaller models for embeddings (text-embedding-3-small) and consider Ollama for high-volume operations. |
Best Practices
Mix Providers for Cost Optimization
Use Ollama for high-volume operations (embeddings) and OpenAI for quality-critical tasks (chat, OCR)
Sync Models Regularly
When you add new models to Ollama or OpenAI releases new models, re-sync to make them available
Don't Disable Models in Active Use
Check usage statistics before disabling. If a model is used by Libraries/Assistants/Lists, they'll need reconfiguration.