Open Source Models Reference

Best-effort snapshot of ~30 prominent open-source / open-weight models and their native capabilities

Includes Llama, Gemma, Qwen, DeepSeek, Phi, Mistral families and more

Legend & Important Notes
= Supported = Not supported
Vision In (OCR): Model can READ/analyze images (for vision understanding, OCR)
Vision Out (Gen): Model can GENERATE images
Tools/Function Calls: Model explicitly documents tool/function-calling (not just JSON prompting)
Embedding: Only dedicated embedding models marked ✅ (all LLMs can be repurposed from hidden states)

Model Families Overview

Llama (Meta)

Open Weights Tool Calling

Llama 3.1 text models (8B/70B/405B), Llama 3.2 vision models (11B/90B), Llama 4 Scout for fast reasoning

Gemma 2 & 3 (Google)

Open Weights Multimodal

Gemma 2 (2B/9B/27B) compact text models, Gemma 3 adds native multimodality (text+image, video+audio)

Qwen2.5 / Qwen3 (Alibaba)

Open Weights Omni

Qwen2.5 text + VL, Qwen3-Omni unified text-image-audio-video with real-time voice I/O and agent focus

DeepSeek (V3 / R1 / Janus)

Open Weights Reasoning

V3 MoE text, R1 reasoning with "thinking" traces, Janus-Pro unified text+image understanding AND generation

Phi-4 (Microsoft)

Open Weights Compact

Phi-4 compact high-quality text, Phi-4-multimodal adds image+audio input in single network

Mistral / Mixtral / Pixtral

Apache 2.0 MoE

Mistral-7B with function calling, Mixtral MoE (8x7B/8x22B), Pixtral-12B Apache-2.0 VLM

Top ~30 Open Source Models - Capabilities Matrix

Capabilities shown are for the released base/instruct models. Specific checkpoints and sizes may vary.

# Model Type / Family Vision In
(OCR)
Vision Out
(Gen)
Audio In Audio Out Chat Tools Reasoning
1 Llama 4 Scout Meta, text SLM
2 Llama 3.2 90B Vision Meta, VLM
3 Llama 3.1 70B Instruct Meta, chat
4 Gemma 2 27B Google, text
5 Gemma 2 9B Google, text
6 Gemma 3 12B VL Google, VLM
7 Gemma 3n Google, multimodal
8 Qwen2.5 72B Instruct Alibaba, text
9 Qwen2.5-VL 32B Alibaba, VLM
10 Qwen2-Audio 7B Alibaba, audio
11 Qwen3 235B Alibaba, text
12 Qwen3-Omni-30B Alibaba, omni ✅ Full duplex
13 DeepSeek-V3 DeepSeek, text
14 DeepSeek-R1 DeepSeek, reasoning ✅ Thinking
15 DeepSeek Janus-Pro-7B DeepSeek, vision+gen ✅ Generate
16 Phi-4 Microsoft, text
17 Phi-4-multimodal Microsoft, multimodal
18 Mistral-7B-Instruct-v0.3 Mistral, chat
19 Mixtral-8x7B-Instruct Mistral, MoE
20 Mixtral-8x22B-Instruct Mistral, MoE
21 Pixtral-12B Mistral, VLM
22 Falcon 2 11B TII, text
23 Falcon 2 11B VLM TII, VLM
24 Yi-1.5-34B-Chat 01.AI, text
25 Yi-VL-34B 01.AI, VLM
26 GLM-4-9B-Chat Zhipu, text
27 GLM-4.1V-9B-Thinking Zhipu, VLM ✅ Thinking
28 Nemotron-4-340B-Instruct NVIDIA, text
29 GPT-OSS-120B OpenAI, text
30 Jamba-1.5-Large AI21, text
Note: Only DeepSeek Janus-Pro-7B natively supports both Vision In (understanding images) AND Vision Out (generating images) in a unified model. Qwen3-Omni-30B is the standout for full duplex voice (audio in + out).

How to Read This / Practical Use

🎨 Best OSS Multimodal (Vision)

Models with Vision In (OCR) capabilities:

  • Llama 3.2 90B Vision
  • Gemma 3 12B VL, Gemma 3n
  • Qwen2.5-VL 32B, Qwen3-Omni-30B
  • DeepSeek Janus-Pro-7B (also generates!)
  • Phi-4-multimodal
  • Pixtral-12B (Apache 2.0)

🔧 Best for Agents / Tools

Models with documented function/tool calling:

  • Llama 3.1/4 Scout
  • Gemma 2/3
  • Qwen2.5/Qwen3/Qwen3-Omni
  • GLM-4
  • Mistral/Mixtral
  • Phi-4
  • GPT-OSS-120B

🎤 Audio In/Out

Realistically for fully-open choice:

  • Qwen3-Omni-30B - Full duplex voice I/O
  • Qwen2-Audio 7B - Speech in, text out
  • Phi-4-multimodal - Audio input focus
  • Gemma 3n - Multimodal with audio+video

🧠 Reasoning Models

Models with explicit reasoning capabilities:

  • DeepSeek-R1 - "Thinking" traces
  • GLM-4.1V-9B-Thinking - VLM with reasoning
  • Llama 4 Scout - Fast long-context reasoning
  • GPT-OSS-120B - High reasoning, tool-use

Recommended Open Source Models for George AI

💬 Chat Assistants

General purpose chat

  • Qwen2.5 72B - Excellent quality
  • Llama 3.1 70B - Meta flagship
  • Gemma 2 27B - Compact, efficient

👁️ Vision In (OCR)

Reading/analyzing images

  • Qwen2.5-VL 32B - Best quality
  • Llama 3.2 90B Vision - Meta VLM
  • Pixtral-12B - Apache 2.0

🎨 Vision Out (Gen)

Generating images

  • Janus-Pro-7B - Only OSS unified model
  • Most OSS models don't generate images

✨ Function Calling

Structured data extraction

  • Qwen2.5/Qwen3 - Best tools support
  • Llama 3.1/4 - Meta tools
  • Gemma 2/3 - Compact + tools

💡 Tip: Most open source models focus on Vision In (understanding) rather than Vision Out (generation). For image generation, consider using OpenAI's dall-e-3 or gpt-image-1, or the specialized OSS model DeepSeek Janus-Pro-7B.

Other Notable Models

George AI's model classifier automatically detects 100+ open source models beyond this reference. Below are commonly used models not in the top 30 above.

Auto-Detection
If you have a model in your Ollama instance that's not listed here, George AI will still attempt to classify it automatically based on naming patterns. For the complete list of supported patterns, see the model-classifier.ts source code.

🔍 Embedding Models (OSS)

Note: The top 30 table focuses on chat/vision models. Here are popular open source embedding models for semantic search:

Model Provider Typical Size Notes
nomic-embed-text Nomic AI 137M params Very popular for Ollama, efficient
mxbai-embed-large MixedBread AI 335M params High quality, Apache 2.0
bge-large BAAI (Chinese) 335M params Excellent quality, widely used
all-minilm-l6-v2 Sentence Transformers 22M params Very fast, good for large-scale
snowflake-arctic-embed Snowflake 335M params Optimized for retrieval
granite-embedding IBM 278M params Enterprise-focused
💡 Recommendation: For George AI, use nomic-embed-text (fastest, good quality) or bge-large (best quality). Avoid tiny models (<100M params) for production.

🌟 Other Popular Models

Additional models commonly used in the open source community:

👁️ Vision Models

  • LLaVA (7B-34B)
    First major open VLM, very popular in community. Good quality but now surpassed by newer models like Llama 3.2 Vision.
  • MiniCPM-V (8B)
    Compact multimodal model from OpenBMB. Efficient for edge deployment.
  • InternVL (multiple sizes)
    Shanghai AI Lab's vision-language model. Strong performance on VQA tasks.
  • CogVLM (17B)
    Tsinghua's cognitive VLM with strong reasoning capabilities.
  • Moondream (1.6B)
    Tiny vision model for resource-constrained environments. Surprisingly capable for its size.

💬 Chat Models

  • CodeLlama (7B-70B)
    Meta's code-specialized Llama variant. Excellent for code generation and understanding.
  • TinyLlama (1.1B)
    Compact model trained on 3T tokens. Good for edge devices and testing.
  • Vicuna (7B-33B)
    UC Berkeley's chat model fine-tuned from Llama. Early popular model, now less common.
  • Alpaca (7B-65B)
    Stanford's instruction-following model. Historical significance but now superseded.
  • Dolphin (multiple sizes)
    Fine-tuned models with reduced censorship. Popular in community but use with caution.
⚠️ Note: Models like Alpaca, Vicuna, and LLaVA were groundbreaking when released but are now superseded by newer models in the top 30 table. They're listed here for reference and because they're still widely deployed.
See Also
Need hosted API access? Check out the OpenAI Models Reference for 40+ models including GPT-4o, GPT-5, o-series reasoning models, embeddings, and image generation.
Back to AI Models Guide Documentation Home
George-Cloud