Open Source Models Reference
Best-effort snapshot of ~30 prominent open-source / open-weight models and their native capabilities
Includes Llama, Gemma, Qwen, DeepSeek, Phi, Mistral families and more
Vision Out (Gen): Model can GENERATE images
Tools/Function Calls: Model explicitly documents tool/function-calling (not just JSON prompting)
Embedding: Only dedicated embedding models marked ✅ (all LLMs can be repurposed from hidden states)
Model Families Overview
Llama (Meta)
Llama 3.1 text models (8B/70B/405B), Llama 3.2 vision models (11B/90B), Llama 4 Scout for fast reasoning
Gemma 2 & 3 (Google)
Gemma 2 (2B/9B/27B) compact text models, Gemma 3 adds native multimodality (text+image, video+audio)
Qwen2.5 / Qwen3 (Alibaba)
Qwen2.5 text + VL, Qwen3-Omni unified text-image-audio-video with real-time voice I/O and agent focus
DeepSeek (V3 / R1 / Janus)
V3 MoE text, R1 reasoning with "thinking" traces, Janus-Pro unified text+image understanding AND generation
Phi-4 (Microsoft)
Phi-4 compact high-quality text, Phi-4-multimodal adds image+audio input in single network
Mistral / Mixtral / Pixtral
Mistral-7B with function calling, Mixtral MoE (8x7B/8x22B), Pixtral-12B Apache-2.0 VLM
Top ~30 Open Source Models - Capabilities Matrix
Capabilities shown are for the released base/instruct models. Specific checkpoints and sizes may vary.
| # | Model | Type / Family | Vision In (OCR) | Vision Out (Gen) | Audio In | Audio Out | Chat | Tools | Reasoning |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Llama 4 Scout | Meta, text SLM | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ |
| 2 | Llama 3.2 90B Vision | Meta, VLM | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ |
| 3 | Llama 3.1 70B Instruct | Meta, chat | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ |
| 4 | Gemma 2 27B | Google, text | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ |
| 5 | Gemma 2 9B | Google, text | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ |
| 6 | Gemma 3 12B VL | Google, VLM | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ |
| 7 | Gemma 3n | Google, multimodal | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ | ✅ |
| 8 | Qwen2.5 72B Instruct | Alibaba, text | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ |
| 9 | Qwen2.5-VL 32B | Alibaba, VLM | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ |
| 10 | Qwen2-Audio 7B | Alibaba, audio | ❌ | ❌ | ✅ | ❌ | ✅ | ✅ | ✅ |
| 11 | Qwen3 235B | Alibaba, text | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ |
| 12 | Qwen3-Omni-30B | Alibaba, omni | ✅ | ❌ | ✅ | ✅ Full duplex | ✅ | ✅ | ✅ |
| 13 | DeepSeek-V3 | DeepSeek, text | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ |
| 14 | DeepSeek-R1 | DeepSeek, reasoning | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ✅ Thinking |
| 15 | DeepSeek Janus-Pro-7B | DeepSeek, vision+gen | ✅ | ✅ Generate | ❌ | ❌ | ✅ | ❌ | ✅ |
| 16 | Phi-4 | Microsoft, text | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ |
| 17 | Phi-4-multimodal | Microsoft, multimodal | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ | ✅ |
| 18 | Mistral-7B-Instruct-v0.3 | Mistral, chat | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ |
| 19 | Mixtral-8x7B-Instruct | Mistral, MoE | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ |
| 20 | Mixtral-8x22B-Instruct | Mistral, MoE | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ |
| 21 | Pixtral-12B | Mistral, VLM | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ |
| 22 | Falcon 2 11B | TII, text | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ✅ |
| 23 | Falcon 2 11B VLM | TII, VLM | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ | ✅ |
| 24 | Yi-1.5-34B-Chat | 01.AI, text | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ✅ |
| 25 | Yi-VL-34B | 01.AI, VLM | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ | ✅ |
| 26 | GLM-4-9B-Chat | Zhipu, text | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ |
| 27 | GLM-4.1V-9B-Thinking | Zhipu, VLM | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ Thinking |
| 28 | Nemotron-4-340B-Instruct | NVIDIA, text | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ✅ |
| 29 | GPT-OSS-120B | OpenAI, text | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ |
| 30 | Jamba-1.5-Large | AI21, text | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ✅ |
How to Read This / Practical Use
🎨 Best OSS Multimodal (Vision)
Models with Vision In (OCR) capabilities:
- Llama 3.2 90B Vision
- Gemma 3 12B VL, Gemma 3n
- Qwen2.5-VL 32B, Qwen3-Omni-30B
- DeepSeek Janus-Pro-7B (also generates!)
- Phi-4-multimodal
- Pixtral-12B (Apache 2.0)
🔧 Best for Agents / Tools
Models with documented function/tool calling:
- Llama 3.1/4 Scout
- Gemma 2/3
- Qwen2.5/Qwen3/Qwen3-Omni
- GLM-4
- Mistral/Mixtral
- Phi-4
- GPT-OSS-120B
🎤 Audio In/Out
Realistically for fully-open choice:
- Qwen3-Omni-30B - Full duplex voice I/O
- Qwen2-Audio 7B - Speech in, text out
- Phi-4-multimodal - Audio input focus
- Gemma 3n - Multimodal with audio+video
🧠 Reasoning Models
Models with explicit reasoning capabilities:
- DeepSeek-R1 - "Thinking" traces
- GLM-4.1V-9B-Thinking - VLM with reasoning
- Llama 4 Scout - Fast long-context reasoning
- GPT-OSS-120B - High reasoning, tool-use
Recommended Open Source Models for George AI
💬 Chat Assistants
General purpose chat
Qwen2.5 72B- Excellent qualityLlama 3.1 70B- Meta flagshipGemma 2 27B- Compact, efficient
👁️ Vision In (OCR)
Reading/analyzing images
Qwen2.5-VL 32B- Best qualityLlama 3.2 90B Vision- Meta VLMPixtral-12B- Apache 2.0
🎨 Vision Out (Gen)
Generating images
Janus-Pro-7B- Only OSS unified model- Most OSS models don't generate images
✨ Function Calling
Structured data extraction
Qwen2.5/Qwen3- Best tools supportLlama 3.1/4- Meta toolsGemma 2/3- Compact + tools
💡 Tip: Most open source models focus on Vision In (understanding) rather than Vision Out (generation). For image generation, consider using OpenAI's dall-e-3 or gpt-image-1, or the specialized OSS model DeepSeek Janus-Pro-7B.
Other Notable Models
George AI's model classifier automatically detects 100+ open source models beyond this reference. Below are commonly used models not in the top 30 above.
🔍 Embedding Models (OSS)
Note: The top 30 table focuses on chat/vision models. Here are popular open source embedding models for semantic search:
| Model | Provider | Typical Size | Notes |
|---|---|---|---|
nomic-embed-text | Nomic AI | 137M params | Very popular for Ollama, efficient |
mxbai-embed-large | MixedBread AI | 335M params | High quality, Apache 2.0 |
bge-large | BAAI (Chinese) | 335M params | Excellent quality, widely used |
all-minilm-l6-v2 | Sentence Transformers | 22M params | Very fast, good for large-scale |
snowflake-arctic-embed | Snowflake | 335M params | Optimized for retrieval |
granite-embedding | IBM | 278M params | Enterprise-focused |
nomic-embed-text (fastest, good quality) or bge-large (best quality). Avoid tiny models (<100M params) for production.
🌟 Other Popular Models
Additional models commonly used in the open source community:
👁️ Vision Models
- LLaVA (7B-34B)
First major open VLM, very popular in community. Good quality but now surpassed by newer models like Llama 3.2 Vision. - MiniCPM-V (8B)
Compact multimodal model from OpenBMB. Efficient for edge deployment. - InternVL (multiple sizes)
Shanghai AI Lab's vision-language model. Strong performance on VQA tasks. - CogVLM (17B)
Tsinghua's cognitive VLM with strong reasoning capabilities. - Moondream (1.6B)
Tiny vision model for resource-constrained environments. Surprisingly capable for its size.
💬 Chat Models
- CodeLlama (7B-70B)
Meta's code-specialized Llama variant. Excellent for code generation and understanding. - TinyLlama (1.1B)
Compact model trained on 3T tokens. Good for edge devices and testing. - Vicuna (7B-33B)
UC Berkeley's chat model fine-tuned from Llama. Early popular model, now less common. - Alpaca (7B-65B)
Stanford's instruction-following model. Historical significance but now superseded. - Dolphin (multiple sizes)
Fine-tuned models with reduced censorship. Popular in community but use with caution.