3.1 Overview of Mainstream Large Language Models (LLMs)

A Crowded Field: Scanning Mainstream Large Language Models (LLMs)

Large Language Models (LLMs) are undeniably the absolute core engines at the pinnacle of the current artificial intelligence (AI) wave. Like intelligent “central processing units,” they drive a remarkable range of applications, from fluent smart conversations and astonishing text generation to complex logical reasoning, code writing, and even cross-modal understanding. For legal professionals, whether you directly use public chatbot tools like ChatGPT, Claude, or DeepSeek, or indirectly encounter them through specialized Legal Tech products integrating LLM capabilities, understanding the characteristics, strengths, weaknesses, technical focuses, and potential application scenarios of these dominant mainstream LLMs has become critically important.

This knowledge not only helps you choose the tools best suited to your needs but also allows you to set reasonable expectations for their capabilities and remain vigilant about potential risks (such as “hallucinations,” bias, and data privacy). This section aims to provide a “map of the contenders” in the current LLM landscape, surveying, comparing, and analyzing several prominent mainstream LLM series, considering both domestic (China-based) and international players, as well as the significant open-source forces.

1. Representative Domestic Models (China): Rooted Locally, Competing Fiercely

China’s AI field is booming with intense competition. Large Language Models (LLMs) developed by several Chinese companies now rival top global models in performance. They potentially possess inherent advantages in Chinese language processing and understanding China’s specific national conditions and legal environment.

Key Representative Models (Listed illustratively, not ranked):
- DeepSeek Series: Developed and open-sourced by the AI company DeepSeek, this model series is renowned for its outstanding performance in code generation and mathematical reasoning. DeepSeek V3 and R1 have achieved or surpassed the levels of mainstream commercial models in multiple benchmarks. Its open-source strategy, especially including the 671B parameter version, directly empowers enterprises, research institutions, and developers, making it a top choice for building custom AI systems. DeepSeek’s breakthroughs in performance and efficiency have garnered significant attention in the open-source community.
- Alibaba Tongyi Qianwen (Qwen): A general large language model developed by Alibaba’s DAMO Academy, known for its robust performance and wide influence. The Tongyi Qianwen series (including Qwen2.5, vision-language model Qwen-VL, reasoning model Qwen-QwQ, and multimodal model Qwen-QvQ) are all open-sourced, holding a significant position in both domestic and international open-source communities. Qwen models excel in tasks like text generation, image understanding, and complex reasoning, widely used in enterprise scenarios.
- Doubao: Developed by ByteDance, this large language model leads the domestic market with its core strengths in user experience and ease of use. Leveraging ByteDance’s deep expertise in content recommendation, search, and multimodal data processing, Doubao supports text generation, conversational interaction, content creation, and multimodal tasks (like image generation and understanding). Its uniqueness lies in its profound understanding and optimization for the Chinese context, particularly excelling in scenarios like social media content generation, short video scriptwriting, and personalized recommendations. Doubao offers various versions, including lightweight models (e.g., Doubao Lite) and enterprise-grade high-performance models (e.g., Doubao Pro), catering to diverse needs from individual developers to large corporations. Recently, Doubao introduced voice interaction capabilities, further enhancing its potential in intelligent assistants and customer service. According to ByteDance officials, Doubao achieved hundreds of millions of daily active users in 2024, becoming one of China’s most popular AI tools.
- Baidu Wenxin Yiyan (ERNIE Bot): Leveraging Baidu’s technological foundation in Chinese NLP (ERNIE pre-training models), knowledge graphs, and search engines, ERNIE Bot demonstrates excellence in Chinese understanding and generation as well as answering questions related to China-specific knowledge. Its rapid iteration capability facilitates wide application in education, search, and enterprise services. ERNIE Bot also supports multimodal interaction, integrating image and voice input to enhance user experience.
- iFlytek Spark Cognitive Large Model: Building on iFlytek’s technological advantages in intelligent speech and cognitive intelligence, the Spark model performs well not only in text processing but also emphasizes its cross-modal interaction capabilities (e.g., speech, image, code, math). Spark is widely used in vertical domains like education, healthcare, and intelligent customer service, holding a significant advantage in voice-driven interaction scenarios.
- Zhipu AI ChatGLM / GLM Series: Originating from technology developed at Tsinghua University’s Knowledge Engineering Group (KEG), Zhipu AI’s ChatGLM and open-source GLM models (like GLM-130B, ChatGLM2/3-6B, and the latest GLM-4) have a broad user base within China’s AI community. GLM-4 performance rivals GPT-4, supporting complex reasoning, dialogue, and content generation, widely applied in academic research and commercial settings. Its open-source strategy significantly lowers the barrier for developers.
- Moonshot AI Kimi Chat: Founded by prominent entrepreneurs in the AI field, Kimi Chat’s main selling point and technological breakthrough is its support for ultra-long context input (claiming up to 2 million Chinese characters at launch). It performs exceptionally well in long document reading, analysis, summarization, and Q&A, making it particularly suitable for scenarios requiring comprehension of entire lengthy texts.
- Other Important Players: This includes Tencent’s Hunyuan, Huawei’s Pangu, SenseTime’s Ririxin, Baichuan Intelligence’s Baichuan series (also open-source), 01.AI’s Yi series (also open-source), and others. The market is fiercely competitive with rapid technological iteration.
Common Features and Advantages of Domestic Models:
- Excellent Chinese Processing Capability: Typically optimized for the linguistic features, grammatical structures, cultural nuances, and internet slang of the Chinese language, often providing more “native” performance on Chinese tasks.
- Deep Understanding of China-Specific Knowledge: May outperform international models trained primarily on English corpora in understanding China’s legal system, judicial practices, policy environment, social culture, historical context, etc.
- Easier Compliance with Domestic Regulations: Domestic providers are generally more familiar with and better able to comply with relevant Chinese laws and regulations, such as the Cybersecurity Law, Data Security Law, Personal Information Protection Law, and the Interim Measures for the Management of Generative Artificial Intelligence Services, especially regarding cross-border data transfer restrictions, content security audits, and service 备案 (filing).
- Localized Services and Ecosystem: Offer more convenient Chinese interfaces, documentation, technical support, and ecosystem applications tailored to domestic user needs.
Potential Limitations:
- Gap with Top International Models: Despite rapid progress, there might still be a gap compared to flagship models from OpenAI, Google, Anthropic in terms of maximum model scale, cutting-edge general intelligence (especially in non-Chinese tasks, complex scientific reasoning, global knowledge coverage), and potentially original innovation (this gap is narrowing).
- Diversity of the Open-Source Ecosystem: While China has significant open-source contributions (e.g., Qwen, GLM, Baichuan, 01.AI, DeepSeek), the overall size, activity, and resource richness of its open-source community might still be developing compared to the international ecosystem often centered around models like Llama.
Relevance to Legal Scenarios:
- Potential First Choice for Chinese Legal Affairs: For law firms, legal departments, courts, arbitration institutions, etc., whose primary business is in China, deal with large volumes of Chinese legal documents, and require a deep understanding of the Chinese legal environment, using mainstream domestic LLMs is often a more natural, efficient, and potentially more compliant choice.
- Specific Advantages for Long Text Processing: Models like Kimi Chat, with their ability to handle ultra-long contexts (e.g., directly uploading and analyzing a Chinese contract, judgment, prospectus, or research report hundreds or thousands of pages long), offer significant value in the legal domain.
- Code and Logical Reasoning Capabilities: For scenarios involving formal analysis of legal statutes, smart contract code generation, or assisting complex logical reasoning, models like DeepSeek that excel in code and math might be advantageous.
- Compliance and Data Security: Choosing models and services deployed within China and compliant with domestic regulations helps mitigate data security and compliance risks.

2. Leading International Models: Spearheading the Frontier with a Global Perspective

Globally, LLMs developed by several tech giants and top AI research institutions represent the cutting edge of current technology. They continually push boundaries in general capabilities, multilingual processing, complex reasoning, and multimodal integration. Among the most prominent are the OpenAI GPT series, Anthropic Claude series, Google Gemini series, and xAI Grok series.

2.1 OpenAI GPT Series: Pioneer of the Wave and Continuous Leader

OpenAI and its GPT (Generative Pre-trained Transformer) series are key forces that ignited the global LLM boom and continue to lead technological development.

Core Model Family (Approx. chronological order & capability evolution):
- GPT-3: Released 2020, 175B parameters, stunned the world with zero-shot/few-shot learning.
- InstructGPT / GPT-3.5: Aligned via instruction fine-tuning and RLHF, formed the basis for early ChatGPT, significantly improved conversational ability.
- GPT-4: Released March 2023, flagship model, supports visual input (GPT-4V), major improvements in complex reasoning, knowledge accuracy, instruction following, and long context (32k tokens).
- GPT-4 Turbo: Released Nov 2023, optimized GPT-4, context window expanded to 128k tokens, knowledge updated to April 2023, lower cost. Correction: Original text mentioned Oct 2023 knowledge, but later models updated this.
- GPT-4o (“omni”): Released May 2024, multimodal flagship, supports real-time text, audio, vision interaction, faster, cheaper, enhanced multilingual and vision capabilities. Updated Nov 2024, max output tokens increased to 16,384.
- GPT-4.1 Series: Released April 2025, includes GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano, optimized for coding and instruction following, 1M token context window, particularly suited for complex software engineering tasks. Performance slightly below Gemini 2.5 Pro and Claude 3.7 Sonnet on some coding benchmarks.
- GPT-4.5 / GPT-5: Expected later 2025, GPT-4.5 as a transition, GPT-5 to combine traditional LLMs with reasoning models, further enhancing complex analysis capabilities.
Core Technical Features:
- Based on Transformer decoder architecture.
- Large-scale pre-training combined with instruction fine-tuning, RLHF/RLAIF alignment.
- Multimodal capabilities since GPT-4 (text, image; GPT-4o adds audio).
- GPT-4.1 optimized front-end coding, format adherence, and tool use consistency.
Advantages:
- Top-tier general intelligence, excels in reasoning, programming, creative writing, and multimodal tasks.
- User-friendly experience, ChatGPT has 400M weekly active users, 60% market share.
- Strong developer ecosystem, broad API support, integrations (e.g., Stripe).
- Rapid iteration, e.g., GPT-4o’s structured output support (JSON schema).
Limitations and Risks:
- Hallucination risk persists, especially in long context tasks; GPT-4.1 accuracy drops to 50% at 1M tokens.
- Knowledge cutoff latest is June 2024, needs external search.
- High cost, GPT-4.1 priced at $2/M input tokens,$ 8/M output tokens.
- Closed source, reliant on OpenAI services.
- Data privacy concerns, requires Enterprise version or API for security assurances.
Relevance to Legal Scenarios:
- Huge potential: Assisting legal research, document drafting, contract analysis, preliminary Q&A.
- Long text advantage: GPT-4 Turbo/4o/4.1’s 128k to 1M context windows suit long legal documents.
- Multimodal potential: GPT-4V/4o can analyze image/text evidence; GPT-4.1 achieves 72% accuracy on Video-MME.
- Core Prerequisite: Must rigorously verify output accuracy and protect client confidential information.

2.2 Anthropic Claude Series: Safety-First Expert in Long Document Processing

Founded by former OpenAI researchers, Anthropic emphasizes AI safety, ethics, and controllability. Its Claude series is known for long context processing and the Constitutional AI safety mechanism.

Core Model Family:
- Claude 1 / Claude Instant: Early versions, laid the foundation.
- Claude 2 / 2.1: Performance improvements, Claude 2.1 supported 200k token context window.
- Claude 3 Series (Released March 2024):
  - Haiku: Fastest, most affordable, for lightweight tasks.
  - Sonnet: Balances speed and performance, for enterprise applications, SWE-bench score 62.3%.
  - Opus: Flagship model, rivals GPT-4, top performer on complex tasks.
- Claude 3.7 Sonnet (Released March 2025): Latest iteration, further enhances coding and reasoning, SWE-bench score 62.3%, close to Gemini 2.5 Pro’s 63.8%.
- Common Features: All support vision understanding, standard 200k token context window, capable of handling 1M tokens for some tasks, excels in “Needle In A Haystack” tests.
Core Technical Features:
- Based on Transformer architecture.
- Constitutional AI training method, using preset principles to improve safety and honesty.
- Optimized for long context processing, supporting multi-turn dialogue and complex document analysis.
Advantages:
- Leading long context window, ideal for processing extremely long legal documents.
- Safety and ethical design, reduces risk of harmful outputs.
- Top-tier performance, Claude 3 Opus and 3.7 Sonnet excel in complex tasks.
- Multimodal capability, supports image input and text output.
- API provided, enhancing integration.
Limitations and Risks:
- Ecosystem is smaller, but expanding via platforms like Vertex AI.
- Closed source, reliant on Anthropic services.
- High cost, especially for Opus and 3.7 Sonnet.
- Knowledge cutoff limitations, requires external data supplementation.
- Hallucination risk still needs attention.
Relevance to Legal Scenarios:
- Long document processing: Ideal choice for reviewing lengthy contracts, trial transcripts, due diligence files.
- Safety and Compliance: Constitutional AI makes it more trustworthy for handling sensitive information.
- Complex Analysis: Opus and 3.7 Sonnet are suitable for legal argumentation and risk assessment.
- Core Prerequisite: Need to verify accuracy and protect confidentiality.

2.3 Google Gemini / PaLM Series: The Search Giant’s Multimodal Ambition

Leveraging its deep technical expertise, Google’s Gemini series is its flagship native multimodal strategic model, replacing the PaLM series.

Core Model Family:
- LaMDA: Early conversational model, replaced by Gemini.
- PaLM / PaLM 2: General models, previously powered Bard; PaLM 2 supported multilingual and code generation.
- Gemini 1.0 Series (Released Dec 2023):
  - Ultra: Flagship, rivals GPT-4, surpassed GPT-4 on 30/32 benchmarks at launch.
  - Pro: Balanced performance, powered Bard (now renamed Gemini).
  - Nano: Efficient on-device model, integrated into Pixel 8 Pro etc.
- Gemini 1.5 Series (Released May 2024):
  - Pro: 1M token context window, enhanced code generation and logical reasoning.
  - Flash: Lightweight version, optimized for latency and cost.
- Gemini 2.0 Series (Released Dec 2024):
  - 2.0 Flash: Supports native image and audio output, emphasizes “reasoning” capabilities.
- Gemini 2.5 Pro (Released March 2025): 1M token context window, SWE-bench score 63.8%, leading GPT-4.1 and Claude 3.7 Sonnet.
- PaliGemma 2 (Released March 2025): Focused on vision-language tasks, scales from 3B to 28B parameters.
Core Technical Features:
- Based on Transformer architecture, optimized on Google TPUs.
- Native multimodal design, supporting text, code, audio, image, video.
- Integrates Google Search for real-time information.
- Supports Google ecosystem (Workspace, Android, YouTube, etc.).
Advantages:
- Multimodal capabilities, Gemini 2.0 supports native image/audio output; 2.5 Pro excels in interactive simulations and coding.
- Real-time information, access to latest data via search integration.
- Ecosystem integration, supports Gmail, Docs, YouTube, etc., with up to 10 min context memory.
- High performance, Gemini 2.5 Pro leads on several benchmarks.
Limitations and Risks:
- Early factuality concerns were raised, now significantly improved.
- API ecosystem still maturing.
- Closed source, reliant on Google services.
- Data privacy concerns, especially regarding training on public data.
Relevance to Legal Scenarios:
- Real-time information: Suitable for analysis tasks requiring the latest laws, regulations, or case law.
- Multimodal potential: Handling mixed-media evidence (e.g., video, images).
- Workflow integration: Convenient for Google Workspace users.
- Core Prerequisite: Need to verify output and protect confidentiality.

2.4 xAI Grok Series: Open-Source Explorer Missioned to Accelerate Scientific Discovery

Developed by xAI, the Grok series aims to “accelerate human scientific discovery,” combining multimodal capabilities and an open ecosystem to provide practical and transparent AI solutions.

Core Model Family:
- Grok 1: Released 2023, xAI’s first model, emphasized general conversation and scientific reasoning, performance close to GPT-3.5.
- Grok 2: Released Aug 2024, significantly improved reasoning and multimodal capabilities, approaching GPT-4 level, surpassing Claude 3 Sonnet on some benchmarks.
- Grok 3: Released Feb 2025, xAI’s flagship model, supports text and image input, 128k token context window. Rivals GPT-4o and Gemini 2.5 Pro on complex reasoning, coding, and multimodal tasks. Offers DeepSearch mode (iterative web search) and Think mode (deep reasoning) to enhance accuracy and complex problem-solving.
Core Technical Features:
- Based on Transformer architecture, optimized for multimodal processing.
- Open-source inclination, some model weights released publicly, fostering community development.
- Real-time information integration, via DeepSearch mode.
- Deep integration with X platform, supporting social media content analysis.
Advantages:
- Multimodal capabilities, Grok 3 supports image input and text output, suitable for mixed data analysis.
- Real-time information, DeepSearch mode provides up-to-date data.
- User-friendly, free access (limited quota), SuperGrok subscription offers higher limits.
- Science-oriented, excels in reasoning in fields like physics and mathematics.
- Ecosystem potential, expanding applications via X platform and API.
Limitations and Risks:
- Relatively new ecosystem, developer community still growing.
- Partially closed source, Grok 3 full model not entirely open.
- Hallucination risk, requires output verification.
- Privacy concerns, need to ensure data security.
Relevance to Legal Scenarios:
- Real-time information: DeepSearch mode is suitable for retrieving the latest legal developments.
- Multimodal potential: Analyzing image/text evidence like scanned contracts.
- Scientific reasoning: Applicable to technology-related legal analysis (e.g., patents).
- Core Prerequisite: Need to verify accuracy and protect confidentiality.

3. The Power of Open-Source Models: Openness, Customization, and Community Driven

Beyond the major closed-source commercial models, the open-source Large Language Model (LLM) community is flourishing, offering numerous high-performance, freely usable (subject to license agreements), and customizable models. This significantly lowers the barrier to accessing advanced AI technology. Chinese open-source LLMs play a vital role in the global ecosystem, particularly with strengths in Chinese language processing and specific domains like code and math. Meanwhile, international open-source models like Meta’s Llama series and those from Mistral AI further enrich the landscape.

3.1 Chinese Open-Source Models: Local Strengths and Global Influence

Chinese AI companies have made outstanding contributions to open-source LLMs, releasing several powerful models that excel, especially in Chinese understanding, code generation, mathematical reasoning, and multimodal tasks. These models not only serve domestic needs but also hold significant positions in the global open-source community.

3.1.1 DeepSeek Series: Efficient Innovation and Open-Source Pioneer

Developed by Hangzhou DeepSeek AI, this series is known for efficient training and an open-source strategy, aiming to tackle cutting-edge AI challenges.

Core Model Family:
- DeepSeek-LLM (2023): Initial general models, 7B/67B parameters, rivaling Llama 2 70B, strong in Chinese, code, and math.
- DeepSeek-Coder (2023): Focused on code generation, HumanEval Pass@1 score 73.78, surpassing peers.
- DeepSeek-MoE (Jan 2024): First Chinese open-source MoE model, efficient inference via Mixture-of-Experts.
- DeepSeek V2/V2.5 (2024): Introduced Multi-Head Latent Attention (MLA), significantly reducing memory needs, performance near GPT-4.
- DeepSeek V3 (Dec 2024): 671B parameter MoE model, training cost ~$5.58M on 14.8T tokens, rivals commercial models like GPT-4/Claude 3.7 series, strong on benchmarks like Aider Polyglot and Codeforces.
- DeepSeek-R1 (Jan 2025): Reasoning model based on V3, uses Chain-of-Thought (CoT) and RL, rivals/surpasses OpenAI o1 on AIME/MATH, training cost ~$6M, MIT license.
Core Technical Features:
- Based on Transformer, optimized with MoE and MLA mechanisms.
- Open-source strategy: Provides model weights, some training code, supports community development.
- Efficient training: Uses mixed precision (FP8/BF16) and load balancing, reducing costs.
Advantages:
- High cost-effectiveness: V3 training cost much lower than Western models (e.g., GPT-4’s estimated $100M+).
- Chinese capability: Excels in Chinese understanding/generation, suitable for localized apps.
- Community support: Widely distributed via Hugging Face, growing developer ecosystem.
- Domain strengths: Outstanding in code (leading LiveCodeBench score) and math (Math-500 score 90.2).
Limitations and Challenges:
- Censorship: Content filtering for sensitive topics.
- Hardware requirements: High-parameter models need powerful GPUs for local deployment.
- Ecosystem maturity: Toolchains still maturing compared to Llama.
Relevance to Legal Scenarios:
- Custom legal models: Locally deployable for sensitive legal data, aids contract analysis/research.
- Chinese advantage: Superior handling of Chinese legal documents (contracts, regulations).
- Cost-effectiveness: Open-source reduces long-term operational costs, viable for smaller firms.

3.1.2 Alibaba Qwen Series: Benchmark for Multimodal and Enterprise Applications

Alibaba’s Qwen (Tongyi Qianwen) series represents Chinese open-source LLMs with its wide parameter range and open-source strategy, widely used in enterprise settings.

Core Model Family:
- Qwen 1.5 (2024): Models from 1.8B to 72B params, outperforms Llama 2, strong in Chinese, code, multilingual tasks.
- Qwen 2 (2024): Further optimized 7B/72B models, supports multimodality (Qwen-VL), competitive in VQA/image understanding.
- Qwen 2.5-Max (Jan 2025): Pre-trained on >20T tokens, MoE architecture, 32k context, surpasses DeepSeek V3 on Arena-Hard, LiveBench.
- QwQ-32B (Mar 2025): Focused on math reasoning and code generation, rivals DeepSeek R1 with lower compute needs, Apache 2.0 license.
Core Technical Features:
- Based on Transformer, integrates MoE and vision-language capabilities.
- Open-source ecosystem: Available via Hugging Face, ModelScope, Alibaba Cloud API, serving >90k enterprises.
- Multimodal support: Qwen-VL handles image input, suitable for mixed data.
Advantages:
- Wide applicability: Models for various compute needs.
- Enterprise integration: Deep ties with Alibaba Cloud, supports private deployment.
- Multimodal capability: Processes mixed text/image data for evidence analysis.
- Community support: Active open-source community, high download counts.
Limitations and Challenges:
- Data transparency: Limited disclosure of training data details impacts research.
- Censorship issues: Some models restrict sensitive content.
- Performance controversies: Declined performance on some benchmarks (e.g., Hungarian Math Exam) raised data contamination concerns.
Relevance to Legal Scenarios:
- Multimodal applications: Analyzing image/text legal evidence (e.g., scanned contracts).
- Enterprise deployment: Suitable for large law firms’ private deployment needs.
- Chinese optimization: Efficiently processes Chinese legal texts.

3.1.3 Zhipu AI GLM Series: Fusion of Multimodality and Efficient Inference

Zhipu AI’s GLM series excels in multimodal capabilities and efficient inference, gradually closing the gap with top international models.

Core Model Family:
- ChatGLM3 (2023): 6B params, strong Chinese capability, MBPP code score 52.4, near DeepSeek 67B.
- GLM-4 (2024): Performance approaches GPT-4, supports multimodality (image input), improved inference speed.
- GLM-Z1 Series (Apr 2025): Includes inference (GLM-Z1-Air/AirX), rumination (GLM-Z1-Rumination), and base models. AirX is 8x faster than DeepSeek R1 at 1/30th the price, supports autonomous tool use and self-verification.
Core Technical Features:
- Based on Transformer, optimized for multimodal and long context.
- Partially open-source: E.g., GLM-4 weights released, encouraging community development.
- Efficient inference: GLM-Z1-AirX optimized for low latency.
Advantages:
- Multimodal capability: Supports images and text for complex tasks.
- High cost-effectiveness: GLM-Z1-Air offers free API access, lowering usage barriers.
- Chinese specialization: Excels in Chinese semantic understanding (e.g., semantic dependency graphs).
Limitations and Challenges:
- Tool use stability: GLM-4 shows instability in complex tool-calling scenarios.
- Lack of transparency: Fewer details on training data/processes disclosed.
- Censorship: Content filtering for sensitive topics.
Relevance to Legal Scenarios:
- Multimodal processing: Analyzing mixed legal evidence.
- Efficient inference: Suitable for real-time legal consultation scenarios.
- Chinese advantage: Optimized handling of Chinese legal documents.

3.1.4 Baichuan Intelligence Baichuan Series: Open-Source Choice for Enterprise Applications

Baichuan Intelligence’s Baichuan series targets enterprise applications, balancing open-source availability with commercialization.

Core Model Family:
- Baichuan2 (2023): 7B/13B params, outperforms Llama 2, strong Chinese capability.
- Baichuan3 (2024): Further improvements, among China’s leading open-source models, suitable for enterprise customization.
- Baichuan4 (2025): Latest iteration, optimized multimodal and reasoning capabilities, parameters undisclosed.
Core Technical Features:
- Based on Transformer, supports multimodal extensions.
- Dual open-source/commercial strategy: Offers free weights and paid API/private cloud deployment.
- Enterprise optimization: Supports private deployment, meeting compliance needs.
Advantages:
- Enterprise-friendly: Offers fine-tuning services and private deployment for sensitive data scenarios.
- Chinese capability: Stable performance on Chinese tasks.
- Community support: Open-source versions foster developer ecosystem growth.
Limitations and Challenges:
- Open-source transparency: Limited details on training data/processes.
- Domain focus shift: Paused pre-training mid-2024 to focus on verticals like medical AI.
- Censorship: Content filtering for sensitive topics.
Relevance to Legal Scenarios:
- Private deployment: Suitable for law firms handling sensitive legal data.
- Enterprise support: Meets compliance and customization needs.
- Chinese optimization: Efficiently processes Chinese legal documents.

3.1.5 01.AI Yi Series: Open-Source Powerhouse with Multilingual and High Performance

01.AI’s Yi series is known for multilingual capabilities and high performance, contributing to the global open-source ecosystem.

Core Model Family:
- Yi-34B (2023): 34B params, performance near GPT-3.5, balanced Chinese/English capabilities.
- Yi-6B/9B (2024): Lightweight models, optimized inference efficiency for edge devices.
- Yi-Large (2025): Latest model, parameter scale undisclosed, strong in multilingual and code generation, partially open-source.
Core Technical Features:
- Based on Transformer, optimized for multilingual processing.
- Open-source strategy: Provides model weights, supports community fine-tuning.
- Multilingual support: Strong bilingual (Chinese/English) capabilities, expanding to other languages.
Advantages:
- Multilingual capability: Suitable for cross-border legal scenarios.
- High performance: Yi-34B performs well on multiple benchmarks.
- Active community: Rapidly growing open-source ecosystem.
Limitations and Challenges:
- Parameter scale: Smaller than DeepSeek V3 (671B), limiting complex task capacity.
- Data transparency: Limited disclosure of training data details.
- Censorship: Content filtering for sensitive topics.
Relevance to Legal Scenarios:
- Multilingual support: Suitable for handling legal documents in multiple languages.
- Local deployment: Ensures data privacy.
- Cost-effectiveness: Open-source models reduce operational costs.

3.2 International Open-Source Models: Cornerstones of the Global Ecosystem

International open-source LLMs, led by Meta’s Llama series and models from Mistral AI, provide global developers with high-performance, customizable solutions.

3.2.1 Meta Llama Series: The Open-Source Flagship Challenger

Meta’s Llama series, with its high performance and relatively permissive license, has significantly propelled the global open-source LLM ecosystem.

Core Model Family:
- Llama (Feb 2023): First generation, showed potential of smaller models.
- Llama 2 (Jul 2023): 7B/13B/70B params, conditional commercial use, strong performance, lowered LLM barrier.
- Llama 3 (Apr 2024): 8B/70B params, instruction-tuned versions outperform peers, significant gains in reasoning, code, instruction following.
- Llama 3.1 (Jul 2024): Added 405B model, rivals GPT-4 performance, 128k context window, multimodal support planned.
- Llama 3.3 (Dec 2024): Optimized inference efficiency for 8B/70B models on low-resource devices.
- Llama 4 (Apr 2025): Latest version, claimed 17B active parameters (16-expert MoE), touted as best-in-class multimodal model, runs on single H100 GPU. Llama 4 Scout offers 10M context window. Performance claims are currently debated.
Core Technical Features:
- Based on Transformer decoder, continuously optimized architecture.
- Massive training data, combined with SFT and RLHF/RLAIF alignment.
- Open-source: Provides weights, supports local deployment and fine-tuning.
Advantages:
- Openness and control: Local deployment ensures privacy, supports deep customization.
- Cost-effectiveness: Avoids API dependency, economical long-term.
- Thriving ecosystem: Community offers rich tools and optimized versions.
- High performance: Llama 3.1 405B rivals closed-source models on many benchmarks.
Limitations and Challenges:
- Technical barrier: Deployment/fine-tuning requires expertise and resources.
- License restrictions: Commercial use subject to Meta’s terms.
- Safety responsibility: Users manage ethical/safety risks.
- Gap at the top: Slightly lags GPT-4o in maximum scale/cutting-edge capabilities.
Relevance to Legal Scenarios:
- Autonomy and control: Supports building dedicated legal LLMs for sensitive data.
- Long text processing: 128k context window suitable for long legal documents (Llama 4 Scout aims for 10M).
- Privacy assurance: Local deployment meets strict compliance needs.

3.2.2 Mistral AI Series: Paragon of Efficiency and Sparse Architecture

Mistral AI (France) has gained prominence in the open-source community through efficient and innovative architectures.

Core Model Family:
- Mistral 7B (2023): 7B params, rivals Llama 2 13B performance, highly efficient inference.
- Mixtral 8x7B (Dec 2023): Uses MoE architecture, 56B total params, fast inference, performance near GPT-3.5.
- Mixtral 8x22B (Apr 2024): 176B params, performance approaches Llama 3 70B, MoE boosts efficiency.
- Mistral Large (Feb 2025): Latest flagship, parameter scale undisclosed, strong in multilingual and code generation, partially open-source.
Core Technical Features:
- Based on Transformer, MoE architecture optimizes inference efficiency.
- Open-source weights: Some models (e.g., Mixtral 8x7B) fully open.
- Multilingual support: Covers multiple European languages.
Advantages:
- Efficient inference: MoE architecture reduces compute requirements.
- High performance: Mixtral 8x22B excels on several benchmarks.
- Community support: Active open-source ecosystem, rich integrations.
Limitations and Challenges:
- Training transparency: Limited disclosure on data and training details.
- Hardware requirements: Large models need powerful GPUs.
- Commercial restrictions: Some models (e.g., Mistral Large) have stricter commercial licenses.
Relevance to Legal Scenarios:
- Efficient deployment: Suitable for firms with limited resources.
- Multilingual support: Handling cross-border legal documents.
- Customization potential: Supports development of specialized legal models.

3.3 Common Value of Open-Source Models

Both Chinese and international open-source LLMs collectively lower the barrier to AI technology, fostering technological democratization and rapid innovation. Domestic series like DeepSeek, Qwen, GLM, Baichuan, and Yi enrich the global ecosystem with Chinese language strengths and domain expertise (e.g., code, math), while international models like Meta’s Llama and Mistral AI drive broad adoption through high performance and openness. These models offer the following core values for the legal field:

Autonomy and Control: Local deployment ensures data privacy and meets compliance needs, especially crucial for legal practitioners.
Customization: Enables development of specialized legal models, optimizing terminology understanding and task handling.
Cost-Effectiveness: Reduces reliance on APIs, lowering long-term operational costs.
Community Driven: Global open-source communities provide abundant tools and support, fostering innovation.

In legal scenarios, open-source models are particularly well-suited for tasks requiring high privacy protection, specific terminology understanding, and localized deployment, such as contract analysis, regulatory research, and evidence processing. However, users must consider the technical barriers, output verification needs, and safety management to ensure effective application and compliance.

When choosing a Large Language Model for legal work, there is no one-size-fits-all “best” option. You need to weigh various factors based on your specific application scenario, task complexity, performance requirements, budget constraints, technical capabilities, requirements for data privacy and control, and relevant legal and compliance obligations.

Primarily handling Chinese legal matters, high focus on domestic compliance and local knowledge? → Prioritize mainstream domestic models like DeepSeek, Doubao, Tongyi Qianwen, Spark, Kimi, GLM-4, etc.
Seeking top-tier general intelligence and reasoning capabilities? → Consider GPT-4.1, Claude 3.7 Opus, Gemini 2.5 Pro, DeepSeek-R1 (or future flagship updates).
Need to process ultra-long documents is essential? → Prioritize the Claude series (Opus/Sonnet/Haiku), Kimi, Gemini 2.5 Pro.
Need to conduct research incorporating real-time web information? → Consider various models with search integration, or even those with advanced research features like Grok, Claude, ChatGPT, etc.
Want full data control, deep customization, and embrace open source? → Focus on the DeepSeek series, other excellent domestic open-source models like Qwen, ChatGLM/GLM-4, etc., as well as international options like Gemma, Llama, Mistral series.
Highly cost-sensitive, or handling relatively simple tasks? → Consider using mainstream online model services, or opt for high-performing smaller versions of open-source models (e.g., Gemma 3 7B, Mistral 7B, Qwen2.5 7B, DeepSeek 7B, etc.).

Ultimately, base your decision on actual testing and validation! Before deciding to adopt or rely heavily on a specific LLM, it is strongly recommended to conduct small-scale, representative tests on the specific legal tasks you care about (e.g., a specific part of contract review, a certain type of case research, drafting assistance for a particular document type). Compare their performance, output quality, factual accuracy, hallucination rates, instruction following ability, and safety side-by-side. And, regardless of the model chosen, never forget: human professional judgment and final review are the indispensable last line of defense!

Conclusion: A Hundred Boats Compete - The Dynamic and Evolving Intelligence Landscape

The field of Large Language Models (LLMs) is currently experiencing an unprecedented era of high-speed development and fierce competition. New models, technologies, and application paradigms emerge constantly, creating a dynamic landscape where “a hundred boats compete.” From the rapid rise and localized strengths of domestic players in China, to the continued leadership and technological breakthroughs of international giants, and the vibrant innovation driven by the open-source community, each force is actively shaping the future of AI.

For legal professionals, it is crucial to maintain a basic understanding and dynamic awareness of the major players and their representative products in this field. Recognizing their respective technical characteristics, core advantages, potential limitations, and associated legal and ethical risks (especially concerning data privacy, output accuracy, intellectual property, and bias) is fundamental for making informed technology choices, effectively leveraging LLMs to empower legal work, and prudently avoiding potential pitfalls. Understanding these differences will help you better grasp opportunities and navigate challenges in this era driven by AI transformation.