主流大语言“推理模型”深度评测：ChatGPT vs Grok3 vs Claude3.7 vs Deepseek-R1 vs Gemini 2.0 Pro

In-depth Review of Mainstream Large Language "Inference Models": ChatGPT vs Grok3 vs Claude3.7 vs Deepseek-R1 vs Gemini 2.0 Pro

I. Introduction

In today's era of rapid AI development, various big language models are constantly iterated and updated. Today, we will evaluate five top big models in depth: ChatGPT o3-mini, Grok3 thinking, Claude3.7 thinking, Deepseek-r1, and Gemini-2.0-Pro, and compare their performances in different scenarios in all aspects.

II. Comparison of in-depth evaluation and analysis

to answer the same question using each model in ShirtAI separately.ShirtAI has free unlimited access to GPT Plus, Claude Pro, Grok Super, and Deepseek full-blooded versions, and the official website is one click away:www.lsshirtai.com

Title 1:Workers in a tea factory have to fill a rectangular tea box of length and width 20 cm and height 10 cm into a square cardboard box with a prism length of 30 cm (measured from the inside). What is the maximum number of boxes that can fit in a carton? How can it fit?

Conclusion:The answer is 6 boxes, and the claude-3.7-thinking reasoning model wins hands down, fast and accurate.deepseek-r1 is the slowest but has the correct answer, and Grok3 deepthinking and O3-mini have the wrong answer.

Title 2:The function $$f(x) = e^x + ax^2 - x.$$ is known (1) Discuss the monotonicity of $f(x)$ when $a = 1$; (2) When $x \geq 0$, $f(x) \geq \ frac{1}{2}x^3 + 1$, find the range of values of $a$.

Conclusion:All the models give the correct answer, but the o3-mini is better in terms of speed.

In addition, we conducted other tests with the following results:

test scenario	ChatGPT o3-mini	Grok3 thinking	Claude3.7 thinking	Deepseek-r1	Gemini-2.0-Pro
complex mathematical problem (Bayes' theorem)	Basic explanations are clear, but depth and detail are lacking, and cases are simple	Explanations are vivid and introduce intuitive visualization analogies, but rigorous derivation is slightly lacking	The most systematic proof process with in-depth explanations of concepts, detailed medical screening cases, and clear calculations	Mathematical derivations are most rigorous, formulas are beautifully laid out, but case explanations are relatively academic	Balances theory and practice, but not as good as Claude and Deepseek on specific details
coding skills (Rapid Sort)	Basic functionality is implemented correctly, but code efficiency and boundary handling are poor	Correct algorithm, slightly redundant code structure, practical optimization suggestions	The code is clear and easy to read, detailed comments, explanation of each step of the idea, complexity analysis of a comprehensive	The code is the most streamlined and efficient, with optimal boundary condition handling and in-depth complexity analysis	Provides multiple implementations, including in-place sorting and functional programming, with certain boundary cases under-considered
Creative Writing (2050)	The story flows well but is rather bland, and the futuristic technological elements favor common imagery	Good at building a grand worldview, bold technology portrayal, slightly weak character emotion portrayal	The plot is rich and vivid, the characters are three-dimensional, and the technological details are both forward-looking and sensible, incorporating emotional elements	Accurate but slightly stereotypical tech details, not enough storytelling	Narrative structure is complete, technology and social issues are well integrated, innovation is slightly lacking
logical inference (Prisoner's Dilemma)	Accurate explanation of underlying concepts, but not enough in-depth analysis	The analysis is most in-depth, introducing an evolutionary game theory perspective to discuss equilibrium strategies for repeated games	Theoretical explanations are the clearest, logical derivations are rigorous, and real-life examples from multiple fields are provided	Mathematical models are most rigorously constructed, but examples are slightly academic	Balancing theory and practical application with a wide variety of case studies

Overall, the advantages and disadvantages of the models are compared as follows:

mould	dominance	inferior	Most Applicable Scenarios
ChatGPT o3-mini	- Best performance in lightweight models - fast response time - Accuracy in dealing with basic issues	- Limited capacity for complex reasoning - Deep thinking is not as functional as other models	- Everyday Simple Questions and Answers - Basic content creation - Lightweight application scenarios
Grok3 thinking	- Transparency in the thinking process - Outstanding logical reasoning skills - Explain concepts in a lively and interesting way	- Slightly poor Chinese language skills - Insufficient depth in certain specialized areas	- Complex reasoning that requires seeing the thought process - Innovative thinking stimulation
Claude3.7 thinking	- The most balanced combination of competencies - Precise command following - Creativity and logic go hand in hand - Minimal hallucinations	- Vertical-specific specialization is slightly weaker than specialized models	- Content creation that requires a balance of creativity and accuracy - Complex command tasks
Deepseek-r1	- Extremely strong code and math skills - Best understood in Chinese - Rigorous academic reasoning	- Creative Writing is Relatively Stereotypical - Generic representations are not as vivid as other models	- Programming Development - Scientific research in mathematics - Chinese Academic Content Generation
Gemini-2.0-Pro	- Wide-ranging knowledge - Strong multimodal understanding - Abundance of practical cases	- Lack of depth in some complex reasoning scenarios	- Multimodal interactions that require the combination of images - Knowledge-intensive questions and answers

III. Comparison of model basics

Model name	development company	Release time	Model size	Charges
ChatGPT o3-mini	OpenAI	July 2024	About 7 billion parameters	Free and Plus paid versions
Grok3 thinking	xAI	July 2024	undisclosed	xAI members
Claude3.7 thinking	Anthropic	August 2024	undisclosed	Partially free, Claude Pro paid
Deepseek-r1	search in depth	May 2024	236 billion parameters	free (of charge)
Gemini-2.0-Pro	Google Internet company	May 2024	undisclosed	Partially free, premium version paid

IV. Comparative table of core competencies

capability dimension	ChatGPT o3-mini	Grok3 thinking	Claude3.7 thinking	Deepseek-r1	Gemini-2.0-Pro
General Questions and Answers	4	5	5	4	4
coding skills	3	4	5	5	4
mathematical reasoning	3	4	4	5	4
logical thinking	3	5	5	4	4
Creative Writing	4	4	5	3	4
command following	4	4	5	4	4
Chinese Language Proficiency	4	3	4	5	4
Depth of thought	3	5	5	4	4
illusionist control	3	3	5	4	4

v. synthesis of conclusions

After a full range of reviews, we came to the following conclusions:

Best Overall: Claude 3.7 thinking, excelled in most tests, especially in creative writing, command following and illusion control
Best Professional Competence: Deepseek-r1 was the best in code, math and Chinese professional content
Best thinking process: grok3 thinking and claude3.7 thinking are the most transparent in terms of demonstrating the thinking process
Best lightweight app: ChatGPT o3-mini has the best price/performance ratio among lightweight apps
Best Multimodal: Gemini-2.0-Pro Leads in Handling Multimodal Content

Which model to choose should ultimately be based on your specific usage scenario. If you are looking for a fully balanced experience, Claude 3.7 is a good choice; for programming and math needs, Deepseek-r1 is worth considering; and if you need a lightweight daily assistant, ChatGPT o3-mini can also meet the basic needs.

Additional resources have been prepared to help you explore your modeling potential. To master the big model cue word technique and interact with models efficiently, click on the link:Large Model Prompt Word Tips , here are practical strategies to help you unlock the model's powerful features.

If you want to use GPT Plus, Claude Pro, Grok Super official paid exclusive account, you will not recharge yourself can contact our professional team (wx: f15303420735)

For more products, please check out	See more at
ShirtAI - Penetrating Intelligence	AIGC Big Model: ushering in an era of dual revolution in engineering and science - Penetrating Intelligence
1:1 Restoration of Claude and GPT Official Website - AI Cloud Native	Live Match App Global HD Sports Viewing Player (Recommended) - BlueShirt.com
Transit service based on official API - GPTMeta API	Help, can anyone of you provide some tips on how to ask questions on GPT? - Knowing
Global Virtual Goods Digital Store - Global SmarTone (Feng Ling Ge)	How powerful is Claude airtfacts feature that GPT instantly doesn't smell good? -BeepBeep

GPTMeta API

In-depth Review of Mainstream Large Language "Inference Models": ChatGPT vs Grok3 vs Claude3.7 vs Deepseek-R1 vs Gemini 2.0 Pro

I. Introduction

II. Comparison of in-depth evaluation and analysis

III. Comparison of model basics

IV. Comparative table of core competencies

v. synthesis of conclusions

For more products, please check out

See more at

advertising position

GPTMeta API

Transit proxy service based on official APIs

Site Navigation

Begin

Docking third parties

consoles

Instructions

Online Monitoring

Friendly Link

OpenAI

Gemini

GPT Metaverse

Claude Metaverse

ShirtAI

Blueshirt cloud

Contact Us