In-depth Review of Mainstream Large Language "Inference Models": ChatGPT vs Grok3 vs Claude3.7 vs Deepseek-R1 vs Gemini 2.0 Pro

I. Introduction

In today's era of rapid AI development, various big language models are constantly iterated and updated. Today, we will evaluate five top big models in depth: ChatGPT o3-mini, Grok3 thinking, Claude3.7 thinking, Deepseek-r1, and Gemini-2.0-Pro, and compare their performances in different scenarios in all aspects.

II. Comparison of in-depth evaluation and analysis

to answer the same question using each model in ShirtAI separately.ShirtAI has free unlimited access to GPT Plus, Claude Pro, Grok Super, and Deepseek full-blooded versions, and the official website is one click away:www.lsshirtai.com

Title 1:Workers in a tea factory have to fill a rectangular tea box of length and width 20 cm and height 10 cm into a square cardboard box with a prism length of 30 cm (measured from the inside). What is the maximum number of boxes that can fit in a carton? How can it fit?

Conclusion:The answer is 6 boxes, and the claude-3.7-thinking reasoning model wins hands down, fast and accurate.deepseek-r1 is the slowest but has the correct answer, and Grok3 deepthinking and O3-mini have the wrong answer.

 

Title 2:The function $$f(x) = e^x + ax^2 - x.$$ is known (1) Discuss the monotonicity of $f(x)$ when $a = 1$; (2) When $x \geq 0$, $f(x) \geq \ frac{1}{2}x^3 + 1$, find the range of values of $a$.

Conclusion:All the models give the correct answer, but the o3-mini is better in terms of speed.

 

In addition, we conducted other tests with the following results:

test scenario ChatGPT o3-mini Grok3 thinking Claude3.7 thinking Deepseek-r1 Gemini-2.0-Pro
complex mathematical problem
(Bayes' theorem)
Basic explanations are clear, but depth and detail are lacking, and cases are simple Explanations are vivid and introduce intuitive visualization analogies, but rigorous derivation is slightly lacking The most systematic proof process with in-depth explanations of concepts, detailed medical screening cases, and clear calculations Mathematical derivations are most rigorous, formulas are beautifully laid out, but case explanations are relatively academic Balances theory and practice, but not as good as Claude and Deepseek on specific details
coding skills
(Rapid Sort)
Basic functionality is implemented correctly, but code efficiency and boundary handling are poor Correct algorithm, slightly redundant code structure, practical optimization suggestions The code is clear and easy to read, detailed comments, explanation of each step of the idea, complexity analysis of a comprehensive The code is the most streamlined and efficient, with optimal boundary condition handling and in-depth complexity analysis Provides multiple implementations, including in-place sorting and functional programming, with certain boundary cases under-considered
Creative Writing
(2050)
The story flows well but is rather bland, and the futuristic technological elements favor common imagery Good at building a grand worldview, bold technology portrayal, slightly weak character emotion portrayal The plot is rich and vivid, the characters are three-dimensional, and the technological details are both forward-looking and sensible, incorporating emotional elements Accurate but slightly stereotypical tech details, not enough storytelling Narrative structure is complete, technology and social issues are well integrated, innovation is slightly lacking
logical inference
(Prisoner's Dilemma)
Accurate explanation of underlying concepts, but not enough in-depth analysis The analysis is most in-depth, introducing an evolutionary game theory perspective to discuss equilibrium strategies for repeated games Theoretical explanations are the clearest, logical derivations are rigorous, and real-life examples from multiple fields are provided Mathematical models are most rigorously constructed, but examples are slightly academic Balancing theory and practical application with a wide variety of case studies

 

Overall, the advantages and disadvantages of the models are compared as follows:

mould dominance inferior Most Applicable Scenarios
ChatGPT o3-mini - Best performance in lightweight models
- fast response time
- Accuracy in dealing with basic issues
- Limited capacity for complex reasoning
- Deep thinking is not as functional as other models
- Everyday Simple Questions and Answers
- Basic content creation
- Lightweight application scenarios
Grok3 thinking - Transparency in the thinking process
- Outstanding logical reasoning skills
- Explain concepts in a lively and interesting way
- Slightly poor Chinese language skills
- Insufficient depth in certain specialized areas
- Complex reasoning that requires seeing the thought process
- Innovative thinking stimulation
Claude3.7 thinking - The most balanced combination of competencies
- Precise command following
- Creativity and logic go hand in hand
- Minimal hallucinations
- Vertical-specific specialization is slightly weaker than specialized models - Content creation that requires a balance of creativity and accuracy
- Complex command tasks
Deepseek-r1 - Extremely strong code and math skills
- Best understood in Chinese
- Rigorous academic reasoning
- Creative Writing is Relatively Stereotypical
- Generic representations are not as vivid as other models
- Programming Development
- Scientific research in mathematics
- Chinese Academic Content Generation
Gemini-2.0-Pro - Wide-ranging knowledge
- Strong multimodal understanding
- Abundance of practical cases
- Lack of depth in some complex reasoning scenarios - Multimodal interactions that require the combination of images
- Knowledge-intensive questions and answers

III. Comparison of model basics

Model name development company Release time Model size Charges
ChatGPT o3-mini OpenAI July 2024 About 7 billion parameters Free and Plus paid versions
Grok3 thinking xAI July 2024 undisclosed xAI members
Claude3.7 thinking Anthropic August 2024 undisclosed Partially free, Claude Pro paid
Deepseek-r1 search in depth May 2024 236 billion parameters free (of charge)
Gemini-2.0-Pro Google Internet company May 2024 undisclosed Partially free, premium version paid

IV. Comparative table of core competencies

capability dimension ChatGPT o3-mini Grok3 thinking Claude3.7 thinking Deepseek-r1 Gemini-2.0-Pro
General Questions and Answers 4 5 5 4 4
coding skills 3 4 5 5 4
mathematical reasoning 3 4 4 5 4
logical thinking 3 5 5 4 4
Creative Writing 4 4 5 3 4
command following 4 4 5 4 4
Chinese Language Proficiency 4 3 4 5 4
Depth of thought 3 5 5 4 4
illusionist control 3 3 5 4 4

v. synthesis of conclusions

After a full range of reviews, we came to the following conclusions:

  1. Best Overall: Claude 3.7 thinking, excelled in most tests, especially in creative writing, command following and illusion control
  2. Best Professional Competence: Deepseek-r1 was the best in code, math and Chinese professional content
  3. Best thinking process: grok3 thinking and claude3.7 thinking are the most transparent in terms of demonstrating the thinking process
  4. Best lightweight app: ChatGPT o3-mini has the best price/performance ratio among lightweight apps
  5. Best Multimodal: Gemini-2.0-Pro Leads in Handling Multimodal Content

Which model to choose should ultimately be based on your specific usage scenario. If you are looking for a fully balanced experience, Claude 3.7 is a good choice; for programming and math needs, Deepseek-r1 is worth considering; and if you need a lightweight daily assistant, ChatGPT o3-mini can also meet the basic needs.

Additional resources have been prepared to help you explore your modeling potential. To master the big model cue word technique and interact with models efficiently, click on the link:Large Model Prompt Word Tips , here are practical strategies to help you unlock the model's powerful features.

If you want to use GPT Plus, Claude Pro, Grok Super official paid exclusive account, you will not recharge yourself can contact our professional team (wx: f15303420735)

For more products, please check out

See more at

ShirtAI - Penetrating Intelligence AIGC Big Model: ushering in an era of dual revolution in engineering and science - Penetrating Intelligence
1:1 Restoration of Claude and GPT Official Website - AI Cloud Native Live Match App Global HD Sports Viewing Player (Recommended) - BlueShirt.com
Transit service based on official API - GPTMeta API Help, can anyone of you provide some tips on how to ask questions on GPT? - Knowing
Global Virtual Goods Digital Store - Global SmarTone (Feng Ling Ge) How powerful is Claude airtfacts feature that GPT instantly doesn't smell good? -BeepBeep

advertising position

Transit proxy service based on official APIs

In this era of openness and sharing, OpenAI leads a revolution in artificial intelligence. Now, we announce to the world that we have fully supported all models of OpenAI, for example, supporting GPT-4-ALL, GPT-4-multimodal, GPT-4-gizmo-*, etc. as well as a variety of home-grown big models. Most excitingly, we have introduced the more powerful and influential GPT-4o to the world!

Site Navigation

Begin
Docking third parties
consoles
Instructions
Online Monitoring

Contact Us

公众号二维码

public number

企业合作二维码

Cooperation

Copyright © 2021-2024 All Rights Reserved 2024 | GPTMeta API