Why Anthropic calls the new Claude 3 its ‘most intelligent’ AI model yet

Subject: Science and tech

Section: Awareness in IT & Computer

Context:

Anthropic was founded by former members of OpenAI, the company behind ChatGPT. It says its new family of AI models is capable of advanced performance, beating the likes of GPT-4 on some parameters.

More on news:

The family includes three state-of-the-art AI models in the ascending order of capabilities – Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus.
The company claims that each model offers an increasingly powerful performance, offering a balance between intelligence, speed, and cost based on their specific use case.

What is Claude 3?

Claude is a group of large language models (LLMs) developed by Anthropic.
The chatbot is capable of handling text, voice messages, and documents.
The chatbot is capable of generating faster, contextual responses compared to its peers.
Claude 3 Opus is the most powerful model, Claude 3 Sonnet is the middle model that is capable and price competitive, and Claude 3 Haiku is relevant for any use case that requires instant responses.
Claude Sonnet powers the Claude.ai chatbot for free at present and users only need an email sign-in.
Opus is only available through Anthropic’s web chat interface and if a user is subscribed to the Claude Pro service on the Anthropic website. It is available for $20 a month.
All new models come with a 2,00,000-token window, signifying possibly better performance, accuracy and the capacity to input more information in a user prompt.

How did Claude 3 perform?

Based on the comparison of Claude 3 with its peers, it seems the Anthropic may have caught up with OpenAI.
It had surpassed many AI models with the launch of its GPT-4 Turbo.
Claude 3 reportedly demonstrates advanced performance across cognitive tasks such as reasoning, expert knowledge, mathematics, and language fluency.
Despite the lack of consensus over whether LLMs can really “know” or “reason,” the AI research community commonly uses these terms.
The company says that the Opus model exhibits “near-human levels of comprehension and fluency on complex tasks”.
While this is a big claim, the scores show that Claude 3 Opus has shown some near-human performance on specific benchmarks. However, this doesn’t mean that Opus possesses general intelligence like humans.

Claude 3 vs GPT-4

Claude 3 Opus has surpassed GPT-4 on as many as 10 AI benchmarks, which include MMLU (undergraduate level knowledge), HumanEval (Coding), HellaSwag (common knowledge), and GSM8K (grade school math).
On the benchmark scores, Claude 3 beats its peers narrowly. For example, in the five-shot MMLU trial, Claude 3 secured 86.8 percent while GPT-4 obtained 86.4 percent.

Benchmark scores.

Claude 3 has also shown improvements in terms of analysis, forecasting, content creation, multilingual conversations, code generation, etc.
Anthropic claimed that the new model family also comes with enhanced vision capabilities, allowing Claude 3 to process photos, charts, and diagrams, much like GPT-4V.

Limitations of Claude 3

According to those who had early access to the model, Claude 3 performs well in tasks such as answering factual questions and optical character recognition (OCR), meaning the ability to extract text from images.
However, it struggles with complex reasoning and mathematical problems at times.
It also exhibited biases in its responses, such as favoring a certain racial group over others.
In the past too, other AI models have faced similar problems.
Google’s AI chatbot Gemini was criticized after it showed racial bias and historical inaccuracies. It refused to generate images of white individuals and depicted those individuals as people of color.
Anthropic has emphasized the safety features of Claude 3, especially its refusal to generate harmful or illegal content.
The company was also among the first to bring about Constitutional AI. Developers laid down a set of values that the system must follow so that it undertakes politically and socially responsible actions.
As of now the Claude 3 is the most expensive model on the market, but Anthropic has plans to release affordable versions soon.
Based on the early reports, benchmarks, and confidence from the AI community, Claude 3 seems to be a significant step forward in the development of LLMs.