Ars Technica
Anthropic introduces Claude 3.5 Sonnet, matching GPT-4o on benchmarks
20 June 2024 at 17:04

Anthropic introduces Claude 3.5 Sonnet, matching GPT-4o on benchmarks

By: Benj Edwards

20 June 2024 at 17:04

Enlarge (credit: Anthropic / Benj Edwards)

On Thursday, Anthropic announced Claude 3.5 Sonnet, its latest AI language model and the first in a new series of "3.5" models that build upon Claude 3, launched in March. Claude 3.5 can compose text, analyze data, and write code. It features a 200,000 token context window and is available now on the Claude website and through an API. Anthropic also introduced Artifacts, a new feature in the Claude interface that shows related work documents in a dedicated window.

So far, people outside of Anthropic seem impressed. "This model is really, really good," wrote independent AI researcher Simon Willison on X. "I think this is the new best overall model (and both faster and half the price of Opus, similar to the GPT-4 Turbo to GPT-4o jump)."

As we've written before, benchmarks for large language models (LLMs) are troublesome because they can be cherry-picked and often do not capture the feel and nuance of using a machine to generate outputs on almost any conceivable topic. But according to Anthropic, Claude 3.5 Sonnet matches or outperforms competitor models like GPT-4o and Gemini 1.5 Pro on certain benchmarks like MMLU (undergraduate level knowledge), GSM8K (grade school math), and HumanEval (coding).

Read 17 remaining paragraphs | Comments

Ars Technica
Researchers describe how to tell if ChatGPT is confabulating
20 June 2024 at 15:32

Researchers describe how to tell if ChatGPT is confabulating

Ars Technica

By: John Timmer

20 June 2024 at 15:32

Researchers describe how to tell if ChatGPT is confabulating

It's one of the world's worst-kept secrets that large language models give blatantly false answers to queries and do so with a confidence that's indistinguishable from when they get things right. There are a number of reasons for this. The AI could have been trained on misinformation; the answer could require some extrapolation from facts that the LLM isn't capable of; or some aspect of the LLM's training might have incentivized a falsehood.

But perhaps the simplest explanation is that an LLM doesn't recognize what constitutes a correct answer but is compelled to provide one. So it simply makes something up, a habit that has been termed confabulation.

Figuring out when an LLM is making something up would obviously have tremendous value, given how quickly people have started relying on them for everything from college essays to job applications. Now, researchers from the University of Oxford say they've found a relatively simple way to determine when LLMs appear to be confabulating that works with all popular models and across a broad range of subjects. And, in doing so, they develop evidence that most of the alternative facts LLMs provide are a product of confabulation.

Read 14 remaining paragraphs | Comments

Ars Technica
Report: Apple isn’t paying OpenAI for ChatGPT integration into OSes
13 June 2024 at 13:20

Report: Apple isn’t paying OpenAI for ChatGPT integration into OSes

Ars Technica

By: Benj Edwards

13 June 2024 at 13:20

On Monday, Apple announced it would be integrating OpenAI's ChatGPT AI assistant into upcoming versions of its iPhone, iPad, and Mac operating systems. It paves the way for future third-party AI model integrations, but given Google's multi-billion-dollar deal with Apple for preferential web search, the OpenAI announcement inspired speculation about who is paying whom. According to a Bloomberg report published Wednesday, Apple considers ChatGPT's placement on its devices as compensation enough.

"Apple isn’t paying OpenAI as part of the partnership," writes Bloomberg reporter Mark Gurman, citing people familiar with the matter who wish to remain anonymous. "Instead, Apple believes pushing OpenAI’s brand and technology to hundreds of millions of its devices is of equal or greater value than monetary payments."

The Bloomberg report states that neither company expects the agreement to generate meaningful revenue in the short term, and in fact, the partnership could burn extra money for OpenAI, because it pays Microsoft to host ChatGPT's capabilities on its Azure cloud. However, OpenAI could benefit by converting free users to paid subscriptions, and Apple potentially benefits by providing easy, built-in access to ChatGPT during a time when its own in-house LLMs are still catching up.

Read 7 remaining paragraphs | Comments

Ars Technica
Apple and OpenAI currently have the most misunderstood partnership in tech
11 June 2024 at 13:29

Apple and OpenAI currently have the most misunderstood partnership in tech

Ars Technica

By: Benj Edwards

11 June 2024 at 13:29

On Monday, Apple premiered "Apple Intelligence" during a wide-ranging presentation at its annual Worldwide Developers Conference in Cupertino, California. However, the heart of its new tech, an array of Apple-developed AI models, was overshadowed by the announcement of ChatGPT integration into its device operating systems.

Since rumors of the partnership first emerged, we've seen confusion on social media about why Apple didn't develop a cutting-edge GPT-4-like chatbot internally. Despite Apple's year-long development of its own large language models (LLMs), many perceived the integration of ChatGPT (and opening the door for others, like Google Gemini) as a sign of Apple's lack of innovation.

"This is really strange. Surely Apple could train a very good competing LLM if they wanted? They've had a year," wrote AI developer Benjamin De Kraker on X. Elon Musk has also been grumbling about the OpenAI deal—and spreading misconceptions about it—saying things like, "It’s patently absurd that Apple isn’t smart enough to make their own AI, yet is somehow capable of ensuring that OpenAI will protect your security & privacy!"

Read 19 remaining paragraphs | Comments

Ars Technica
Apple unveils “Apple Intelligence” AI features for iOS, iPadOS, and macOS
10 June 2024 at 15:15

Apple unveils “Apple Intelligence” AI features for iOS, iPadOS, and macOS

Ars Technica

By: Benj Edwards

10 June 2024 at 15:15

Apple unveils “Apple Intelligence” AI features for iOS, iPadOS, and macOS

On Monday, Apple debuted "Apple Intelligence," a new suite of free AI-powered features for iOS 18, iPadOS 18, macOS Sequoia that includes creating email summaries, generating images and emoji, and allowing Siri to take actions on your behalf. These features are achieved through a combination of on-device and cloud processing, with a strong emphasis on privacy. Apple says that Apple Intelligence features will be widely available later this year and will be available as a beta test for developers this summer.

The announcements came during a livestream WWDC keynote and a simultaneous event attended by the press on Apple's campus in Cupertino, California. In an introduction, Apple CEO Tim Cook said the company has been using machine learning for years, but the introduction of large language models (LLMs) presents new opportunities to elevate the capabilities of Apple products. He emphasized the need for both personalization and privacy in Apple's approach.

At last year's WWDC, Apple avoided using the term "AI" completely, instead preferring terms like "machine learning" as Apple's way of avoiding buzzy hype while integrating applications of AI into apps in useful ways. This year, Apple figured out a new way to largely avoid the abbreviation "AI" by coining "Apple Intelligence," a catchall branding term that refers to a broad group of machine learning, LLM, and image generation technologies. By our count, the term "AI" was used sparingly in the keynote—most notably near the end of the presentation when Apple executive Craig Federighi said, "It's AI for the rest of us."

Read 10 remaining paragraphs | Comments

Ars Technica
DuckDuckGo offers “anonymous” access to AI chatbots through new service
6 June 2024 at 12:39

DuckDuckGo offers “anonymous” access to AI chatbots through new service

Ars Technica

By: Benj Edwards

6 June 2024 at 12:39

On Thursday, DuckDuckGo unveiled a new "AI Chat" service that allows users to converse with four mid-range large language models (LLMs) from OpenAI, Anthropic, Meta, and Mistral in an interface similar to ChatGPT while attempting to preserve privacy and anonymity. While the AI models involved can output inaccurate information readily, the site allows users to test different mid-range LLMs without having to install anything or sign up for an account.

DuckDuckGo's AI Chat currently features access to OpenAI's GPT-3.5 Turbo, Anthropic's Claude 3 Haiku, and two open source models, Meta's Llama 3 and Mistral's Mixtral 8x7B. The service is currently free to use within daily limits. Users can access AI Chat through the DuckDuckGo search engine, direct links to the site, or by using "!ai" or "!chat" shortcuts in the search field. AI Chat can also be disabled in the site's settings for users with accounts.

According to DuckDuckGo, chats on the service are anonymized, with metadata and IP address removed to prevent tracing back to individuals. The company states that chats are not used for AI model training, citing its privacy policy and terms of use.

Read 6 remaining paragraphs | Comments

Normal view