How the small Chinese AI start-up DeepSeek shocked Silicon Valley


A small Chinese artificial intelligence lab stunned the world this week by revealing the technical recipe for its cutting-edge model, transforming its lone leader into a national hero who defied U.S. attempts to halt China’s high-tech ambitions .

DeepSeek, founded by hedge fund manager Liang Wenfeng, released its R1 model on Monday, explaining in a detailed article how to build a large language model on a bootstrapped budget that can automatically learn and improve without human supervision.

US companies including OpenAI and Google DeepMind have pioneered the development of reasoning models, a relatively new area of ​​AI research that attempts to match models to human cognitive abilities. In December, San Francisco-based OpenAI released the full version of its o1 model but kept his methods secret.

The R1 version of DeepSeek has sparked a frenzied debate in Silicon Valley over whether better-resourced U.S. AI companies, including Meta and Anthropic, can defend their technical lead.

During this time, Liang became a center of national pride in his country. This week he was the only one AI leader selected to attend a publicized meeting of entrepreneurs with the country’s second most powerful leader, Li Qiang. Entrepreneurs were asked to “focus their efforts on breaking through key core technologies.”

In 2021, Liang began purchasing thousands of Nvidia graphics processing units for his AI side project while managing his High-Flyer quantitative trading fund. Industry insiders saw it as the eccentric actions of a billionaire looking for a new hobby.

“When we first met him, he was a very nerdy guy with a terrible hairstyle who was talking about building a cluster of 10,000 chips to train his own models. We didn’t take it seriously,” said one of Liang’s business partners.

“He couldn’t express his vision other than to say: I want to build this, and it will be a game-changer. We thought this was only possible with giants like ByteDance and Alibaba,” the person added.

Liang’s status as an outsider in the field of AI was an unexpected source of strength. At High-Flyer, he built a fortune using AI and algorithms to identify patterns that could affect stock prices. His team became adept at using Nvidia chips to make money trading stocks. In 2023, he launched DeepSeek, announcing his intention to develop AI on a human scale.

“Liang built an exceptional infrastructure team that really understood how the chips worked,” said one of the founders of a rival LLM company. “He brought his best people with him, from the hedge fund to DeepSeek.”

After Washington banned Nvidia from exporting its most powerful chips to China, local AI companies have been forced to find innovative ways to maximize the computing power of a limited number of local chips – a problem which Liang’s team already knew how to solve.

“DeepSeek engineers know how to unlock the potential of these GPUs, even if they are not cutting edge,” said an AI researcher close to the company.

Industry experts say DeepSeek’s focus on research makes it a dangerous competitor because it is willing to share its findings rather than protect them for commercial gain. DeepSeek has not raised money from outside funds or taken significant steps to monetize its models.

“DeepSeek works like early DeepMind,” said an AI investor in Beijing. “It’s purely research and engineering focused.”

Liang, who is personally involved in DeepSeek’s research, uses proceeds from his hedge fund trades to pay top salaries for top AI talent. Alongside TikTok owner ByteDance, DeepSeek is known for offering the highest compensation available for AI engineers in China, with staff based in offices in Hangzhou and Beijing.

“The DeepSeek offices resemble a university campus for serious researchers,” the business partner said. “The team believes in Liang’s vision: to show the world that Chinese people can be creative and build something from scratch.”

DeepSeek and High-Flyer did not respond to a request for comment.

Liang presented DeepSeek as a purely “local” company, made up of doctors from top Chinese schools, Beijing, Tsinghua and Beihang universities rather than experts from American institutions.

In an interview with the national press last year, he said his core team “didn’t have people returning from abroad. They are all local. . . We need to develop the best talent ourselves.” DeepSeek’s identity as a purely Chinese LLM company has earned it plaudits at home.

DeepSeek claimed to have used just 2,048 Nvidia H800s and $5.6 million to train a model with 671 billion parameters, a fraction of what OpenAI and Google spent to train models of comparable size.

Ritwik Gupta, an AI policy researcher at the University of California, Berkeley, said recent models released by DeepSeek demonstrate that “there is no gap when it comes to AI capabilities.” “.

“The first person to train models has to spend a lot of resources to get there,” he said. “But the second mover can make it happen cheaper and faster.”

Gupta added that China has a much larger pool of systems engineers than the United States, who know how to make the most of computing resources to train and run models cost-effectively.

Industry insiders say that while DeepSeek has shown impressive results with limited resources, it remains an open question whether it can continue to compete as the industry evolves.

Returns at High-Flyer, its big backer, are lagging in 2024, which a person close to Liang blamed on the fact that the founder’s attention was primarily focused on DeepSeek.

Its American rivals are not sitting idly by. They are building mega “clusters” of Nvidia’s next-generation Blackwell chips, creating computing power that threatens to once again create a performance gap with Chinese rivals.

This week, OpenAI said it was create a joint venture with Japanese company SoftBank, nicknamed Stargate, planning to spend at least $100 billion on AI infrastructure in the United States. Elon Musk’s xAI is massively expanding its Colossus supercomputer to hold more than a million GPUs to help train its Grok AI models.

“DeepSeek has one of the largest advanced computing clusters in China,” said Liang’s business partner. “They have enough capacity for now, but not for long. »

Additional reporting by Wenjie Ding in Beijing