DeepSeek: how a small Chinese AI company is shaking up US tech heavyweights
DeepSeek: how a small Chinese AI company is shaking up US tech heavyweights"
- Select a language for the TTS:
- UK English Female
- UK English Male
- US English Female
- US English Male
- Australian Female
- Australian Male
- Language selected: (auto detect) - EN
Play all audios:
Tongliang Liu ne travaille pas, ne conseille pas, ne possède pas de parts, ne reçoit pas de fonds d'une organisation qui pourrait tirer profit de cet article, et n'a déclaré aucune autre
affiliation que son organisme de recherche.
Chinese artificial intelligence (AI) company DeepSeek has sent shockwaves through the tech community, with the release of extremely efficient AI models that can compete with cutting-edge
products from US companies such as OpenAI and Anthropic.
Founded in 2023, DeepSeek has achieved its results with a fraction of the cash and computing power of its competitors.
DeepSeek’s “reasoning” R1 model, released last week, provoked excitement among researchers, shock among investors, and responses from AI heavyweights. The company followed up on January 28
with a model that can work with images as well as text.
So what has DeepSeek done, and how did it do it?
In December, DeepSeek released its V3 model. This is a very powerful “standard” large language model that performs at a similar level to OpenAI’s GPT-4o and Anthropic’s Claude 3.5.
While these models are prone to errors and sometimes make up their own facts, they can carry out tasks such as answering questions, writing essays and generating computer code. On some tests
of problem-solving and mathematical reasoning, they score better than the average human.
V3 was trained at a reported cost of about US$5.58 million. This is dramatically cheaper than GPT-4, for example, which cost more than US$100 million to develop.
DeepSeek also claims to have trained V3 using around 2,000 specialised computer chips, specifically H800 GPUs made by NVIDIA. This is again much fewer than other companies, which may have
used up to 16,000 of the more powerful H100 chips.
On January 20, DeepSeek released another model, called R1. This is a so-called “reasoning” model, which tries to work through complex problems step by step. These models seem to be better at
many tasks that require context and have multiple interrelated parts, such as reading comprehension and strategic planning.
The R1 model is a tweaked version of V3, modified with a technique called reinforcement learning. R1 appears to work at a similar level to OpenAI’s o1, released last year.
DeepSeek also used the same technique to make “reasoning” versions of small open-source models that can run on home computers.
This release has sparked a huge surge of interest in DeepSeek, driving up the popularity of its V3-powered chatbot app and triggering a massive price crash in tech stocks as investors
re-evaluate the AI industry. At the time of writing, chipmaker NVIDIA has lost around US$600 billion in value.
DeepSeek’s breakthroughs have been in achieving greater efficiency: getting good results with fewer resources. In particular, DeepSeek’s developers have pioneered two techniques that may be
adopted by AI researchers more broadly.
The first has to do with a mathematical idea called “sparsity”. AI models have a lot of parameters that determine their responses to inputs (V3 has around 671 billion), but only a small
fraction of these parameters is used for any given input.
However, predicting which parameters will be needed isn’t easy. DeepSeek used a new technique to do this, and then trained only those parameters. As a result, its models needed far less
training than a conventional approach.
The other trick has to do with how V3 stores information in computer memory. DeepSeek has found a clever way to compress the relevant data, so it is easier to store and access quickly.
DeepSeek’s models and techniques have been released under the free MIT License, which means anyone can download and modify them.
While this may be bad news for some AI companies – whose profits might be eroded by the existence of freely available, powerful models – it is great news for the broader AI research
community.
At present, a lot of AI research requires access to enormous amounts of computing resources. Researchers like myself who are based at universities (or anywhere except large tech companies)
have had limited ability to carry out tests and experiments.
More efficient models and techniques change the situation. Experimentation and development may now be significantly easier for us.
For consumers, access to AI may also become cheaper. More AI models may be run on users’ own devices, such as laptops or phones, rather than running “in the cloud” for a subscription fee.
For researchers who already have a lot of resources, more efficiency may have less of an effect. It is unclear whether DeepSeek’s approach will help to make models with better performance
overall, or simply models that are more efficient.
Trending News
Philippines launches naval drills with allies as regional tensions simmer | WTVB | 1590 AM · 95.5 FM | The Voice of Branch CountyBy Mikhail FloresSUBIC, Philippines (Reuters) – The armed forces of the Philippines, the United States and four other co...
Pankhurst statue given Grade II* listing to mark centenary of votes for women - GOV.UKA statue of Emmeline Pankhurst that overlooks Parliament has been upgraded to Grade II* today to commemorate the centena...
New app and lecture series to promote egg sales - farmers weeklyAndrew Joret, British Egg Industry Council chairman: “… it’s important that student caterers are given a comprehensive u...
The taquero hero’s journey that led Jorge ‘Joy’ Alvarez-Tostado to create Tacos 1986A man will change the course of his life for the love of a pretty girl. At least that’s the way it worked for Jorge Humb...
Midas share tips: 1spatial is the digital maps expertBy JOANNE HART, FINANCIAL MAIL ON SUNDAY Updated: 11:34 EDT, 7 March 2021 Chancellor Rishi Sunak's Budget was clear...
Latests News
DeepSeek: how a small Chinese AI company is shaking up US tech heavyweightsTongliang Liu ne travaille pas, ne conseille pas, ne possède pas de parts, ne reçoit pas de fonds d'une organisation qui...
Printed batteries to be rolling out before year's endUsually with cool technologies like this, it’s all being done in one guy’s lab at University of BFE, and they’ve got to ...
Maglev joystick developed at Carnegie MellonForce feedback plus six dimensions of movement — I like what I’m hearing so far. This yet-to-be-named device has been in...
Future of mining may be on the ocean floorMining for minerals such as copper could become more efficient and cause fewer environmental problems if miners look for...
Royal free equips nurses with bodycams to tackle violence | nursing timesA major London trust has introduced body-worn cameras for its senior nurses to try and curb high levels of violence and ...