How DeepSeek-R1 was created?

This article summarizes the innovations and optimizations in DeepSeek series models for large-scale training. The breakthroughs of DeepSeek are primarily reflected in model and algorithm innovations, software and hardware collaborative optimization, and the improvement of overall training efficiency...

Full description

Saved in:

Bibliographic Details
Published in	Shenzhen da xue xue bao. Li gong ban Vol. 42; no. 2; pp. 226 - 232
Main Author	ZHANG Huimin
Format	Journal Article
Language	English
Published	Science Press (China Science Publishing & Media Ltd.) 01.03.2025
Subjects	artificial intelligence deepseek group relative policy optimization large language model mixed-precision training mixture of experts architecture multi-head latent attention mechanism multi-token prediction
Online Access	Get full text

Cover

Loading…

Abstract	This article summarizes the innovations and optimizations in DeepSeek series models for large-scale training. The breakthroughs of DeepSeek are primarily reflected in model and algorithm innovations, software and hardware collaborative optimization, and the improvement of overall training efficiency. The DeepSeek-V3 adopts a mixture of experts (MoE) architecture, achieving efficient utilization of computing resources through fine-grained design and shared expert strategies. The sparse activation mechanism and lossless load balancing strategy in the MoE architecture significantly enhance the efficiency and performance of model training, especially when handling large-scale data and complex tasks. The innovative multi-head latent attention (MLA) mechanism reduces memory usage and accelerates the inference process, thus lowering training and inference costs. In DeepSeek-V3's training, the introduction of multi-token prediction (MTP) and 8-bit floating-point (FP8) mixed-precision training technologies improves the model's contextual understanding and training efficiency, while optimizing parallel thread execution (PTX) code significantly enhances the computation efficiency of graphics processing units (GPUs). In training the DeepSeek-R1-Zero model, group relative policy optimization (GRPO) is used for pure reinforcement learning, by passing the traditional supervised fine-tuning and human feedback stages, leading to a significant improvement in inference capabilities. Overall, DeepSeek series models has achieved significant advantages in the field of artificial intelligence through multiple innovations, setting a new industry benchmark.
AbstractList	This article summarizes the innovations and optimizations in DeepSeek series models for large-scale training. The breakthroughs of DeepSeek are primarily reflected in model and algorithm innovations, software and hardware collaborative optimization, and the improvement of overall training efficiency. The DeepSeek-V3 adopts a mixture of experts (MoE) architecture, achieving efficient utilization of computing resources through fine-grained design and shared expert strategies. The sparse activation mechanism and lossless load balancing strategy in the MoE architecture significantly enhance the efficiency and performance of model training, especially when handling large-scale data and complex tasks. The innovative multi-head latent attention (MLA) mechanism reduces memory usage and accelerates the inference process, thus lowering training and inference costs. In DeepSeek-V3's training, the introduction of multi-token prediction (MTP) and 8-bit floating-point (FP8) mixed-precision training technologies improves the model's contextual understanding and training efficiency, while optimizing parallel thread execution (PTX) code significantly enhances the computation efficiency of graphics processing units (GPUs). In training the DeepSeek-R1-Zero model, group relative policy optimization (GRPO) is used for pure reinforcement learning, by passing the traditional supervised fine-tuning and human feedback stages, leading to a significant improvement in inference capabilities. Overall, DeepSeek series models has achieved significant advantages in the field of artificial intelligence through multiple innovations, setting a new industry benchmark.
Author	ZHANG Huimin
Author_xml	– sequence: 1 fullname: ZHANG Huimin
BookMark	eNotjstKw0AUQGdRwbb6B13kBxLv3HmvROqjlYJidR0md-5Iam1KUij-vaW6OnAWhzMRo123YyFmEirlUN-sX6vnSqIOFQKaChDRjsRYAkCJVvpLMRmGDYAGpdVYzBbdsbhn3q-Zv8o3WRzjUFDP8cDp9kpc5Lgd-PqfU_Hx-PA-X5Srl6fl_G5VkjTOlj4kiYHZ2wSNTSqanDI5CsEa1Og1MVuydMJpCcFhzoYoB9UEQEY1Fcu_buript737Xfsf-outvVZdP1nHftDS1uuQTOpxjTGoNEUg08mE7oUjPLWxax-AdQfSt4
ContentType	Journal Article
DBID	DOA
DOI	10.3724/SP.J.1249.2025.02226
DatabaseName	DOAJ Directory of Open Access Journals
DatabaseTitleList
Database_xml	– sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website
DeliveryMethod	fulltext_linktorsrc
Discipline	Sciences (General)
EndPage	232
ExternalDocumentID	oai_doaj_org_article_04ec3b5b55254ca98d5fc27d953867af
GroupedDBID	-03 ALMA_UNASSIGNED_HOLDINGS CCEZO CEKLB GROUPED_DOAJ
ID	FETCH-LOGICAL-c1576-89d129ee86d0b6d3a5fdfc7c996524284cee6c6ccee0252072ff5ccf93b902e23
IEDL.DBID	DOA
ISSN	1000-2618
IngestDate	Wed Aug 27 01:28:31 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	true
Issue	2
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c1576-89d129ee86d0b6d3a5fdfc7c996524284cee6c6ccee0252072ff5ccf93b902e23
OpenAccessLink	https://doaj.org/article/04ec3b5b55254ca98d5fc27d953867af
PageCount	7
ParticipantIDs	doaj_primary_oai_doaj_org_article_04ec3b5b55254ca98d5fc27d953867af
PublicationCentury	2000
PublicationDate	2025-03-01
PublicationDateYYYYMMDD	2025-03-01
PublicationDate_xml	– month: 03 year: 2025 text: 2025-03-01 day: 01
PublicationDecade	2020
PublicationTitle	Shenzhen da xue xue bao. Li gong ban
PublicationYear	2025
Publisher	Science Press (China Science Publishing & Media Ltd.)
Publisher_xml	– name: Science Press (China Science Publishing & Media Ltd.)
SSID	ssj0040343
Score	2.302569
Snippet	This article summarizes the innovations and optimizations in DeepSeek series models for large-scale training. The breakthroughs of DeepSeek are primarily...
SourceID	doaj
SourceType	Open Website
StartPage	226
SubjectTerms	artificial intelligence deepseek group relative policy optimization large language model mixed-precision training mixture of experts architecture multi-head latent attention mechanism multi-token prediction
Title	How DeepSeek-R1 was created?
URI	https://doaj.org/article/04ec3b5b55254ca98d5fc27d953867af
Volume	42
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LSwMxEA7SkxexPvBRZQ8e9JA2m-fuSXyVUlDEWuht2SSTi9CWttK_72Szgp68eArkEJIvZL75YPINIVcANdI4oCyRPH7JgYLa0gPlQYKyJneyMXF9ftGjqRzP1OxHq69YE5bsgRNwAybBCausUihlXF0WXgXHjS_xpWpThxh9kfO-xVSKwZKJtrSeMYoaoUif5oThcjB57Y_7seUyikOu-lHw_Dbsb5hluE_22pQwu0tb6ZIdmB-Qbvvo1tl16wx9c0h6o8U2ewRYTgA-6Fuebet11qR94G-PyHT49P4wom17A-pyzPJpUXokW4BCe2a1F7UKPjjjUIEoJM5CIn9ppx0OuFHODA9BORdKYUvGgYtj0pkv5nBCMgxbPHe188JJaRGvWpholMdsrr2X_pTcx_NVy-RgUUVP6WYCka5apKu_kD77j0XOyW6EPVVx9Uhns_qEC6T1jb1sbvALEo6bdg
linkProvider	Directory of Open Access Journals
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=How+DeepSeek-R1+was+created%3F&rft.jtitle=Shenzhen+da+xue+xue+bao.+Li+gong+ban&rft.au=ZHANG+Huimin&rft.date=2025-03-01&rft.pub=Science+Press+%28China+Science+Publishing+%26+Media+Ltd.%29&rft.issn=1000-2618&rft.volume=42&rft.issue=2&rft.spage=226&rft.epage=232&rft_id=info:doi/10.3724%2FSP.J.1249.2025.02226&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_04ec3b5b55254ca98d5fc27d953867af
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1000-2618&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1000-2618&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1000-2618&client=summon