The following articles are from the Science and Technology Guide , author Li Guojie
recently, the emergence of DeepSeek has aroused waves in the field of global science and technology, triggering extensive heated discussions and in-depth thinking from academia to industry. In 2025, the third Journal of Science and Technology Herald published the article "thinking about the development path of AI caused by DeepSeek" by academician Li Guojie, which deeply analyzed the problems related to the development path of AI behind DeepSeek, professional and inspiring, the full text is hereby presented to the readers.
1. Why does DeepSeek cause global technology shock
the emergence of DeepSeek is a new landmark event in the development history of artificial intelligence (AI). DeepSeek's user growth exceeded 0.1 billion within 7 days, setting a new world record for the user growth rate. At the same time, the share price of NVIDIA, a chip giant, plunged 17% in a single day and its market value shrank by $589 billion, setting a record for the largest loss of listed companies in the United States in a single day. The rise of DeepSeek has broken the superstition that "high computing power and high investment are the only way to develop artificial intelligence" and "integrated circuit Process Advantage = artificial intelligence technology hegemony, leading the artificial intelligence industry into a new era that focuses on algorithm and model architecture optimization, and attaches great importance to data quality and scale, and rationally improves computing power. At the same time, the rise of DeepSeek also marks that Chinese technology companies have changed from "pursuers" to "rule modifiers". In the field of artificial intelligence, which is the most concerned in the world, challenge Western hegemony in AI field with subversive innovation. Global leading artificial intelligence enterprises embrace DeepSeek one after another, highlighting its irresistible influence. Microsoft first announced to add the DeepSeek R1 model to its cloud platform Azure AI Foundry, allowing developers to build cloud-based applications and services. Companies such as Amazon cloud technology (AWS), Avida and AMD have successively announced the deployment of DeepSeek V3 and R1 models on their AI service platforms. No matter how the governments of some countries resist, how a few media are maliciously slandered, hundreds of millions of users and many companies make choices based on DeepSeek's cost performance and personal experience, and actively integrate into the DeepSeek ecosystem. DeepSeek's efficient and low-cost inference model and open-source business model will lead the new trend of artificial intelligence industry. DeepSeek's V3 and R1 models are very popular, first of all, because they have significant innovations in model algorithms and system software levels. The number of model parameters of the DeepSeek-V3 is as high as 671 billion, but due to the adoption of the self-developed hybrid expert model (MoE) architecture, each layer has 256 routing experts and 1 sharing expert in the subdivision field, only about 37 billion parameters are activated for each call, significantly reducing the training and computing costs. DeepSeek improved multi-head potential attention mechanism(MLA) reduces the key-value cache overhead, reduces the memory usage to 5% ~ 13% of other large models, and greatly improves the model running efficiency. DeepSeek-R1 model abandons the traditional supervision and fine tuning(SFT) creatively put forward group relative strategy optimization(GRPO), directly stimulate reasoning ability from the basic model through reinforcement learning, greatly reducing the cost of data labeling and simplifying the training process. DeepSeek reveals the truth that the development of inference models is simpler than imagined and can be done in all walks of life. These inventions of DeepSeek are not the original innovations proposed for the first time, but DeepSeek has made great efforts to achieve the ultimate technology through hard work, and has reached a new technological peak on the basis of the public achievements of predecessors. After the rise of the third wave of artificial intelligence, the U.S. government, leading AI enterprises and the investment community have formed a basic belief: the development of artificial intelligence requires high computing power, at present, the chip with the highest computing performance of artificial intelligence is Avida's GPU. Therefore, the United States believes that as long as the sales of GPU are controlled, it can dominate the world in the field of artificial intelligence. Trump signed the bill on the second day of his inauguration to launch the Stargate program, investing $500 billion to build the infrastructure of artificial intelligence. It can be seen that the U.S. government regards consolidating the computing power Foundation of artificial intelligence as the key to maintaining its global leadership. The leading American AI enterprises represented by Avida have one side of the real Tiger and the other side of the paper tiger. Young Chinese science and technology workers, who were not afraid of tigers, poked a hole in the paper tiger to let the world see clearly that the tiger was not that terrible. DeepSeek's shock to the world is the power to reveal the truth.
2. Has the Scaling Law met the ceiling?
In January 2020, OpenAI published a paper "The scale rule of Neural Language Models" ( Scaling Laws for Neural Language Models), proposed the scale rule: "By increasing the model size, data volume and computing resources, the model performance can be significantly improved." In the AI field, the law of scale is regarded as "axiom" by some people, commonly known as "making great efforts to make miracles". Leading enterprises such as OpenAI and the AI investment community in the United States regard it as a magic weapon to win. However, the law of scale is not the scientific law that has been verified countless times like Newton's law, but the experience of OpenAI and other companies in developing large models in recent years. From the perspective of scientific research, it belongs to a conjecture of technological development trend; From the perspective of investment, it belongs to a bet on a certain technological route. Artificial intelligence is an exploration of future technologies. There are many possibilities for technical routes, and artificial intelligence itself also has diversified goals. There are many mountains to climb on the road of exploration, and there is more than one path to climb a mountain. It is not a scientific attitude to regard a belief or conjecture as a scientific axiom. The actual effect of large model training in recent years shows that in order to achieve linear growth of large model performance, it is necessary to have a high exponential growth in model size, data volume and computing power input, doubled in a few months. From GPT-3 to GPT-4, the parameter size has increased by about 10 times, the number of GPUs used for training has increased by nearly 24 times, and the total computation has increased by nearly 70 times. The high exponential increase of any input can't last long. The speed increase of large civil aviation aircrafts and the main frequency increase of integrated circuits all stop at the appropriate time, large models should not be exceptional. People who advocate "Scaling Law" often use Richard S.Sutton, the father of reinforcement learning. The article "bitter lessons" as the basis for pursuing high computing power: "Researchers have tried again and again to improve performance through exquisite engineering design, but in the end, they all lost to the simple and crude" increase computing power "scheme. History has proved that general methods always win in the AI field." However, Sutton himself has been right in the past two years“Scaling Law” I made a profound reflection. He pointed out that although Scaling Law it is effective in improving model performance, but it is not the master key to all problems. AI systems need not only strong computing capabilities, but also continuous learning, adaptation to the environment, understanding of complex situations, etc. These capabilities are often difficult to achieve by simply increasing computing power. But now it is said that the law of scale has come to an end, and there is no basis. Compared with the complexity of neural connection in the human brain, the current artificial neural network still has a gap of at least hundreds of times. It depends on the actual effect in the future whether we can continue to expand the scale of neural networks and increase the amount of training data, and whether we can still obtain returns commensurate with the investment. However, GPT-5 has not been available for a long time, which may indicate that the effect of scale expansion is not obvious. Turing Award winner Yang Likun(Yann LeCun) and IlyaSutskever, former chief scientist of OpenAI to be frank, the law of scale has touched the ceiling. The emergence of DeepSeek has forced the AI community to seriously think about this technical development route: will we continue to invest huge sums of money in the pursuit of high computing power, or find another way to make more efforts on algorithm optimization? The advent of DeepSeek marks that the AI training mode has shifted from the extended development stage of "vigorously creating miracles" to the connotative development stage of intensive system optimization. DeepSeek's success does not deny the important role of computing power in the development of artificial intelligence. In fact, since there are more equipment for reasoning than training equipment, the computational power required for reasoning will become the main requirement. However, green development is a major principle that must be followed, and reducing the energy consumption required by artificial intelligence must be an important goal in the scientific and technological field.
3. What path should be chosen to develop "general artificial intelligence" (AGI)
"General artificial intelligence" is a vague technique that has not formed a broad consensus. Language. OpenAI universal pursuit artificial general intelligence (AGI) is one of them refers to AI's ability to deal with complex problems at the human level in many fields. There is a moravick paradox in the field of artificial intelligence: "complex problems are easy to solve, while simple problems are difficult to solve." From this perspective, artificial intelligence that can solve complex problems is not necessarily general artificial intelligence. Many people think that being able to cope with unexpected situations of designers is called "universal". Therefore, the academia of artificial intelligence pays more attention to the ability of continuous learning and self-improvement of intelligent systems. The universality of artificial intelligence is not only manifested in the processing of language, but also includes the ability to interact with the external objective world based on common sense and daily experience like human beings.
Artificial intelligence is the reproduction and transcendence of a certain aspect of human intelligence. In the field of science and technology, the so-called "universal" must be relative and have certain conditions or scope. We should understand the limitations of artificial intelligence and not blindly pursue artificial intelligence that can solve all problems. The key point is to implement relatively common artificial intelligence technologies to various industries according to actual needs, so that artificial intelligence technologies within a certain range can see actual results. The realization of general intelligence is a gradual process and will not come suddenly due to the invention of a certain technology. The universality of artificial intelligence has been significantly improved compared with the previous two waves, but in some applications, the Turing test is only a phased result, and the current technology is still far from the real general intelligence.
There is no conclusion on how to realize general artificial intelligence. Both DeepSeek and OpenAI aim to develop "general artificial intelligence", but take different paths. OpenAI believes that Scaling Law is expanding the scale of the model as much as possible. It hopes to make a general basic model first, and then "distilling" The industry vertical model that can be used in various industries, it is the road of "from general to specialized. In addition to reducing the training cost of general models, how to improve the performance and efficiency in specific fields or tasks while maintaining the generalization ability is still a problem to be solved. On the contrary, DeepSeek takes the road of "from specialized to general" artificial intelligence development, trying to carry out system-level innovation in model algorithm and engineering optimization, open up a new way to explore general artificial intelligence under limited resources. The so-called "mixed expert model" is to set small wisdom into big wisdom, and set specialized wisdom into general wisdom. The "small and fine" model leads the key development direction of artificial intelligence from enterprise-oriented to B to closer to consumer-oriented to C, from the widely covered "horizontal eating" to the in-depth study of "vertical refinement", let more small and medium-sized enterprises participate, may create more market space. However, integrating multiple dedicated models into general models also needs to solve many technical and engineering problems, such as interfaces between models, data format unification, load balancing during training, etc.
The competition between general purpose and special purpose is a common phenomenon in the development of technology. In the field of integrated circuits, there is a "grazing cycle" in which "general purpose" and "special purpose" evolve alternately for 10 years ". Which way can the artificial intelligence of "from general to specialized" and "from specialized to general" go through has to wait for the historical conclusion. Perhaps the final result is the integration of general and specialized fields. The "vertical refinement" of specialized multi-models and the "horizontal expansion" of General large models complement each other and jointly build a new industrial ecology in the era of intelligence.
4. Should the development of artificial intelligence pursue high computing power or high computing efficiency (high energy efficiency)
turing is recognized as the founder of artificial intelligence in scientific and technological circles, because he proposed a scientific hypothesis that human intelligence can be simulated by calculation. His paper implies that computing is equivalent to intelligence. So far, the achievements of artificial intelligence are almost inseparable from computing. The emergence of large models has raised the effect of computing power to an unprecedented height. We need to think seriously, is high computing power the essential requirement of artificial intelligence? The initial motivation for the development of artificial intelligence is to simulate the human brain. The human brain, which has evolved for millions of years in nature, is a computing device with extremely high computational efficiency and energy efficiency, and its power consumption is only about 20W. The extremely low power consumption of the human brain is due to the adoption of distributed analog computing. At present, the high energy consumption of computers is due to the digital computing using software and hardware separation. Simton, the founder of Deep Learning(Hinton) the professor recently proposed a new research direction of "Mortal computing", adopting the same simulation computing method as human brain, which subverts the traditional computing mode of separating hardware and software. This kind of research pursues high computational efficiency and high energy efficiency in computing. In the long run, it is the right direction for the development of artificial intelligence. After the release of DeepSeek, the team guided by Li Feifei, a Chinese scientist from Stanford University, took Alibaba Tong Yiqian based on the Qwen model, through "distilling" Google's AI inference model Gemini 2.0 Flash Thinking Experimental, combined with SFT Technology, 16 Avida H100 GPUs were used for 26min of training, and s1 model was successfully trained at a cost of less than $50 for cloud computing. Its performance exceeded Open AI o1-preview model type. This model with low training cost may not be as versatile as that of large companies, but the surprisingly low cost can achieve performance comparable to that of high-end models in some applications, it shows that there is still much room for improvement in the low cost of artificial intelligence. Low cost is the basic requirement of technology popularization. The popularization of steam engine, electric power and computer is only achieved when its cost is reduced to the public's acceptance. Artificial intelligence will certainly follow this path. At present, the blind pursuit of high computing power leads to the high cost of artificial intelligence, which hinders the large-scale popularization of artificial intelligence technology. DeepSeek is not only a technological breakthrough, but also a rule refactoring person, opening up a feasible way to develop artificial intelligence at low cost. The rise of DeepSeek shows that AI is no longer limited to simple stacking computing power, but has entered a new stage focusing on pursuing high computing efficiency and energy efficiency.
5. Why is "open source" so powerful
over the past few years, the performance of open-source big models has always been more than one generation behind that of the closed-source big models of leading enterprises. This time, DeepSeek's performance has caught up with that of the closed-source model, the confidence of the open source community has been greatly enhanced. Yang Likun, the winner of the Turing Prize, believes that "the correct interpretation of DeepSeek's rise should be that the open source model is surpassing the closed source model". This evaluation is very pertinent, because changing the AI development model is more important than the breakthrough of single technology. Although artificial intelligence represented by OpenAI is booming, most enterprises dare not give their own data to private AI platforms to generate their own vertical models, I am afraid that the submitted data will reveal my technical secrets. The reason why artificial intelligence is difficult to implement in various industries may be a fundamental reason. DeepSeek's completely open source mode has solved this problem. Now enterprises and users all over the world can download the small and refined models provided by DeepSeek to local, even if the network is disconnected, a high-efficiency vertical model can be "distilled", which truly realizes the democratization of technology. For a long time, leading AI companies in the United States have exaggerated the security risks of open-source AI and tried to curb open-source AI through supervision. In fact, open-source models are crucial to the global AI supply chain, and developing countries especially need open-source AI technologies. If the United States continues to set obstacles in this field, China is expected to occupy a central position in the open source AI global supply chain, thus enabling more enterprises to turn to the technical solutions of Chinese enterprises rather than American enterprises. The real AI competition is not only the competition of technology and model, but also the competition of ecosystem, business model and values. The open-source model allows every developer to easily call powerful AI tools and is no longer restricted by large companies. The evolution speed of AI will be significantly improved. DeepSeek's open source strategy will prove to history that in this AI competition, whoever embraces open source will win the future.
6. Whether China has the ability to lead the world in artificial intelligence
Some people say that ChatGPT is a breakthrough from 0 to 1, while DeepSeek is only an extension from 1 to N. This view does not conform to the historical track of the development of artificial intelligence. Artificial intelligence is a research field without strict definition. There is no limit of 0 to 1 between intelligence and unintelligence, and only the development process of continuously improving intelligence level. For a long time, most high-tech enterprises in the field of artificial intelligence in China have attached great importance to application innovation and business model innovation, pursuing rapid profits and rarely participating in core technology innovation. With the development of economy and the accumulation of technology, Chinese enterprises have begun to have the ability of originality. DeepSeek may be a watershed, marking that China's AI industry has begun to move from "technology with running" to "technology with running and leading. It should be admitted that china still has a gap with the united states in basic research and core technologies of artificial intelligence. Although in the field of artificial intelligence, the total number of papers published and the number of patent authorizations in China exceeds that in the United States, most of the most cited source papers come from the United States, which is also the main source country of top AI models. The artificial intelligence Index report 2024 released by Stanford University shows that in 2023, there were 61 famous AI models in the United States and only 15 in China. In recent years, China has been rapidly catching up with AI and its progress is gratifying. According to the statistics of three top machine learning conferences such as the Neural Information Processing System Conference (NIPS) from 2020 to 2024 by Japan economic news, in more than 30000 published papers, there are 8491 Chinese authors (14766 in the United States), and the number of Chinese authors has increased 8 times in the past 4 years. Artificial intelligence is different from the capital-intensive and experience-intensive integrated circuit industry. It not only needs to "burn money", but also needs to "burn brain". In essence, it is an emerging industry that combines human intelligence. Therefore, the artificial intelligence industry has obvious asymmetry. A small enterprise with more than 100 smart minds can challenge the leading enterprise with a market value of trillions. DeepSeek is only one of the potential AI enterprises in China. Recently, MIT Science and Technology Review published a report entitled "paying attention to four Chinese artificial intelligence start-ups outside DeepSeek", pointing out the rising stars(Stepfun), Model Best, Zhipu, Infinigence AI 4 enterprises also showed their technical strength and global competitiveness, which were no less than DeepSeek. After DeepSeek stood out, the story of "Hangzhou 6 little dragons" was widely spread (6 start-up AI companies including deep search, Yushu technology, game Science, deep cloud, Qunxin technology and qiangnao technology). So far, there are 52 unicorn enterprises in the field of artificial intelligence in China, accounting for about 18% of global unicorn enterprises in artificial intelligence. This shows that in the AI field, a number of innovative small enterprises in China have entered the forefront of the world and begun to show their ability to lead the world. The success of DeepSeek shows that in the development of artificial intelligence, algorithm optimization and system-level engineering optimization are indispensable, and excellent engineers play a vital role. A good engineering education system and a large team of engineers are one of China's major advantages. We should give full play to this advantage. After entering the state of running and running, don't care too much about how many months are different from the United States. You can do what you do and I can do what I do, which is better than who can find the right research direction. The younger generation is becoming the main force of scientific research. We should be confident to take the lead in the research and application of artificial intelligence in the United States.
7. How to make efforts to realize the self-reliance and self-improvement of artificial intelligence in China
to realize the self-reliance and self-improvement of artificial intelligence, we should not only rely on the top-level planning of the country and sufficient financial support, but also do a good job in the use and cultivation of talents and the construction of industrial ecology, the premise of overcoming many difficulties is to have self-confidence. The premise of DeepSeek's success is the confidence of its founder Liang Wenfeng. He said in an interview: "China's AI cannot be a follower forever. Someone must be at the forefront. OpenAI does not exist like God. They cannot lead forever." The post-80s and post-90s Chinese young people have begun to look at the Western countries in the United States. They have the courage and confidence to "dare to be the first in the world". They are the hope for China's self-reliance in science and technology. DeepSeek's talent employment mode broke the tradition, and Liang Wenfeng chose a unique employment strategy. He refused the experienced talents and chose the young people who were just young. Those who have more than 8 years of work experience in recruitment will be rejected directly; Those who have more than 5 years need to be particularly outstanding before they can be selected. The team members of DeepSeek are almost all fresh graduates or doctoral interns from top domestic universities. The real innovation often comes from those who have no burden. DeepSeek reuses the extremely enthusiastic and curious young people, rather than those who are used to using experience to find answers, this employment concept has brought amazing innovation impetus to the company, and also warned the traditional Chinese education mode and talent employment mode. To achieve self-reliance and self-improvement of artificial intelligence, the most difficult thing is to build an independent and controllable industrial ecology. The moat of Avida"Not the GPU chip itself, but the unified computing device architecture (CUDA) software piece ecology. DeepSeek has impacted the CUDA ecosystem, but has not completely bypassed CUDA. Its ecological barriers still exist. In the long run, it is necessary to develop an independent and controllable AI software tool system better than CUDA to reconstruct the AI software ecosystem. To achieve this goal requires careful planning and long-term efforts. Relevant departments should make up their minds to organize national development forces, fully mobilize the enthusiasm of upstream and downstream enterprises, and accomplish this great event. Capital investment is not the only factor that determines the success or failure of AI, but the scale of China's investment market has shrunk sharply in recent years, which is worth warning. CB Insights data show that in 2023, the US AI investment reached 67.2 billion US dollars, 8.7 times that of China's AI investment. This year, the AI investment in the United States increased by 22.1%, while the private AI investment in China decreased by 44.2%. Among them, in terms of generative AI private investment, the total investment in the United States reached 22.46 billion US dollars in 2023, while that in China was only 0.65 billion US dollars. Venture capital and private equity funds are extremely important to support the scientific and creative industry. They play the role of capital pool and guarantee for innovation. China and the United States used to keep pace with each other in the scientific and technological innovation market, but by 2023, the investment amount of China's scientific and technological innovation was only 8% of that of the United States. Although the U.S. investment community is pursuing "big computing power" and there is a certain bubble, normal financial support is a necessary condition for the development of AI. The government and the Capital Circle should work together to build a healthy financial ecology of scientific innovation and provide necessary impetus for innovation, so that more DeepSeek can appear. The formation of industrial ecology also depends on market traction. The state should encourage the promotion of AI applications on PCs, mobile phones and physical devices through policy guidance to enhance the market share of domestic GPU, CPU and domestic software. We should attach great importance to the open source strategy of chip design and large models, and strive for China to play a leading role in the open source system of global artificial intelligence. Under the condition of limited computing power, we need to give full play to the ultimate performance of hardware and explore all possible optimization space through collaborative innovation of algorithms and software. The domestic artificial intelligence model is very close to the level of the United States. We need to optimize and adapt computing power resources and artificial intelligence platforms, and strive to lead the world in the scientific research and application of artificial intelligence in China. Author Profile: li Guojie, CCF member and honorary chairman, institute of Computing Technology, Chinese Academy of Sciences, researcher, academician of Chinese Academy of Engineering, research interests include computer architecture, parallel algorithms, artificial intelligence, big data, computer network, information technology development strategy, etc..
The full text of the paper was published in the 3rd issue of Science and Technology Guide in 2025, with the original title 《 thinking about AI development path caused by DeepSeek"
this article is reprinted from Science and Technology Guide by Li Guojie