"Why the super powerful open source CPU is Chinese", the Silicon Valley big V soul asked, shaking 500,000 people online onlookers

#News ·2025-01-06

图片

In these two days, a question of the overseas big V has triggered widespread heated discussion in foreign science and technology circles.

The post attracted more than 500,000 views and 5k likes within two days.

Hackernews is also buzzing about it.

图片

The GitHub Star of the "Xiangshan" processor ushered in a wave of inflation in 2 days.

图片

And this CPU is not someone else, it is the core results of the computing technology Institute of the Chinese Academy of Sciences, domestic self-developed CPU: Xiangshan.

What is hot and discussed is not only the CPU itself, but also the project behind it: a core for life.

Foreigners sigh: the Chinese are building the core project brick by brick.

Behind the nature is recognition, so the core leader of the project, Professor Bao Yungang of the Institute of Computing Research of the Chinese Academy of Sciences also expressed emotion:

This is an affirmation of the Xiangshan project.

图片

The most powerful open source processor known

To put it simply, the "Fragrant Hill" project came out of the circle a bit unexpectedly.

George Hotz, who posted the tweet, is the president of Comma AI, an open-source autonomous driving company, with a daily focus on chip hardware. Some time ago he singled out bugs in AMD's CUDA replacement.

图片

The "open source strongest" mentioned in his tweet dates back to the 2024 RISC-V European Summit, where the latest version of Xiangshan, "Kunming Lake", achieved a standardized score of 45 at 3GHz in SPECint 2006 testing.

Its performance is comparable to ARM Neoverse N2 and is the most powerful open source processor known to date.

图片

△XiangShan: Empowering Open-Source RISC-V Innovation with High Performance Processor and Agile Infrastructure

For specific comparison, please refer to the following figure:

图片

Some people say that although it is difficult to find SPECint2006 scores for the advanced x86 and ARM cores, because they both use PECint2017, the final clock rate will also have a significant impact; But because "Kunming Lake" is under 3GHz, the performance is quite good.

After the matter triggered a heated debate, Bao Yungang, a researcher at the Institute of Computing Technology of the Chinese Academy of Sciences, also issued a response on Zhihu.

图片

Bao Yungang said that the Xiangshan open source CPU project has been started for 5 years, and now with the development of the industry, it has strengthened the belief in open source CPU.

At present, the Xiangshan project has made some progress.

Performance can be comparable to ARM Neoverse N2, but also has gathered the industry landing "eight King Kong" enterprise customers.

Original answer: (Swipe up and down to see everything) :

图片

Through this latest response, the development vein of the Xiangshan project is also more clearly presented.

The launch of the Xiangshan project dates back to 2019.

Bao Yungang and his team's Dr. Tang Dan have long believed in building an open source RISC-V core mainline, like Linux, that can be widely used by industry and support academic experimentation with innovative ideas.

Therefore, with the support of the Chinese Academy of Sciences, the Institute of Computing Technology of the Chinese Academy of Sciences led the launch of the "Xiangshan" high-performance open source RISC-V processor project.

Then the team did more than a year of preparatory work, until June 2020, the official development work of Xiangshan processor began, the GitHub code warehouse was also established, and then the main physical design process was completed in Shenzhen Pengcheng Laboratory.

According to reports, the pipeline front end, back end, access pipeline, L1 Cache, L2/L3 Cache and other key codes are independently implemented by the Xiangshan team.

More than a year later, the first version of the Xiangshan processor (Yanqi Lake architecture) was released in July 2021.

The Yanqi Lake architecture is oriented towards single-core scenarios, supports the RV64GC instruction set, and reaches the frequency of 1.3GHz at the 28nm process node.

In January 2022, the Yanqi Lake chip was returned to the chip and successfully lit, which can correctly run complex operating systems such as Linux/Debian.

In December 2021, the R&D team of Xiangshan also further developed and expanded, and jointly established the Beijing Open Source Chip Research Institute with 16 institutions to carry out the product transformation of Xiangshan processor core and the follow-up architecture research and development.

图片

Two months before the first generation, the design work of the second generation of Xiangshan chip - Nanhu has also been started.

The Nanhu V1 version is geared towards dual-core scenarios, supports the RV64GCBK instruction set, and is launched in November 2023 at a frequency of 2GHz at a 14nm process node.

The South Lake V2 version, which included improvements such as MBIST, was released in April 2023 and released in October of the same year and successfully lit up Linux.

Nanhu V3 will contain more microstructural, PPA improvements, and the project is currently underway.

On August 24, 2022, the Institute of Computing Technology of the Chinese Academy of Sciences, Beijing Open Source Chip Research Institute, Tencent, Ali, ZTE, etc., established a joint research and development team to formally carry out the joint development of the third-generation Xiangshan (Kunming Lake architecture).

The design exploration of Kunming Lake and the productization of South Lake have since been established as the next two priorities of the Xiangshan project.

At the 4th RISC-V China Summit in August last year, the development board based on Nanhu was officially unveiled and successfully ran the "Cloud · Original God".

图片

△ Photo source: "Xiangshan Open source processor" public number

According to the biweekly report released by the team, in November 2023, the research and development of Kunming Lake has come to an end, and the last time this expression appeared in the biweekly report was in April last year.

After this, the biweekly report shows that each group of Kunming Lake is continuing to promote the optimization of area, timing and power consumption.

图片

However, at present, the Xiangshan processor has not yet achieved mass production.

The official said that Xiangshan will maintain a micro-structure iteration cycle and a streaming cycle of about half a year, and in addition to the micro-structure, it also hopes to explore and establish an agile development process for high-performance processors.

Official documents detail the microstructure of Fragrant Hills

From a technical point of view, the Xiangshan processor uses Chisel hardware description language, and the microarchitecture adopts the out-of-order six launch structure and the separate design of the memory access subsystem.

The research and development team separated the access subsystem, including two load pipelines, two store addr pipelines, two store data pipelines, and independent load queue and store queue, store buffer and so on.

At present, the technical document of Nanhu Microarchitecture has been fully published, and the overall structure is as follows:

图片

Specifically, the front-end pipeline of Xiangshan processor includes branch prediction unit, finger fetching unit, instruction buffer and other units, and sequential finger fetching.

Nanhu architecture adopts a decoupling of branch prediction and instruction cache. The branch prediction unit provides the request and writes it to a queue, which sends it to the fetch unit and sends it to the instruction cache.

The extracted instruction code is preliminatively checked for branch prediction errors through pre-decoding, and the prediction pipeline is washed in time. The checked instruction is sent to the instruction buffer and transmitted to the decoding module, and finally the back-end instruction supply is formed.

图片

The back-end includes decoding, renaming, reordering buffer, retention station, integer/floating-point register heap, and integer/floating-point arithmetic unit.

The pipeline back end of the processor is responsible for the renaming and out-of-order execution of instructions.

As shown in the following figure, the back end of Xiangshan processor (Nanhu) can be divided into four parts: CtrlBlock, IntBlock, FloatBlock and Memblock.

CtrlBlock is responsible for decoding, renaming and dispatching instructions; IntBlock, FloatBlock and MemBlock are responsible for out-of-order execution of integer, floating point and memory access instructions respectively.

(VectorBlockworlds is also added to the back end of the third-generation Xiangshan processor, which is responsible for vector processing)

图片

The MemBlock of Xiangshan processor contains the memory access pipeline and queue in the core, and the first-level data cache tightly coupled with the memory access pipeline.

There are two load pipelines, two separate sta pipelines and two std pipelines, and the load and store pipelines are responsible for maintaining the sequence information of the access instructions, respectively.

图片

The latest Kunming Lake architecture does not yet have detailed technical documentation, but the team has published the overall architecture diagram.

It can be seen that its structure is similar to that of Nanhu on the whole, but the specific implementation of each Block at the back end has also changed a lot, and a Block of processing vectors has been added, and the cache capacity of some links has also been improved.

图片

In terms of open source protocol, Xiangshan adopts the Magnolia Loose License Version 2, insists on open design source code and process, and welcomes contributions from the community.

China is building its core engineering capacity brick by brick

The "Fragrant Hill" project unexpectedly turned red, and overseas netizens gave the whole anxiety.

In the comments, others were quite certain: this means that China is solving basic hardware problems.

While Silicon Valley is still funding a wave of hardware start-ups, China is building core engineering capabilities, brick by brick.
……
Hard problems are the key to attracting real talent.

图片

Some people say that the best chip architects in the United States are at Nvidia and Apple, and no one is engaged in open source.

图片

One even went so far as to ask, if logic and math are China's strengths, how will that affect the future of computing?

图片

Some people have captured the "one core for life" plan:

图片

The project mentioned here is actually an initiative launched by the University of Chinese Academy of Sciences in 2019.

To put it simply, the goal is to let undergraduates lead the design and implementation of a 64-bit RISC-V processor SoC chip, which can successfully run the Linux operating system and the UCAS-Core teaching operating system written by students.

The first batch of students participated in the program was only 5, and after several years, the scale has gradually expanded to more than 6,000 people.

The core goal of this program is to break through the traditional curriculum boundaries under the concept of "open source sharing" and shorten the cycle of talents from the training stage to the investment in scientific research and industry.

A similar model can be seen in the Mead-Conway style of training that was popular at MIT in the last century, which also focused on students designing and manufacturing chips through the whole process. Later, many students took their course design to Silicon Valley to start businesses.

In short, in the beginning of 2025, domestic open source began to brush the screen frequently.

When discussing "Fragrant Hills", some people said:

And don't forget DeepSeek.

China seems to be doing more and more open source work.

图片

TAGS:

  • 13004184443

  • Room 607, 6th Floor, Building 9, Hongjing Xinhuiyuan, Qingpu District, Shanghai

  • gcfai@dongfangyuzhe.com

  • wechat

  • WeChat official account

Quantum (Shanghai) Artificial Intelligence Technology Co., Ltd. ICP:沪ICP备2025113240号-1

friend link