龙芯3A6000深度评测:国产CPU的希望之星?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
What we are going to evaluate today is a very special CPU. This is Loongson's 3A6000. It is a domestically produced CPU that is completely independently developed and uses independent IP. Even the instruction set is independent. I believe many students have been looking forward to it for a long time and should be very curious about this 3A6000. What level is it? Can the key indicators of performance and power consumption be close to those of major international manufacturers such as Intel and AMD? What is its software ecosystem like? In today’s video, we are going to talk about the Loongson 3A6000. But before I talk to you about the 3A6000, I think it is necessary to talk about the “Loongson” first. For students who don’t know the technology circle, you may have heard a lot of this kind of “what core”. ", "What core", "What core" are the names, right? I feel a little numb. Is this "Godson" just another chip maker? If you think so, you are wrong. It is not an exaggeration to say that Loongson is the ancestor of domestic CPUs. The Loongson project was first born at the Institute of Computing Technology, Chinese Academy of Sciences. They were already developing the first product as early as 2000. In 2002, it used the MIPS III instruction set. The successful launch of Loongson 1 was the first general-purpose CPU independently developed by China. In subsequent years, Loongson would almost launch some new products for use in all walks of life. In 2009, Loongson 3 3A1000 came out . This processor uses a four-shot random processor. Loongson also began to enter the quad-core era. In 2015, Loongson launched the follow-up 3A2000 processor, 2017 3A3000 and 2019 3A4000. During this period, Loongson continued to improve the microarchitecture. The upgrade process has made significant progress , but the biggest change for Loongson is that before 2020, Loongson processors have been expanding MIPS licensing based on the MIPS instruction set. Although it is relatively open compared to ARM , on the one hand MIPS Its fate is very turbulent. Neither official maintenance nor future improvements can be counted on. This is something without a future. Loongson can only rely on itself to continuously expand. On the other hand , for Loongson, which wants to be independent and controllable. If you want to drastically modify the design, if you want to build your own software ecosystem, it is best not to use other people’s stuff. So Loongson has been considering using independent instruction sets for a long time. In 2020 , Loongson’s independent instruction set-LoongArch finally met the world. Then in 2021, the first LoongArch-supported processor 3A5000 officially debuted. The Loongson ecosystem officially entered the Dragon architecture era. Today’s 3A6000 is the follow-up product of 3A5000. The second generation processor uses the LoongArch instruction set. In order to better understand the 3A6000 I decided to send this sample to a friend to polish and polish it to see what its internal structure is like. To make a die shot, we can remove the top cover of the CPU and then take out the core , slowly physically polish it and then chemically corrode it. Fortunately , our friend was very patient and finally allowed us to see the true appearance of the 3A6000. It is a chip with a size of about 116mm², which is not a large area. Compared with the 3A5000, it is even reduced by nearly 20% . Such a small area means that it is the same as the 3A5000. A Wafer can cut out more chips , which reduces costs for Loongson. The four you see in this chip are its four CPUs . Although the chip area is much smaller , the CPU core is This new generation of LA664 CPU core is 34% larger than the previous generation. Compared with the LA464 on the 3A5000, this new generation of CPU core is actually much wider. The front-end decoder has been increased from 4 to 6 wide. The ROB buffer has increased sharply from 128 to 256. The architecture is almost ushered in. The new core also adds support for hyper-threading. The 3A6000 has four cores and eight threads. At the same time , the L2 cache for each core has doubled . However, the L3 cache size is still the same as the 3A5000 and has not been increased . The L3 seems to be 4MB independent for each core rather than shared. This may have some impact. Overall, the core size of 3A6000 is close to the size of AMD Zen3. This is a serious high-performance processor architecture , and it The matched memory controller supports dual-channel DDR4 128-bit wide memory and can currently run a memory frequency of 3200MHz. One of the biggest problems with the 3A6000 is that its CPU frequency cannot reach the official calibration maximum of 2.5GHz . In fact, there is currently no solution. Because 3A6000 uses the domestic 12nm process for independent and controllable reasons. This 12nm is essentially a 14nm node. Compared with the current most advanced process level, there is a gap. If the CPU wants to run a high-frequency process, it will be an insurmountable hurdle. We can only hope that the independent technology will continue to improve. However, the ASUS motherboard we used actually provides some overclocking possibilities. Later, we will also try to overclock the frequency to see where the limits of this Godson are for the final desktop computer. The terminal is different from the PC we are familiar with. The 3A6000 has a BGA soldered on the motherboard and does not use a slot. This is reasonable. After all, Loongson’s target customers, various government and enterprise institutions, are unlikely to replace the CPU at will. Loongson also provides this motherboard. A 7A2000 bridge chip is provided to match the 3A6000. The 7A2000 includes peripheral processing such as PCI-E, USB, and SATA. It can provide 32 PCI-E 3.0 channels and communicate with the CPU through the HT3.0 bus . The most important thing is It contains a self-developed GPU - LG110. Yes, the Loongson platform also has integrated graphics. The motherboard also provides an HDMI interface that supports up to 1080P60Hz or 2K30Hz output. However, the performance of its integrated graphics is basically comparable to glmark2 es2, which scores 400 points. It is at the same level as Intel's Z8300 core display or Qualcomm's Snapdragon 800 core display. More importantly, this integrated display only supports the ancient standards of OpenGL2.1 and ES2.0 , so this time we plugged in an RX580 independent display. Why use RX580 to get the best experience ? Because Loongson has requirements for graphics card compatibility, for the official system, they only make open source drivers for A cards for RX580 and below. A cards that support RDNA and all N cards cannot be used. Some third-party systems in the new world can. We have started to adapt RDNA2 , but this requires not only system compatibility but also motherboard compatibility . As for the N card, there is no need to think about it. After all, everyone knows that the NV and Linux open source communities are incompatible, right? Anyway, the most usable graphics card we have currently is the 580. Not only does the graphics card have to be considered for compatibility, Loongson also has strict compatibility requirements for memory. A lot of the memory we have either doesn't light up or runs unstable at the moment . The one with the best compatibility that I have tried is Unisoc Micro. This pair basically ran 3200 without any problems. So let’s go back to the 3A6000 CPU. I believe everyone is curious about what performance it has? What about energy efficiency performance? Then let’s stop talking nonsense and just run the computer industry standard - SPEC CPU 2017 to find out. Here we have to pay attention to two things. The first is single-core IPC , which is the performance at the same frequency. This can tell us how well Loongson's architecture design is doing. The second is absolute performance , which determines the end user experience . Let's take a look at IPC first. We also fixed 2.5GHz and ran the SPEC single-core GCC13 compiler. The result is like this. Wow, the improvement of 3A6000 compared to the previous generation is indeed quite large. In just two years, the integer is strong. 46% higher and 64% higher floating point. This progress is really very fast. How does it compare with the two major international manufacturers IA? Loongson also surprised us. Even running at 2.5GHz, the IPC of the 3A6000 architecture is close to the level of AMD Zen3. In fact, in terms of integer performance alone, it even exceeds Zen3. The integer IPC is close to Zen4 and Intel's 14th generation Core. There is absolutely no problem with the international advanced level. In contrast, Loongson's floating point performance still has a relatively large gap. Its floating point IPC is far from being able to beat Zen3 floating point performance. In fact, it is greatly affected by the memory cache link of the CPU. However , for consumer-level CPUs, the performance is indeed more focused on integer operations. It can do integers first to at least satisfy the daily use of government and enterprise customers. Floating point is mainly used for some more complex computing applications . Take your time. Anyway, from the IPC point of view, the new architecture of Loongson's 3A6000 is indeed very amazing. Of course, I also mentioned earlier that it is limited by the manufacturing process. Loongson's frequency cannot be increased and it can only run at 2.5GHz . This is its maximum performance. The problem is that if Intel and AMD both run at silent frequencies, that is, above 5GHz, then Loongson will of course be far behind. In fact, there is a certain gap between 3A6000 and i3 10100 when compared to single-core silent frequencies. His performance is different from that of i3 10100. The i3 9100 or i7 4790K that lacks the turbo frequency are on the same level. Having said that, I still have a very curious question of the century. I believe you are also curious about which is the stronger large core between the Loongson 3A6000 and the Kirin 9000S ? Is this a duel between domestically produced and self-developed products? Then we ran it down to SPEC 2017. It seems that each has its own merits. Even if the frequency is as low as 0.1GHz, Loongson's integer is still stronger , but in terms of floating point, Kirin's Taishan core is much stronger. Anyway, this is just a picture to satisfy everyone. After watching the good performance of IPC in Arena Desire, let’s also take a look at the absolute performance of multi-core. I have to say that the 2.5GHz frequency is still holding back a lot. Loongson 3A6000 runs SPEC multi-core , and its performance is still far behind that of i3 10100. If compared with the 12th generation i3, it is far behind. The multi-core 3A6000 is actually closer to Intel's current entry-level processor Intel 300, which is the new Pentium. Compared with its previous generation 3A5000, the improvement is of course huge , but there are also some strange things. It stands to reason that with the addition of hyper-threading and multi-core in this generation, the gap will be wider than that of single-core. The integer part is indeed like this . The performance of 3A6000 has increased by 71% compared to the previous generation, which is very good. However, the floating-point multi-core has only improved by 60% , not even single-core. This is obviously wrong. There are too many hyper-threads. Why can't multi-cores be used? We must have encountered a bottleneck , so we are also wondering if it is caused by the L3 cache. After all, the core size is so much larger but there is no increase in the L3 cache , and L3 is still a non-shared design. When running high-pressure projects such as multi-core firepower, the memory cache It is easier for some bottlenecks to affect the performance of CPU floating point. This is a possibility we speculate. However, generally speaking, the performance of Loongson 3A6000 is suitable for its target customers , which are the offices of major government and enterprise companies. It is indeed enough. In addition to performance, we also need to talk about power consumption. How is the energy efficiency of Loongson? In fact, I didn't have any expectations for this. After all, the CPU core of the 3A6000 is much wider than the previous generation, and the architecture design is larger. At the same time, the process has not been upgraded and iterated, right? This power consumption is unlikely to be low. The actual measurement is in line with our guess. Compared with the 3A5000 that also runs 2.5, the power consumption of the 3A6000 has indeed increased a lot. Although the power consumption of the 40-plus W CPU Package is not that high , This energy efficiency is still far behind Intel's 10th generation i3 , but it is actually better than I expected because compared to 3A5000 , at least the performance improvement is far greater than the power consumption improvement. Loongson is definitely still We have worked hard to optimize the design , so everyone knows that there is no end to the troubles here in Geek Bay. How can we let this Godson go by just running the silent frequency? Right? We also overclocked the 3A6000 today. We used an ASUS motherboard this time, so we followed the ASUS tradition and provided an overclocking option in the BIOS. After pressurizing 1.2V, we can overclock to a frequency of 2.8GHz under air cooling . Even though it only increased by 0.3GHz , its base frequency is very low. In proportion, it still increased a lot, right? After overclocking, let’s run SPEC17 single-core again. The performance improvement is quite obvious . However, air-cooling overclocking alone seems not enough. How about we try liquid nitrogen overclocking? Is it okay to completely squeeze out the limit of Loongson? This time I invited a buddy, Uncle Tony, who really likes playing with liquid nitrogen. Dangdangdang , we had a hard time playing around with this 3A6000. This time, the liquid nitrogen Tony brought was enough to lower the CPU temperature to a temperature close to minus 200 degrees. It is true. It awoke the CPU from freezing , and Tony came here with a lot of useful information. As an ASUS insider, he brought us a secret BIOS that can unlock higher overclocking space. We quickly flashed the BIOS to check the physical condition. The result is super. After much research, I discovered that such a low temperature of liquid nitrogen may not make sense because the IMC memory controller of this CPU is very, very sensitive. After exceeding the CPU frequency, our temperature cannot be too low, otherwise the IMC will not be able to maintain stability. The CPU can reach around 3.1GHz under liquid nitrogen, but we found that at this time, as long as the temperature is below minus 40 degrees, the system will get stuck. If the temperature is above minus 40 degrees, the machine may not even be able to pass the self-test. After all, the liquid Nitrogen overclocking requires cooling throughout the entire process. This seems to be an unsolvable problem. At first we had no clue where the problem with the CPU was. As a result, by chance, we replaced the 3200 memory with low-frequency memory and it worked. If you pull this temperature threshold down a little and replace the 2400 memory, the CPU can withstand the low temperature of minus 90 degrees and start up smoothly. However, as long as the memory frequency is a little higher, such as using a 2666 CPU, it will be difficult to maintain stability , so we know It turns out that after the 3A6000 reaches its limit, its bottleneck is the IMC memory controller in the CPU. The highest we can run is 3.1GHz main frequency with 2400 memory. In this super smoky environment, the entire SPEC process is several times. We were determined not to finish the three-hour test , so we ran SPEC's 525 , which is the x264 sub-item. This ultra-fast frequency can run 49.6, which is 23% better than 2.5G. This improvement is still quite big now . This result finally exceeds that of the tenth generation i3. At this time, the CPU power consumption has reached more than 70W, nearly doubled. However, it is obvious that the overclocking potential of the CPU core is restricted by other factors. This overclocking bottleneck is no longer in the core , otherwise it should be It can go even higher. I wonder if everyone is satisfied with the performance of 3A6000? In fact, I think performance is not a problem that needs to be worried about. After this CPU overhaul, the performance is actually fully usable. But I think the software ecosystem is still what everyone is most concerned about, right? How is your experience using various software on the Loongson platform? I also spent a lot of time to feel it, and then I found that the Loongson ecosystem is still a bit complicated. First of all, the switch from MIPS to Dragon architecture three years ago has gone through an ecological restart. This is obvious. The instruction set has changed. The entire ecosystem must have changed. You need to rebuild it, right? But the reboot that just came out of Dragon Architecture is not thorough enough. We still need to go through a new reboot. This is what the "old world" and "new world" (ABI1.0-ABI2.0) mean? It should be said that Loongson was quite decisive in abandoning MIPS at that time. Therefore, many things in the 3A5000 and Dragon architecture seemed to the outside world to be suddenly introduced. For example, the software ecosystem for the first launch of the Dragon architecture at that time was developed internally by Loongson and then released together with the products. These software ecosystems we call the "old world" (ABI1.0) are relatively not open. Later, the Linux kernel officially introduced support for the Dragon architecture in version 5.19. At this time, we discovered that the cars previously built by Loongson behind closed doors There are more or less problems. If you want to connect with the Linux open source community, many standards will have to be changed , so there will be a "new world" (ABI2.0). So you will find that the same 3A6000 has two different software ecosystems that are not direct to each other. Compatibility with which ecosystem you are using first depends on which version of Linux system you are using. For example, the officially supported Loongnix, Kirin system, and Tongxin are all still in the old world , while Arch Linux (5.19 and above) is commonly used in the open source community. Kernel version) is already a new world. Secondly, it also depends on which world the specific software is developed on. Does this sound like a headache? In fact, the main headache is that developers and users may not necessarily feel the difference. In fact, Loongson itself knows very well that this matter needs to be rectified quickly , so they are also promoting the migration to the new world. The official system will be upgraded soon , and two days ago There is also a compatibility layer of liblol to facilitate the new world Linux to run old world software. After talking about the pitfalls of the new and old worlds, I have to talk to you about a very important thing, which is to translate the installed capacity of Loongson in order to attract developers to develop. Native software is still relatively difficult , so Loongson attached great importance to translation at the beginning of hardware design. It added some special instructions and dedicated hardware design to the CPU to accelerate binary translation. At the software level, it also designed Loongson's proprietary translation layer LATX, which is LoongArch to x86. The abbreviation of LATA is also used to translate ARM software , so Loongson can actually run x86 and ARM programs. There is a basic guarantee in the software ecology. Of course, if there is translation, there will definitely be losses. We also tested the native ones separately. The decompression performance of the translated version of 7-Zip is pretty good. The translated version has about 76% of the efficiency of the original version, but the compression loss is relatively large. After translation, it is only half as efficient as the original version. The translation efficiency of LATX is not as good as Microsoft and Apple's ARM translation layer , but it is already considered good. Not bad, much better than I expected. However, the translation efficiency will also vary depending on the software. If you run more complex software such as Cinebench, the results will be very different . We ran Cinebench R15 3A6000 multi-core and only had more than 240 points , which was not even as good as i3 2100. The gap between the strong native 9th generation i3 and the translated 2nd generation i3 is really a bit big. Fortunately, LATX is always being updated. I still remember that when I used it for the first time in December, R15 could not even be opened and could only run the ancient R11. .5 Now after updating the 1.4.4 version of LATX, at least R15 can run normally. The higher version still cannot be entered , but it can be said that the future is promising. Return to the user side of the Loongson platform. What is the current software experience like? ? If you are using the official Loongnix system of Loongson, it actually provides a Loongson Application Cooperative. Many commonly used software can be found here. However, even the officially supported applications here are not all natively developed. There are mainly four types. It is relatively easy to transplant open source software like OBS, VLC, HandBrake, etc. to Dragon architecture , and they can all run natively . The second type is true native development companies that have specially adapted it for Loongson, such as QQ, Tencent Conference, DingTalk, Baidu Netdisk Sunflower Remote, Tinder, Kingsoft WPS, etc. , including some domestic industrial software such as CAD, these apps have all made native dragon architecture versions . The third one does not have native adaptation but can use x86. Software translated from the Linux version , such as NetEase Cloud Music, QQ Music, Feishu, Mind Map, and many x86 software developed by foreign companies, can run relatively smoothly as long as it is not too large and is translated using 3A6000 . The last one is also The most disgusting thing is the Chosen Son WeChat. Oh, this WeChat is the most special because WeChat not only has not adapted to Loongson, it has not even released a Linux version. This means that we must apply two layers of translation layers. First use LATX to translate x86 and then use it. Wine translates the Windows version of WeChat to run it. The experience is simply shit. Not only will it freeze after a long time , but the interface is also prone to bugs. For example, I read an article on a public account and then adjusted the window size . Look. That's a problem, right? So can’t a software that is used by the most people in the country increase investment in R&D? This experience can really only be said to be usable in the future. Of course, even the actual use of the software that can be used here is varied , especially those professional software such as blender. You can also find it in this application cooperative. After all, blender is open source. It is not surprising that it is compiled on the Loongson platform. However, there is only an ancient version of 2.7 available. You still have to compile it yourself or find a newer version compiled by netizens, blender 4.0. I tried it a little and it worked at first glance. There is no problem with a little operation , but here comes the problem. You can only edit the project but not render it because the cycles renderer does not support Loongson. It should be specially adjusted before the software can be used , but the specific functions may not be usable. This situation is quite common in transplanted software. In the end, the really useful software on Loongson is the software developed for its native adaptation . I also tried the experience of using Loongson for development. First of all, our most commonly used IDE JetBrain Family Bucket is based on Java, and Loongson provides the Dragon architecture version of OpenJDK. Therefore, the latest version of JetBrain Family Bucket runs natively on the Dragon architecture with this efficiency. It’s full. If you want to write Python, whether it is the official system or the version of the Python interpreter that comes with UOS, it is version 3.7 and it is also native. Although it is a bit old , it is suitable for daily writing of a crawler. There will be no experience difference in terms of files or anything like that. If you want to use the features of the new version, you can also download the source code of Python and compile it yourself. However, there is a problem with programming, that is, it is impossible to reinvent the wheel for everything, right? ? You must quote the results of some third-party software, and here comes the pitfall. Almost all third-party libraries in PyPI will not provide binary packages for the Loongarch architecture. After all, our architecture is indeed too special. Fortunately, the Loongson official has also compiled many commonly used packages for you. The third-party library can be used out of the box by changing the source in pip. However, there are indeed too few things in the source of Loongson. Once you encounter something that is not available , you have to download the source code and compile it yourself. At this time, you will encounter various problems. It is commonplace to report strange errors and then solve them one by one. On the Java side, in addition to the native OpenJDK mentioned just now, Loongson also provides a native maven warehouse. If you want to develop npm and gem native corresponding to JavaScript or Ruby There are also warehouses. Generally speaking, as long as you are willing to toss around, the Loongson platform is fully capable of doing most development work. Of course, the tossing itself also requires manpower and time costs, right? This may also dissuade some developers. Now that we have finished talking about the practical information, should we start to consolidate? Have you ever wondered what it feels like to play games on Loongson ? Hey, thanks to the help of Wine and LATX translation layer, playing PC games with Loongson is no longer a foolish dream . It just requires you to have some patience because you will often encounter compatibility issues , such as various environment dependencies that are not correctly installed or the version of Wine is not correct. Errors caused by compatibility We started trying to run games on Loongson at the end of last year. At that time, LATX and Wine could support Loongson to run DX9 games , such as "The Elder Scrolls: Skyrim", a very classic work. Look, we tried it. It can run smoothly. Although the frame rate is indeed not very high, I can hardly say that it is barely playable. It seems that the frame rate is only in single digits. The translation efficiency is natural in complex working conditions such as games. It’s hard to explain. It’s hard for both CPU and GPU to show their strength. I also tried Tomb Raider 9, which is even a native Linux game . It seems to run smoothly and the frame rate is obviously higher than that of Tomb Raider 5. Ah wow, Tomb Raider has been running for at least ten or twenty frames. It feels like the running efficiency is pretty good. Let’s just assume it can be played. Judging from the current level of perfection of the translation layer, it’s already thankful that it can run . Enter Loongson actually got an unexpected surprise after this year , which is that it can finally use DXVK. This thing is familiar to students who often watch our programs. It allows you to use Vulkan to run DX games . This way, you can support DX11 and be able to play it all at once. There are many more games. We tried "Titanfall 2". As a DX11 game, it can run smoothly on Loongson. Although the frame rate does not seem to be high like before, it is still playable , but it is done. In fact, there are still many games that will encounter problems in DX11 translation . For example, Genshin Impact will cause a debugger error and it cannot be run even if it is used on the international server. There is also a game called GTA5 . I was actually able to enter the game at the beginning. I was very happy. Oops. The result It will be stuck on Social Club. The same game file will not report this error on x86 . There should still be some components that cannot run on the translation layer of Dragon Architecture. If it is some newer large-scale games , it is more likely. We basically couldn't run games like The Witcher 3 and God of War when we tried them. But if you play less stressful games like 2D games , some Galgame, visual novels, etc., then many of them can still run on Godson. I tried the remastered version of Magus Night here. It is very stable and smooth. Some open source simulators such as PPSSPP have also been transplanted to Dragon architecture by the community. 3A6000 can also run smoothly. It should be said that there is no shortage of games on the Loongson platform. But basically the ones that can run smoothly are old games and 2D games, 3A masterpieces, etc. We didn’t expect it to run. I was already very excited when I saw a few games that could run smoothly . But as I said before The LATX translation layer and Wine have been updated and iterated. Future game compatibility will definitely get better and better . I wonder what your impression of the Loongson 3A6000 is from today’s detailed review ? To be honest, I don’t have the luxury to expect it to reach the international advanced level. This is obviously unrealistic. I think as long as Loongson can stick to its own path and make steady improvements in each generation , and improve the software ecosystem , at least make it usable . Not bad. From this point of view, I think the 3A6000 has successfully completed the task. At least the performance of this product is usable. Although the ecology of the Dragon architecture is still very weak , it is at least suitable for its target customer group. Daily office needs can be met. After all, it is the results in just two years. It is worth looking forward to what it can develop into in the future. But having said that, I think it is still very difficult for Loongson to enter thousands of households. Its advantages are mainly in It is autonomous and controllable , so it is basically sold to government and enterprise customers who have requirements for Xinchuang. I can think of it as a purely to B product. Among ordinary consumers, only fanatical enthusiasts will consider buying it and playing with it. From its competitiveness It is almost impossible to get a share of the consumer market with Intel and AMD. So how to attract developers to join the game and how to get everyone to develop software for the LoongArch architecture has become a problem that Loongson needs to consider carefully. But regardless of I am quite happy to see that domestic CPUs have made great progress. This review of the Loongson 3A6000 took us a very long time. It has already been written for a long time. I don’t know if I have answered your doubts. If you think If it is useful, be sure to press and hold like and click three times to support us. Don’t forget to come to our Taobao store - Geek Bay store to see various peripherals. Don’t forget to follow our channel - Geek Bay. Then we will See you next time , bye
Info
Channel: 极客湾Geekerwan
Views: 156,160
Rating: undefined out of 5
Keywords:
Id: GpUJHHwedCw
Channel Id: undefined
Length: 27min 43sec (1663 seconds)
Published: Sun Feb 11 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.