Why Apple's M1 Chip is So Fast

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

M1 has spoiled me for laptops now. The chassis always runs somewhere between cold and room temperature, and I’ve nearly forgotten what fans even sound like. And for a low end first gen effort, it handles my daily workload with what can only be described as ease. Impressive little thing. Thanks for the technical breakdown.

πŸ‘οΈŽ︎ 144 πŸ‘€οΈŽ︎ u/walktall πŸ“…οΈŽ︎ Feb 01 2021 πŸ—«︎ replies

This video is probably the best explanation I've seen yet of how & why the M1 chips are so fast.

It goes in-depth, but it is quite approachable. Its explanation on variable-length instructions & decoders is top-notch and I was surprised how clear it is.

Most of this information is locked away in lengthy, in-depth knowledge that goes far beyond M1. This video organizes it in a much more serious way than hand-wavy "explanations" that really do not into any serious detail and focuses on the M1's technical successes.

πŸ‘οΈŽ︎ 56 πŸ‘€οΈŽ︎ u/-protonsandneutrons- πŸ“…οΈŽ︎ Feb 01 2021 πŸ—«︎ replies

Using the M1 reminds me of when I had an all SSD setup for a few years and then used my friends 10 year old dell with an HDD. Going back to a clunky feeling intel with the hot keyboard and loud fans, feels like I'm stressing the hell out of my friends old $50 brown Honda after cruising in a Tesla.

πŸ‘οΈŽ︎ 26 πŸ‘€οΈŽ︎ u/JohrDinh πŸ“…οΈŽ︎ Feb 01 2021 πŸ—«︎ replies

This channel is great, thanks for sharing.

πŸ‘οΈŽ︎ 5 πŸ‘€οΈŽ︎ u/renamdu πŸ“…οΈŽ︎ Feb 01 2021 πŸ—«︎ replies

I wish Apple would create a video game system.

πŸ‘οΈŽ︎ 10 πŸ‘€οΈŽ︎ u/AmericanMexican69 πŸ“…οΈŽ︎ Feb 01 2021 πŸ—«︎ replies

Now do a M2 variant with a tad load more GPU oomph.

πŸ‘οΈŽ︎ 3 πŸ‘€οΈŽ︎ u/mc4_life πŸ“…οΈŽ︎ Feb 02 2021 πŸ—«︎ replies

Great explanation.

The unified memory is a great competitive advantage for Apple. It makes the system faster by doing away with the redundant copying of data, reduces cost for less component, less energy consumption, less heat, less space, and just by improving the memory you’ll potentially be improving both the cpu, gpu, and possibly other specialized units.

Is M1 affected by meltdown, spectre, and other variants? Never heard anybody talk about the security aspect yet.

πŸ‘οΈŽ︎ 2 πŸ‘€οΈŽ︎ u/winsome_losesome πŸ“…οΈŽ︎ Feb 02 2021 πŸ—«︎ replies

This was a great explanation for why the M1 chips are so impressive. Viewing from my newly acquired M1 Air just makes it all the sweeter :)

πŸ‘οΈŽ︎ 2 πŸ‘€οΈŽ︎ u/MrAshyP πŸ“…οΈŽ︎ Feb 02 2021 πŸ—«︎ replies

I can’t wait for the 16” Pro.

πŸ‘οΈŽ︎ 2 πŸ‘€οΈŽ︎ u/ReelPaws πŸ“…οΈŽ︎ Feb 02 2021 πŸ—«︎ replies
Captions
[Music] hi guys welcome back to the channel so we've all heard about apple's new m1 chip and apple promised a lot with this chip they promised a 3.5 faster cpu as well as 3.9 times faster video processing all while using only 25 of the energy that a standard mac would that didn't have the m1 chip these were bold claims from apple and the world has been poised to see how effective these m1 chips really are and now we're starting to get the first real world tests of just how quick they are so how fast are they they're fast they're really fast over at max tech the team tested an 899 mac mini versus a 2 500 imac and they got some shocking results the m1 mac chip absolutely blew the intel driven imac out of the water so we now know the m1 chip is living up to apple's claims but what is making the m1 chip so much faster than computers with more ram and larger gpus so in order to understand why the m1 chip is so fast we first need to understand what a cpu or central processing unit is a cpu also called a microprocessor pulls in instructions from memory and then performs those instructions in a sequence trying to boil the cpu down to its basic font it's essentially a number of named memory cells called registers and a number of computational units called arithmetic logic units or alus the alus perform the arithmetic operations within the cpu like addition subtraction etcetera but the alus cannot do this without the data to manipulate and this is where the registers come in the alus are attached to the registers and it is the registers that pull the data from the computer's memory so as an example say we are asking the cpu to add together the numbers 10 and 20. first the cpu needs to load the numbers from random access memory or ram and store these numbers in a register so here we are grabbing the number 10 from memory location 140 and storing it in the register r1 we then do the same for accessing the number 20 that's stored at memory location 180 and store this in the register r2 only now can the cpu use the add operation of the alu to perform the operation on these numbers but how is the m1 chip faster well firstly and importantly the m1 chip is not a cpu usually a computer has multiple different chips each with a particular job these chips often need to work together to perform certain tasks the n1 chip groups all of these chips into one silicon package the m1 chip does have a cpu but it also contains the computer's gpu memory and much more meaning the chip makes up the whole processing needs of the computer in the industry this is known as a system on a chip for years cpus have been made better by simply adding more general purpose cpus these are otherwise known as cores the more cause the computer has the more powerful its processing ability however in recent years apple has begun to maneuver away from this architecture rather than having multiple general purpose cpus they're producing more specialized chips that are very good at doing particular tasks these chips are able to perform their allotted tasks much faster and for much less energy than a general purpose cpu drastically increasing the chip's performance per watt the m1 chip includes multiple of these specialized units now although i've said the m1 chip is not a central processing unit it does have a central processing unit within it this is in charge of running most of the code involving the operating system and its applications however it also has a graphics processing unit or gpu which handles graphics related tasks such as visualizing an app's user interface and 2d 3d gaming in addition to this it's got several other components though it has an image processing unit or isp which is one of these specialized units i've spoken about the isp drastically increases the image processing capabilities of the m1 chip and that's why its processing of imagery is so much faster than the industry standard it's also got a digital signal processor which handles more mathematically intensive functions than a cpu including decompressing music files and things like that in addition to this a neural processing unit used in higher end smartphones to accelerate machine learning ai tasks but it's also being used more commonly in computers as well these include things like voice recognition and camera processing it also has a video encoder and decoder which handles powerful efficient conversion of video files to different formats as well as a secure enclave which handles encryption authentication and security and finally a unified memory system which allows the cpu gpu and other calls to quickly exchange information that last point the unified memory is one of the key ingredients to why the m1 chip performs so well but it's a little bit of a difficult concept to get your head around cheaper computers tend to have integrated cpus and gpus often referred to as integrated graphics these are infamously slow and this is largely due to the way each of them ingests data in order for this system to work the memory needs to be partitioned meaning the cpu and gpu uses its own memory and cannot take data from each other's therefore if the cpu wants to use the gpu to process something that's stored in the cpu's memory then rather than just handing the gpu the data the cpu needs to copy the data into the gpu's memory a process which is exceptionally slow this is because each unit likes to ingest data differently a cpu wants to receive the data very quickly and they're happy to ingest the data in very small sizes whereas gpus are the opposite they don't need the data to be delivered quickly but when it is delivered it wants a lot of it because of this putting a gpu on a cpu is usually not a good idea because you dramatically decrease the potential of the gpu by starving it of the data it craves to work effectively apple has tried to address the fallbacks of shared memory using its unified memory architecture there is no special area reserved for just the cpu or just the gpu memory is allocated to both processors they can both use the same memory no copying is needed apple uses memory which serves both large chunks of data and serves it fast in computer speed this is called low latency and high throughput thus the need to be connected to two separate partitions of memory is removed the m1 chip also uses an arm instruction set these classically produce less heat allowing the gpu to have a higher heat budget than a gpu that was placed on the same silicon die this is unlike amd's and intel's architecture that uses an x86 instruction set arm and x86 are a little bit outside the scope for this video but if you want to know more let me know in the comments and i'll do a separate video on it i'll also include a link in the description if you want to know more now this architecture is part of the reason but not the whole reason why the m1 chips are so fast in addition to these advantages the general purpose cpu that's already in the m1 mac chip called firestorm is in itself really fast in order to make a general purpose cpu faster you need to do one of two things you can either make them perform more operations in the sequence faster or you can make them perform tasks in parallel meaning they are able to perform multiple tasks at once the firestorm cpu in the m1 chip makes use of the latter of these two by increasing the parallel processing capabilities of the chip in developer speak parallel processing is more commonly known as threading code however writing functional threaded code is some of the most difficult code to write and so it requires a very talented programmer to be able to write it most developers are not able to write an effective parallel code and therefore if you want your product to be both very fast using parallel processing but also accessible to most developers you need to produce a system that takes care of the parallel processing for you without having to have a programmer write advanced threaded code to achieve this the m1 firestorm cpu uses something called out-of-order execution out-of-order execution is a complicated topic and i'll be honest it took me a while to get my head around it using out-of-order execution developers don't need to thread their code to get the same advantages from the developer's point of view each core is just running faster in order to understand this we need to look back at how a cpu works asking for data at a particular location in the computer's memory is a slow process but a cpu is capable of getting multiple bytes of data at the same time from a location meaning that getting one byte of information at memory location 150 is no quicker than getting that byte and the subsequent nine bytes of information in memory in other words getting 10 bytes of adjacent memory is no slower than getting just one byte as long as this information is stored adjacent to one another this means your microprocessor can get a whole chunk of information at once with instructions to carry out on that data but when you write code you write it in a sequence so that actions happen in a particular order this is where out-of-order execution comes in a modern cpu uses out-of-order execution to analyze what operations rely on what data by doing this the cpu knows which operations can be performed simultaneously as well as those that need to be performed sequentially i think this is best explained with an example here we can see a short buffer of instructions a cpu might receive first let me explain what this syntax means the first part is the action being performed so in operation one we can see that the action is multiplication the second is the register location to store the result of the operation so in operation one that is our zero the following parts are the register locations to perform the operation on so in operation one we have a multiplication being performed on r1 and r2 and the result of this operation should be stored in the register r0 now multiplication is a slow computing process but because the addition in operation 2 requires the result of the multiplication in operation 1 then this operation cannot be performed until the operation 1 has taken place it will just have to wait however operation 3 doesn't rely on any of the outputs of operation one or two meaning it is independent of these operations and can therefore be performed in parallel to operation one without waiting for that operation to complete this is how out of order execution works but in real life the cpu will be forming this on hundreds of instructions this produces a parallel execution of the code without the developer having to write complicated threaded code themselves with that being said a developer should always write threaded code if it's within their competency and it makes sense to do so but when combined with out-of-order execution we can get some incredible results as we're already seeing with the m1 chip but you might be asking why is the m1 chip necessarily better at out-of-order execution than the amd and intel chips well the speed and effectiveness of out-of-order execution all comes down to how quickly you can fill up a buffer of micro operations the more micro operations there are on your buffer the higher the probability that cpu will be able to find multiple operations that it can do in parallel the speed with which a computer can refill this buffer with micro operations depends on how quickly the computer can chop up a program's machine code into micro operations this is done using something called a decoder this is where the m1 chip has a massive advantage even the meatiest of intel's and amd's microprocessors only have four decoders the m1 chip has eight this means it can fill up the instruction buffer faster than any of its rivals in addition to this the m1 chip also has an instruction buffer that's three times larger than the industry standard so not only can it fill the instruction buffer quicker than its amd and intel rivals but it also has more space there to fill so why can't amd and intel just start building chips to compete with the m1 surely they can just add some more decoders or create some more system on a chip solutions well it's not actually as easy as that for two reasons the models of companies like amd and intel rely on them producing general purpose cpus that people can just slot into a pc motherboard however things are starting to move away from this design to a more system on a chip architecture this means that rather than obtaining physical components from vendors you now have to obtain the ip rights to use this particular design from vendors meaning you buy the rights to the design of a cpu gpu io controller etc and then you use that to make your chip using something like a foundry now here's the problem intel amd nvidia they're not going to start licensing their ip to computer companies like hp and dell you could argue that companies like intel might just start making their own system on a chip solutions but to what specs different computer companies often provide different advantages to their clients and that's possible because they assemble the different hardware in their computer's motherboard it's not going to be economical for intel or amd to supply individual socs to each of their clients so they're in a bit of a catch-22 scenario for apple of course this isn't a problem now they've pivoted away from intel they control every piece of the pie and can build their own chips to match whatever specifications they choose the second reason that amd and intel are at a disadvantage is because of the instruction set they they're using their chips as already mentioned the m1 chip uses the arm instruction set whereas intel and amd tend to use the x86 instruction set now as i've already said the inner workings of these instruction sets are beyond the scope of this video but there is an important difference that we need to consider in an x86 instruction set the instruction can be anywhere between 1 and 15 bytes long in the arm instruction set the instructions have a fixed length of 4 bytes long why is this relevant well if we go back to our concept of decoding you're probably wondering why intel and amd can't just put more decoders in their cpus they can't do this because of the amount of work required to split up each stream of bytes into instructions within a decoder if these instructions are all the same length as they are in the m1 chips then this is an incredibly simple computing task to perform however as the x86 instructions can be different lengths the decoder cannot tell where an instruction starts it has to analyze each individual instruction to calculate how long the instruction is currently amd and intel chips just attempt to force this solution by decoding the instruction from every possible start point meaning there are a lot of possible wrong guesses this adds such complexity to the decoding stage of this process that adding extra decoders is almost impossible in fact it is so difficult that amd has admitted that four decoders are likely the upper limit possible for their chips all of this puts amd and intel at quite a disadvantage and if they're gonna keep up with the latest m1 chip they're gonna have to be quite inventive with the way they design their chips this also affects the pc market it's kind of the first time that the apple products have become cost effective compared to a pc considering how much better the processing power is now thanks for watching guys i'll be really interested to know what your thoughts are so please leave them in the comments below and i'll try and reply to everyone if you enjoyed this video consider subscribing to get all my content and as always happy devon guys and i'll see you in the next video you
Info
Channel: The Dev Doctor
Views: 186,099
Rating: undefined out of 5
Keywords: M1, m1 chip, m1 chip vs intel, m1 chip benchmark, apple chips, apple silicone, fastest chip, fastest mac chip, thedevdoctor, apple, m1 macbook pro, apple m1 chip, apple silicon
Id: cAjarAgf0nI
Channel Id: undefined
Length: 13min 35sec (815 seconds)
Published: Mon Dec 28 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.