The AI Hardware Problem

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Good general overview. Worth the watch.

👍︎︎ 4 👤︎︎ u/0xD153A53 📅︎︎ Feb 13 2021 🗫︎ replies

Why not concentrate research efforts on spiking ANNs instead? Changing computer architectures to fit current models instead of developing new models does not seem like a good long term plan. Nice video though.

👍︎︎ 1 👤︎︎ u/rand3289 📅︎︎ Feb 14 2021 🗫︎ replies
Captions
the millennial idea of expressing signals and data as a series of discrete states had ignited a revolution in the semiconductor industry during the second half of the 20th century this new information age thrived on the robust and rapidly evolving field of digital electronics fundamentally it was predictable in high-yielded nature the abundance of automation and tooling made it relatively manageable to scale designs and complexity and performance as demand group processors could be made faster and more robust as demand expanded supporting mechanisms such as memory would grow in complexity to match but now a massive paradigm shift in how information is processed is occurring we've arrived at the dawn of the artificial intelligence revolution and with this comes a greater demand for more computational capacity however the scaling of existing processing technologies is starting to hit a wall the power being consumed by machine learning applications cannot feasibly grow as is on existing processing architectures the unique requirements of machine learning are now forcing the semiconductor industry to explore the largely abandoned use of analog circuitry for computation analog circuitry in information processing systems for the most part have been relegated to interfacing with the real world and communications it's difficult and time consuming to design and verify and prone to external interference furthermore it does not scale easily without digital support and is not easily adapted to newer technologies despite these shortcomings analog may in fact hold the key to the advancement of ai this video is made possible by brilliant ai seems to be the big thing nowadays everything is ai this or ai that but once you peer through the hype the world of machine learning is a fascinating perspective of how nature and now computer scientists harnessed a statistical process to make sense of information in a noisy and uncertain world with brilliant you can quickly dissect and grasp the concepts behind machine learning in an incredibly intuitive manner brilliance allows you to build your problem solving and critical thinking abilities by making it easy to strengthen your base of knowledge and understanding brilliant offers a huge array of interactive lessons on a broad range of math science and computer science topics and unlike traditional courses you learn efficiently by working on problems and solving puzzles you don't even need to be an expert to understand the lessons it's made for all levels of knowledge i've personally enjoyed their course on artificial neural networks it's a great way to learn how neural networks learn by detecting patterns in huge amounts of information and it really makes you appreciate the mechanics of their structure and how flexible they are for data processing and making predictions and decisions with brilliant you learn in depth and at your own pace brilliant is offering a 20 discount for the first 200 people that sign up using this link just go to brilliant.org forward slash new mind the link is in the description below machine learning occurs in a way that is fundamentally different from traditional von neumann architecture at its core is the multiply accumulate function or mac this simple function takes two numbers multiplies them together and adds the result to an accumulator the vast majority of ai applications are inference engines that employ a neural network model that is composed of layers of neurons interconnected by a weight parameter and each employing the multiply accumulate function even a relatively small network such as the mobilenet 224 model for example has over 4.2 million weights and requires 569 million multiply accumulate operations to perform an inference in a digital neural network implementation the weights and input data are stored in system memory and must be fetched and stored continuously through the sea of multiply accumulate operations within the network this approach results in most of the power being dissipated in fetching and storing model parameters and input data to the arithmetic logic unit of the cpu where the actual multiply accumulate operation takes place a typical multiply accumulate operation within a general purpose cpu consumes about 250 femtojoules of energy while the energy consumed during data transfer to and from the operation is more than two orders of magnitude greater than the computation itself in the range of 50 to 100 picojoules in effect this memory transfer accounts for the vast majority of the time and power consumed by both learning and inferencing these inefficiencies are so significant that it took the widespread availability of gpus and their massive parallel architecture to initiate the recent rapid growth in machine learning their ability to process 3d graphics requires a larger number of arithmetic logic units coupled to high-speed memory interfaces this characteristic inherently made them far more efficient and faster for machine learning by allowing hundreds of multiply accumulate operations to process simultaneously however despite these advantages the graphic-centric architecture for gpu imposes a new inefficiency when used for machine learning gpus tend to utilize floating point arithmetic using 32 bits to represent a number by its mantissa exponent and sign in some cases this can even be configured to be as high as 64 bits or more because of this gpu targeted machine learning applications have been forced to use floating point numbers operating at a level of mathematical precision that is far beyond what is needed to be effective mitigating this inefficiency by using fixed point arithmetic or modified forms of floating point numbers have been attempted but ultimately the problem of excessive processing still remains the next evolution in machine learning would come in the form of dedicated ai accelerator application specific integrated circuits these dedicated ai chips are designed for a high volume of low precision calculations offering dramatically larger amounts of data movement per joule when compared to gpus and general purpose cpus a large part of this boost in efficiency is attributed to the move towards 8-bit integer computational architecture this came as a result of the discovery that with certain types of neural networks the dramatic reduction in computational precision only reduced network accuracy by a small amount these types of ai acceleration hardware are especially suited for a class of deep learning neural networks known as convolutional neural networks and are most commonly applied to analyzing visual imagery one noteworthy example of the performance levels of these chips is google's current machine learning processor used in their data centers the google tensor processing unit containing 65 536 8-bit multiply accumulate blocks one of these processors can perform around 90 trillion operations a second while consuming just 250 watts as newer gpus begin to catch up with the performance of these chips the next generation of application specific integrated circuit designs hope to squeeze out even more performance per watt however the ability to scale this technology is slowing it will soon become infeasible to increase the number of multiply accumulate units integrated onto a chip or reduce bit precision further there is an inherent limit to implementing machine learning on digital electronics outside of the realm of the digital world it's known definitively that extraordinarily dense neural networks can operate efficiently with small amounts of power the mammalian brain is a clear example of this even the most power hungry variant the human brain only takes 25 watts to drive the 86 billion processing units within it compared to the 250 watts needed to drive the 65 000 processing units of a google tensor processing unit chip it becomes obvious that we're still orders of magnitude away from what is possible while attempting to replicate the brain is probably not the ideal path forward much of the industry believes that the digital aspect of current systems will need to be augmented with a more analog approach in order to take machine learning efficiency further with analog computation does not occur in clocked stages of moving data but rather exploit the inherent properties of a signal and how it interacts with a circuit combining memory logic and computation into a single entity that can operate efficiently in a massively parallel manner some companies are beginning to examine returning to the long outdated technology of analog computing to tackle the challenge analog computing attempts to manipulate small electrical currents via common analog circuit building blocks to do math these signals can be mixed and compared replicating the behavior of their digital counterparts however while large-scale analog computing have been explored for decades for various potential applications it has never been successfully executed as a commercial solution currently the most promising approach to the problem is to integrate an analog computing element that can be programmed into large arrays that are similar in principle to digital memory in fact the technologies at the core of several resistive non-volatile memories such as ram or resistive ram mram or magneto resistive ram and pcm or phase change memory are being developed for machine learning use with resistive memory technology each cell effectively becomes a programmable resistor that can retain its configuration unpowered and be easily interfaced to the digital world by configuring the cells in an array an analog signal synthesized by a digital to analog converter is fed through the network as this signal flows through a network of pre-programmed resistors the currents are added to produce a resultant analog signal which can be converted back to digital via an analog to digital converter because the signal effectively moves passively through the array very little power is consumed they are also flexible and modular by design allowing for the construction of massively parallel networks that consume very little power and interface well with digital electronics it's estimated that such systems could perform machine learning operations up to 500 times more efficiently than gpus and current application specific integrated circuits using an analog system for machine learning does however introduce several issues the most glaring is that of precision and variability analog systems are inherently limited in precision by the noise floor though much like low bit with digital systems this becomes less of an issue for certain types of neural networks variability however is a much greater hurdle to overcome if analog circuitry is used for inferencing the result may not be deterministic and is more likely to be affected by heat noise or other external factors when compared to a digital system variability can also be introduced in the manufacturing process especially in very large arrays this fragility is highly problematic as noise injected into such a system could easily render results untrustworthy other potential issues lie at the interface between digital and analog for applications where very high processing speeds are needed such as with autonomous vehicles the latency of the conversion process from digital to analog and back must be balanced against its accuracy another problem with analog machine learning is that of explainability unlike digital systems analog systems offer no easy method to probe or debug the flow of information within them they form a sort of black box with no means to verify the integrity of a result this creates the dilemma of potentially unexplainable ai systems creating issues of trust especially in mission critical applications still their high efficiency make them an attractive solution some in the industry propose that a solution may lie in the use of low precision high speed analog processors for most situations while funneling results that require a higher confidence to lower speed high precision and easily interrogated digital systems this configuration may ultimately be similar to the way our brains function performing low precision processing on the bulk sensory data coming in and only applying more energy consuming focus on a very small amount of that data stream this video was made possible by brilliant be sure to check them out at brilliant.org backslash new mind or at the link in the description below [Music] you
Info
Channel: New Mind
Views: 281,343
Rating: 4.9252362 out of 5
Keywords: ai, neural networks, ann, convolutional neural networks, machine learning, machine vision, AI on GPU, GPU, tensor flow, tensor, artificial intelligence chips, artificial intelligence, ai asics, ai chips, google TPU, tensor processing unit, neuron, weights, MRAM, magnetic ram, PCM, phase change memory, MRRAM, magneto resistive memory, artificial neural networks, analog ai, analog neural networks, GPU AI, GPU neural networks
Id: owe9cPEdm7k
Channel Id: undefined
Length: 13min 25sec (805 seconds)
Published: Sat Feb 13 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.