- For hundreds of years, analog computers were the most
powerful computers on Earth, predicting eclipses, tides,
and guiding anti-aircraft guns. Then, with the advent of
solid-state transistors, digital computers took off. Now, virtually every
computer we use is digital. But today, a perfect storm of
factors is setting the scene for a resurgence of analog technology. This is an analog computer, and by connecting these
wires in particular ways, I can program it to solve a whole range of differential equations. For example, this setup
allows me to simulate a damped mass oscillating on a spring. So on the oscilloscope, you
can actually see the position of the mass over time. And I can vary the damping, or the spring constant, or the mass, and we can
see how the amplitude and duration of the oscillations change. Now what makes this an analog computer is that there are no
zeros and ones in here. Instead, there's actually
a voltage that oscillates up and down exactly
like a mass on a spring. The electrical circuitry is an analog for the physical problem, it just takes place much faster. Now, if I change the
electrical connections, I can program this computer to solve other differential equations, like the Lorenz system, which is a basic model of
convection in the atmosphere. Now the Lorenz system is
famous because it was one of the first discovered examples of chaos. And here, you can see the Lorenz attractor with its beautiful butterfly shape. And on this analog computer, I can change the parameters and see their effects in real time. So these examples illustrate some of the advantages of analog computers. They are incredibly
powerful computing devices, and they can complete a
lot of computations fast. Plus, they don't take much power to do it. With a digital computer, if you wanna add two eight-bit numbers, you need around 50 transistors, whereas with an analog computer, you can add two currents, just by connecting two wires. With a digital computer
to multiply two numbers, you need on the order of 1,000 transistors all switching zeros and ones, whereas with an analog computer, you can pass a current through a resistor, and then the voltage across this resistor will be I times R. So effectively, you have multiplied two numbers together. But analog computers also
have their drawbacks. For one thing, they are not general-purpose
computing devices. I mean, you're not gonna run
Microsoft Word on this thing. And also, since the inputs
and outputs are continuous, I can't input exact values. So if I try to repeat
the same calculation, I'm never going to get
the exact same answer. Plus, think about
manufacturing analog computers. There's always gonna be some variation in the exact value of components, like resistors or capacitors. So as a general rule of thumb, you can expect about a 1% error. So when you think of analog computers, you can think powerful,
fast, and energy-efficient, but also single-purpose,
non-repeatable, and inexact. And if those sound like deal-breakers, it's because they probably are. I think these are the major reasons why analog computers fell out of favor as soon as digital
computers became viable. Now, here's why analog computers
may be making a comeback. (computers beeping) It all starts with
artificial intelligence. - [Narrator] A machine
has been programmed to see and to move objects. - AI isn't new. The term was coined back in 1956. In 1958, Cornell University psychologist, Frank Rosenblatt, built the perceptron, designed to mimic how
neurons fire in our brains. So here's a basic model of how
neurons in our brains work. An individual neuron
can either fire or not, so its level of activation
can be represented as a one or a zero. The input to one neuron is the output from a bunch other neurons, but the strength of these connections between neurons varies, so each one can be given
a different weight. Some connections are excitatory, so they have positive weights, while others are inhibitory, so they have negative weights. And the way to figure out whether a particular neuron fires, is to take the activation
of each input neuron and multiply by its weight, and then add these all together. If their sum is greater than
some number called the bias, then the neuron fires, but if it's less than that,
the neuron doesn't fire. As input, Rosenblatt's
perceptron had 400 photocells arranged in a square grid, to capture a 20 by 20-pixel image. You can think of each
pixel as an input neuron, with its activation being
the brightness of the pixel. Although strictly speaking, the activation should
be either zero or one, we can let it take any
value between zero and one. All of these neurons are connected to a single output neuron, each via its own adjustable weight. So to see if the output neuron will fire, you multiply the activation
of each neuron by its weight, and add them together. This is essentially a vector dot product. If the answer is larger than
the bias, the neuron fires, and if not, it doesn't. Now the goal of the perceptron was to reliably distinguish
between two images, like a rectangle and a circle. For example, the output neuron could always fire when presented with a circle, but never when presented with a rectangle. To achieve this, the
perception had to be trained, that is, shown a series
of different circles and rectangles, and have its
weights adjusted accordingly. We can visualize the weights as an image, since there's a unique weight
for each pixel of the image. Initially, Rosenblatt set
all the weights to zero. If the perceptron's output is correct, for example, here it's shown a rectangle and the output neuron doesn't fire, no change is made to the weights. But if it's wrong, then
the weights are adjusted. The algorithm for updating the weights is remarkably simple. Here, the output neuron didn't
fire when it was supposed to because it was shown a circle. So to modify the weights, you simply add the input
activations to the weights. If the output neuron
fires when it shouldn't, like here, when shown a rectangle, well, then you subtract
the input activations from the weights, and you keep doing this until the perceptron correctly identifies all the training images. It was shown that this
algorithm will always converge, so long as it's possible
to map the two categories into distinct groups. (footsteps thumping) The perceptron was
capable of distinguishing between different shapes,
like rectangles and triangles, or between different letters. And according to Rosenblatt, it could even tell the
difference between cats and dogs. He said the machine was capable of what amounts to original thought, and the media lapped it up. The "New York Times" called the perceptron "the embryo of an electronic computer that the Navy expects will
be able to walk, talk, see, write, reproduce itself, and be conscious of its existence." - [Narrator] After training
on lots of examples, it's given new faces it has never seen, and is able to successfully
distinguish male from female. It has learned. - In reality, the perceptron
was pretty limited in what it could do. It could not, in fact,
tell apart dogs from cats. This and other critiques were raised in a book by MIT giants,
Minsky and Papert, in 1969. And that led to a bust period for artificial neural
networks and AI in general. It's known as the first AI winter. Rosenblatt did not survive this winter. He drowned while sailing in Chesapeake Bay on his 43rd birthday. (mellow upbeat music) - [Narrator] The NAV Lab
is a road-worthy truck, modified so that researchers or computers can control the vehicle
as occasion demands. - [Derek] In the 1980s,
there was an AI resurgence when researchers at
Carnegie Mellon created one of the first self-driving cars. The vehicle was steered by an artificial neural
network called ALVINN. It was similar to the perceptron, except it had a hidden
layer of artificial neurons between the input and output. As input, ALVINN received
30 by 32-pixel images of the road ahead. Here, I'm showing them as 60 by 64 pixels. But each of these input
neurons was connected via an adjustable weight to a
hidden layer of four neurons. These were each connected
to 32 output neurons. So to go from one layer of
the network to the next, you perform a matrix multiplication: the input activation times the weights. The output neuron with
the greatest activation determines the steering angle. To train the neural net, a human drove the vehicle, providing the correct steering angle for a given input image. All the weights in the
neural network were adjusted through the training so that ALVINN's output
better matched that of the human driver. The method for adjusting the weights is called backpropagation, which I won't go into here, but Welch Labs has a great series on this, which I'll link to in the description. Again, you can visualize the weights for the four hidden neurons as images. The weights are initially
set to be random, but as training progresses, the computer learns to pick
up on certain patterns. You can see the road markings
emerge in the weights. Simultaneously, the output
steering angle coalesces onto the human steering angle. The computer drove the
vehicle at a top speed of around one or two kilometers per hour. It was limited by the speed at which the computer could
perform matrix multiplication. Despite these advances, artificial neural networks still struggled with seemingly simple tasks, like telling apart cats and dogs. And no one knew whether hardware or software was the weak link. I mean, did we have a good
model of intelligence, we just needed more computer power? Or, did we have the wrong idea about how to make intelligence
systems altogether? So artificial intelligence
experienced another lull in the 1990s. By the mid 2000s, most AI researchers were
focused on improving algorithms. But one researcher, Fei-Fei Li, thought maybe there was
a different problem. Maybe these artificial neural networks just needed more data to train on. So she planned to map out
the entire world of objects. From 2006 to 2009, she created ImageNet, a database of 1.2 million
human-labeled images, which at the time, was the largest labeled image
dataset ever constructed. And from 2010 to 2017, ImageNet ran an annual contest: the ImageNet Large Scale
Visual Recognition Challenge, where software programs
competed to correctly detect and classify images. Images were classified into
1,000 different categories, including 90 different dog breeds. A neural network competing
in this competition would have an output
layer of 1,000 neurons, each corresponding to a category of object that could appear in the image. If the image contains,
say, a German shepherd, then the output neuron
corresponding to German shepherd should have the highest activation. Unsurprisingly, it turned
out to be a tough challenge. One way to judge the performance of an AI is to see how often the five
highest neuron activations do not include the correct category. This is the so-called top-5 error rate. In 2010, the best performer
had a top-5 error rate of 28.2%, meaning that
nearly 1/3 of the time, the correct answer was not
among its top five guesses. In 2011, the error rate of
the best performer was 25.8%, a substantial improvement. But the next year, an artificial neural network from the University of
Toronto, called AlexNet, blew away the competition with a top-5 error rate of just 16.4%. What set AlexNet apart
was its size and depth. The network consisted of eight layers, and in total, 500,000 neurons. To train AlexNet, 60 million weights and biases
had to be carefully adjusted using the training database. Because of all the big
matrix multiplications, processing a single image
required 700 million individual math operations. So training was computationally intensive. The team managed it by
pioneering the use of GPUs, graphical processing units, which are traditionally used
for driving displays, screens. So they're specialized for
fast parallel computations. The AlexNet paper
describing their research is a blockbuster. It's now been cited over 100,000 times, and it identifies the
scale of the neural network as key to its success. It takes a lot of computation
to train and run the network, but the improvement in
performance is worth it. With others following their lead, the top-5 error rate on the ImageNet competition plummeted in the years that followed,
down to 3.6% in 2015. That is better than human performance. The neural network that achieved this had 100 layers of neurons. So the future is clear: We will see ever increasing demand for ever larger neural networks. And this is a problem for several reasons: One is energy consumption. Training a neural network
requires an amount of electricity similar
to the yearly consumption of three households. Another issue is the so-called
Von Neumann Bottleneck. Virtually every modern digital computer stores data in memory, and then accesses it as needed over a bus. When performing the huge
matrix multiplications required by deep neural networks, most of the time and energy goes into fetching those weight values rather than actually doing the computation. And finally, there are the
limitations of Moore's Law. For decades, the number of transistors on a chip has been doubling
approximately every two years, but now the size of a transistor is approaching the size of an atom. So there are some fundamental
physical challenges to further miniaturization. So this is the perfect
storm for analog computers. Digital computers are
reaching their limits. Meanwhile, neural networks
are exploding in popularity, and a lot of what they do boils down to a single task: matrix multiplication. Best of all, neural networks
don't need the precision of digital computers. Whether the neural net
is 96% or 98% confident the image contains a chicken, it doesn't really matter,
it's still a chicken. So slight variability in components or conditions can be tolerated. (upbeat rock music) I went to an analog
computing startup in Texas, called Mythic AI. Here, they're creating analog
chips to run neural networks. And they demonstrated
several AI algorithms for me. - Oh, there you go. See, it's getting you.
(Derek laughs) Yeah.
- That's fascinating. - The biggest use case is
augmented in virtual reality. If your friend is in a different, they're at their house
and you're at your house, you can actually render each
other in the virtual world. So it needs to really
quickly capture your pose, and then render it in the VR world. - So, hang on, is this
for the metaverse thing? - Yeah, this is a very
metaverse application. This is depth estimation
from just a single webcam. It's just taking this scene, and then it's doing a heat map. So if it's bright, it means it's close. And if it's far away, it makes it black. - [Derek] Now all these
algorithms can be run on digital computers, but here, the matrix multiplication
is actually taking place in the analog domain.
(light music) To make this possible, Mythic has repurposed
digital flash storage cells. Normally these are used as memory to store either a one or a zero. If you apply a large positive
voltage to the control gate, electrons tunnel up through
an insulating barrier and become trapped on the floating gate. Remove the voltage, and the electrons can
remain on the floating gate for decades, preventing the
cell from conducting current. And that's how you can store
either a one or a zero. You can read out the stored value by applying a small voltage. If there are electrons
on the floating gate, no current flows, so that's a zero. If there aren't electrons, then current does flow, and that's a one. Now Mythic's idea is to use these cells not as on/off switches,
but as variable resistors. They do this by putting a
specific number of electrons on each floating gate,
instead of all or nothing. The greater the number of electrons, the higher the resistance of the channel. When you later apply a small voltage, the current that flows
is equal to V over R. But you can also think of this
as voltage times conductance, where conductance is just
the reciprocal of resistance. So a single flash cell can be used to multiply two values together,
voltage times conductance. So to use this to run an
artificial neural network, well they first write all the
weights to the flash cells as each cell's conductance. Then, they input the activation values as the voltage on the cells. And the resulting current is the product of voltage times conductance, which is activation times weight. The cells are wired together in such a way that the current from each
multiplication adds together, completing the matrix multiplication. (light music) - So this is our first product. This can do 25 trillion
math operations per second. - [Derek] 25 trillion. - Yep, 25 trillion math
operations per second, in this little chip here, burning about three watts of power. - [Derek] How does it
compare to a digital chip? - The newer digital
systems can do anywhere from 25 to 100 trillion
operations per second, but they are big, thousand-dollar systems that are spitting out 50
to 100 watts of power. - [Derek] Obviously this isn't like an apples apples comparison, right? - No, it's not apples to apples. I mean, training those algorithms, you need big hardware like this. You can just do all sorts
of stuff on the GPU, but if you specifically
are doing AI workloads and you wanna deploy 'em,
you could use this instead. You can imagine them in security cameras, autonomous systems, inspection equipment for manufacturing. Every time they make a Frito-Lay chip, they inspect it with a camera, and the bad Fritos get blown
off of the conveyor belt. But they're using artificial intelligence to spot which Fritos are good and bad. - Some have proposed
using analog circuitry in smart home speakers, solely to listen for the wake
word, like Alexa or Siri. They would use a lot less
power and be able to quickly and reliably turn on the
digital circuitry of the device. But you still have to deal
with the challenges of analog. - So for one of the popular networks, there would be 50 sequences of matrix multiplies that you're doing. Now, if you did that entirely
in the analog domain, by the time it gets to the output, it's just so distorted that you don't have any result at all. So you convert it from the analog domain, back to the digital domain, send it to the next processing block, and then you convert it into
the analog domain again. And that allows you to
preserve the signal. - You know, when Rosenblatt
was first setting up his perceptron, he used a digital IBM computer. Finding it too slow, he built a custom analog computer, complete with variable resistors and little motors to drive them. Ultimately, his idea of neural networks turned out to be right. Maybe he was right about analog, too. Now, I can't say whether
analog computers will take off the way digital did last century, but they do seem to be better suited to a lot of the tasks
that we want computers to perform today, which is a little bit funny because I always thought of digital as the optimal way of
processing information. Everything from music to pictures, to video has all gone
digital in the last 50 years. But maybe in a 100 years, we will look back on digital, not not as the end point
of information technology, but as a starting point. Our brains are digital in that a neuron either
fires or it doesn't, but they're also analog in that thinking takes place
everywhere, all at once. So maybe what we need to achieve true artificial intelligence, machines that think like
us, is the power of analog. (gentle music) Hey, I learned a lot
while making this video, much of it by playing with
an actual analog computer. You know, trying things out for yourself is really the best way to learn, and you can do that with this
video sponsor, Brilliant. Brilliant is a website and app that gets you thinking deeply by engaging you in problem-solving. They have a great course
on neural networks, where you can test how
it works for yourself. It gives you an excellent intuition about how neural networks can
recognize numbers and shapes, and it also allows you to
experience the importance of good training data and hidden layers to understand why more sophisticated neural networks work better. What I love about Brilliant is it tests your knowledge as you go. The lessons are highly interactive, and they get progressively
harder as you go on. And if you get stuck, there
are always helpful hints. For viewers of this video, Brilliant is offering the first 200 people 20% off an annual premium subscription. Just go to brilliant.org/veritasium. I will put that link
down in the description. So I wanna thank Brilliant
for supporting Veritasium, and I wanna thank you for watching.