Have you ever thought about how video streaming
is possible? Let's think about how big a typical 1080p
video is: 1920x1080 pixels, 24-bits each, 30 frames per second... That's almost 1.5 gigabits per second. How can you transmit that much data, over
the air, in real time? The answer is video compression. You might have heard of codecs. A codec is a piece of software that encodes
and decodes data. The encoding part compresses the data, making
it easier to store and transmit. And the decoding part reverses this process,
recreating the original data as closely as possible. Use of codecs is not limited to video, they
can be used to encode and decode many types of signals. But for now, let's focus on how video codecs
work. In the previous video, we talked about how
still images are compressed. In short, images are compressed by throwing
out the information that is less visible to the human eye and storing redundant data more
efficiently. We can easily extend image compression to
video compression by compressing a video frame by frame. This approach is called spatial or intra-frame
coding. Even doing that alone would significantly
reduce the file size, but we can actually do much more than that. In a typical video, many consecutive frames
tend to be nearly identical. We can make use of this temporal, inter-frame
redundancy to further compress a video. First, let's think of an extreme case where
nothing moves in a video. Instead of storing each one of these identical
frames we can simply tell our encoder to keep the first frame and repeat it N times. That would save us a lot of space. Now let's think of a more realistic case where
only some parts in the video don't change. This time we can do the same thing, but more
locally, by dividing the frames into blocks and repeating only the blocks that don't change. What if all blocks change between consecutive
frames but some change a lot and some change a little? Instead of checking whether a block has changed
or not we can search for a given block in the next frame within a neighborhood. This process is called block motion estimation. How does this help with compression? Well, instead of saving every frame, we can
save a reference frame and the motion vectors for the blocks. The motion vectors tell us how we should move
the blocks to closely match the next frames. This is called motion compensation. Although motion compensation can reduce the
difference between two consecutive frames greatly, it is usually not enough by itself
to fully recreate the next frame. So, in addition to the motion vectors, we
should also save the frame differences between the actual and motion compensated frames. These differences are known as residual frames. When it's time to play this video, the decoder
predicts the current frame by taking the previous reference frame, compensating for the motion
using the motion vectors, and adding the residual frame. You might ask couldn't we just save the original
frames instead of saving the residual frames? We could, but residual frames have much less
information than the full reference frames. Therefore they are highly compressible. Let's overview the entire process. Traditional video compression algorithms represent
a video as a sequence of reference frames followed by residual frames. There are two types of compression done here:
intra-frame coding and inter-frame. In this video, I focused mostly on the inter-frame
coding part which achieves a high compression efficiency by exploiting the similarities
between consecutive frames. Intra-frame coding, on the other hand, compresses
a frame by throwing out visually redundant information within the frame and storing the
rest more efficiently. Check out my previous video to learn more
about how this is done. The methods I covered here are the very basics
that are used by many codecs including the mainstream H.264 codec, which is also known
as MPEG-4 AVC. Modern video codecs, including H.264, H.265,
and VP9, use sophisticated methods to balance the level of compression and perceptual image
quality without introducing too much computational complexity. Although video compression algorithms we use
today are pretty mature, video compression is still an active area of research. Researchers have already been experimenting
with machine learning models that have the potential to perform better than today's block-based
hybrid encoding standards. It's not easy to beat today's encoding standards
since they had decades to mature and to be tuned in many possible ways. I still think that an end to end trainable
codec will eventually outperform the traditional compression methods, by optimizing perceptual
image quality while minimizing the file size. That's all for today! I hope you liked it. If you have any comments or questions, let
me know in the comments section below. Subscribe for more videos. As always, thanks for watching, stay tuned,
and see you next time.