What's up, everybody! Today we're talking about how digital images
are represented, compressed, and stored on your devices. Let's get started! A typical image is represented as a matrix. Values of this matrix correspond to pixel
intensity values. A larger number means a brighter pixel, a
smaller number means a darker pixel. Color images have different channels for each
color components, such as red, green, and blue. Although this is probably the most common
way to represent an image, it's not how they are typically stored on a disk. Why not? Let's take a look at what happens when we
do. Let's say we have a 12-megapixel color picture,
which means we have 12 million values to store for each color channel leading to a total
of 36 million values. If we assume that these values are stored
as 8-bit or single-byte integers we should end up with a 36-megabyte file. I have a 12-megapixel image here. Let's see how big it is. Wait, what? That's not even 2 megabytes. How's this possible? The answer is image compression! In this case, it's JPEG image compression. You've probably seen the extension ".jpg"
at the end of your image file names. JPEG is not the only compressed image format
but it's probably the most common one. JPEG is a lossy compression format, meaning
that some of the information in the original image is actually thrown out. The more information you discard the worse
the image quality gets. So, there is a tradeoff between the image
quality and file size. But JPEG makes a profitable trade, reducing
file size while preserving the perceived image quality. Because the thrown out parts are designed
to be the parts that we wouldn't notice easily. Let's see how this is possible. The first step is color space conversion. Instead of representing an image with it's
red, green, and blue color component intensities they are converted into a color space where
one channel represents the light intensities, the other two channels represent the colors. This is a linear transform that can be expressed
as a matrix multiplication. This conversion provides a separation of the
luminance from the chrominance components. Since our visual system is much more sensitive
to the changes in brightness than color, we can safely downsample the chroma components
to save some space. This strategy is called chroma subsampling
and is used in many image and video processing pipelines. Another characteristic of the human visual
system that we can take advantage of is the frequency-dependent contrast sensitivity. What this means is that it's easier to miss
small objects or fine details in a picture as compared to the large ones, which is kind
of obvious. In this figure, the spatial frequency of the
bars increases from left to right and the contrast decreases from the bottom to top. This may vary from person to person but as
you can see the bars under the curve are more visible than the rest. This is because our visual system is more
sensitive to brightness variations in this range of spatial frequencies. Look at the bars at the low-contrast high-frequency
part of the figure: they are barely visible. This phenomenon gives us some room for compression
in those less visible frequencies. JPEG compression does that by dividing the
image into 8x8 blocks and quantizing them in a frequency-domain representation. This is done by comparing each one of these
8x8 blocks with 64 frequency patterns where the spatial frequency increases from left
to right and top to bottom. This process decomposes the image into its
frequency components, converting an 8x8 block where each element represents a brightness
level into another 8x8 block where each element represents the presence of a particular frequency
component. This method is called the discrete cosine
transform. In this representation, we can easily compress
the frequencies that are less visible to us by dividing these frequency components with
some constants and then quantizing them. The frequency components that we are less
sensitive to get divided by larger constants as compared to the ones that we are more sensitive
to. Quantization in this context simply means
rounding the result to the nearest integers. Using larger divisors lead to more numbers
rounded down to zero. This results in higher compression rates but
it also lowers the image quality. After quantization, we end up with a lot of
zeros in the high frequencies. We can store this information more efficiently
by rearranging the elements. If we rearrange these coefficients in a zig-zag
order from top left to bottom right, we can group these zeroes together. Once we have the zeros together, instead of
storing each one of them separately we can store their value and the number of times
they consecutively occur in tuples. This technique is called run-length encoding
and is used in many other algorithms as well. Finally, we can further compress what's left
by encoding the more frequent values with fewer bits and less frequent values with more
bits. Doing so reduces the average number of bits
per symbol. This process is called entropy coding. Both run-length encoding and Huffman coding
are lossless compression methods. No information is thrown out in these steps. The compression is achieved solely by storing
redundant data more efficiently. This type of compression is used to compress,
transmit, and store many types of data, including images, audio, and documents. When it's time to decode an image, all these
steps are reversed. Since some information is lost during subsampling
and quantization steps, the decoded image won't be identical to the original one. However, the compressed images should look
almost as good as the original ones when a reasonable compression rate is used. Compression artifacts become more visible
as the compression rate increases. It's hard to show an uncompressed image here
and compare it to a compressed one because the video you are watching now is compressed
as well. One type of image that JPEG particularly falls
short is synthetic images such as web graphics. Sharp edges are not common in natural images
but they are in synthetic images. High-frequency components that make up a strong
edge in a synthetic image get compressed harshly leading to visible compression artifacts near
the edges. So what to do with synthetic images then? Use another image format such as PNG or WebP,
or better yet use vector graphics when possible. Vector graphics are stored as mathematical
equations rather than pixel values. They are lossless and can be scaled to any
size without losing quality. Vector graphics are not feasible for pictures
but they are perfect for graphics like logos, illustrations, and diagrams. That's all for today! I hope you liked it. If you have any comments or questions, let
me know in the comments section below. Subscribe for more videos. As always, thanks for watching, stay tuned,
and see you next time.