[MUSIC PLAYING] STEPHANIE WONG: You can't really
talk about storage at Google without talking about how
our production systems work. How are all the
hard disks and flash drives on individual
machines connected? Let's boil it down. Data is stored in hard disk
drives or solid state drives. SSDs and HDDs are
deployed separately from the applications. For example, they can be
stored in a dedicated storage tray like this one. In other words, we separate
Storage from Compute. Data needed by a
machine is typically not even in the same
rack because machines can access data in a
different physical location through our robust
global fiber network. Splitting Storage and
Compute lets us scale them independently as demand grows. And machines can process
requests much more efficiently. We organize Storage
and Compute racks into rows of
physical enclosures, organizing groups of
enclosures into clusters, and putting multiple clusters
in a single data center. Each cluster depends on their
own power, cooling, and network infrastructure,
a deliberate part of how we design for data
protection and reliability at scale. We've built our own
warehouse-scale machine out of hundreds of thousands
of relatively inexpensive machines. At Google, it's rare to
dedicate an entire storage appliance in our data
center for storing only one product or service's data. Instead, we spread a workload's
data across multiple machines, and workloads share network
access to that storage. Encryption is inherent
in our storage systems, and all data is encrypted
prior to being written to disk. But how does Google make
storage accessible and scalable across a global
fleet of machines? Remember, at
planetary scale, it's not unusual for individual
machines, racks, or even entire buildings
to periodically fail, so we need to build software
to make the data stored on them durable. That way, when they
fail, no data is lost. Every layer of
our storage stack, down to the file system and the
layer that writes to storage devices, is a shared service. Services like Search,
Photos, and Gmail workloads share machines
at the data centers. Resources within each machine,
like Compute and Memory, are allocated to each service. At a machine level,
disk, or D for short, is a system that exposes
the hard disk and SSD drives attached to individual
machines to other services in the cluster. It manages access to each
machine's disk capacity to maximize utilization. At the cluster level,
most data we store has high durability and
latency requirements, so we built a file system called
Colossus on top of D, which is the foundation for many
services like Cloud Storage and Bigtable. At the Colossus level,
files are broken down into a set of chunks that can
be stored on different machines in the cluster. This data replication
across machines is the key to fast
recovery and fault tolerance against things
like network failures. For a given chunk, Colossus
identifies a machine to write the chunk to, and
the client sends the chunk to the D service that
runs on the target machine to perform the write. Security is ensured through
the encryption of each chunk, and each chunk has a
unique encryption key. Because a single machine can
be running multiple services, and conversely, a service can
be running on many machines, services are constantly
adapting to the amount of resources they use. Now you can see how distributed
data better utilizes our machine capacity and gives
Google Cloud Services higher reliability and performance. For Google Cloud users,
you have the option to store data on local
solid state drives, zonal persistent disks that
map to clusters across zones in the same region, or
regional persistent disks across cloud regions or
even storage buckets. You can store data in a single
region for high performance, dual region for high performance
and high availability, and multiple regions for
the highest availability. Your data maps to
different clusters and protects it from zonal
and regional failures. D and Colossus manage access to
the storage, store data safely and securely, and
efficiently use our hardware so you get the best
performance possible. All of this is possible because
of the speed of our network fabric. More on that next time on
Discovering Data Centers. [MUSIC PLAYING]