NoSQL databases have become very popular. Big companies rely on them to store hundreds
of petabytes of data and run millions of queries per second. But what is a NoSQL database? How does it work, and why does it scale so
much better than traditional, relational databases? Let's start by quickly explaining the problem
with relational databases like MySQL, MariaDB, SQL Server, and alike. These are built to store relational data as
efficiently as possible. You can have a table for customers, orders,
and products, linking together logically: customers place orders and orders contain
products. This tight organization is great for managing
your data, but it comes at a cost: relational databases have a hard time scaling. They have to maintain these relationships,
and that's an intensive process, requiring a lot of memory and compute power. So for a while, you can keep upgrading your
database server, but at some point, it won't be able to handle the load. In technical terms, we say that relational
databases can scale vertically, but not horizontally, whereas NoSQL databases can scale both vertically
and horizontally. You can compare this to a building: vertically
scaling means adding more floors to an existing building, while horizontal scaling means adding
more buildings. You intuitively understand that vertical scaling
is only possible to a certain extend, while horizontal scaling is much more powerful. Why do NoSQL databases scale so well? Well, first of all, they do away with these
costly relationships. In NoSQL, every item in the database stands
on its own. This simple modification means that they're
essentially key-value stores. Each item in the database only has two fields:
a unique key and a value. For instance: when you want to store product
information, you can use the product's bar code as the key and the product name as the
value. This seems restrictive, but the value can
be something like a JSON document containing more data, like price and description. This simpler design is why NoSQL databases
scale better. If a single database server is not enough
to store all your data or handle all the queries, you can split the workload across two or more
servers. Each server will then be responsible for only
a part of your database. To give an example: Apple runs a NoSQL database
that consists of 75,000 servers. In NoSQL terms, these parts of your database
are called partitions, and it brings up a question. If your database is split across potentially
thousands of partitions, how do you know where an item is stored? That's where the primary key comes in. Remember, NoSQL databases are key-value stores,
and the key determines on what partition an item will be stored. Behind-the-scenes, NoSQL databases use a hash
function to convert each item's primary key into a number that falls into a fixed range. Say between 0 and 100. This hash value and the range is then used
to determine where to store an item. If your database is small enough or doesn't
get many requests, you can put everything on a single server. This one will then be responsible for the
entire range. If that server is becoming overloaded, you
can add a secondary server, which means that the range will be split in half. Server 1 will be responsible for all items
with a hash between 0 and 50, while server 2 will store everything between 50 and 100. Theoretically, you've now doubled your database
capacity: both in terms of storage and in the number of queries you can execute. This range is also called a keyspace. It's a simple system that solves two problems:
where to store new items and where to find existing ones. All you have to do is calculate the hash of
an item's key and keep track of which server is responsible for which part of the keyspace. Now, in this example, the range of 0 to 100
is a bit small. It would only allow you to split up your database
into 100 pieces at most. So, real NoSQL databases have much bigger
key spaces, allowing them to scale almost without restrictions. Besides great scalability, NoSQL is schemaless,
which means that items in the database don't need to have the same structure. Each one can be completely different. In a relational database, you have to define
your table's structure, and then each item must conform to it. Changing this structure isn't straightforward
and could even lead to loss of data. Not having a schema can be a big advantage
if your application and data structure is constantly evolving. At this point, it's clear that NoSQL databases
have certain advantages over relational ones. But that's not to say that relational databases
are obsolete, far from it. NoSQL is more limited in the way you can retrieve
your data, only allowing you to retrieve items by their primary key. Finding orders by ID is no problem, but finding
all orders above a certain amount would be very inefficient. Relational databases, on the other hand, have
no trouble with this. There are workarounds for this issue, but
only if you know how you're going to access your data. And that might not always be the case. Another downside is that NoSQL databases are
eventually consistent. When you write a new item to the database
and try to read it back straight away, it might not be returned. As I've explained, NoSQL splits your database
into partitions. But each partition is mirrored across multiple
servers. That way, a server can go down without much
impact. When you write a new item to the database,
one of these mirrors will store the new item and then copy it to the others in the background. This process might take a little bit of time. So when you read that item, the NoSQL database
might try to read it from a mirror that doesn't have it yet. This is not a big issue in practice because
data is replicated in just a few milliseconds. And if you want consistency, most NoSQL databases
do have that option. So, in summary: both NoSQL and relational
databases will be around for the foreseeable future. Each with their own strengths and weaknesses. So now you know how NoSQL works, let's look
at a few examples. Cloud providers heavily promote NoSQL because
they can scale it more easily. AWS has DynamoDB, Google Cloud has BigTable,
and Azure has CosmosDB. To give you another example of their scalability:
during Amazon Prime Day in 2019, Amazon's NoSQL database peaked at 45 million requests
per second. That's mind-boggling! But you can also run NoSQL databases yourself
with software like Cassandra (which was developed by Facebook), Scylla, CouchDB, MongoDB, and
more. Before ending this video, let's quickly talk
about the name "NoSQL." It's a bit confusing as it can be interpreted
in two ways. First up: "NoSQL" can mean "not only SQL,"
pointing to the fact that some NoSQL databases partially understand the SQL query language,
on top of their own query capabilities. And secondly, it's often called "NoSQL" in
the sense of "non-relational" because it can't easily store relational data. So that was it for this video. Please subscribe if you learned something
from it, and I hope to see you in the next video!