Ok. Great! Now, we understand why people use databases. However, I would imagine some of you are Excel
users and still imagine an Excel spreadsheet when we talk about tabular data. Please, don’t do that! Data tables, databases, and Excel spreadsheets
are different things! In this lecture, we will focus on the differences
between spreadsheets and databases. This exercise will be relevant, not only for
current Excel users. Those of you who do not use Excel regularly
will still have the chance to understand the advantages and the disadvantages of using
databases or spreadsheets. Ok. Let’s start with a definition
What is a spreadsheet? It is an electronic ledger, an electronic
version of paper accounting worksheets. It was created to facilitate people who needed
to store their accounting information in tabular form digitally. So, it is possible to create tables in a spreadsheet. This is one reason some people believe spreadsheets
and databases are interchangeable, while, in reality, they aren’t. There are similarities between the two. Both can contain a large amount of tabular
data and can use existing data to make calculations. Third, neither spreadsheets nor databases
are typically used by a single person, so many users will work with the data. The differences between the two forms of data
storage lie in the way these three characteristics are implemented. Ok. Imagine a spreadsheet. Every cell is treated as a unique entity. It can store any type of information – a
date, an integer value, a string name. And then, not only can we have different types
of values in various cells, but we can also apply a specific format to these cells. This is not inherent to databases. They contain only raw data. Each cell is a container of a single data
value. It is the smallest piece of information there
is. You must pre-set the type of data contained
in a certain field. This feature prevents inadvertent mistakes
– for example, in a field containing date values, should the user try to insert a string,
the software will show an error and she will have the chance to correct herself. This won’t happen in Excel – if you insert
a string in the column with date values, you wouldn’t obtain an error message, and Excel
will store the string value. In a spreadsheet, data can be stored in a
cell, while in a database, data is stored in a record of a table, meaning you must count
the records in a table to express how long the data table is, not the number of the cells. And that is it – you cannot pick a font
colour or size. All you care about is the information being
stored; you don’t care about formatting. Our main goal is to save the numbers. Another substantial difference is that, in
a spreadsheet, different cells can contain calculations, such as functions and formulas. This means, if you want to combine two integers,
the result will be stored in another cell. In a database, all calculations and operations
are based on the existing data and are done after its retrieval. There is a specific feature, called “views”,
similar to the tables, in which you can do a calculation. These objects also contain columns that can
be normal columns like the ones in the tables or could contain a certain type of calculation. There is no way you can mistake a record of
data with a calculation. The database features mentioned so far improve
data integrity – you can’t store different types of data in the same field, and it is
unlikely someone will mistake a data value for an outcome of a calculation, especially
in large data sets. Data integrity is a strong advantage when
working with databases. Naturally, you might think a spreadsheet can
contain multiple worksheets, so one can create tables in the worksheets, and then use the
worksheets to create relations between the tables. Why bother using relational databases? Well, in a spreadsheet, such relations will
be logically limited. Instead of setting up spreadsheets or worksheets,
one can set up relations between the tables, and this will boost the performance of operations,
increasing the speed with which you could manipulate your dataset. Albeit powerful for many circumstances, spreadsheets
have their limitations. Excel is incapable of handling over 1 million
rows of data. This immediately induces us to look for a
solution. Usually, the fix is to use databases, where
having 2, 5, or 10 million records is not a problem. Referring to the multi-user property, spreadsheets
are lagging. Essentially, every person must update their
own spreadsheet with new data. For instance, if there is a new purchase to
register or a last name in the “Customers” table to correct, every user must make these
changes manually. You would justifiably think Google Docs and
the latest versions of Office solve this issue, but they do so only partially. In Google Docs, you might have trouble finding
out who changed or deleted information incorrectly, which often leads to a cumbersome situation
where people have a hard time organizing their tasks. As opposed to that, you saw in the Data Control
Language lecture that databases provide a stable structure, controlling access permissions
and user restrictions. One person can make a change that is visible
to everybody instantly. This feature increases efficiency and data
consistency when using databases. Considering data integrity and data consistency,
using databases eliminates duplicate information, which is another way to save space and increase
efficiency. Look at the “Customers” table. You know a certain first and last name corresponds
to a unique email address. So, if you know John McKinley has changed
his email and you are using a spreadsheet flooded with data, you may change the email
address once and accidentally miss updating the same address in another record. This may lead to inadvertent mistakes. They can be avoided when using a relational
database - an accredited user only needs to access the “Customers” table and change
John McKinley’s email address there. Just once. Not only will this operation save time, but
it will also anticipate inconsistencies. So, what we discussed in this lesson highlights
why databases are a better environment for storing and keeping track of data when working
with multiple dimensions and large amounts of data. Spreadsheets have their advantages as well
– they are an excellent tool that allows us to carry out extensive analysis. But for the easy retrieval and updating of
data, efficiency, data consistency, data integrity, speed, and security, relational databases
are definitely the structure to opt for. They can store lots of raw data and are excellent
when separating the data from the way it is displayed for analysis. As you saw, it would be a good idea to stop
trying to visualize data tables in the form of spreadsheets. They are different. Stay tuned for the next lecture, where we
will offer more database terminology. Thank you for watching!