(upbeat music) - Alright hello, thank you all for coming. I wanna start off this talk
doing a Thought Experiment. So it's a Thought Experiment
where you think through in your head to try to learn
something about something and you usually play a
hypothetical situation and it reveals information. So I want you to think
about an application you work on that has a database and think of one that's very important to whoever it is that's
paying you to work on it. So for example, for me, I've worked on a Warehouse
Management System at Stitch Fix that managed our inventory. How much did we pay for it, who's got it, where is it, who do we sell to? Super, super important. So now think about what would
happen if the source code of that application disappeared
from the face of existence. Gone forever, never to be returned. No backups, no get repos hanging
around anywhere, it's gone. How could you recover from that? So I think you could. I mean it would be painful,
it would take a long time, it would be expensive but you
kinda know how the app works, you work on it and you've got users, they kinda know how the app works. You could figure it out. The business people paying you, they know how they want it to work and you've got the database still there, you could piece it together. It would suck but you could do it. So now take that other way. Application is fine, the
data and the database goes away for ever. No backups, no dumps sitting on S3 is just gone forever and
ever and ever to be returned. How could you recover from that? So I think that is an
Extinction Level Event, right, you would never be
able to recover from that. Because the whole reason
you have a database is to remember stuff
that you can't remember, or to share stuff with other
people who need to know it. And that's why it's there. So there is no other way to get that, you can't remember what the
data looks like in repopulated, you would be dead. So what this Thought Experiment tells me is that the data our applications manage is actually more important than the source code that manages them. Which is interesting, because
I don't know about you all, but I if I had to guess, we've all spent a lot more time learning about how to write code, then we have learning about modeling databases or managing data. I mean, I have two degrees
in Computer Science and I didn't take a single
Database Course there, they did not make me do any of that. So that's what I'm gonna try to start closing that gap on today. So my name is Dave Copeland,
or @DaveTron5000 on Twitter. How I know anything about this, despite my lack of courses
in college about it, was I was lucky enough to work with someone who
understood Database Design, who understood the
Theoretical Computer Science that like makes it a real thing. And he was very persnickety about having our databases model properly. And so he explained what that was, why that was important. 'Cause we worked on a database, it was very terribly modeled. And he gave me the language to
say like, why it was so bad. And then I've since read
books and papers about this to understand the
Theoretical Computer Science that underpins what
we're gonna talk about. And if I do a good job here, you won't have to worry about any of that, 'cause it is very hard to understand. So hopefully, we can sort of start to feel more confident
about how to do that. Also, as a side note, while
I have your attention, I'm the co-author of "Agile
Web Development with Rails 6" which just came out in beta today, if you are looking to, if you're new to Rails, and
you want a book to learn Rails, this is a very good one. This is a discount code
for getting some money off. I will Tweet it later, so
don't worry about that. Cool okay, so first, what is a Database? So a lot of things in
life are databases, right? Redis is a database, your
File Systems are database, Excel is a database. But these are not the kinds of databases we're talking about. We're talking about the kind of database that your application probably uses. And that's called an OLTP. That stands for Online
Transaction Processing. And I had a whole bunch of slides to explain what that meant. But I'm just gonna cut to the end, because it's not super important. What it really means is the
OLTP is the source of truth. It is the one place where
the facts about the world that you care about go. So that database, I talked about, how much did we sell
this item to someone for? That source of truth
that is in this database of this Warehouse Management System. It gets copied everywhere, it gets cached for performance, it gets aggregated for
Business Intelligence all over the place. But the source of truth is that database. So therefore, it stands to
reason the Database Design is how we make sure that
our database can be trusted to be the source of truth. So think about Excel, if we
just put everything in Excel, well, you can put whatever
you want in there. So how could you possibly trust that to be the source of truth? Because literally anything can go into it. So the design of the database in Excel does not really make it easy for us to trust it as a source of truth. So this then leads to the
first thing that we wanna start thinking about is, what are the facts about the
world that we want to record? Often were given a generic
business problem to solve, or a wireframe to implement. And that's totally fine. But somewhere in there is an implication that we need to capture information and make it available to someone else to answer some questions. And so that's what where we
wanna put in our database. So that's the facts that we wanna record, and we need to understand what those are. 'Cause as we'll see, the
active database design is really about making sure you understand the requirements very clearly. And so the first step is to
write down what those facts are. So we're gonna go through
an example database that is gonna start out terrible and become better and better and more, we'll understand why it's getting better, 'cause it's very hard
to understand the stuff without a running example. So I've chosen a domain
that I think everyone is very intimately familiar with, which is, of course,
professional wrestling. So we're gonna make an application about professional wrestling and store data about
professional wrestling. So here are the facts that we
wanna know about the world. So a wrestler might have a finishing move, right, that they used to complete a match, that they wanna win a match
or they're finishing move, we want a database of
that who's using what? A wrestler might wrestle
on a particular show, Right, so The Rock might
wrestle on Smackdown, Stone Cold Steve Austin wrestles on Raw, we need to know who's
wrestling on what show? 'Cause not every wrestler
wrestles on the same show. Now wrestler can also
be cleared to wrestle on a particular match of a show. So we wanna have The Rock
wrestling the main events, because that comes on last, we're gonna get a big ratings
boost from having him there. So we wanna keep track
of that in our database. But we might have Zack
Ryder who's less popular wrestling the opening match, right? So that needs to be in
our database as well. And lastly, on every show,
in professional wrestling, they're fighting over a title usually. So on Raw, they fight over
the Universal Championship. So we wanna keep that
in our database as well. Now, you don't have to know anything about professional wrestling, to start thinking about how
you might solve this problem, how you might model this data, build this application, right? You're good Rails and Ruby developers, you're gonna start looking
at all these little nouns that show up, right? Nouns are the key to know what to do. We're going to have our wrestler table finishing move attribute a show table. Well, you can start to
see it form in your mind, see an object model kinda form. So you know, you might
do something like this. Okay, we've got our wrestlers table, our shows table will know
what shows have what matches by joining them together. And then if we want to know what wrestler wrestles
and what show what match, we'll have another table,
this brings it all together. Now, if you haven't done a
lot of data modeling at all, that probably seemed like
incredibly fast and confusing. So from here on out, we're
gonna go very very slow. For those of you that
have done this before, you might feel like
yeah, by gut, you know, we can just kind of do it. And I've done that many times, and you have a gut feel about
how things should relate based on how you understand things. And this data model is not bad. It seems to meet our needs, but it's got some issues. So suppose that Zack Ryder wrestles on the opening
match of the show Superstars. so we could store that in this model. Now, suppose that Superstars
doesn't have an opening match? Well, that can also be
stored in this model. So which is right, we have a data model
that allows us to store two pieces of information
that don't agree. So that's not good, right? We don't want to have that situation and that's the case, we can't rely on this to be the source of truth. So what that type of thing
is, is called an Anomaly. So Anomalies is when the data model prevents the storage of certain facts, or allows ambiguity to
exist, like we just saw, or requires deleting one fact in order to delete another fact. We'll see some more specific
examples of these later. But the takeaway here is that a data model that has Anomalies can't
necessarily be trusted to be the source of truth, because it allows you to
store wrong information or not store right information. And so Normalization
is the process by which we remove Anomalies from our designs. It sounds very formal and formality is kind of two sometimes. But it actually is gonna be very helpful, because what we're gonna see
is a little bit of formality allows us to make very strong statements about the changes we're
gonna make to our data model. And that therefore gives us confidence that we're improving the
quality of our data model. So like I said, we're not
gonna get into a ton of Math, but I am gonna use a few of the terms from this area of research mostly. So if you actually wanna
know more about it, you know what the words
are to start kinda googling and start your search towards the great depth of knowledge here. So Normalization is a process
we're gonna learn about here. Normal Forms or Mathematical
truths about the data model, that tell us what sort of
Anomalies have been removed. So Normal Forms have an
order lower to higher. So a lower Normal Form is
open to more Anomalies, i.e it is a worst data model, i.e it cannot be trusted
to be the source of truth. A higher more Normalized Data
Model has fewer Anomalies, and therefore can more be trusted
to be the source of truth. So Normalization is doing that. Promised is not gonna be a ton of Math. So let's get back to our domain here. So instead of trying to
make some object graph and attributes and stuff like that, let's just throw everything into Excel and see where that gets us. So here's our data model, we got a column for everything
that we wanna store. And we just put some rows in there to kind of keep track of this, right? The Rocks finishing
move is The Rock Bottom, he wrestles on Raw and Smackdown. Raw is where the Universal
Championship is defended and The Rock wrestles in the main event. Alright, so we can store some
facts about the world in here. Now, probably nobody thinks
this is a good database design. Like, but how do we know it's
not a good database design? Yeah, we can say it looks gross, so it's ugly, or there's duplication, but we can make much more strong statements about it. And so we're gonna go
through this process, right? So right now we have kind of whatever, we could just put
anything we want in there, no rules to hold anything. That's why it's not a good design. So the first step is going
to be to modify our table into what is called a Relation. I'll talk about that in a second. But a Relation is like the bare minimum of any kind of design of a database. And from there, you can start
to make real improvements. And the way you make those improvements is actually pretty straightforward. And those improvements will
lead us to what is called a Normalized Schema. And again, it will be in what's called Boyce-Codd Normal Form. Again, that's for googling, do not worry too much about that. The point is, it will have
provably fewer Anomalies, and therefore provably better, and therefore more
likely to be relied upon as a source of truth. Okay, so I talked about
a Relation, what is that? It's gonna seem kinda, like not much. So Relation is a table that
doesn't have any duplicate rows, or every field has a single value in it. And the fields of each column
are all of the same type. This sounds kind of reductive,
and not that interesting. But this is just the baseline,
this is the bare minimum. And if you remember what our
Excel spreadsheet looked like, you can already see it doesn't exactly meet these needs here. So before we see what
our database looks like, as a Relation, I want to talk about Types 'cause I did mention that, right? They all have to be in the same Type. Now Type isn't like category theory, high school level specificity, right? This is a design. So you just need to kinda
think about it a little bit. Think about all the names
of wrestlers in the world. Well, think about all
the strings in the world, there's more strings than
there are wrestler names. So you can see the kind of just there. So something just really basic. I just want to point out
the NULL is not a value. So therefore NULL cannot
be in allowed value. So therefore no field in our
database will allow NULL. Now we'll talk about
what that means later, when we actually write code
to deal with this data model. But until then, NULL
is not allowed at all. And we won't need it,
it's not gonna be a thing that we have to worry
about, we won't need it. So we know what facts we want to record, we wanna get them in
some form of some tables that we can call a Relation. So here's our Excel spreadsheet, we can see some problems, right? Universal Title, that's
not the right type, it should be Universal
Championship, so that's a problem. Two values there, right? We want single values
not two, same deal there. So how can we resolve this? Well, we can fix the Universal Title, change that to Universal Championship. And then we can just duplicate
every single row in here so that we don't have multiple rows, or multiple values, right? So same data, right? So we see The Rock is in there twice, because he's on two shows. So that's why there's two rows there. But this is a Relation. So the is no duplicate rows,
every field has a single value. And everything has the same Type. You don't see The Rock in
the finishing moves column, you don't see Smackdown
in the Title column, it all kind of makes sense. And it's not good and It's not great. It's still open to Anomalies
but it's at least a base that we can start to say
definitive things about. And so this is what's
called First Normal Form. Now, it is open to Anomalies, as we said. Kenny Omega is a wrestler,
he doesn't work for WWE. But he does have a finishing move, we'd like to store that in our databases, our database should do that. But it can't because he's not on a show that defends a title or a match. If we were to remove The
Rock and Kevin Owens, we would also lose the knowledge that Raw is where the Universal
Championship is defended. So that's not good. Stone Cold Steve Austin
changes its finisher to the million dollar dream, we have to remember to
do it in two places, otherwise, it's confusing
as to what his finisher is. So these are all the kind of Anomalies that our data model allows that we're going to now eliminate. But because we've actually
gone through the process of taking whatever to a Relation, we can now actually use a
well defined process to do it. That process requires that we capture the business
rules in a certain way and it's very very lightly formalized, like very lightly formalized. We need to capture it
in terms of two things that I'll talk about, Functional Dependencies and Keys. And so we can write
the business rules down in terms of these two things, we can literally execute an algorithm to improve the design of our database. So let's talk about
these two things separate we'll put them together, hopefully, it'll make sense. So Functional Dependency is
when the value of one column unambiguously implies the
value of another column based on the business rules. So what does implies mean? So looking at example, right? A wrestler has a finishing move, if I wanna know the
finishing move of a wrestler, all I need to know is the wrestler. Therefore, wrestler
implies finishing move. Same with Show and Title. So things that are not
Functional Dependencies. A show can have more than one match, So therefore there's not a
Functional Dependency there. Because if I have the name of a show, I can't necessarily get a single match, I might get zero, I might
get one, I might get two, I might get five. So there's not a Functional
Dependency there. Also know that Functional Dependencies don't necessarily go the other way. We saw in our example,
data that a wrestler, multiple wrestlers can use
the same finishing moves. So therefore, finishing
move doesn't imply wrestler. If I have stunner, well, I don't know what wrestler uses that 'cause there's multiple
wrestlers that could use that. And again, this depends
on the business rules. What if we decided in
our world of wrestling, no two wrestlers could use
the same finishing move? Well, if that were true, then there would be a
Functional Dependency. So you can see how this is about capturing the business rules. And so the way that you do this is, you write them down in this format and then you talk to
your business partner, like is this is this correct? Is it true that if I know the wrestler, I can know the finishing move? And you can talk through
and it's a way to capture what the actual requirements are. So I also wanna point out, there's a difference
between the data itself satisfying the Functional Dependencies and the data model doing it. So this database, this data model is an Excel spreadsheet, more or less. And the data that we have right now does satisfy the Functional Dependencies. But that's not that important. The data model is what needs
to sort of adhere to this, because this data model
allows us to do this. Now the data does not satisfy
the Functional Dependencies. So as a statement of our design, right, a design in which
we can execute a query based on a wrestler and get
an ambiguous result back about what they're finishing move is, is open to anomalies, because by the business rules, we should not get an ambiguous result if we query for a finishing move based on just the wrestlers name. So hold on to that for a second. Now, let's talk about Keys. And then we'll bring it all together and I hope it will all make sense. So you've probably heard of Keys before, Primary Keys, IDs, things like that, Foreign keys, right? So what we're talking about now, a Key is simply a set of columns that uniquely identifies a row. Said in another way, no more than one row and our database can have the same set of
values for those Keys. There can be many Keys in a Relation based on whatever the
business rules happens to be, the business rules to say,
what forms of duplication are and are not allowed. Pointing out that all columns
of a table do form a Key because remember, Relation
is single, no duplicate rows. So therefore, all of the
columns of that table are Key. But the business rules
revealed the other Keys and that's what you have to do. So according to our business rules, a wrestler cannot wrestle the same match on the same show more than once. Even in wrestling, that
would not make sense. So therefore, wrestler
show and match is a Key. There should be no more
than one row on our table where wrestlers show
and match is duplicated. If I have a wrestler
and a show and a match, I can get an ambiguous
answer from our database. That's what a Key means. And so you have to find
out what those Keys are. And you have to do that
by asking questions of the people that want you to build this application to verify. Yes, that is true. No wrestler can wrestle the same match on the same show more than once. They will tell you that, and then you will then
determine that that is a Key. Okay, so the data satisfies this Key. In terms of the data
model, satisfying the Key, we simply just need to
indicate what the Key is. So you can see the little Key icon there, we'll see how to make
sure that gets enforced in the database a little bit later. But for the purposes of modeling, you just sort of need to
state this is the Key. So now you got Functional
Dependencies and you got Keys, now we can bring them together. And remember, we did this by understanding the business rules, looking at the data can help
you see what these might be, but this is a way for you to
verify that you understand the domain that you're actually building. You understand what you're
trying to accomplish, what problems are you trying to solve? So what are the implications
of these two things? So to get unambiguous information, we need a query that
gives us zero or one rows. Therefore, we need a Key to do that. That's the definition of a Key. Now our Functional Dependency between wrestler finishing move tells us that we should be able to get unambiguous information about
our wrestler's finishing move simply from just having the wrestler. But the wrestler is not a Key, the wrestler is part of the
Key but it is not itself a Key. So that means that if any query that is just based on
wrestler cannot be guaranteed to give us unambiguous results. It could give us ambiguous results and we saw that right? I have the wrestler and what
is The Rock's finishing move will depends on which
row we're looking at. So this is how we know our database is allowing us to store data that violates the business rules. The design of our database, because it is allowing us to write queries that return ambiguous
results, when it shouldn't, does not satisfy the business rules. So how do we solve that? So this was all the hard
part for you doing design, understanding and capturing and
looking at the requirements. To fix this right, what do we need? Well, if we had a table and wrestler was the Key of that table, and that table also had the finishing move of that wrestler in it, then therefore, we could query that table to get an unambiguous result. Therefore, the Functional
Dependency would be satisfied if that were the case. So let's make that table. Let's just create a table that has that and see what happens. So make a table that has the left side of
our Functional Dependency, which is wrestler, that's going
to be the Key of this table. We're gonna populate the
other fields of this table with the right side of
the Functional Dependency which is finishing move. Now because we have this, we don't actually need
the finishing move column in the original table, so we'll just remove it. If we want to connect
everything up together we can. So if we put our data back
into these two tables now, it's starting to look a little bit better, and if we wanna connect everything right, the wrestlers names match on both tables. That's how we can join them together. And we could even join them together to create the original table
as a view if we wanted to where we can recreate
exactly what we had now. And if we repeat this
process with show and title, which works exactly the
same way in this case, that starts to look more like a data model that we might have come up
with by our gut instinct. We've got, there's no duplication, it's looking pretty good. So let's kind of talk a
little formally though, we can see where the keys are. Now if we put the Functional
Dependencies on top, we can see that all of the
Functional Dependencies are dependent on a Key. Meaning we can get unambiguous information
out of our database, which we couldn't before, by definition, we can't have The Rock in the finishing moves
table more than once and so therefore, if we wanna
know that his finishing move, we query that table and
will definitely tell us. That's cool. What we just did is underpinned by a ton of Theoretical Computer Science and Math. So this isn't just me showing
you a thing you could do this is like a actual algorithm proven by Math to do what we just did. So we just turned our original Relation into what is called
Boyce-Codd Normal Form. And what that means, is that our data model has no Anomalies based on our Functional Dependencies. Meaning based on our understanding
of the business rules, our model does not
allow us to put data in, that violates those business rules. And that is, not like a gut feeling, that is provable, like that is, you can't like say that's
not true, that is 100% true. The Math proves that we did that. And now that we have the structure, all of that data that we wanted to insert or delete or change that we
couldn't before, now we can. We can see Kenny Omega is in
our finishing moves table, even though he doesn't work for WWE. We can store that fact
now, we couldn't before. We can change The Rocks
finisher to the People's Elbow and we won't worry about
changing multiple places, like we can do that in a way
that is safe and reliable. We can add Ricochet who
has no finishing move but he wrestles on Raw. We couldn't store that before, now we can. So now our data model
based on our understanding of the business rules can store everything we needed to store. So what we just did was we executed Heath's Theorem, again, is the thing you can
Google if you wanna know more but the takeaway is that
Heath's Theorem proves that what we just did,
actually does remove Anomalies. It actually does make
the data model better provably objectively, and
that we didn't lose any data. So that was a very safe thing that we did, and there's no debating about it, we don't have to debate
the elegance of our model or have gut feelings about it, it's proved that we made
it better by doing that. So we wrote down the facts
we wanted to record partly as understanding the
business requirements, we made sure that they were
in a Relation of some kind. In our case, we just put
everything in one table but however many tables you have, they should be a least a Relation. And then we wrote down our understanding of the business requirements in terms of Functional Dependencies, and Keys. That's the hard part,
that's all the thinking. Once you've done that, executing Heath's Theorem, you could write a Ruby program to do this, I can show you one that I wrote. It just looks at the data model, looks for places where the
Functional Dependencies or not dependent on the
key of the table there in, and redesigns a database to
do that, it's mechanical, which is not my intuition when I was really first
learning about this, it seemed like this fourth
part was the thinking, what are my tables going to look like? But that's actually not it. It's more, what are the business rules? Okay, so this is kinda
all theoretical, right? We didn't see a field called ID. I said the words Primary Key, but we didn't really talk
about what that meant, and if you've done anything with Rails, you'd probably but, this is probably the only thing you've heard of, right? There's definitely ideas and that thing is called the Primary Key. Where is all that? And what is that doing? So let's talk about that. So a Primary Key is usually just the Key that we all agree to use most of the time. There's nothing particularly
special about it. And in our case, wrestler show and match, that is the Primary Key, that's the main Key of that First Relation that we were looking at. That's how we get information
out of that table. And you'll notice that that's
based on the data itself. So that's called a Business
Key or a Natural Key, 'cause it naturally arises from the data and that's why it's so
important to identify because it captures the
fact that you understand the business domain by understanding what data can and cannot be duplicated. People also understand that right? The users understand this stuff, 'cause it's capturing business rules. But you can imagine a problem, what if we wanted to
change The Rocks name? Well, we're using his name as a way to cross reference all
the tables together, so we have to change it
in many different places, and even if they tell us, they're not gonna let wrestlers
ever change their names, well, that's not gonna stick, they're eventually going
to change their minds and their database is going
to be really screwed up, it could be impossible to fix things if we're using the actual data itself, to cross reference in other tables. So to get around this, and
this is what Rails does, we create what's called a
Synthetic Key or a Surrogate Key. Surrogate it stands in
for the Business Key, it stands in for the Natural Key. And the idea here is that the
Synthetic or Surrogate Key is just a made up value
and Rails is 123456, has no meaning at all. So we can use that to
connect all of our tables and cross reference them, and then no matter what
happens to the actual data, while our linkages
between tables are safe, because those values
have no meaning at all. So that's why Rails does this and it is a very good practice,
you should always do this, there's no real reason not to, because these things happen, the data, the rules will change. We want our data model and how its implemented in the
database to be as resilient to that change as possible. So let's add some Synthetic
Keys to solve this problem. So we're gonna take wrestler
out of finishing moves and replace that with the wrestler ID, will take show out of titles and replace that with show ID. We'll put the show ID and wrestler ID and our wrestler shows table, every wrestler has an
ID, every show has an ID, so that should sort things out. And here's a good example of a
table that has multiple Keys. So let's insert some
data into our database. So this data satisfies all the Keys, satisfies all the
Non-Functional Dependencies, but clearly something is wrong. I mean, The Rock shouldn't
have multiple IDs. An ID should uniquely identify a wrestler. You could say an ID implies wrestler. So that's a Functional Dependency that we failed to identify. So wrestler ID implies
wrestler, show ID implies show. So now that we understand
these Functional Dependencies, we can see we have
Functional Dependencies, they're dependent on part of
a key, not the entire Key. So therefore, our database is flawed. This is why our database
allowed that weird data to be inserted into it
after we added these fields. Okay, the hard part is done. Now we know how to fix this, fixing this doesn't require
nearly as much thought as it did to get to this
part where we can fix it. We make two new tables
that have as their Key, wrestler ID and show ID. We move wrestler and show
into their respective tables, we remove them from the
wrestler shows table and now we're back where we started. Free of all the Anomalies based on the Functional
Dependencies that we understand. Provably, there's no way
you can say otherwise, unless we missed a Functional Dependency. Now if we throw our
other tables back in here to get a whole view of the
data model as it stands now, this looks pretty reasonable. This looks like what we
might have come up with it looks kind of squint
similar to this thing. This is what we had first. But this we can make
definitive statements about, no one can tell us that
it's open to Anomalies, based on our understanding
of the business rules. No one can tell us this is bad. We can't say exactly what
this doesn't doesn't do. We can't say anything about this. And so that's kinda the power and taking a little time to write out what the business rules are, and use those to drive
your database design. Okay, so we're still kinda
in theoretical land, though. So how do you take this and actually use this in
your Rails App, right? If you're used to doing
bin/rails g migration, or Rails DB Migrate, what you're doing is manipulating what's called the Physical Model, the actual tables in the actual database. What we've been talking
about is the Logical Model. And so everything we've talked about, your business partners, whoever you're working with, they should better to
understand that model. They don't necessarily need to understand the Physical Model, but they should be able to
understand the Logical one. So that's what we've omitted a bunch of specifics about the database. But now, let's talk about
those 'cause it is important. So the same way, as
you look at a wireframe as the design and the actual Rails views as the implementation, well, what we just did is
the design of our database and so now, when you talk
about the implementation, that's what's the with
the Physical Database is. And so there's three concerns, you wanna make sure that
your Physical Database enforces all of the Keys, not just the Primary Key
that Rails creates for you, you wanted to enforce all the associations between the tables, and you wanna do some association or are some enforcement
of the types of data that go into your database. So the first two are
relatively straightforward. To enforce Keys, you make
a unique index like this, so this basically says,
there can be no two rows in the wrestlers table
where the name is the same, this tells the database to refuse to allow that data in there. So this enforces that Key, you can enforce a Key across
multiple fields like this , so this says that no row
should have the same value for wrestler at show ID and match, and the database, again,
will enforce this for you. Associations can be enforced
using foreign key constraints. So for whatever reason, Rails
doesn't do this by default and there's really no
reason not to do this. So in context here, what
this Foreign Key concept says is that if I insert into
the show titles table of value or the show
doesn't actually exist in the show's table, then the database will
refuse to insert it, because that would be
a dangling reference, we don't know what it means. And it also means that if I deleted a show that show titles I was referring to, it won't let me delete that show, because that will make
our database inconsistent. So you always want this, you wanna rely on the database
to get out things like this. And these are, these things are easy to do 'cause you've understood
them from your model, you just need to kinda type
them in in your Migration Files. Types are a little harder,
and maybe impossible. Think about the allowed
values for wrestler name. Well, there's kind of a lot, there could be an infinite number. But there's certainly fewer wrestler names than there are strings. So you know, something has to happen here. Now, we talked about NULL,
and we'll talk about it again. But we definitely know that NULL is not the name of a wrestler. Maybe a wrestler could be
named to the string NULL, but not NULL. So we can tell the database,
don't let NULL be inserted. That's pretty easy. You should probably do that
on most of your fields. But it gets a little
bit more subtle, right? So these three strings,
these are different strings, but they're all the same name, they all say The Rock essentially. So how do we wanna deal with that? What about this? Certainly no wrestlers name
is like a bunch of spaces. Maybe it'll be a tag team
called a bunch of spaces, but it wouldn't be a bunch
of ASCII space characters. So this is certainly not
the name of a wrestler. So you have to think
through the trade offs here. How likely is it that the bad data is gonna get into the
system in the first place? And if it does, how bad is it? If it's very likely, in very bad well, you need to do a lot to make
sure that doesn't happen. But if it's not likely,
or maybe not a big deal, maybe you don't need to
over complicate your life. So as an example of things you can do, so this tells Postgres to make a case insensitive, unique index, so we inserted The Rock, and then we later inserted
The Rock all capitals, this will prevent that from happening. We could also use a thing
called Check Constraints that's very, very powerful. So this tells Postgres, before
you insert a wrestlers name, trim the white space from either end and if the result is the empty string, then do not allow that to be inserted. So this would handle are a wrestler whose name is like eight spaces in a row. This you can go deep
on this if you need to. Rails validation is, of course,
you can do a similar thing, this is a little easier to understand, it's way more powerful, but of course, Rails has to enforce this. You have to tell Rails to enforce this, you can circumvent this easily so this won't actually prevent it and you know, uniqueness
doesn't exactly work, but again, it's a trade off, how likely is it gonna happen? Now, what about NULL? So NULL it's still not a value, right? That's the way we did by definition. But when we write Ruby or Rails code, Nil is value, Nil a thing that shows up like we kind of have to deal with it. There's no easy way to to avoid it, so let's see kind of how that plays. So if we were to continue
making our Physical Database, match exactly our Logical Model, we make this finishing moves table that has the name of the move, NULL false, and it references a wrestler. Let's find that work. Now remember Ricochet,
who has no finishing move? So he just wouldn't have an
entry in this table at all but what would it look like in our code? We can't do this because
if wrestler is Ricochet then finishing move is NULL, we gonna know pointer exception. So we can use this lonely operator thing, you're gonna litter your code
with all, that kinda sucks. So maybe you wrap it up and some methods to kinda save yourself from that. I mean, there's really no
way to avoid the Nil here, it just there's no way
to real way around it. And I think that matter. That matters to how you're actually creating the Physical Database. So if we don't need to know anything else about a finishing move
other than his name, maybe we'll just put it
on the wrestlers table and not make a separate table, and will allow it to be NULL. And will be nice and
explain what NULL means even though it might seem obvious. That's a trade off that
you might wanna make. This will make the code a lot easier, it's really not making a huge difference to have a database as implemented, and this is the difference
between Logical and Physical. So, general guidance, this kind of summarizes
what we just talked about. Create a unique index for
all the Business keys, you've gone through the
trouble of identifying them to figure out your data model, you should have the
database, enforce them. Always use Foreign Key Constraints. I do not know why Rails
doesn't do this by default, you should totally do this by default, there's really no reason not to and every reason to. Default to NOT-NULL, NULL is not a value, so
you should default to it. Understanding that you might
need to make a trade off or you might allow it, like we just saw, and if you do that, do it knowingly, do it with a good reason and write that reason out in the comments. And lastly, when you're
thinking about Types, right, you have to do that trade off. And if you're using a powerful
database like Postgres, you have a ton of tools available to really enforce at the data level, what's the allowed values can be. So if that's super
important, then do that. If it's not important
that's fine, don't do that. Don't overcomplicate your life. So in conclusion, I hope all this made sense, I really did. This took a lot of reading
to figure out how to explain, so I really hope it's
sunk in even a little bit, and not to make it even worse, but of course, Boyce-Codd Normal
Form that we talked about, it's not perfect, your data model could still
be open to other Anomalies that are very complicated to explain, but nevertheless exists. But Boyce-Codd Normal
Form, what we just did, if you do what we just did always and your data models are always like that, you're in really, really good shape. Like that's excellent. So that's a good thing to aim for. When you're editing the model, when you have to make
changes to the database, change the model and see
what that change implies to the implementation. You saw what happened when
we added those ID fields, it wasn't obvious at first, that we like actually created
a real problem in our database that we were able to get rid of, by sticking around in the
model before we got into it. And again, I've said
the word business rules, the phrase business
rules about 1000 times, but that's what this is about. This is about you understanding
the business rules, you understanding what
problem you're trying to solve and how you're solving it. And only when you have that understanding, start modeling your data. Because remember the
Thought experiment, right? The data is more important
your app in almost every case, so treat it that way. So that is what I got for you, thank you. (members clapping) Should do Q&A? We got a couple minutes
for Q&A if anyone has any? - [Man] (mumbles). - Are those situations where you don't wanna do
Boyce-Codd Normal Form. So making a data model to
be the source of truth, I would say probably not a way you might not wanna
do that as if you have a separate database that you've sort of, what they called the
Denormalized for quick access. So you might copy the data into a form that doesn't require so much joining, but you always have that data somewhere that is kind of fully normalized. So I have a Blog post and
many opinions on this, and I'll try to sum it up. My opinion is that Rails validation are for helping users not screw up, and Database Constraints are
for keeping the data clean. So sometimes you use
one and not the other, sometimes you use both. But I definitely don't
make them mirror each other 'cause that doesn't seem like
it makes sense again to me. But I also know there are some gems, you can get that kind of
try to keep parody there. Some people do actually like Rails to mirror the Database Constraints. So it's a little bit of personal style. But for me, I don't know why
I am making a validation? If I have the Constraints, and I'm not gonna Rail fact I'll Rail but, anyway, that's a very quick summary of my strong opinions on that. What resources to dig a little deeper? So the Wikipedia articles
on this stuff are not bad. I read a book, I'm gonna have
to dig up the name of it now. But it basically explains
this in one level more but I think the Wikipedia, the Wikipedia articles
are actually pretty good. And they walk you through this stuff with a maybe a better example than professional
wrestling or worst example, depending on your thoughts. But yeah, that's a good place to go. And those will have links
to some of the literature and other books on it from there. Okay, I don't see any hands. So, all right. Thank you all. (members clapping) (upbeat music)