Complete guide to Database Normalization in SQL

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone welcome back in this video let us understand normalization in SQL also referred to as database normalization so I'm going to split this video into three different sections in the first section I'm basically going to give you an introduction of what normalization is different levels of normalization and some key points in the second section I'm basically going to explain everything with proper example I have a data set that I have taken from kaggle it's in denormalized format we are going to normalize it step by step live in this video by using the different levels of normalization and finally in the third section I'm basically going to talk about the different anomalies that exist in a denormalized data set such as insertion anomaly deletion anomaly and updation anomaly we'll also see how we can avoid these anomalies by having a normalized data set what is normalization now in the most simplest of terms normalization is the process of Designing a database effectively such that we can avoid data redundancy now in turn this will help us to avoid anomalies such as insertion deletion and updation anomalies now I'm going to explain these anomalies in the later section but before that it's important that we understand what do I mean by data redundancy and designing a database right so let us understand data redundancy basically it means data duplication in other words we have seen different tables and Excel files or spreadsheets where we have duplicate records right we all know how that looks like and that is a form of data redundancy but that's not the only form so imagine that you have a table it has a primary key which means that every record has some unique identifier so there are no duplicate records as such but still you can have certain list of columns where the same values are repeated for multiple records even that can be considered as example of data redundancy for example a table with 10 columns 1000 records and out of those 10 columns five columns would have repeated basically the same values getting repeated for for a lot of Records maybe 100 records or 200 records and so on right even this can be considered as data redundancy if we had properly normalized that database then we could have avoided this kind of data redundancy okay so I hope this is clear now let us understand designing a database what does that exactly mean right so database mainly the main purpose of having a database is to store data right now database will store data in table so there are databases databases will have tables and table is the object which is going to store your data right now it does not mean that you just create one single table to store all your data right generally what we do is based on our requirement or our project need we would be creating a list of tables and we would create each table to store one type of data and then we would try to build release between these tables right now this whole process of coming up with multiple tables to store a specific type of data and then building relationship between these tables can be called as the process of Designing a database right now if I have to summarize I can say that normalization is the process of effectively designing a database such that we can avoid data redundancy okay so now you understand what normalization is let's understand the different levels of normalization now so let's say you have a normalized data set and you you have been asked to normalize it now there is no magic button that you just click and your database automatically gets normalized generally what you need to do is you need to follow certain rules there is a step-by-step rule that you need to follow in order to normalize your database now generally what happens is there are different levels of normalization that you can do on a database okay now each of this level will have certain rules as soon as the database will satisfy this rule you can say that the database is satisfying that particular level of normalization now generally we have the first level of normalization is called as first nominal form then the second level is called as a second normal form then we have third normal form fourth normal form BC and fifth normal form six normal form Etc right now each of this normal form basically is a level of normalization under each of this level there are certain rules that we need to follow as soon as so let's say say all the rules of the first normal for me for database satisfies then we can say that that database is in first normal form as soon as that database will satisfy all the rules of level 2 that is second normal form then we can call that database to be in second normal form okay now one thing that you need to remember here is even though we have so many different levels of normalization the golden standard of normalization is generally third normal form most of the companies today try to normalize their database only until third normal form there can be some specific cases or specific requirement where a company or an organization or a project team might want to normalize your database even further than third normal form but in general in most cases generally the normalization happens until the third normal form and that is why in this video I'll be talking only until third normal form okay so now having said that I hope you understand all the basics now let's look at a proper example of a denormalized data set that I have taken from candle and will try to normalize it using different levels of normalization okay so as you can see this is the denormalized data set that I have taken from kaggle now of course when I took from kaggle it had a lot more records I think thousands of Records but I have just taken a handful of them so that it becomes easier for me to explain and easier for you to understand right now this is a sales order data set and I'm calling it denormalized because basically what a denormalized data set basically means is that all the data all all the different type of data is bundled together into a single file or a single table and that is exactly what we have okay once you look at this data set once you understand this then you will realize why do I call it denormalized okay now let's first try to understand this data set what does it actually have and then we will try to normalize it step by step okay so this is a sales order data set basically has information about all the different sales order that has happened so the First Column is an order number basically the ID for or a number for your order then there is a quantity number of items that was sold as part of that particular order the price of each item the total cost of the order the order date and the status of the order okay then we have information about the product the product which was actually sold as part of this order so the product code is mentioned here product name and the price is mentioned in these two columns then we have the customer information the customer who actually purchased this order now you can see that these are not individual customer these are corporate customers right so this is some companies and there is also information about these companies their phone number their address their City postal code Etc and finally we also have couple of columns about the sales agent who are responsible for this order and at the end we have the deal size basically indicates how big this particular order was right so this is basically our data set it's not a very complex data set I have actually reduced some of the columns just to keep it short and meaningful for this video okay now let's say you went into a company your manager gives you this file tells you that hey uh whatever your name is maybe let's say your name is David okay hey David I want you to normalize this data set and you can create a new database and make sure that when you create you normalize that database as per this particular data set okay so what will you do so as I told you there is no magic button that you just click and your database will completely get normalized you need to follow step by step process you need to follow each level of normalization to normalize your database right and that is exactly what we will do now the very first step in normalization is to make our database satisfy the first normal form now as part of first normal form we can see that there are two rules that we will need to follow the rule number one is that every column or attribute basically they are the same attribute or column basically means the same every column needs to have a single value okay that's the rule number one okay now if I go back to my data set here and let me just copy this data set and move it to this next sheet so that we can make it a normalized step by step so I'll just put it here okay and now if I look at this data set is there any column in this data set which does not satisfy rule number one let's look at that the rule says every column should have one single value now I can see that almost every in fact every column here has one single value except for some of the columns in the last record here I can see that there is a comma by comma separated values so there are two values here there is two values here and then you can see here also there are two different product codes mentioned uh here are also two different product names mentioned right so what this means is basically this was like there was a single customer who purchased two different items but the company basically provided just a single invoice order okay so in the single invoice or in the single order statement they have basically merged the details of two different items ideally this should not happen right so don't read this to be something like when you're going to a supermarket when you purchase 10 different items and you will get just one bill or invoice when you go let's say you go with your brother and you're buying a car you would definitely be given two different invoices right there would be two different cards would be built separately and here we are talking about big products right as you can see here right so ideally this is not correct right we cannot be having two different items uh Build Together into a single order right ideally this should have been split into two different columns and this also does not satisfy the first normal form each column should have one single value and that is exactly what we will need to do first so first of all even before we look at the rule number two let's just try to change this data set such that it will satisfy the rule number one so we will need to split these multiple values in the column into so each value should have its own record okay so what I do is I'll copy this whole thing and I'll paste it here and I'll remove one of the values here so I'll put 1 4 here and and I'll put another 4 here okay and then again the same thing 780 and 850 so 780 and 850 right and then okay this one I need to multiply 4 into 780 because it's a total cost or quantity into the item of each price of each item so I think this one will be three one two zero and this one should probably be three four double zero okay and then the products so let me just keep the first product in the first line and the second product I'll move it to the second line here okay and the same thing with the product name buses will be at the top and the trucks will be my last uh order okay and the same thing with the price 800 and 900 right hopefully I think that's all I have basically split all the double valued attributes or columns in now having single columns okay only single values in each column just that one of the record has increased right so imagine if you had three values here then you had to basically split it into three records okay I hope that makes sense so now I can say that and if I go to the right I will see that there are no columns where there are multiple values in a single column okay so I can say that this whole data set now satisfies the first rule of my first normal form right now let's look at the second rule the second rule states that each row should be unique okay either through a single or multiple columns and it is not mandated to have primary key okay so they are saying it's not mandated to have primary key if there was a primary key then well and good but if it is not it's still okay just make sure that you have unique records how do I see if these are unique records so first of all my very first five records I can tell them that they are unique why because the order number itself is unique so every record will be unique definitely right but when it comes to the last record or the last two records I can see that both these records are sharing the same order number right but then each of this order is for a different product so if I could combine this column so let's say if I could combine this column or let's say Let Me Maybe color it like this and let's say if I combine this column so the combination of these two columns order number and product code is basically unique right because the last two records here if you just carefully see the order number is one zero one zero six but it is two different products so the records are unique even though it's for the same customer these records are unique when I take the combination of order number and product code okay so basically for rule number two this is already satisfying rule number two I basically have nothing to do here so I can say that this data set now is in first normal form okay so my first level of normalization is done now let's move on to the second level hey there are you still watching the video if yes then make sure that you have click on the like button and you have subscribed to the channel to watch more such videos definitely leave your feedback in the comments below I would love to read them thank you so much and let's continue with the video now as part of the second normal form there are three different rules that we will need to follow okay now for now just try to ignore these definitions I will explain this when I come to that but if you see the first rule of a second normal form basically tells that it has to be in first normal form so even before you come to level 2 of normalization the level one has to be satisfied that is first normal form should be satisfied before you can basically try to satisfy the second normal form that's rule number one the second rule is that all non-key attributes must be fully dependent on candidate key right now even before we proceed further or read further let us first understand what do we mean by candidate key okay now I have these definitions here just so that I could save time in if I had to type them here it would take more time so I have all all of them ready here okay so basically candidate key means that these are the set of columns which will uniquely identify record for example let's say I am going to copy this data set and put it into our next tab okay just so that it becomes easier for us to understand okay so this is the data set which is now satisfying the first normal form right now what exactly do we mean by candidate key right so candidate key is basically the set of columns that will uniquely identify a record in our case in this particular data set I can easily tell as we discussed before the combination of order number and product code is always unique as per this data set right so in our case the candidate key becomes the combination of order number and product code so basically in our case the candidate candidate key is uh order number Plus product code okay so I'm play I'm saying plus probably in SQL we can say concatenate okay now this is basically what my candidate key for this particular data set okay so okay I add one more thing to remember is a table can have multiple candidate Keys okay now this candidate key is not a primary key I have not created a primary key for this data set so primary key is different from a candidate key okay can and you can have multiple candidate key because there can be multiple columns or multiple set of columns in your data set which might form a unique record right so any combination of column which forms unique record we can call them as candidate key okay so remember that and the other thing that you need to remember is what do you mean by non-key attributes so basically any other column which is not part of the candidate key or the primary key we treat them as non-key column okay I hope this two uh concepts are clear now let us try to understand the rule number two so basically the rule number two of the second normal form states that all non-key attributes must be fully dependent on the candidate okay what exactly does that mean that is if a non-key column is partially dependent on a candidate key that is the subset of columns forming the candidate key then split them into Separate Tables let me try to explain that in simple words okay so all non-key columns should be fully dependent on candidate key so for example these columns here are I can say that these are my non-key columns right now the quantity item total cost order number Status these are all the columns which are directly dependent on the order number right because each of these values are basically with respect to an order number with respect to this order this is the quantity this is the item price this is the cost etcetera right so all of these columns are directly dependent or or dependent on only on the order number they have nothing to do with the product code right my candidate key is a combination of these two columns but I have some of these non-key columns which are dependent only on one of the columns which is forming the candidate key so that that means these columns are having partial dependency on one of the candidate key columns right so this means that these columns here do not satisfy the rule of second normal form so basically we will need to split the columns okay now these are not the only non-key columns right so these columns here definitely does not satisfy the rule 2 of second normal form so we will need to split them similarly if I look at the product name and product price these are the columns which are directly linked to the product code so the name motorcycle is can be represented based on this product code it has nothing to do with the order right so this product name and the price has nothing to do with the order they are only dependent on product code that is just one of The Columns of the candidate key so again these two non-key columns are partially dependent on one of the candidate key columns so which will basically not satisfy the second normal form rule so again we will need to split these columns right similarly when I look at the customer so I have the customer name here here and then I have some other information about the customer their address their phone number Etc now the phone number the address and other details about the customer are specific to each customer they are not specific to an order or specific to a product code right so these columns here have basically not dependent on any of the candidate keys so in such cases we again will need to split these columns into a separate table okay so this is the first or basically the second rule that we will need to satisfy so basically we will need to have Separate Tables for all of these orders we need to have Separate Tables for these products and we need to have Separate Tables for this customer information right so let us try to do that so what I'm going to do is I'm going to create an order table so maybe I'll just keep this as it is and I'll just move this okay so I'll move all of my products into a next table here okay and then I'll move all of my customers into the next table so I'll just cut them from here and let's say I'll probably move it here okay and then I'll just bring the last three columns closer to my orders table here so it becomes easier for you to see okay now okay so I think probably this makes sense and let me just merge them so it does not ruin my formatting okay now okay so I have basically split my data set into three different tables as you can see okay and I can call this my order stable this is my products table and this is my customers table okay now this is the first rule okay but this is not the okay this is the second rule in second normal form but this is not the only thing so once we have basically split them into multiple tables then we should make sure that every table should have a primary key and relationship between the tables should be formed using foreign key okay so the first thing is we should make sure that every table that we have now split or created needs to have a primary key so does my orders table okay so this is my orders table maybe let me just mention that in case if it confuses anyone so this is my orders uh table okay and this is my products table so let me just move this here products table I'm just giving like a table name okay and this is my customers table okay so I think that's fine for now okay so maybe just to highlight them I'll just highlight them in maybe green okay so it just becomes easier to remember okay so I have my three tables which I have split now the next thing is I need to create primary key here right so if I see I do not have any primary key so I have this order number but the last two records are basically having the same order number so I cannot consider this to be a primary key so what I'll do I'll create a new primary key for this table so I'll just move this to the right I'll create a new column and I'm going to call it like order ID okay and here I'll just tell one two three four five six seven okay so just some standard uh values okay just to just for the sake of example and I'll just remove this highlighting of the column because I don't need this as of now okay so I have created one of the primary keys for this table that is order number that is the orders table and I think this looks fine okay so if I just format it again I think this so my order stable looks fine there is a primary key that is fine the next is product stable now for products table I don't really need to create a primary key key but you can see that I have a product code probably I can use this to be my primary key but there are certain duplicate records so these buses and trucks are repeated twice right so first and foremost since it's a separate table and I don't need to store these products multiple times in this separate a table I will just remove the duplicate records okay so I'll just go to the Excel functionality of remove duplicates and I'll just click on that and you can see that two duplicate values found and removed so now I have unique products and I can treat my product code itself to be my primary key so here I'm just going to call this is my primary key column okay and let this let me just highlight this separately so let's say uh or maybe something lighter okay so this is my primary key column okay and this will be my primary key column in the products table that is product code and then when it comes to my customers table you can see that again I do not have any primary key here right the customer name cannot be treated like primary key and one more thing is I can see that this customer here is repeated twice I don't need to store the same customer information multiple times in the customers table I can remove one of them so again I'll just select entire thing and I will try to basically remove the duplicate records okay so I'll just say remove duplicates and you can see one of the customers is removed okay now the next thing is to create a primary key I do not want to consider the customer name as a primary key so what I'll do is I'll create a new column here which I'm going to name like customer ID and this is going to be my primary key so I'm just going to name give values like C1 until I think C6 okay and I'll use the same formatting for this and I'll put it here okay so now I think I can say that all my three tables are having primary keys so that is fine so one part of my rule number three is satisfied the last part is that I need to build relationship between these two these three tables okay now here is a very important point that I I want you guys to remember and that is you can normalize your database in multiple different ways there is no hard set rule that this is exactly how you should normalize your database there is nothing like that okay it depends on you your project your requirement your company your teammates and so many different factors how you would like to normalize it for example in this particular case I could satisfy the rule 3 of this second normal form in two different ways one what I could do is in my orders table itself I could add a last column here telling that this is a product code here and a customer ID here and that would still satisfy my second normal form or what I could do is I could create a new table something like a relationships table so what I'll do is I'll create a relationships table and I'm just going to call it like relationships where I'm going to have my order number my product code and my customer ID okay so I'll have my order number then I'll copy my product code here okay and then I'll copy my customer ID okay now I know the last two orders belong to the same customer so I'll paste it here and I know the last two orders also belonged to this products that is I think it was buses and trucks so what I'll do I'll just copy this okay I I am aware of the data so I am just doing it quickly okay so basically what I have done is instead of adding these two columns at the end of the orders table I kept the order stable as it is I kept the product table as it is I kept the customer stable as it is and I just added a new table where I just have a relationship between my products customers and orders table okay so this is completely up to you your project how you would like to normalize it both are right whether you add these columns to the end of this table or you create a new table both are right okay so I'm just following this I hope this is clear now I can see that my second normal form this whole data set that I have created is now satisfying my second normal form the only thing that you need to remember is when I had first normal form I just had one table but when I came to Second normal form I basically have four tables there is a separate table for orders separate table for products separate table for customers and then there is a relationship table which basically has the linkages or relationship between the other three tables so I hope this is clear now let's try to normalize our database further such that it will satisfy the third normal form okay now if I go into my third normal form but before that let me copy all of this okay all of these four tables that I've already created and I'll just move it into my third normal form here okay and I'll remove this this thing I don't need it here okay so basically I just copied the normalized data set as per second novel form into this particular tab so that we can make our changes on this data set okay so as per the third normal form if your database needs to satisfy third normal form first of all it has to be in the second normal form that's rule number one and the rule number two is it should avoid transitive dependency so if your database or data set is having any transitive dependencies then you should split such columns which are causing the transitive dependency now first of all what do you mean by transitive dependency right so imagine that you have a table by the name t and it has three columns a b and c and let's say your column A is functionally dependent on B and B is functionally dependent on C then we can say that a is functionally dependent on C okay something like that now let me explain that in simple words so for example let me go back to my denormalized data set that I had initial okay and here let's say that I am going to consider my customer name as a okay so this is a okay and I'm just going to mark it like a and let's say my order number is B okay and my sales contact name this particular column is C okay just for example okay now if I look at this data set my customer name that is my column A for this particular example if I am aware of the customer name then I can know what is the order that customer purchased right so I can say that a is functionally dependent on B right so basically if I know a I can know what order did that customer purchase right similarly if I know B so if I know the order number I can identify who was the sales person who was responsible for that order right so since I know that so in other words I can say that if I know my customer if I know a customer name I can identify who was a salesperson who was responsible for that order okay so this is kind of like a transitive dependency that is present in this database okay so as as part of the second normal form we have already split the customer information into a separate table right but if you look at the data set in the orders table we can see that the sales person's information is still stored in the orders table which we basically need to remove and put it into a separate table because this is causing the transitive dependency right so what I am going to do is I'm going to remove these two columns that is the first name and last name of the sales person I am going to create a separate table here okay and this sales persons are basically employees so I'm just going to call I am going to create a primary key for this for this particular data which is going to be my employee ID now before I create my primary key I can see that there are some duplicate data right because two a sales person might have might have been responsible for multiple orders right so I'm just going to remove the duplicate data because since it's an employee table I don't need to store the same employee information multiple times so I'll remove the duplicate data and now I'll add the primary key and I'm just going to say A1 until the last and I'll just copy this formatting and I'll just put it here okay so this becomes my employee table that is fine and I'll just move my deal size column to the left so that it is clearly visible in my orders table okay so now I think my orders table is perfectly fine it has only orders information my products table is fine customer table is fine the relation table is fine and I have created employee table but there is no relationship between the employee table and the relation and the relationship table now how do I create this link again what I'm going to do is I'm just going to copy this employee ID and put it into this relationship table and I know the last two orders were done by the same employee so I'll just make that update here okay so now I can say that okay so just to make it clear I'll just move this to the down okay and I'll move this to the top here and this is my employee table right so I'm just going to put it here and this is my employees table okay so now you can see that I have my order stable separated I have my products table I have my customers table I have my employees table and finally I have a relationship table basically which provides a link between all of these tables so this we can say that now my entire data set is in third normal form which as I said is the golden standard of normalization this much or level of normalization is actually pretty good enough for almost any of the database that exists today so I hope this is clear now I think one thing that I forgot to mention about the foreign key is that uh maybe I did not explicitly mention that how did you create the foreign Keys uh in the second normal form because we created uh the primary Keys based on the columns that I mentioned here right this was my PK column uh for my relationship table orders ID is the PK column for my customers uh table uh the customer ID is my PK uh column but where exactly is the foreign key right and that is what I did not I think explain so let me talk about that now so basically when I say foreign key basically creates a master and child relationship in in this particular case my order stable is my master table my products table is also master table customer stable is also the master table but this relationships table becomes my child table right so what I do I create a foreign key here so this will be my APK and my FK so I create a foreign key for this particular column which is going to reference against the order ID from the orders table same way I'll create another foreign key here okay which for product code column which is going to refer to the product code from the products table and I'll create another foreign key in my customer ID column which is going to refer to the customer ID from the customers table so my foreign keys are created in my relationships table okay I hope this is clear now okay now this is the end of the section two I hope you have completely understood the three levels of normalization and how we have normalized our denormalized data set into this level of normalization okay now let's move on to the last section where we'll be talking about the different anomalies first of all let's start with insertion anomaly now what exactly we mean by insertion anomaly is basically anomaly means problem we can say insertion problems that can exist in a denormalized data set okay so our initial denormalized data set looked something like this right now let's say one of the customers here is this one that is kind of toys okay now over a span of couple of months or some time he or the the company would be purchasing multiple different orders okay and every time they purchase an order uh the order number would be different the product might be different the price and everything else date everything would be different but it's the same customer right so for every order even if you purchases 10 orders or 100 orders or whatever the number of orders their information the customer information is basically the same so the same information would have to be inserted into this table why because there is just one table right this is a table which has information about the customer about the order about the product everything right and the data would look something like this so the same customer his information is basically getting repeated multiple times why because there is no separate table or a place where the customer information would have to be stored right so everything has to be in this denormalized table so some of this information would basically get reported deleted which basically results in data redundancy and this is one of the insertion problems or insertion anomalies that can exist in a denormalized data set okay now let's say if we have to do this same activity in a normalized data set for example the third normal form that we discussed in the earlier section we came up with these set of tables right now let's say this is the database that we have the normalized database as per the third normal form and this is where that particular customer is going to purchase multiple different orders so whenever that customer that is let's say the customer is land of choice he or they want to purchase multiple orders all their order information just gets inserted into this orders table right whether it's a 10 orders 100 orders or whatever number of orders everything only has to get inserted into this table there will be no changes into a product because these are all the existing products they are purchasing and there is no info no changes to the customer information because the customer table already has their information right so the data redundancy that was caused in this denormalized data set is basically completely eliminated because of the normalized level of database that we have here okay I hope you understand this particular concept of insertion anomaly okay so this is just one scenario now a second scenario could be that let's say there is a change in the product so let's say a new product has been introduced by the company okay and maybe it's a cruise ship okay now they want to insert this cruise ship information into the database and let's say you only have denormalized data set okay now in a Dean organized data set when you want to insert the information about a specific list of values that is let's say in this case product you have the product information but there is no orders for that no one has yet purchased this product so no one no customer has purchased this product it's just a new product introduced by your company right now in order to add this into your database into your denormalized database you have information about the product but there is no information about the order there is no information about the customer so all of this will would have to be null right which is basically not right right so in order to just insert a product there is no other place you could just enter information about the product you have to use the same denormalized table where a lot of the other columns has no value so it would basically be blank which is again is causing missing data here right so this is also a form of insertion anomaly that can exist now let's say if we have to do the same activity in a normalized database okay that is this third normal form data set that we have created here if I had to introduce a new product I could simply add a new record into this products table right just add a new record into this products table and everything is done no changes to the orders no changes to the customers no changes to the employees Etc right so this is how uh the second scenario in the insertion anomaly that can exit can be easily eliminated if you have a properly normalized database okay so I hope the concept of insertion anomaly is clear now let's move on to deletion anomaly okay now let's say you have this denormalized database and let's say you have one of the requirement where due to whatever reason you have to remove one of the orders from your system maybe it was wrongly added or whatever okay so you have to remove this particular order that is one zero one zero three ordered number you have to remove it okay your intention is only to remove the order from your database but if you have to remove this order in the denormalized data set you have to remove this entire record right when you remove this entire record you are not only losing the information about the order but you will also lose information about this product you can see that this product classic cars is only present in one of the records one of the record in this table so if you delete this order record you will lose this product information completely or permanently from your system and the same thing with the customer this customer is only present in this record if you delete this record you are not only losing the order information you are also losing the product information you are also losing the customer information which is actually not right you only if you want to remove something you should know what you want to remove ideally if you want to remove an order that is fine but you would definitely not want to lose information about your customer or about some product that you have right so this is a problem of deletion or deletion anomaly that can exist in a denormalized data set now let's see how we could easily do this in a normalized database okay now again I'm taking this third normal form data set that we have and let's say we have to remove one of the orders okay and the order that we have to remove was this one so here I could easily come to the orders table and I could delete this order and once I delete this I will have no changes so basically it does not impact my products table it does not impact my customers table so basically I'm not losing any additional information I'm only losing the information that I actually want to lose right so this is basically how your normalized data set is going to avoid deletion anomalies okay I hope this is clear now let's talk about updation anomaly okay now again it's a simple example let's say you have this whole data set okay so there are this many different orders and many of these customers have purchased motorcycles okay now let's say there was a change in the price of your motorcycle so someone said that the current price is 200 you need to change it to 250 okay now if you have to make this change ideally you would want this update to happen only in one place okay now here you could see probably I have some 20 records in real database this might be 20 million records okay and this motorcycle might have been purchased with let's say 100 000 people okay if you have to update the price you will have to update all the hundred thousand records where this motorcycle product was purchased right and that is not efficient unnecessarily you have to update so many different records whereas ideally there should have only been one particular place where you should have updated the price of your vehicle or your product right so this is a problem of updation so updation anomaly that exists in a denormalized data set but let's say if we have to do similar update in a normalized data set here you can see that I just have one table for products if I have to change the price I just go into this one record and I update the price here just one record will get impacted right so this is so much better than having to update the multiple records every order where the product was present right so this is basically how update anomaly can be avoided by using a normalized database okay so I hope you learned something useful from this video if you did definitely leave your feedback in the comments below thank you so much for watching and see you soon in the next one bye
Info
Channel: techTFQ
Views: 38,623
Rating: undefined out of 5
Keywords: database normalization, database normalization 1nf 2nf 3nf, sql normalization, database normalization in sql, normalize database, normal forms, 1nf, 2nf, 3nf, sql, learn sql, sql turorial, normalization in sql, normalization tutorial, normalization in database, sql techtfq, techtfq, techtfq sql, insertion anomalies, deletion anomalies, updation anomalies, anomalies in dbms, dbms normalization, data redundancy, candidate key, partial dependancy, transitive dependancy
Id: rBPQ5fg_kiY
Channel Id: undefined
Length: 40min 50sec (2450 seconds)
Published: Tue Mar 07 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.