How to clear SQL Server DBA interview?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone my name is Yogesh in today's video I will explain you how you can clear our sequence our DP entry and very much important for all the experienced DP because specially there is a lot of expectation of company from them so this video will tell you like how you can present the most important part of your you can say experience and in a better way I won't be covering much of this examples I will be coming few example so that you can become friendly with the topics which I am covering in detail I give the training specially for that but this one I am expecting you to know all the thing if you doesn't know still it will be a good introduction for you so that you can explore so now starting with like from as you know like this from fresher companies are not expecting much but from the experience DBA they expect a lot of things what are the common things one is they should be experienced in all the common DB operation like backup setting up user installation up gradation and those other things and then coming to the experience in always-on you can say and experience it is after recovery and setting up three table secondary mirroring and those kind of things and then there is a good knowledge in exercise SS has become it has become almost mandatory it's not just the part of BA nowadays because all DB are expected to know but still if you does not know still it can work in most of the companies and one specific requirement is performance optimization if you search today 90% of the job requirement is you should be good at performance optimization and here is the crux because all company data is increasing and the performance is degrading so if you can tell them like you are the best person for performance optimization it will give you a lot of you can say it will give a lot of appreciation to your resume and your experience and you will get selected and this is the crux of your you can say total interview so it's all on you how you present it so today's video I will be giving you the script what you need to speak and what are the topics I will be covering and a few example to it so that you can be friendly with it what does actually that means and then once you know these things you need to prepare learn I because it's not like that you just read the script go and get that job even if you get the job it it will be very difficult to do the work so you need to know these things so that's the main so starting with like what is performance optimizations now the performance optimization I will start with a basic example so let me show you one example so that you can understand it so I have this you can say pre-prepared few examples so let's show you a simple example so here let me just say let's set just I so what I'm doing is I'm just executing this simple script what this script is doing that does not matter just the performance optimization meter and you need to know like what is the execution plan okay let me select the execution plan also so that you can get a small insight about it and let me secure this query parallel in a different tab and I will use the same statistics IO n time on I hope you know this statistics IO and time on it just you know like what is the input output and the time believe me this is not a session I am just getting you introduced to the concept okay so if you see this query got executed in less than a second and it gives me 384 records doesn't matter what are those 384 records and the execution plans if you see it seems very simple okay so let us just wait for this query to complete okay because the first topic which I am trying to show you is called as pooling so pooling is one of the main operator which whenever you are doing a performance optimization you need to first look for all the queries which expense you query which are having spooling into it because pooling you usually have a great scope of improvement and spooling is you can say the most so now my first query got executed completely and let's fish go into the message and see how many logical reads so for the student table I did 3456 anyway the scans and then I have twenty one hundred thousand nine twenty one 195 502 scans so this is a pretty massive number and the number of Records if you see it's 384 and if you notice in this execution plan let me zoom in so there is something called as Lisa's full lazy spool is one of the main expensive operator any execution plan has it you can optimize it okay so 3 384 records the same query which executed less than 1 seconds I am able to so the same query which took around 54 seconds I am able to execute in less than once again and it's depends upon how I fix it so the first topic I'm not explaining you how lazy spool work and other thing because you heard EB I am expecting you to know that but because this topic is mainly not to teach you the promise of to me it's more about what you will say about the optimization so now suppose you are giving you an introduction saying my name is Yogesh I am from this place I got my graduation in 2012 I started working with X and why is it company and so in Saudi and then you can say like in most of the my work experience now my main focus has been on performance optimization I am very good at it now the moment you say the person bills if suppose that is the person who is taking your interviews of project manager and he knows there is a performance problem in his project for sure he will be asking how you solve it so now the the real part comes in so the first thing is like first we try to find so you can start with saying like firstly first we try to find expensive queries as 99% of the performance issues are because of the bad words and this is a field case believe in my whole experience a lot of companies people from YouTube contact missing like my server is not working well and when I asked what is the size of database they would be like 2 GB of database what is the size of RAM it's like 16 GB so is 8 core or 16 cores and still they have performs from to the 2gb of the badly written queries so 99% of the time the performance issue is the bad written queries 1% is the chance it's because of wrong configuration or because of the hardware issue my case has been like with the hardware issue where the hard disk was not working properly I never seen configuration that much wrong like it we - because of performance issue or it can be with you can see patches also because in 2016 6 over till the time they give up date 7 there was some memory issues okay coming about this so first you can start with like first we try to find the expensive queries as 99% of the performance choose our queries so once we know the bad question now you will explain how you fix those queries so once and remember one thing this is all about how you will explain your interview now you can say like know one single query effect your server because suppose there is a very long or any query took three hours to run but you are executing that query once in a month so it doesn't matter you can run in a night out when the server is not that heavy loaded and those kind of things it's all about if your server constantly beyond 99% of the usage a hundred percent of the users and there is a problem and that only not only one query be responsible for that multiple queries get into it so 99% of the performance is because of the bad quiz and how you fix it so first I try to find is there any lazy school if there is a lazy school I try to fix it now what is lazy spool lazy spool is the most expensive operator McCall's which happens because of duplicate aggregation now let me explain a simple example what I covered so in this example what I have done is so I have the student table what I was doing was I was getting the date here of the student when he was born and I was trying to get the maximum date of birth in that year from the student table so what happened for exposed in attend student in 1991 so for 10 student it will ask for the maximum do you be now all are born in the same year so for sure the output of this is query will be the same because 1980 110 people born in the same table there will be only one maximum date for that year so even though there are millions of students sill there will be only one maximum DOB so in that case what happens I will be executing this million times the number of times the student are there for the same here so this is duplicate aggregation there is a duplicate aggregation what's equals over two is it forces spooling and most time it's lazy spooling there are other spooling also eager spooling this it shows like one percent of the CPU usage but if you notice in back of it so ah this is only one percent but in the back fifty six percent plus forty two percent which make around ninety eight percent and 1 percent of ignite so if you see this part of the query is the ninety nine percent of your performance problem so if you fix this part and you will be good so why is this full comes it comes up because of duplicate aggregations and how I removed it in this example I used the CT instead of doing that first I got the ear and the maximum tup date of birth in that year and then I made a join with the student able to get the unit who were born in that he be the maximum you know whether you can say moose younger student of that year so let's coming back to the document so what I said is latest food is most expensive operator among gone which happens because of the duplicated tables CTE they will be enables at sectors or whatever you want to do you can do it so so how you're starting with you are saying because and how to solve the queries first operator is the lazy spool if there is a spooling happening in the your query first you try to remove it how we remove it using temporary table and CT because ladies full is very bad if you notice here it took 54 seconds and we executed in less than 1 seconds so there is for sure 5400 times you can say 54 times it's better than the original query and also in let's go to the logical rates now logical rate for this query is who is the scan and 172 is the logical reads and here what was that it was in like 6 digit yeah it's 7 digit and the 4 digit was the scan count now if you notice suppose I execute this query hundred times on the server and it is doing that many lot you can read for sure it will my server will be using a lot of CPU now suppose I run this query hundred times the logical reads are less it has to read less memory so the performance will be far better so vice versa it's like if you reduce the logical reads in your query for sure there will be a performance so what I am trying to say is this lazy spool has that big impact like it from seven-digit it brings to a four digit number so the first expensive operator what you try to find and so all in the queries the lazy spool suppose you have ten hundred you can say hundred expensive queries you figure it out ten has the lazy spoon first try to fix them because it's easy not in all the cases you will be able to remove because especially when you are using the cursive CT you cannot remove it by the chance by a default they come there okay now coming to the second operator so first operator is first you look for the label then you look for the hash match now hash match is kind of a cunning thing it's not that simple it's not one formula to fix it so hash has usually happen which is always because of unsorted data suppose you are joining table a with P and E so you are joining on the foreign key now suppose there is no index on the foreign key so there is no sorted data on the other side so when you join it has to sort the data so how sequel server dues first it calculate the hash or the data and then join them so that becomes in extra expensive operating an operation for the your sequel server so how you can fix it first which means either missing index on dot there are indexes but not properly utilized I will explain you what that means so I have func so when you are joining with function also that can be the reason so let's see this example so this example we solve still you see there is a missing index suggestion okay now here you notice like to 172 so if I create my index let's see whether it will improve my performance for or not so I will say let me say student ID X I will just create an index and run the same query now you have to notice my scan is like four digits now let me run the same query and now I've created the index now if I go my scans please reduce to three-digit it's totally half like from 1272 it becomes like 670 so it's almost scan has increased doesn't matter because the data size is small so scanning doesn't matter and the execution plan got changed okay now let me just drop the index again and we'll see what does that mean even I can fix that without that d-dad 0 comma now here is and see if this works ends I have to drop the index usually this data is less you see so there is a hash French one why because to date I'm joining with GT now when I'm joining with CT so CT is does it have any sorted data by default and also student doesn't have any index so it is adding you can say joining two columns which are both unsorted so in that case what it has to do it has to sort them before making a joint so there is an hash match operator if you cannot so this is an hash hash operator so what I have done is I created an index so as there is an index this is a sorted data so by default it will go for hash you can say nested loop or Mojo joint which will be far better than the hash match and though it's very simple so one is the index is missing the second is when we don't properly utilize the index now I just added one function T tan and I had zero dates to my state so this won't be practically there but this example I am using any expression on the join column so even though there is an index still my query won't be as I used a function on my column it make undetermined istic for your you can see is equal to determine the outcome so it goes again hash match so you should always confirm whether index is there if it is index is already there then you have to make sure you are you should optimize your query to use that index okay so now coming to this so how your answer is like I do how I perform and do a performance optimized I will be keep on repeating the same thing so that you can learn so how I do is like first I confirm which are the expensive pills then I try to see which queries has the lazy school schooling happening if the schooling is happening I try to remove them using temporary table in fuse or cts once I do that then I look for the hash man if there are hash pins then I try to see why the hash match is happening it can be because of the missing index or it can be because of the answer you can say some expression used on the column which is not utilizing index properly so I try to fix that with that now coming to the key lookup key lookup is very simple I think almost you know I don't need to show an example so key lookup is nothing but suppose in this index which I just created I remove this first name and last name I keep only on dop so what will happen is for first name and last name it has to again go to the clustered index to get the data so key lookup is some something like where you have missing columns from the index so here three columns are there if I just remove one column then it has to go for key lookup to get the third column because the index only contains two data so he look up you can solve by adding you can say more keys or more include columns to the index but a lot of time that becomes very difficult to do that because suppose sequel server may give you a situation saying like add all the columns to the index and that's practically invincible like it's not good also because it is like duplication of tables so Sookie lookup is sometimes like you have a big table you have index and only one column is missing and there is no harm in hiding it because it's increasing or you can say your performance then you can go for a key lookup you can fix the key local so the first preference is lazy's full second preferences you can say hash man third preference is like key lookups in also like you need to know like when we create the index I hope you all people know like what does we mean by index key so there are two parts one is key column and the include column a lot of times I have seen DBA doesn't know what both this mean so if I add a column into let's say key part and then I add a column to the include part so the difference between both the part is in key part the data has to be sorted include part it need not to be sorted so it will be saved as it is so include column if you have like the column which are part of select so you in a query if I break down the query query has three parts on a select and the date along between selected from and then after from there is a table and then there is a where clause so the where Clause column should always be key and include column should always be the selection one which you are just selecting like first and last name there is no query which is running on first epsilon all are running on the do V so do we will be the key and the include column will be the first in plastic so just wanted to explain that's why I explained okay now coming to the next topic that is called a partition elimination now after Lisa spool if your database or your tables or partition believe me this will do a magic on your tables a lot of times I have seen the companies has partitioned their data but still partition elimination does not happen so I won't be explaining so this is the just few steps of creating partitions so I won't be going through that because you can learn about partitioning on YouTube like little because if you just search partitioning is equal so you can get million videos out there but I will explain you like what does that mean so I have one table called order details which is partition on you can say order date order details which is what I will be getting the data between these two so if I run my query and see the execution plan so let's see the logical read so I have 25 counts scan count and 46 logically the table is small it's not that big so I just wanted to show so now if I go in my execution plan and I click and go to the properties let's say go to the properties and I select my table like this is order details so here you see there is something called an actual partition so how many partition it has to scan this table is small it has only 25 partitions and this is I have already partitioned this table why it has to scan through all that you can say column even though the partition is on the order date now I will run the same way the same order details just one thing I am doing is I am casting the data like the parameters now if I run my query and I see and let's go to the execution plan take on my table see execution plan got changed and here you see the actual number of partition it used to and what are those partition three to four so here you see instead of scanning like twenty five partition it only scan two partition to get you to the data and that is also seek not even scan now if I go into my logical rates how how much they were like I forgot I think something what we do but now if you see the logical it is four so it's like reduced drastically I will give you my liquid say I will explain like what in my experience also there was a case where we reduced a query from ninety minutes to ten seconds and it was all possible because of two things one is removing lazy spools and second partition elimination partition elimination was you can say the biggest part of it with lazy spool I was able to reduce it to like 10 minutes 20 minutes but from 10 minutes 2 seconds it this partition elimination break because this was not happening and how this part is formation it doesn't happen if you notice the column data type in the table is date time but the parameter I am passing is daytime - it seems very silly but it matters a lot okay so it matters a lot so in partition elimination first thing is you check whether the data type of parameter and the column is matching or not and second thing now here what I do is I will do the same thing date adds D comma 0 at 0 date and I run my query again and see what will happen to the partitions mistake I forgot the syntax I can let me just run it again and let me show see the execution plan now here you see okay turn J let's say one two let's say five drive this make a difference now it's not working okay so what I was trying to say is like sometimes like if someone is using expression on the index like you in the column the parameter oh no no not on this parameter when I did it wrong so let's say I just added this function onto the column not on the parameter now if I run because parameters are computed value so it won't have a difference so if I click on now so here you see instead of 25 Hz and 23 because the first column they did I was matching the second time what happened is by the second condition I put an expression on the column and try to run my condition so as I'm using any expression what it made is it made it run deterministic so what sequence ever did is it went through all the columns to all the partitions after three because first two partitions got filtered by this but after that other partition it all took so this thing matter not like partition elimination whether first of all data types are matching or not or secondly are we doing any you can say automatic operation or a logical operator like a logical would not automatic operation on the column itself which is the partition key if that is the case we have to avoid it to make the partition elimination happen so let us see what we have learnt so what you are saying what you have to say is like in performance optimization first we figure out the expensive query once expensive queries are there first we see whether this is the lazy spool or not if lazy spool is there it is because of the duplicate aggregation so we try to fix it by adding imperatively on the views or you can see T's then we check for hash match if there is a missing index on that becomes becomes because of means either there is a missing index or the index is there but it is not getting properly used because of the X some expression done on the column so we try to fix it if it is on the query level or an index then coming to this key lookup in key lookup means there is some missing column in the index it's not always possible to add all the columns to the index so we try to see if it is one column and it doesn't have then we can add to avoid you can say key lookup or operator to get that straw column which is not part of the index now the last not the least is the partition elimination if there is a partitioning on the table and the queries are unable to each eliminate the partition then we have to see why it is not happening it can be because of the data type mismatch between the partition column and the parameter passed or an expression used on the partition column you can see the key column if so I hope I explained you wealth now other few small topics are like one is bad news no bad news is very simple things let me just explain you with an example yeah so I created a view called order this product view and this is my query which are if I execute it's very simple query getting me few columns and the execution plan looks very simple 1 2 3 let's say like 2 stages because these two are on the same level ok now I couldn't get a product you now this view has the same exact way but it has one column less than the this one so here the quantity is there if there is no quantity and if I see the execution plan it's exactly the same because even it is not having that one column in the output still executing plans even same now a lot of time what people do is once they create the view now suppose I want a quantity column what they will do is instead of joining it altering of view or adding a new view they will join it with order details table again just to get that unit and unit and the quantity thing and made us join when this happen let's see what happened to the execution plan it added one more layer to it because view doesn't contain that column for that column getting that column you add in the table again to join so it has to revisit the table so it adds more logical range so here you see the logical reading will be more than the or in this query so whenever this is you can say extra column required in there you can say view so instead of joining the same table back to to the view instead of doing that either alter the view or add a new view and use that so bad will sometimes play a create your rules in the performance problem I have seen them so this has been a big problem so you always try to resolve that by altering the view are adding a new use and then coming with them minimizing sub-queries many ways of some queries is very simple so this is the best example to it suppose you have to get all the company with the customers and their most recent order not all the orders most recent order and amount of it and who shipped the orders so if you see you cannot get it very easily because it's not the straightforward Joe and because if you do the join you will get all the orders but as I told you you only need to get the most recent order a lot of people say I will do its a max of order ID but what if I need other columns also so in that case what people do is the users of query like get me the order ID with this custom is their order by order ID descending top one means get the most recent the similar way I will get the fried similar way I will get the ship now suppose I need 20 more columns I will keep on adding this sub queries ok so one thing is like my speed may be the fast the reason is I am NOT teaching you these things I am just explaining you how you will present in your interview for this you can see if you want to learn you can ping me I usually provide training specially for the DPS for that but else you can just google these topics whatever I am teaching you now ok coming back to the sub K so if I need 20 more columns I will add this 24 so how you reduce it you can reduce it by simple using the cross apply across apply like not always the cross apply but it's more like you have to remove the because sub-queries so here in the cross apply execute the query once and got all the columns now if I need more 20 column I could just add into the same query I don't have to again and again if the query to do that so minimizing sub queries make a lot of difference so if I just execute both let's see how much is the difference so the first quiz 71% of the batch and this is $29 cent of the batch and suppose I need to get other columns also give an example so if I had the say from order I need to get order date let's say order date I will get one more columns and now if I run again put the queries and I will add the sweet a second ordered it here also ordered eight and I run pulled the quiz again and I see now the percentage got increased because for one more query I took getting one more column I added one more sub query and you can see costing for this one got decreased the easiness because in my this case I can get all the columns from the same table without adding any join or anything so it's far more cheaper than that one okay so then the last is minimizing sub query and then coming to the nonce RGB queries nonce R is nothing but making your query not searchable the best example was as I showed you we added one expression in to so if I add a date add expression here to eat and d1 Odyssey do this now what will happen is it will produce an hash match and hash MIT because what happens index was this it is unsorted data either index was there it was not properly utilized so these expression makes your query sad non-search argument because it makes sequel so because this is a search this is where clause so what your expression does it make undetermined like it be so sequence of a get confuse like but from the outcome so it goes for the other non you can scan kind of operation so in that case yes query become non-surgical all using wildcard like make this like percentage XY self percentage so if it is like X Y Z and then percentage it's okay but if it is like percentage XYZ then it becomes a problematic so in that case also it becomes non sizable so so these are the pretty much topics which can give you very good insight about the performance optimization so what we learn is you will tell like the way I used to solve the queries first finding is fooling if there fix it by temporary tables or C T's or any other thing because Lee is full are nothing but duplicate aggregation aggregation is expensive operator and when it becomes duplicate duplicate then it becomes more expensive then using hash match-fixing hashmat by creating index or by altering existing index or why changing our queries by seeing if there is any expression happening on the key column or not then using like trying to remove key lookup and after this least but most important if table is partitioned seeing whether the partition elimination is happening or not if it is not happening fix it usually the case is the data type of parameter in the column is not matching up and second is like again using non-chargeable queries on the column on the key partition column by using date and/or other function not to use that bad views bad views are like this view doesn't contain few columns so instead of creating a new view people try to find the same user then minimizing sub query using cross apply or any other operator where instead of computing multiple sub query I can compute it once have to use it multiple times and then the removing non-sahaja queries so this was like all about and the performance optimized I can just walk you through the slides so this is my you can say sessions content usually this is what I explained like first how to analyze execution plan how to see status takes pooling hash match key lookup bad use sub query minimization partition eliminations or index scan and sick so these are the topic I usually cover in my training so that's what I showed you like so I will be actually posting this script into the link in the description video you can get this from there or you can just mail me on my email id that's yogi taught mail at that gmail.com you can reach me on whatsapp also and Skype also Skype let me just put the Skype email id also so my skype is same Yogesh dot Bella at the it live.com so Skype also you can contact me if you have any other questions so hope this session would have helped you because what I notice is like Divi doesn't offer performance optimization to the company so either company doesn't find any TPS so they will choose you or they will choose the person who has given an answer to the performance optimization believe me if you can include performance optimization in your introduction and the person who is taking your interview get interested in the performance optimization and you answer this way believe me you is a select it because I do a lot of consultations I have made a lot of people I hope this video would have helped you if you have any questions please put in comments or just mail me or reach me on skype or
Info
Channel: techsapphire
Views: 17,527
Rating: 4.9209876 out of 5
Keywords: sql, server, dba, interview, questions, performance, optimization, experience, fresher, ssis, always, on, query, queries, optimize, sql server 2008, sql server 2012, sql server 2016, sql server 2019, interview questions, database, administrator, sql server dba interview questions
Id: xFFyWXE06lk
Channel Id: undefined
Length: 35min 4sec (2104 seconds)
Published: Fri Nov 15 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.