Interpreting Oracle Explain Plan Output - John Mullins

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
good morning everybody my name is John Mullins and I'm from Themis we'll talk a little bit more about that here in just one moment I know we got some more people joining us here in just a second but well then get started I know everybody is very busy and looking forward to this topic plus you need to get back to work and get on with your day as well today's webinar title is interpreting explained planned output and reksai my name is John Mullins you can see on the screen there you should be able to see on the screen there there's my email address J Mullins at FEMA Singh Kham and you should be also be able to see the the main website URL for Themis the missing comp and today's presentation the slides will be are available out at FEMA Singh Kham slash webinars and later on you'll be able to also get the recording of this webinar at that same site just give us a little while to get the recording part posted out there but the slides will be available right away all right I want to thank everybody for joining us today a little bit about my background I noticed a lot of the names that are joining us today are people that I've seen in class before so good to good to see you again um again my name is John Mullins with Themis I've been using Oracle back since the early kind of to mid 80s so back in 1984 or so started off with Oracle version 5 which wasn't really much more than just a repository for data it didn't have nearly the features that it has today in there but started off as a developer programmer it worked as a DBA as well and then since then have been doing consulting and teaching a lot of classes you'll see some of those classes listed a little bit later on I am an Oracle certified professional DBA and so I still work as as far as as a developer programmer and DBA as well so I've kind of seen it from both sides as far as tuning your SQL goes I'm also a certified technical trainer so you know many of you that have had some of my classes before know what that means and that means our classes are fairly structured yet very laid-back so I invite you to enjoy today's presentation ask questions as you have those I'll try to get to all the questions I can but with the time frame we have I may not be able to get to all those but if you do have questions for later on feel free I'll put this back up there feel free to just send me an email at Jay Mullins at Themis Singh Kham and I'll get to those questions as soon as I can like I said I'm representing themelis today Themis has been around for quite some time offering a lot of different training topics for you and you know that we offer those both on-site at your place and online over the Internet in public places as well we have lots of different topics available for you from mainframe db2 sequel server down to the middle range with UNIX and Linux and Java web development and then of course with Oracle as well everything from beginner to advanced in SQL PL sequel of database administrations so feel free to visit our website and we'll serve some other information later on all right today's topic is kind of related to one of the other classes that we have called oracle SQL optimization and that's for developers and DBAs both so we'll mention that a little bit later on as well I know you're anxious to get into the topic here all right so what we're going to do today is we're going to type the webinar will last about 45 minutes or so and we're going to talk about explain plan that utility and what it's used for but more importantly what are some of the things that we're going to look for in that environment so we want to make sure that we know what types of things we should look for what types of things we should give a higher priority to and how to look for those things what are we looking for what does it look like so we'll talk about those things there we're not going to go into necessarily how to run explain plan as many of you know that have used it before it's something that's very simple to run but it's much more difficult to interpret depending on which tool or utility that you're running it from so you could run it from anything from sqlplus which isn't going to give you any help at all right I'm just going to display the plan and then it's up to you to interpret the plane decide what's good and what's bad too if you're using SQL Developer or if you're using toad or something else that might you know at least highlight some of the things in the plan with colors red yellow green indicating things that you might look at things that could be problems or things that might be okay too so what we're going to talk about we've got the plan already how do we look at it alright so as many of you know explain plans just going to show us the execution path the access path that something called the optimizer has decided for our SQL that's you know how it's going to physically run our code and that information once the optimizer decides on it is going to be stored up in memory on the database server in something called a library cache which is something that's inside another larger memory structure called the shared pool so as you issue let's say a select statement Oracle's first going to make sure the syntax is ok from there he'll do what's called a semantic check to make sure that you're accessing tables you have privileges to that the columns you're referring to are valid columns within those tables so he'll get table definitions column definitions security privileges those types of things make sure that those are all in place and if your syntax passes and your semantics pass then he goes and looks in that shared pool in memory to see if this statement has ever been run before and if it has if he doesn't match on that then you can just reuse the execution plan that's there if not that's what brings us to today so if the plan is not in the shared pool then the optimizer has to come up with this access path or execution plan for your statement and going to kind of talk a little bit about what kind of things he looks for there - once he comes up with your execution plan out of all the alternative execution plans he looks at then he'll store that in memory in that library cache and then it can be reused later on if that statements executed again all right so just to kind of start what kind of work our way from the top down a little bit here so first part here will be very simple you know you have some explaining output depending on which tool you run it from it could look slightly different so it could be more graphical less graphical it could have additional columns that you don't see here on this particular page and this is just one example this was run from SQL plus and this was displayed and many of you know there's a package in oracle called DBMS underscore X plan and in that package there is a a table function called display and that can read our plan table and hopefully format it such that it's a little bit easy to read there so we know that you know how do we interpret this little plane that we have here and this is an actual plan you can see the different steps and it looks like we've got some accesses to an employee table and a department table we have some index accesses in there we're going to talk about all those here in just a little bit and then the question becomes okay where do I start well we're going to go whatever is indented to the furthest to the right so whatever is the furthest and dentist we're going to read it from the inside out so if we take a look at this particular example we're going to start down here the ID on the Left doesn't indicate the order that's just an ID for that particular task so it looks like here that task number four let me go back to that page there task number four the table access full so we did a full table scan on the department table first alright and then we have ID number five so I listed the order of the operations down below for you there too so he's going to do a a full table scan on department followed by an index range scan there as well and he's going to do what's called a nested loop join between the department table and in this case the employee table okay so what he'll do in this case he's going to read one record from the department table and then scan the employee table for matches and hopefully what he's doing with the nested loop when he's scanning that that meant when he scans the employee table he's doing that via an index so typically not always but typically there's going to be an index on the foreign key column in that other table that's one of the things the optimizer is looking at when he's trying to decide which plan to do which method to use to join the tables we'll talk more about all this stuff here like nested loops and hash joins in just a moment so I'm not going to go into detail on those just quite yet we just want to understand the order first now if you have two items that are indented to the same place you're going to read them typically depends on the tool depends on the script but typically going to read it then from the top down okay so go inside find what's indented the furthest if you have couple items that are tied for that then go ahead and read it from the top down okay and you can get a copy of this slide you can go back through the order of operations down below see if that makes sense for you now once we understand the order then we're going to look at each of the individual operations along the way to see if they make sense so step one excuse me is just to find what was done first see what the operation was know your tables know your data know your indexes and see if that makes sense see if that's a good choice did walk why did he do a full table scan on Department were their indexes available for him to use or not if there were indexes available why didn't the optimizer not use them so we need to know what kind of things cause the optimizer not to use an index things like the not equal operator or maybe a row function on that particular field that has the index or maybe we're just returning more records a higher volume of records a higher percentage of records from the table and he thought a full table scan would be more efficient there's parameters in the database that help him decide whether to do a full table scan or an index scan and the DBAs are well aware of those types of things so we'll go into that those details here in just a few moments okay all right here's what we're looking for in that app and that output from the explained plan and this I kind of put them in the order that a lot of people do now some people are going to have different orders here too but typically a lot of people are going to look for the full table scans first now the reason they do that is to one they're easy to find and two we all understand what a full table scan is so we'll typically start there we're going to go through each one of these in separate slides here in just a moment as we go through the the webinar here if you do have questions in your in your dashboard for the webinar there is a question a little tab over there you can go ahead and post those over there as we go through here don't use the chat box but use the question box there like I said I'll try to get to them as many of those as possible so we're going to look for full table scans and like so we'll go through each of these individually will look for join methods did he do or doing a join did you do a nested loop did he do a hash join to do a sort merge join what did he do we're going to look at the different index access methods that you do an index range scan unique scan full scan fast full scan skip scan we've got lots of choices there and all these things are appropriate under certain circumstances so you know a nested loop joint isn't good in every every possible scenario so it's good for some things bad for others and that's what we have to know when is it good when is it bad and then later on we can decide how can I get them to change the plan whether that be through adjusting some index parameter for my database whether that be for maybe some of my memory structures need to be adjusted in their size do I need to rewrite my query or my index is not that great or my statistics off there are so many things to consider here and then we'll look at filters you know a filter is what's in your where clause okay when does those when do those filters get applied do they get applied before the before any joins or after any joints and that's important does any part of my execution of my statement is any part of it run in parallel okay so was any of it run in parallel so we know that certain parallel operations might just happen on their own based on parameter settings and database settings and such and other parallel operations we might have to maybe get the optimizer to consider those maybe through a hint it's not so so there are some parallel hints that we can use in our SQL code as well okay all right - how about table partitioning so with the table partition it's a very common thing I said we'll go through these in detail but a table partition is just a single heap table that's been divided into partitions and it's usually done that way because we have a high volume of data in the table and we could you know physically separate the data into smaller subsets based on some criteria whether it's a list like okay I'm going to put all my region codes one through ten in this partition 11 through 20 in this partition so it could be a list I could do it by a range like I'm gonna put everything for January a January 1st to January 30th in this partition February 1st February 28th then this one so do it by a date range we could do have hash partitions we can have sub partitions and so this is a very common practice and what it does for us is if you come out and you say a query like where the order date is January 30th he knows that that there are no January 30th records in partitions 2 through 100 based on the definition of each partition he knows they're only in partition 1 maybe and so he'll even if he does a full table scan he only has to scan one partition and not the entire table table for all the dates there and that could be a very powerful thing all right and then we're looking at object statistics okay so each of our objects have statistics that are out there and so part of the explained plan will excuse me will let us know whether or not statistics were available for that particular query excuse me there lose my voice a little bit and so one thing we're looking for that explain plan there is it'll show us whether statistics were used or not or whether the optimizer had to go out and and find statistics dynamically on the fly for your query so we'll see that in there too and then we have an overall cost for this plan so every plan that the optimizer considers gets a cost and what he's presenting back to you the plan that he's deciding to use will be the plan that has the lowest costs okay we'll talk more about cost in just a little bit all right so let's take a look at each one of these here all right so full table scans easy to find we just have to decide is this full table scan acceptable or not now what's bad about full table scans if they have any volume of data at all is that and even if the data in the table is slight whenever the optimizer decides to do a full table scan he has to read all the blocks from the very bottom of the table up to something called the high-water mark the high-water mark is the at any point in time that the most data you've ever had in the table that's the block that represents that okay if you have a table that you have insert activity into and you have delete activity into the deletes it's deleting records but it's not and it may be making some blocks empty and other blocks not empty the delete does not lower the high-water mark so you'll even notice in some cases you might just as a practice of an extreme example here we know we should use maybe the truncate to delete every record out of a table but if I were to use the delete command to delete every record out of a table the high-water mark remains wherever it was and so if I come back later and do a select count asterisk from that table and I'm expecting a zero it could very well take two minutes to display the zero because he had to read through every empty block up to the high-water mark to get that answer so if you know that you're not going to be putting if you delete some records whether it's all the records or some of the records doesn't matter if you know you're going to delete some records and not necessarily replace them in the near future you may consider lowering the high-water mark okay now there's a question there of how do we lower the high-water mark okay it's a two-step process there's there's multiple ways to do it but we can do it when two commands is the simplest way so what we could do is there's a command we could do an alter table enable row movement and then we can do an alter table shrink space command okay we have to do the enable row movement first because what he might do is move some of the records around to make the blocks more efficient and every record has what's called a row ID and so if I tell him I'm gonna allow you to move the records that means they're going to go to different blocks that means they'll get different row IDs as well okay okay there was another questionnaire about regenerating stats with that lower the high-water mark and answer there is no it could it will show the new number of records in the table but the number of blocks won't change so even though I've deleted maybe a bunch of records it'll show the number of records is a thousand but the number of blocks prior to the delete will remain the same because many of them happen to be empty okay so first thing we're going to look for our full table scans are they good are they bad I would then go out and take a look at I do have some full table scans out there now if you're doing them on large tables you know that could be a problem you know if why did he do the full table scan our index is available or not check out the indexes the indexes aren't there consider creating one if the indexes are there why weren't they used got to ask those questions there all right how about join methods okay I was looking at a few of the questions there if we move data around from one table to another one partition to another that will adjust the high-water mark for us it is a good question all right next thing we're looking for if we are doing joins what type of join did the optimizer choose to do okay alright there there are three major ones that the optimizer will choose from that was one wit that we typically hope that he would not choose from but we look at a nested loop a hash join a sort merge join and then if we want to say it this way the dreaded Cartesian join which we typically you know do by accident most of the time we might want to do that but we want to avoid those typically okay all right I'll get to some chick back on some of those questions here in just a moment alright let's go ahead and let's take a look at each of these join methods because they each will have their own place some are going to be good for some things some would be good for other things so let's look at the nested loop first a nested loop is best for those types of queries those type of applications where you want to get the first rows back quickest so in other words it may not have retrieved all the rows yet but he has found some of them that do match as far as the joint goes and he'll start returning those right away so what we typically have is it's always best when you think about these joints to think about them just with two different objects because that's all the optimizer is doing even if you're doing a ten table join the optimizer is only looking at two things at a time so he's going to try to come up with okay where do I start which table do I start with and there's lots of factors that go into that and but he's going to come up with least initially two sources if he's considering a nested loop a nested loop looks like this it has a drive what typically called a driving table or an outer table it's usually the smaller of the two sources and then he has that what's called an inner table which is usually the larger of the two the way that he usually processes this the classic nested loop is he'll do a full table scan on the smaller of the two and do an index range scan on the larger and on the larger one that indexes on that foreign key column that you're doing the joint on so he'll read a record from the driving table scan the inner table as he finds matches he's going to start returning those read another record from the driving table scan the inner table and just keep doing that that's your loop that he's doing okay this is better for smaller row sources you know not just the table size we do we can't just consider the table size here well what if filters were applied first so maybe I have a table that's a million records but the optimizer applied a filter from my where clause where this is true and now that results set from that filter maybe there's only a thousand records left okay so we have to consider the size of the source after the filters applied if there is a filter okay so if you do have indexes on your foreign key columns the optimizer the nested loop is something he considers you know initially because he says I owe this what I'm looking for oh there's an index on that that's good that's good for a nested loop now what he's also going to consider is the volume of data that's involved here because if the driving table happens to be quite large that could be very inefficient now we want to make sure that if he does pick two tables and they do have a difference in size or in results after filters that he chooses the correct table also for the driving table because what if he puts the larger table as the driving table and the smaller one as the inner table he's got a mixed up he's gotten flipped around now why in the world would he do such a thing he might do that because how does he know the volume of this data to start with well he's looking at something called object statistics they're stored in the data dictionary the relational catalog and those are generated by batch jobs or or on-demand usually by your DBA s that tell the optimizer how much data is involved here so it's various statistics from just number of Records to number of distinct records based on certain columns how many blocks are these records stored in lots of different statistics there so if they're up-to-date accurate that helps the optimizer immensely if there's rec if there's statistics that are missing or inaccurate that that's how the optimizer could come up with a bad plan so we need to make sure that that's in place first so I said the nested loop if you're looking at the explained plan you're saying okay the two tables that he chose and based on the filters that make sense that he chose a nested loop because the volume of data is such and there's an X on the foreign key I'm okay with that if you see that he's trying to join a 10-million result set to a 10 million row result set nested loops probably not that the smarter choice for him when we talk about smaller row sources we're talking about tens of thousands maybe up to a hundred thousand that type of things okay if we see that we have more data then maybe a hash join is the better choice and typically it will be the better choice okay a hash join ends up being looking kind of like a nested loop join in the end run here but it's going to be a little bit more efficient because with the nested loop he's scanning an index with the hash join he's scanning a bitmap trying to see what's matched up here so again he's going to try to choose the smaller row source of the two things that he's joining and he's going to build a hash table based on the column or columns that you're doing your joint on it's going to be a bitmap where can I find this data at and then he's going to have the second source there which is usually the larger the two and then he's going to probe the hash table based on this bitmap to the second source it's kind of kind of in the same motion as a nested loop but he can do it much more efficiently this hash table stored in memory in something called a pga your process global area or program global area called both things so what kind of dictates whether he chooses a hash join or not is how efficiently or how well the DBAs have sized the PGA okay so there's a PGA there every process that connects to the database has access to the PGA if the PGA has been sized inefficiently too small and the optimizer says okay I want to build a hash join because the volume of data is pretty large here and he looks at the PGA and he says oh my gosh there's not enough room in the PGA for this hash table then he's not going to choose a hash join he might choose a nested loop join and he might consider a sort merge join but you'll see in a minute it needs the PGA as well so that's that's how we end up with large nested loops that are inefficient on big volumes of data it might merely be because our memory for storing this hash table is not big enough okay so what we're looking for it typically is two things here I know we have the sort merge-join here to go through in just a second but if you've got smaller row sources look for the nested loops if you have larger rows sources that you're joining on look for the hash joint because the sort merge as we get to it here it's typically not one that we're looking for okay it's going to take both your row sources that you're trying to join together and unless they're already sorted in the the key or the columns that you're doing the joint on he's gonna have to do a sort on both row sources so let's say we're joining two tables by order ID you'd have two joint have to do a sword on the first one by order ID a sword on the second one by order ID and then merge the results together and you'd have to do two swords and if you have any volume of data that's going to be quite large he'd have to do those in the PGA as well and in this case if there's not enough room in the PGA and he may decide either to do I'll consider a hash join but my GJ is already too small or he may decide to do a nested loop whether there's an index there or not on the foreign key column or he may just decide I have no choice I'm going to do the sort merge even though there's no room in the PGA and if there's not room with the PGA then he has to do it on disk it obviously sorts on disc versus sorts in memory is going to be slower so what you should be seeing is fewer sort merges because he's not going to want to do two sorts unless it's a small amount of data okay so what you should be seen and is either either a lot of hash joins or combination of hash joins and nested loops depending on your volume of your data or your results that's after filters that you're trying to join okay so you have to know your data have to know the result the row sources to know is a nested loop makes sense for is this small stuff or is this bigger stuff maybe a hash join is a better choice and then of course we want to avoid the Cartesian join right so this is usually a result of us writing a bad join okay so maybe we we depending on how you write here joins we've gotten our from Clause three or four tables but in our where clause we only have to join conditions or one joint condition or to the wrong columns or something like that okay so what a Cartesian says is I'm not really matching anything to anything I'm going to match every row from one source to every row and the other source that's going to do a couple things it's going to produce a lot of results so more than what we want and it's going to take up a lot of resources so we like to avoid that that's usually just the result of bad queries so so if you look in your explained plain output you see it'll tell you it did a Cartesian join then you need to go back and look at your code and say hey did I properly join the tables or views that are involved here all right other things we're going to look for there once we've kind of decided whether or not our join method is correct or not did he did he use an index you know did he do the full table scan did he do in Selma nested loop did he do two full table scans instead of a full table scan and an index scan so we're looking for the different types of index access methods that might be there as well okay so taking a look at this we'll look at each one of these and so you can kind of see what they look like it was a questionnaire about what's a good size for the PGA you know that's a tough one to answer because the PGA is going to be its air area memory that's going to be shared by all the processes on the system so what the DBA is will think about is at any one point in time for all my concurrent processes that are running you know how many of them are going to be doing sorts and we have to think about all the different types of operations that cause sorts and what are the size of those sorts so I need to size the PGA to handle all the concurrent sorts that might be going on you know think about things that could cause a sort well obviously an order by or a group by or a distinct or creating indexes doesn't matter if they're unique or non unique certain types of joins like a sort merge but needs to do that and set operations like a union or intersect or - those need to do sorts as well okay so we'll lots of things that could cause a source so let's look at these different index access methods and you know the index range scan is probably the one that we see the most because we might be doing something like where you know you know a state equals Ohio and and I might have more than one record that obviously represents Ohio there so he's going to scan the index index looks like a tree it has by default three levels it has a root a branch and a whole bunch of leaves and the leaves are aware that the values are that you're indexing so he's going to go down to the leaf hopefully hopefully close to where the Ohio records are he's going to start scanning blocks at that leaf level looking for all the records that match Ohio in this case and as he finds them he's going to grab what's called a row ID where the tate and that'll take him directly to the table data so at the very least when he scans an index he can only read one block at a in an Oracle the block is the lowest level that's read so he'd have to read a block at the root a block at the branch at least one block at the leaf maybe more than one but at least one grab row ID start reading a block at the table so at the very least he has to read four blocks he can only read one block at a time with a range scan and he's comparing that to well if I have to read four blocks to get the whole Ohio data using this index is comparing that to how many blocks to have to read or how much IO do I have to do for a full table scan that's one of the big comparisons he has that's why the statistics are very important at that point and so you might see in your in your explain plan output it might say something like you see at the bottom of the screen here on the slide index range scan on an index called mix1 he's going to scan that go down to the leaf node grab a row ID for the records that match and then he's going to access the table called m p-- with that row ID so when we read this remember it's from the inside out so this index range scan will feed the table access right above it there okay so and that's when the index range scan looks like a unique scan is going to be faster because at most there's only gonna be one match so he knows he doesn't have to scan multiple blocks especially at the leaf level so when he finds his match at the leaf level he knows he's done he doesn't have to scan other blocks to see if there's a match or not so it's important that indexes that should be unique you make them unique as opposed to non unique so a lot of times we'll just put non uniques on there either just in case or I'm not sure if I have it going to have any duplicates or what's going on there if they're going to be unique you need to make a map mint all primary key fields by default have a unique index so for searching on primary Keys like an employee ID or an order ID those should do an index a unique scan on this but again just because the index is there the optimizer doesn't have to use it other types of scans we could have an index full scan and this isn't necessarily the best one out there but it might be better than a table full scams and the next full scan you're going to get look at your query and in your query if you're referencing all the call all the columns in your query happen to also be in the index so there's no need for him to go to the table so instead of doing an index range scan and grabbing the row IDs and passing them to the table to get those blocks he can just do an index full scan and get all the data he needs without going to the tables now the problem with the full scan is that like the range scan he can only read one block at a time and he's doing an index full scan which means he's going to read every block at the leaf level this isn't necessarily a good thing he's going to compare this to the to the full table scan and see which is better okay now a little bit better than the index full scan this sounds better already just by the name right and index fast full scans how's that sound the best thing about this is that it's going to is the second bullet there he's going to use what's called multi block i/o and each of those multi block sections that he's reading can be done in parallel as well so full table scans are done the same way there's a parameter in the database called DB file multi-block read count DB file multi-block read count and it it will state or indicate how many blocks the database can read in a single IO when doing a full table scan or doing an index fast full scan it by default is usually set to like 128 that means in one i/o if he's doing a full table scan he can read 128 blocks and of course he's comparing that to an index range scan which he can only read one block at a time that's why sometimes if that DB file multi-block read count parameter is has been increased the optimizer says oh wow I can read you know five hundred and twelve blocks and one i/o he'll choose a full table scan and you're sitting there thinking well why can't eat isn't using my index in the parameter setting might be influencing them to do otherwise okay this is also good if you will be used if your query is is accessing all the columns that happen to also be in the index as well but also notice here that they don't come back in any particular order because he's doing them in parallel so if you need them in a particular order you have to slap an order by on it which then might defeat the purpose and maybe not run as well anyway okay so we'll look for that kind of thing and then the last type of index access here is the index skip scan okay you won't see this as often it sounded like a really good feature when it first came out where this comes into play is if you have indexes that have are made up of multiple columns or multiple fields from a table it used to be in the older versions of Oracle if you didn't reference the leading column and the index you had no chance of Oracle using that index optimizer we to say you're not referencing the leading column I'm not even consider this index they added the index skip scan feature I think back in oracle 9i and it allows the optimizer to still consider using the index even though you don't reference the leading column now there's a lot of what-ifs or it depends on this one whether he'll still use the index or not but the key part is he won't automatically disqualify the index from consideration just because you're not referencing the leading column okay you won't see that very often and there's like so there's a lot of it depends on that ones and other things that we should look for in our explain plan output in detail here are the filters and I threw a little explain plan output here on the screen for you all right and all the plans will be a little bit different but notice down here in the predicate information down below it shows you where the filters are being done so there was a question ago about the ID column on the left that's just an ID column you know numbering each operation within the plan it's it has no indication on the order of the steps but we look at the filter down below now this particular query said something like this where a dot Department number equals 20 if I look down below this these numbers one two three in this case these all correspond to the ID column up in the explain plan output up above here so it says here the filter for a dot Department number equals 20 was done at ID number three which is this full table scan on the employee table ok now I'm doing a a join to the department table so he also applied the filter Department number equals 20 on step two now if you look at step two and three in relationship to when the join was actually done since they're indented further we know that they were actually done prior to the join so he first went out got all the 20s out of the employee table and he's estimating in the rows column here that there's five of those and then he got the 20s out of the department table where the department number is the primary key so there's only one of those and then based on that information he decided in this case to do a hash join now in this case here hash join even though the data volume is really low it might be acceptable for your performance but if it worked probably a nested loop would be better because nested loop will perform better with smaller volumes even for volumes this low it can still run better than a hash join now how can I get him to do the nested loop well why didn't he use the index on the department number in the employee table if I can get them to do that then I could get them maybe to use a nested loop maybe there is no index on the foreign key columns so just building the index might take care of that okay all right so we want the filters to happen before the joins because think about it this way if I do a join between two tables and there are ten million rows in this table and 50 million rows in this table is it does it sound to be more efficient to join ten million to fifty million and then throw stuff out bake-off based on the filter or is it sound more efficient to filter stuff out and then do a join between ten thousand rows and fifty thousand rows instead instead of the millions so by looking at the explain plane we can see when the filters happens and that would be very important there okay other things to look for here our parallel operations happening like said you can hint put hints for parallel operations or some operations if there are multiple CPUs on your server which so typically is if he can do them in parallel he will now how do I know that some of my query was operated in parallel you'll see things like these operations down here below on the slide a px it's a parallel transaction that's what the px stands for the Avant you'll see a coordinator he's kind of responsible for the overall parallel operation of that you'll also see a px maybe a block iterator he's responsible for breaking the total volume of the data into smaller sets of data so that they can be processed in parallel and then once those are there and he's processing them when we tried to merge them back together you'll see that the PX ended received that's the communication processes between all the different splits that he had that we're done in parallel so if you put like a parallel hit on a query and you're expecting it to run faster and you expect them to be done in parallel and then you run it through the explained plan you don't see these PX operations in there for some reason he chose not to do it in parallel we need to figure out why is it because he's not recognizing how many CPUs there are does he not know how much data there is so my statistics are bad or what's going on there okay some of the questions come out there is talking about the design of your tables and such and can they influence the plan that could come up it's not so much the the design of the tables as it is really the volume of data in the tables there the design could have some influence but usually the design is it going to necessarily influence the plan so much as it might influence the performance which sounds like they should go hand-in-hand but not necessarily don't know if that made any sense or not there but so not so much influencing the plan you know the the volume of data and the indexes that are on the columns in those tables however they're normalized or not organized influences are playing more cells alright we're almost through here so hang in there just a couple more minutes how about partitioning and this is a very common thing like I mentioned at the very start of the webinar tables and setup group being in one great big heap kind of like a big pile of dirty clothes we can break them into partitions by some criteria and there's different types of partitions available to us there's lists partitions where we can explicitly say what what data is going into which partition there's range partitions there's hash partitions and there's other types too we can have sub partitions also what like I said earlier what this does for us is it breaks the big table and just basically almost like smaller sub tables you still access the data the same way so you still select whatever from the table name and the optimizer is what's called partition aware or partition smart he knows the definition of each partition and he's comparing that to your where clause if your where clause indicates that oh I want a certain region and he'll know that that regions in partition 3 and there's no way for that region to be in partitions 1 2 4 8 whatever he'll only scan that one partition so it's like you're not reading from a smaller table at that point this can be a really big deal now what we're looking for when we do our queries if we're expect to only read data from one partition or a smaller set of partitions not the entire table which means all the partitions in our explain plan out but it'll tell us like you see at the bottom of the screen it'll say partition and then the type of partition whether it's list or range if he's only reading one partition it'll say single if for some reason he felt like based on the criterion your where clause that he'd night find the data in more than one partition then it would say all and now you're doing up a maybe your overall table has a hundred million records in it but the table you want to scan only has a hundred thousand in it if it does a single that's great your performance will be better if he does the all that's just like doing a full table scan or even an index range scan on a hundred million records and your performance won't be nearly as good so if you know you have partitions and you don't have to do anything special in your code to access just single partitions versus all the partitions but if your performance isn't any good if you look at the explained plan and you're not seeing partitions based on singles then there's something going on there either either our codes not quite right so we it is requiring the optimizer to scan multiple partitions or our partitions maybe aren't defined correctly so we'll need to kind of kind of look into that as well okay all right and there was a question there about performance and if the partitions might be too small well the partition think of the partition just like it were another table even though it's part of all one table there but it's almost like there's small sets of tables so what if a table were too small and we keep adding data so we insert data insert data and then we query it in size such what happens when a table fills up and we want to insert more data well the table has to extend and yes that can have a negative impact at the time of the extending but not necessarily down the road when we run our queries okay so during the insert are the insert performance might be worse but what our query performance won't necessarily be worse all right got a couple more here and then we'll wrap up statistics very important so every table every index things like that will have what are called objects statistics things like the number of rows on the table number of blocks those rows make a number of distinct values on the columns and the index that we're talking about those are all going to influence the optimizer greatly on whether to do a nested loop whether do hash join whether to use an index whether to do a full table scan so what we look for in our execution plan is whether or not statistics are there and so if we if their statistics are missing the optimizer then has to do what's called a dynamic sampling of the object to kind of get some idea of how much data is there so you would have to wait for the dynamic sampling to take place and then the sample is just like any other sample if it's a good sample that's representative of your data as a whole then great he might come up with a good number if it's a sample that really doesn't represent the rest of the data the table there might be skewed that he might come up with a bad sample and come up with a bad plan so we want to avoid that so you know bottom line is make sure statistics are available on all your tables and all your indexes there's a data dictionary view called all underscore tables everybody has access that you do not need special privileges to access that and go out there and query it once the time once so whilst it's a see if the number of rows in there there's a column called num underscore rows if it's really representative of what the real number of records are out of a table right now that the statistics that he looks at when running your query like today right right now at noon they're not up they're not up to date up to the second there whenever the statistics were generated which may have been last night or an hour ago or whenever so if you see down here in this note action or something similar that that dynamics sampling was used or dynamic statistics were used we want to avoid that that's going to slow us down okay last thing here is the cost capes we're almost there some people just look at the cost and everything else we've already talked about they ignore this wife sees the cost going down from one run to the next run so in our execution plan up in the right hand corner usually there's a cost and the overall costs for the entire queries on the line that's represented by ID zero okay when you submit your query to run and the optimizer comes up with a plan he considers many other plans before coming up of the plan that he presents to you and out of all the plans the plan that he gives you is the one with the lowest cost cost is a very cup you know somewhat complicated formula internal to Oracle internal to the optimizer that's based on basically the resources that it's going to take to process your query you know how much I owe which means reading blocks how much CPU does the estimate the type of network resources that are available to us memory resources whatever he's put those all through an algorithm is coming up with the cost and whichever one is the lowest that's what you get okay now what should we use the cost for we're trying to see is this query going to read faster this time than it did last time notice the last bullet there the cost is used to compare different plans but for the same query so if somebody's out there doing a three table join between tables XYZ and there their cost is a ten and I'm out there doing a join between tables ABC and my cost is a two I can't compare a two to attend there because the queries are totally different they're going to have different processing there's different indexes available too and everything else now if I'm running a query and it's running slow and I make some adjustments by you know creating new indexes dropping indexes added columns to index it is changing parameter settings dating my statistics changing memory sources something that doesn't cause me to rewrite the query then I could compare the cost it's best to kind of go through all those other things we looked at and then you can still consider cost as important but don't just consider either cost and don't just consider all the other stuff we've already talked about consider them together okay if I make some adjustments outside of rewriting the code drastically and I compare the cost and now the cost goes down in my explained plan output I can feel a little bit better about this one this query will now run a little bit faster sometimes you'll see that just by adding an indexed and I didn't change the query at all the cost goes down a lot or just by updating my statistics the cost will go down a lot okay so that's what cost there represents it's like said it's a based on a formula for calculating or estimating resources needed for your query so we saw there there's a lot of different things for us to consider when we look at the execution plan right if we go back just to kind of summarize all those things we're looking for full table scans we're looking for join methods we know which ones are better in which circumstances nested loops versus hash joins we look at the index scans did he use an index did he not using index why didn't he use the index was it missing but or just something caused him not to maybe the way I wrote my code maybe other factors to determine that and maybe sizes of my memory structures like the pga is really important we learned here today for influencing am i doing hash joins or sort merge joins instead of nested loops do I have indexes on my foreign key columns lots of factors there too was any part of this done in parallel I'm looking for that stuff in my execution plan or not if it wasn't how can I get him to do that maybe by adding a hint maybe not is he taking advantage of the partitions somebody was asking about if data spans multiple partitions what will he do well then he has no choice but to span multiple partitions so he'll have to do may be multiple read single partitions or maybe they have to read all the partitions okay as far as partitions go one more thing on that the DBAs or whoever they can create indexes that are local to each partition but they can also create indexes that span all the partitions so you have what are called global partitions indexes and you also have what are called local partition indexes so those are there just in case you have to span multiple partitions in that case there and then you have we're looking for statistics those are important you can look at your execution plan it'll tell you whether or not statistics were used for this or not and if the statistics are there and they're bad then that estimated number of rows and then execution plan might be way off you know if it says that hey I did a full table scan on this and there were 10,000 rows and you know there's really 10 million rows then maybe the statistics were off if he did it if it says he did a dynamic sampling but maybe the sample was bad ok all right so there's certainly a lot of things for just to look at in there a lot to talk about in 45 minutes or an hour if you want more information about that you can feel free to send me an email at Jay Mullins at Thema Singh Kham for one thing also everything that we talked about here today is also offered in our Oracle SQL optimization class and that's a three day class so we have plenty of time in there to go into this stuff even in more detail and also to try some of the stuff hands-on so you can actually see what it looks like and you can also try things to try to influence him to a different plan as well so lots of different classes there now as far as those go you can go to our main website Thema Singh Kham you can email Johncock eval there's his email address there Jake a cat Thema Singh Kham there's a phone number get in contact with him and remember you can always go out to Thema Singh Kham you can get a copy of the slides today and encourage you to do that out at Thema Singh Kham slash webinars and then shortly after today's presentation is done you can also then go out there and get a recording of today where you could kittens hear my voice and all that good stuff too okay so I thank everybody for attending you had lots of good questions I know I didn't get to all the questions in the question box there so if you want to resend me a question that's in there feel free to do that just send it to Jay Mullins at FEMA Singh Kham and I'll answer all the questions that I get I enjoy doing that immensely and for those of you that have attended before or attend our classes before it's good to see a game and for everybody else out there and everybody that we've seen before thank you for attending everybody have a good Wednesday we're not quite there so to see the weekend yet but to almost okay thank you for attending everybody make sure you go out there get a copy of the slides and the recording of the presentation and hopefully today will help you a little bit in understanding what to look for in your execution or explain plan output now have a good day
Info
Channel: Themis Education
Views: 67,045
Rating: undefined out of 5
Keywords: Oracle Database (Software), SQL (Programming Language), SQL Tuning, Oracle Tuning, Webinar
Id: AhKgGRM_piY
Channel Id: undefined
Length: 60min 15sec (3615 seconds)
Published: Wed Sep 02 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.