20 Essential Oracle SQL and PL/SQL Tuning Tips

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right we're going to go ahead and get started I know everybody is short on time and certainly appreciate you joining us today so let's go ahead and get started get you on you on your way all right thank you to everybody for joining us today my name is John Mullins and I'm coming to you from from with Themis today you should be able to see a slide up there on your screen information on there is very important there's my email address if you have any detailed questions regarding any of the topics we talked about feel free to send me an email at Jay Mullins at Themis ink.com you can also get further information on what Themis has to offer as well at FEMA Singh Kham and then I see some questions coming in in the question box over there a couple things this the slides for today or should already be available out on the website at FEMA Singh Kham slash webinars and either late today or tomorrow there'll be a recording out there as well so the Rick it's session will be posted out there takes a little bit of time for it to get formatted and to get put out there so that'll be out there either late tonight or tomorrow should have that out there for you okay so yeah go out to FEMA Singh Kham slash webinars you need a copy of the if you get a copy of the slides out there they're already out there all right a couple of other things too you can also follow us at Twitter at FEMA straining so there's always good announcements out there of upcoming seminars training sessions and things like that as well all right well I'll go ahead and jump right in here I know we're kind of short on on time as far as having about 45 minutes for today's seminar here so we're going to try to pack in here as much as we can notice the topic is 20 essential SQL and PLC for tuning tips probably the most difficult thing about putting together today's webinar it was just trying to narrow down the tips and things we wanted to talk about to just 20 I mean there are so many things to talk about as many of you know when it comes to tuning we used a very strong word they are essential these are some of the ones that that I feel are very important some of these will be very simple and you'll already be aware of them some of them will be a little bit more complex that maybe you're not currently using in your environment but maybe they could provide some benefit to you when it comes to performance of your SQL or PL sequel codes all right so let's jump right in here again if you have any questions along the way you can put those up in the question box I'll try to to get to those but the best way will be go ahead and take down my my email address here with the time that we have the best ways for me just to address those questions after the session and send you back an email so at Jay Mullins at thema Singh Kham all right just a little bit of information about myself I'm um I'm well I've seen some of the people that have registered for the for the webinar today so I recognize many of the names out there from some of the classes that we've had in the past so it's good to see you all again again my name is John Mullins I've been many you know I've been working with Oracle's since the 80s kind of the early to mid 80s I started with Oracle version 5 122 version 5 there wasn't much we could do in version 5 as far as performance tuning goes and then a lot of course a lot of the other features weren't in there either so start with version 5 122 on a-deck ultrix operating system when I worked at Boeing's I worked at Boeing for 10 years as a developer DBA type person and since then I've worked for various consulting companies and training companies I am an OCP DBA as many even you know from the classes there and I'm also a certified technical trainer so I do try to keep things fairly structured but we also want to kind of kick back and kind of enjoy the topic that we're talking about over the years since those 80s there's been over 300 classes that I've taught along the way and I also currently do also do a short term consulting as well so for tuning troubleshooting installations upgrades those types of things if you're not familiar with Themis at all you can always go out to Thema Singh Kham see a little bit of more information out there as well I've been around a long time provide training and a whole bunch of different areas out there so in addition to Oracle db2 sequel server operating system type stuff with UNIX and Linux web development stuff java.net and all kinds of other topics out there too so feel free to browse that website and get some further information about famous there are some classes that are related to the webinar we'll be talking about I won't go into those in here because I know we're anxious to get started here but here they are you can also see them out there on the website but you know troubleshooting debugging tuning PL sequel we also have an Oracle just SQL tuning class and then a tuning class for just the DBA type people as well all right here's we're going to be doing today I got 20 basically tips or topics that are related to tuning that I think are pretty good and pretty important when it comes to the tuning environment so that's what we're going to talk about today of course with the time that we have we're not going to be able to go into great detail on these but I'm going to try to go and know as much detail as possible on these as well alright so let's jump right in here and let's take a look and see what we've got let's go to our first topic here all right tuning tip 1 what I try to do is kind of mix the SQL and Peele sequel so that for those of you that are that are more SQL and not so much PL sequel or vice versa I kind of kind of tried to put them in here every other one is SQL or PL sequel so you don't have to wait till the end like oh I really wanting to see some PL sequel stuff you don't have to wait till the end for that so I'm going to try to mix them up here along the way all right tuning tip number one and these aren't necessarily in any particular order as far as though number ones by far the most important thing that you should ever do when it comes to tuning or try to take advantage of to speed up your stuff so they're not in any particular order number one is an SQL type tip take advantage of the result cache feature ok as I teach a lot of the classes out there or do some consulting a lot of people aren't aware of the result cache it's been around since Oracle 11g so it's been around for you know a fair number of years not very many versions but 11g has been out there for quite some time result cache is kind of like a materialized view and a lot of people are familiar with materialized views so when I mention that they're like oh I know what that is we use those all the time but here's where they kind of come into play result cache is a memory structure it's it out there it stores results of queries in memory you know typically in Oracle my default the raw data that your queries are accessing the raw data stored in memory and something called the buffer cache and so orca likes to try to put by default everything up in memory for you that you're trying to access but he doesn't store by default the results of that so if you're doing things like aggregate functions or executing your own functions and your code is reading a lot of data but producing a fairly small result set like I want the average price of all my orders or something like that or I want the average attendance of all my polling stations that I have out there a lot of times the we read a lot of data but the results that might be small in fact it might only be one record it's out there okay so that's where this could come into play so what we want to kind of see here is that if you are doing most types of things we can actually then store the result in memory in the result cache and there's a couple of different ways to do that I'll show you a couple of different ways here in just a second a couple of the rules that are about it we know that with materialized views we can define how often the materialized view is refreshed with the result cache we don't have that benefit all right the the results that are in the result cache those are going to get refreshed whenever the data that went into that result actually changes okay you'll see a lot of a lot of features in Oracle over the years where Oracle say well this feature is kind of targeted towards a certain type of environment is a online transaction environment is a data warehouse environment data Mart decision support what kind of environment is it and those environments typically typically have some characteristics you know you know transaction environment has lots of inserts updates and deletes some reading going on as well reporting whatever our data warehouse lots of reading lots of reporting lots of aggregation of data summarization calculations things like that that we're using to help us make certain decisions based on our business but we don't change the data a lot you know data warehouse data gets refreshed you know it varies from one place to another but gets refreshed it might be once a night it might be an incremental refresh it might be a full refresh it might get refreshed during the day once an hour twice a day once a week but it's not just hitting the database just constantly boom boom boom boom with changes that are there so this result cache is going to be good for that kind of environment where the changes aren't that frequent and you can kind of define what that frequent is as to to some people if even if we're only changing the data if maybe once every 5 or 10 minutes what we have to ask is the query and its results that we want to put in the result cache how often is it run from the various parts of the application from the various ad hoc people that might be on the system that are out there you know if even if I'm refreshing even if I'm changing the data once every five minutes if I have 10,000 concurrent users and this particular result that I want to store is at is accessed you know a thousand times in five minutes or a hundred times or even ten times in five minutes then I can still see a benefit from this okay all right now the last bolt there also says that for you PL sequel people out there and that's going to see our tip number two you can also write your own PL sequel functions that we label as result cache functions and we'll talk about that in tip 2 here just a moment all right so how come how does this thing get turned on how's it enabled how can I take advantage of this type of thing remember what we're trying to do here is you read a lot of data but you produce a small results that it might be an aggregate function like an average account to Maksim in whatever or your own function doesn't have to be an aggregate it could just be a regular query that returns ten rows typically the main benefit that we're getting out of this is reducing memory reads okay we're already assuming that we're not reading physically off of disk so we've got the raw data is stored up in memory on the database server and a lot of people are like well that that's as good as it gets right all the data is up in memory but what if I could instead of having access ten million records to show me the average price of all my orders what if I only had to read one record that would show me the average price because the result is stored so reading one record obviously is going to be faster than reading a million records since if the average price the orders doesn't change very often then I can take a huge advantage of this particular feature now how can it be implemented the result cache memory structure is already there so whether you're using it or not right now it's already there it's it's part of what's called the shared pool and the shared pool memory structure that's where your information about your code is stored for it so as you're running your SQL your PL sequel things like that those are already stored in memory in the shared pool their execution plans are stored in the shared pool they're the code the definitions of tables and columns and constraints and privileges those are all in the share pool then you also have the result cache there it's already there now to take advantage of it yet there's two main ways one is the DBA is could turn it on system-wide for everybody so you notice on this particular slide there's a parameter called result cache mode okay result cache mode they can turn that on or off now there I'm going to tell you right now they're probably not likely to turn it on for everything the result cache is a pretty small memory structure and we can make it bigger or smaller we can dictate how big the result sets are that gets stored there or not if your results that doesn't fit in there then your query runs just as it normally would it would read the raw data do the calculations and produce the result but you would not get any benefits unless the result is actually stored in the result cache so they're probably not going to turn it on for everything because what if I go out there right now and it's turned on for everything and I'm running these queries that really aren't optimized very well they're not very efficient and I'm returning more records than I should yet they still fit in the result cache I might be filling up the result cache causing other things to get thrown out or to get clobbered at that point so they're probably not going to do that but all is not lost you can also use the result cache through hints you can see that on this slide here and I'm going to turn on my little arrow here so you can see here you have a couple of hints that you can take advantage of ones called result cache result underscore cache ones called if it was turned on you can say I don't want to use the result cache and so you know many of you are familiar with using hints for most hints we kind of treat them and you'll see this is one of the tips in the presentation a is kind of a last resort type of thing or if all else fails let's try this hint at least temporarily to see if I can get the optimizer to choose a different different plan so we can just use a hint so we do something like a slash Astrix plus we say result cache Astrix slash and then here comes our query now the first time we run our query with the result cache hint there's nothing in the result cache right because we've never run it before so what it will do the very first time with the hint is it'll go out to the raw data read all the data it needs to do the calculations or get the results that is supposed to and then store the results in the result cache for future references to that query now that query the next time I run it the first thing Oracle is going to do is look in the result cache to see if the result for that particular query has already been stored and it's still valid if it is in other words no changes have happened since the last time the query has been run then I can get read the result cache boom here's the result result I might only have to read one record and I bypassed the buffer cache and reading those 10 million records that went into the result in the first place now if the result is not in the result cache even if I run it with the hint or the result that's in there has been marked invalid because the raw data that went into it is changed then it just runs like a normal query it will read the raw data get the result it'll store it in the result cache for the next time again but I don't see any benefit for that okay so that's how we can take advantage that so just use that hint there you can do that the way you can tell if you're using the result cache or not for those of you that are familiar with using the explained playing utility you run your query through the explained plain utility it'll tell you in the output of the explained claim whether or not the result cache was used or if you're used to using like auto trace in sequel developer or in sequel plus it'll also you'll be able to tell from there whether or not it used the result cache or not okay I'm sort of a couple ways we can tell that just like with materialized views you can see huge benefits with this so remember what we're looking for here is queries that access a lot of data that produce small amount of results data that is not changing boom-boom-boom all the time and we can use a hint to take advantage of this result cache it's already there right we don't need to have go back and ask somebody hey can you install the result cache can you recreate the result cache memory structure it's sitting there right now it's probably empty on a lot of your systems so right now it's memory tape and it's not being used so we can also say that right now we're kind of wasting some memory if we don't take advantage of it if we have an environment and if we have some queries that could take advantage of that all right remember these slides are available out on our website - all right tuning tip - I'm just gonna go through these like so if you remember my email address at the start take down your questions and I'll give you my email address again at J Mullins at thema Singh Kham you can send me your questions as we go through here tip 2 is with PL sequels related to tip 1 you can write your own functions that are result cache enabled ok so I put an example here on the screen you already know what the benefits of result cache are so here when we do a create a replace function I can add this result cache option to my function code itself ok so I can add that right here I'm gonna say result underscore cache okay and then compile the function as you normally would and then when you execute the function it'll behave just like the SQL feature did if the result of this function based on the input into this function is already stored in the result cache then I'll get it directly from there and boom here you got it I don't have to go the raw data to recalculate anything ok so that's a good thing there so it's very simple to implement right there's nothing else you need to do now what this with when you do with your own functions notice this function has an input parameter and it's the department number okay the results that are stored in the result cache there are going to be based on the different inputs so if this one's doing what the average salary for whatever department is input so if I run it first for Department 10 it'll store Department tens average salary and the result hash if I run it later for Department 20 in Department 20 is not already there it'll calculate the results and then I'll have a result in the result castrate Department 20 so in this case I could have multiple results out there one for each of the different or unique inputs that I have to the function and very simple to implement there you can see how to do that I like that feature a lot a lot sometimes we don't have the authority or privileges to create a materialized view and you're not going to need any special privileges for this there's nothing to create you're just taking advantage of a memory structure that's already there you're adding a hint to your code or if you already have the privileges to create a function you're adding just an option to it okay so I like that one if you haven't you know if you haven't tried that one and you have queries that behave like that try it see what you hear I tip number three let's go through these like so some of these are going to be real short some of these will be a little bit longer I might kind of put some of the longer ones at the front here tip 3 gather extended statistics if necessary okay many of you aware and if you've taken some of my classes the tuning classes before we talked about object statistics all the time and they're very important when you issue a query the optimizer goes out he checks the statistics that are available like how many records are in a table how many distinct values are there for a particular column if I'm referencing a certain column and things like that and based on those statistics will help the optimizer come up with a good execution plan all right and if those statistics are missing or inaccurate that can be a problem okay what we need to know where that's where this feature comes in this is also an 11g new feature these extended statistics is that the statistics by default when they're generated they're generated on individual objects individual tables individual columns individual indexes things like that so if one thing the optimizer does not know about is a relationship or correlation between multiple columns so in your where clause if you're checking on things where where the job ID equals this and the department ID equals that well the correlation of the relationship there is that we have certain jobs within departments for example salespeople are in a certain department I may not have salespeople sprinkled across all the departments in another department I have programmers and another one I have DBA s or managers you know managers might go across all the departments for example but the data could be skewed in other words you know if I hit the right Department I'll get a lot of salespeople if I hit the wrong Department I'll get zero sales people but the optimizer doesn't know that he only knows their statistics on the job ID and their statistics on the department ID he will understand the skew or how the data is distributed within each one of those but he doesn't know by default combinations of columns and I'll show you how to create one of these extended statistics in a second this is a really good feature so if he doesn't know the relationship or the correlation between two columns like in this case job within Department he's just going to read the individual statistics and come up with an estimate on what he thinks the cardinality will be cardinality will be the estimated number of rows and the result all right and he has to do that based on the individual statistics not the combined statistics which may cause him to come up with a bad cardinality and if the cardinality is not a good estimate then what does that do next I can hear you all saying it then he comes up with potentially a bad execution plan right now how do i generate some extended statistics here I go over here to the next page there we go all right there's a couple things I need to do one is I need to create the definition of the extended statistic so for you DBA is out there that are responsible for generating statistics or for the non DBAs maybe in a non production environment that maybe have that privilege you can see it's there's a function within the DBMS stats package called create extended stats we're just going to give it the table name that we want it on and then put in the combination of the columns that we want the extended statistics to be on so in addition to getting like an example here in addition to getting statistics on job ID alone and department ID alone even if we create histograms they're going to be on they'll show us the distribution of within each individual column this way he's all is going to basically create a histogram but for the combination of the columns so that he can tell that there are 20 salespeople in Department 80 and he knows that there's zero salespeople in department 81 otherwise he all he sees is that there's one job code called sales or how many different job codes are there there might be ten different job codes and ten different departments and he's using the individual statistics for that so step one is create the extended statistics you only have to do that one time what that says is from that point forward any time statistics are generated for the in this case the table that involves these columns Oracle will go out and generate or gather additional statistics on the combination of those columns now there's a couple ways that that can happen I put one of those up here on the screen for you we can run the gather table stats stored procedure and we can do it as if we're trying to create a histogram here that's what it looks like on these two columns column 1 and column 2 if we do that the extended statistics are stored in the database immediately now you'll see if you go out there to Google and you google this particular feature you'll see some a lot of other examples out there where when they do the gather table stats they only do it for a particular table and they don't do the histogram part that'll work but here it's kind of quirky and it won't work the first time it only worked the second time that the query is actually executed so it'll go out there generate statistics for it but the the first time you run the query those extended statistics won't be used but the optimizer will recognize that they're out there and so that when you run it the next time he says hey here's a better way to do it so you might as well just go out when you generate the table stats if you're going to do these extended statistics go ahead and create a histogram on those two columns just like I have here all right and again that can have a huge difference in performance as well I've run some queries where if I look at the estimated cardinality and the explained plan output the numbers way off because I've got multiple columns even if you have multiple column indexes that doesn't matter unless you have these extended statistics so if you're having if you run your queries and you see a cardinality that's way off it might be because of this particular reason here and you just need to generate the extended statistics and then you'll be fine your cardinality estimates will be now be accurate which gives the optimizer what it wants it needs to generate good execution plans I see I see how - one question over there in the question box a couple questions there back to the tip number one and two yeah the default value for that parameter on the result cache is manual that's what that means is you just use the hint to override that and then the second question was the result cache option the answer it is is not not available for store procedures only for functions all right so our next tip there was extended statistics that's a good one I think all right tip number four PL sequel tip take advantage of both processing features where appropriate all right and many of you are familiar with this this this feature has been gaining a lot of traction the last year or year and a half now two years ago I'd go out to do some consulting or do some training and people people never heard of this type of feature it's been around for a long time but lately people people are starting to become aware of it they're starting to use it and they're starting to see the benefits of it here's the problem we run into an appeal sequel program it doesn't matter what type it is when the POC Co program is running when he runs across an SQL statement the PL sequel engine does not understand the SQL engine the SQL statement itself so he has to pass that statement over to the SQL engine and say hey you execute this and when you're done just pass me back the results and that's called a context switch the more context switches that you have in your program the more context switches you have the slower your code is going to be it's just like I'm talking to somebody we speak two different languages we have to have an interpreter between us now if we could just talk directly to each other that goes pretty quickly but I have to stop the interpreter has to understand what I'm saying the trigger has to think about how do I say this in some other language then has to say that to the other person and the other person has to respond in their language back to the interpreter and back to me and that's kind of what a context switch is there all right did we miss tip 3 sometimes when we go back here make sure I'll finish tip for you and I'll take a look there tip 3 there all right so the way that we get around those context switches is we try to do things in bulk in other words instead of sending executing a sequel statement one one at a time especially if it's inside a loop we can gather it all up and we're going to store these in arrays and then send it to the database engine one it in just one big bulk operation so you can notice here on the screen I have two really good examples on how to do that all right first one here is the select and so we're just basically same old select statement here we're just selecting the order ID the order status code from the product order table okay but instead of reading those into some variables or instead of making this an explicit cursor where we're fetching them into variables either through a for loop or a basic loop we're going to let them into these variables here in this case via order IDs and B order status codes these are arrays the V order IDs and B order status because they could be associative arrays like an index by type of array or collection sometimes people call them it could be a nested table array it could be a V array create any of those three types that you like okay so we have to declare the arrays ahead of time we do the bulk Select when the Select gets done if it selects a million records then those verbs will raise the order IDs and B order status codes then they contain a million items in them now the larger your arrays become then you start running into you know some lines that maybe we shouldn't cross like these arrays are stored in memory and something called the pga on the on the database server the bigger your arrays are the more pga memory you're going to need so we don't want get too crazy with this as far as oh i have an array with a billion records in it or a billion items and i may not have enough room in the in the pga for that but compared to okay i'm inside a loop and inside the loop i hit it SQL hitted SQL hidden SQL hidden SQL instead of doing a million contact switches in this case here I'm only doing one okay not a trick question which sounds faster one context switch or a million context switch to accomplish the same thing obviously one does so for the select it's very simple and for updates inserts deletes and merge type statements it's also very simple too it's going to be a for all statement this looks like like a loop right looks like a for loop but nowhere do you see the word loop in the syntax okay so we say for all any variable that we want to call it here we don't have to declare it in some range this is like a for loop some range you can take advantage of the arrays like v order IDs it has some what are called methods or functions like first and last or count so I want to go from the first item in that array to the last item in that array and what am I going to do I'm going to do an update statement so this update will take everything in that array whatever arrays I reference in that statement in this case is the V order IDs and it will go over to the database server in one context switch and this array is loaded with maybe a million in this case order IDs and he'll do the update over there now there's a lot more that goes into it from there there's a lot of options we could have in it because out of a million records I'm updating maybe some of them will fail and so I can I can add some other options to this for all its kind of beyond what we're going to talk about today to say things like okay save save all the ones that fail in another array or i can also say save the ones that succeed in a in a third or fourth array so I can also save the ones that are good the ones that are bad and I can do with those whatever I want maybe I store them in a log table maybe I write them out to a file things like that the for all in the update that's all one statement so when you do a for all the only things you can do inside the for all or update insert delete and merge you cannot do it you can't put an if statement this is all one statement you can't put an if in there or case in there this is all one statement so you can see big benefits by doing this reduce the context switches what does that mean your code runs faster and people have seen some pretty amazing results some of you are out there I'm sure already doing that and you've seen some great results from it I'm sure I tip five and here's a real simple one so I said some of these will go real fast certain syntax can cause the optimizer not to choose an index okay tour or tip for that matter to even consider an index so things like not so unless this tip here is to avoid not logic so not equal however you express that functions that don't have row functions already built for them just regular b-tree functions implicit conversions which is essentially going to be a function if you do a wild-card search in your wildcard operator like the percent or the underscores at the front of the string not after the first character that could cause that to happen and any arithmetic operations or things like concatenation can cause him not to consider using an index because think about it after all if and those are you've been in class before I say this all the time you think about the index in the back of a reference book does it tell you what's not in the book no same thing about the index in the database so if you if you have a knot in the air and you have indexes that are available you look at your explain plan output and they're not in they're not being used that's going to be why now what you have to do is think of this is there another way I could write this query without the knot to get the same results okay tip 6 balhae PL sequel pass parameters by reference not by value necessarily okay by default in a PL sequel program output parameters and input output parameters are passed by value okay just a regular input parameter just in all by itself is going to be passed by reference okay now what does that mean what if I have some parameters output parameters that are larger raise okay that could slow us down quite a bit because that's going to take up a lot of memory got to make sure that that array is filled and we pass that array from one program to another if I make the parameter I pass by reference instead then I'm just passing a pointer where that data is stored in memory okay we do that with the no copy option so if we look it I'll show you example here on the next page alright so in our parameter list we have the parameter name we have the parameter type and then we can say is that no copy or not and then of course it's data type okay that'll make it a pass by reference instead so if you're just passing just scalar values like a num a small number or whatever or a name or something like that not such a big deal but if you're passing arrays or records or objects then you may want to consider using the no copy option on that and pass by reference instead it can make a big difference all right chip number seven perform joins with the correct join method and what does that mean that means we wear people that we have access to the explain plan utility or somebody else runs it for s and they're explaining it back to us or they provide the results to us we need to know in their would typically see things like Oh Mike we redid a nested loop join it did a hash join it did a sort merge join okay is that good or bad when I see a nested loop well we just need to know when is a nested loop I'm appropriate when is a hash join appropriate when is a sort merge join appropriate if we look over here on the next page kind of just some general rules for those so that when you're looking at your explain playing output you know what you're kind of looking for here nested loops are generally better for small result sets and we're looking at the optimizer will typically use that based on values like how big are these result sets maybe after I do a filter maybe before I do a filter like my where clause and he's also looking for indexes on the columns that are being joined here namely the foreign key column in the primary key column which we know will have a probably have an index on it but the foreign key column won't necessarily have an index on it every time okay so that's where the nests if you're doing nested if you're seeing a lot of nested loops and you're looking at the explained playing output and the cardinality that's estimated for those nested loops in the millions of Records or tens of millions hundreds of millions your performance can't be very good even if indexes are being used the better join method for larger result sets is the hash joints he typically will choose the hash join when he does not see an index on the foreign key columns that are being joined the hash join he's going to build a hash table based on the fields that were doing the joint on he's going to store that in the PGA if possible now the PGA is not large enough and we're going to talk about more about the PGA a little bit later on then he says well I could do the hash join and created the hash table on disk but that not very fast so if your PGA's too small that might cause the optimizer to choose a nested loop join even if he has a lot of data because he says I don't need to PGA for the nested loop join so the PGA that memory structure or the temporary work is done is a big factor here now sort merge-join for large result sets is still typically better than a nested loop but typically not as good as a hash joint sort merge says I take two objects maybe two tables whatever or two result sets and I'm joining them on certain columns I have to do a sort on those columns so I might have to do two sorts and that's the downfall the sort merge I have to do two sorts a lot of the time or at least one sort rarely do I not have to do a sort on that one and so that causes a problem so we typically avoid those but and again the sorts are done in the pga just like the hash tables where they're created so we just need to know the rules of those different join methods if we're seeing them show up in our explain plan which ones are appropriate for what types of joins that are going on as far as the volumes of data go okay all right compare performance between alternatives syntax of your code you guys know this right but I have to say it and and when we teach it in the class I'll have people write as part of their exercise okay here's here's a problem for you now right come up with four different ways to get the answer I want people to start thinking about that because I don't what I don't want you to do is you come up with one way to get the answer and it's slow and now you just start banging your head banging your head bang your head trying to do anything possible to get that piece of code to run fast you've tried everything the DBAs have tried everything it's still slow did you try writing in a different way and many of you are very experienced you'll say of course we did and but some people they have you know our brains are fixed a certain way to say some people are joined people and some people are sub-query people some people never think about set operators so we get into that kind of habit of coding a certain way which kind of gets in our way of coming up with alternative ways with that in mind we can't always say that one way is always better than another way we can't say that Oh an on correlated sub-query is always better than an outer join or vice versa you know sometimes one way is better than another way based on all kinds of factors that are out there right here's an example that on the next page here's the problem it will pop up there on your screen here in a second there you go find all the customers that have not placed an order okay we have a customer table that contains all the customers we have a product order table that contains all orders period that have been placed by it some some of those customers so in other words the customer table has everybody the product order table only has those customers that have placed an order I need to know the difference between the two and I could solve this with what's called an anti join which is an outer join where we're doing a search on where something is null I could do it with a sub a non correlated sub-query with a not in I could do it with a correlated sub-query with the exist operator or I could use the minus set operator and get the same result right now you'll read a lot of places that'll say things like the exist is is typically faster than a not in or an in something like that and a lot of times that's true but it also depends on a number of things like with the exist it's a it's a correlated sub-query so we're doing a join between query results well it's going to depend a lot on what's indexed in those two queries it's just like doing a join at that point ok so just be aware if you can't get one way to work right it another way alright here's a real simple one and a lot of you a lot of you do do your coding this way anyway but you may not may not realize that you're getting evident and ever-so-slight performance gained by doing it use correlation IDs when performing joins on columns on the columns that are involved in the query whether it's required or not and likely say well I always you know like in this case here I'm doing what a one two three four table joins and I've got table aliases or correlation IDs associated with each of those tables I only need to qualify the columns in the query with those IDs like II and D and EP and J if they appear in more than one table right so like last name is only in the employee table I don't have to qualify that but at the time that this query is parsed I get an ever so slight performance gain and I don't know about you but I'll take anything I can get if I don't qualify that with an e here's what the optimizer has to do Oh last name last name John didn't put an e dot or d dot no correlation ID on the front of it so what tables last name in he has to go through each of the tables in the from clause his last name in the employee table and he has to do a select to do that a recursive query to do that hopefully it's in memory Oh last name is in the employee table great okay but since hey I didn't specify I want the one from the employee table now he has to check to the department table to see if it also appears there to see if I have a problem with ambiguous columns here is it in the department table yes or no no it's not is it the employee private table yes or no no it's not is it in the job table and the more tables I have the slower the parsing is going to be so very simple tip here whether it's required or not whenever you're doing joins qualify every column with either the full table name or the correlation ID or I call them table aliases whatever you want to do there all right again related to joins here's the next one and those last ones go pretty quick for us to hang in there I analyzed joins one by one and and check each one to make sure they make sense before you go to the next joint so if in other words if you're doing a six table join I have people they have problems with six table joins and performance and they're looking at all six tables at once and they're confused they're like I don't know what to look at well behind the scenes no matter how many tables you have the optimizer is only looking at two things at a time anyway so we might as well look at two things at a time as well okay go to your explain plain output which two things were joined first what kind of join method was used was it nested loop was a hash join was a sort merge joint that it does that make sense for the cardinality the volume of data we're talking about if it does you continue on if it doesn't then you stop there's no need to look at the rest right there at that point why did he choose a hash join instead of a nested loop join may be the cardinality is off maybe the statistics are off maybe it's missing an index maybe it has an index it all kinds of factors there but you can't get your arms around eight or nine or ten tables at once just focus on two at a time all right very simple tip there and a lot of you already do that but from some of the user that are maybe newer to joins and newer to performance chaining it makes a big deal all right let's go to tip 11 this also has to do with joins you can kind of see a theme here eliminate rows as early as possible in the join process or the join order think about this let's say you're doing a four table join and also in your where clause you're also saying things like where department ID equals ten and salary is greater than fifty thousand and this is true and this is true what the optimizer has to decide is do I apply the filters first and get a smaller result set and then do the joins or do I do the joins first which will be much larger and after I do the joins throw stuff out and you might say well that sounds ridiculous you should do the filters first and that is indeed one of the goals of the optimizer is to do the filters first but we know while the optimizer is very good that sometimes the optimizer comes up with a bad plan for various reasons so we want to make sure when we look at the explained plan out but where did he filter the data am i doing a join to a 10 million row table to a hundred million row table and then I'm throwing stuff out or my throwing stuff out first which then results in me doing a join between ten thousand rows and three thousand rows is if if you can make the join smaller then your performance is going to be better all right a lot of people will look at this slide and say well this is a DBA only slide I don't like to look at it that way and this has to do with both SQL and PLC equal both understand potential bottlenecks in the architecture somewhere namely in either the database files themselves or in the memory structures that support your code I won't go into great detail on all these here but just be aware the buffer cache that's where all your data goes your data from your tables your data from your indexes your do your change data your undo data it's all up in the buffer cache if it's not sized properly or have too much contention going on in there there's a lot of other factors is it being managed automatically or manually by the DBA s I could have a bottleneck there the shared pool that's where your code goes what about repeated code can it take advantage of execution plans they're already in this Jerr pool or not but if the share pool isn't sized properly the DBAs typically will do a great job sizing these memory structuring but sometimes it's hard to predict certain transaction things that will happen in the future whether they be batch jobs or small transactions we talked about the PGA that's your temporary work area that's where sorts are done in memory hopefully we'll talk more about the pga in just a second and then your redo log buffer that's where your changes only go so if you have a system where you're doing a lot of inserts updates and deletes we could have a bottleneck there and then the redo log files which are part of the database files we could have some issues there as well so just be aware of what these terms are and that they potentially could be bottlenecks causing your code to run slow if you don't have a good foundation like a good memory structure foundation where your data can go your code can go your temporary work that could be done then your code is in real trouble we shouldn't really spend a whole lot of time looking at indexes and this and that about your code if the foundation is cracking so just be aware of those so that you can sit down and talk with the DBAs and everybody is on kind of the same page there all right one thing that we need to know is that at the lowest level the architecture is the block all right your data in a table is stored in blocks that's the lowest part of the architecture the records in your table are stored in blocks depending on your record size you might get one record per block or a thousand records per block or 10,000 records per block this is just another tip to be aware of the main goal the optimizer is to read the fewest number of blocks possible to get the same result set so so what the optimizer looks at it says hmmm if I read and if I use an index it's going to take approximately you know 50 blocks to get the data I need if I do a full table scan it's going to take approximately 10 blocks to read and he says oh reading 10 blocks is better than 50 blocks so I'm going to choose a full table scan at least for now so the you were number blocks that we can read the faster our code is going to be now how can I read fewer blocks well sometimes reading an index is fewer blocks and doing the full table scan believe it or not sometimes doing a full table scan is actually fewer blocks and reading an index what if the blocks were bigger and I can get more records per block that means I'm reading fewer blocks maybe the key here is the number of blocks were reading not necessarily the number of Records were reading so if I have bigger blocks I can get more records for a block and I might build then in that case I could see a performance benefit from that okay each of these blocks has some options and parameters to the DBAs are well aware of these percent free and percent use percent free in particular is a level within the block that saves space in the block for updates to the records that are already in the block well what if this is data that doesn't get updated it's read-only data it gets refreshed nightly total refresh maybe I can make my percent free smaller which means what get more records per block that's my goal so just just have it on your list the whole point of this is this when you have performance problems we want a list we want a list of options what are the alternatives last thing I want is there's only one thing we can do and only one thing we can do and it's not a great option but let's do it what I'd rather have is okay here's a big list of how I could improve this query in this case here we might have 20 options in this webinar today that's what we want you go through each one well maybe this one's appropriate maybe it's not hey this one is appropriate let's check it out then go to the next one hey that one worked a little bit better than the previous one it's much better than only being kind of focused on one thing and only one thing right tip 14 alright there's some parameters that we need to be aware of and let me stress this one more time to me it does not matter if you're a DBA or developer or programmer you need to be aware of these things like the architectural things as well I want everybody understanding big picture there are some parameters these are parameters though that are set at the database level that can also be changed at the session level but these parameters all affect and can influence the optimizer one way or another okay optimizer moe just be aware of what they are for now optimizer mode tells the optimizer to come up with a plan based on a certain type of thinking the DB file multi-block read count that's that full table scan parameter that tells the optimizer that if you if you choose a full table scan I'm going to let you read multiple blocks in a single IO verses with an index typically can index range scan you can only read one block at a time with this I can read multiple blocks at a time it's kind of like do them almost like doing bulk processing and then the index optimizer index caching and optimizer cost adjustment those have to do with what can I expect from my indexes the index caching means if you choose an index optimizer the higher this number is the more likely that you'll find it in memory because what if that number is real low and the optimizer saying hmm-hmm bill should I use an index or should I do a full table scan well this parameter here says that my chances of finding the index in the buffer cache in memory is very slim so maybe I should just read the entire table these all influence the optimizer when he chooses a certain execution plan and then same thing on the optimizer cost adjustment by default it says that index access and full table scan access costs the same that's like going to the store and you're trying to choose between a red apple and a green apple but they cook both costs 50 Cent's which ones do you choose well cost does not become a factor in your decision but what if the red apples 20 cents and the green apple is 80 cents then cost might be a factor in which one you choose so with this one if we lower the cost adjustment from its default value which is a 100 if we lower that that says indexes cost less than full tape scans in other words indexes are on sale and who doesn't like a sale I do so just be aware these parameters can influence the optimizer to come up with different execution plans and in some cases a bad execution plan all right 15 we're running downhill here we're almost done so hang with me I'm sorry about going over a little bit here when creating a multi-column index make sure you put them in the right order many you've already know that bottom line here is what for just regular b-tree index access that we're trying to tune we want the most selective column first and what columns do we see in our where clause a lot but is that it also the most selective out of all the columns we're putting in our index you know primary key columns are very selective right other columns might not be many of you may have heard of the Skip scan feature with indexes it wants the opposite of that it wants the least selective or the less selective columns first so if you're building all your columns based on the rule and then this is a very common rule to put the most selective columns first you're probably not seeing a whole lot of skip scan access methods going on skip scan says I have an index built on let's say these three columns 1 2 3 but your query only accesses columns 2 & 3 not the leading column of the index alright if that leading column happens to be non selective or leave very less selective compared to the others then it is possible the Oracle could still use the index even though you don't reference the leading column but only if it's a really selective column he's not ever going to do that and so 15 is build your indexes with the most selective column first 16 avoid unnecessary sorts okay we know that we need to know what causes sorts order buys the sinks group buys unions intersects - hash join sort merge joins if you create an index whether it's non unique or neat you get it it has to do temporary work it has to do a sort generating statistics on objects and all that work is done in the PGA okay a little bit about the PGA here before we wrap up that's where all your temporary work is done we want all that temporary work to be done in memory if possible if not it goes to the temporary tablespace or a temporary tablespace hey which means disk physical reads that slows us down also be aware of this by default an individual process cannot consume the entire pga there's an undocumented parameter called underscore pga max size it says that each individual process can only take up a certain amount of the pga at one time that way nobody can hog the whole thing okay so if you add an order by it might get slower you add it you change something and now he's doing a hash join instead of a nested loop it might might actually get slower because he's trying to do it in the pga and there's not enough room in the pga or too much contention in the pga maybe he's having to go to disk last three here do not overuse selects this says select from dual but really any SQL and your PL sequel programs when you don't have to remember that's a context switch I see people do this all the time and they may even have it inside a loop where they're doing a something like a selects estate into a variable from dual rather than doing that how about we do this VV date colon equals sis date there's no SQL and that that assignment operation there we say well sis dates an SQL function but SQL functions are optimized differently than an SQL statement so this do will be faster than the Select because it doesn't have to do a context switch and the more often that you do you can avoid that the bigger benefit you'll get okay another thing is if you're using sequences in there we say here instead of doing select sequence name dot next Val to get the next value of a sequence from duel you can just assign the next value of the sequence to a variable again avoiding the context switch we need to as we're writing our code we need to ask ourself anytime you put an SQL statement in your PL sequel program you should ask yourself is there another way I could write this and avoid doing SQL all together inside my PL sequel program that's your tip all right and then some very kind of general ones here at the end is ql PL sequel i at every class i ask people this question do you have in your environment in your organization in your company do you have SQL and plc call standard documents that everybody supposed to follow that are enforced by somebody that are checked by somebody do you have code reviews to make sure everybody's following these because as many of you know like fern peels sequel there's three different ways to do loops for loop while loop basic loop which one should we do do I do if then else if then else if do I do a simple case to assert do a search case what do I make an upper case what do I put in lower case whether it's SQL or not I put code on one line or multiple lines does it matter for performance sometimes it does a lot of times it'll make it more readable more understandable but sometimes indirectly you could get a performance gain out of that so if you're not aware of in your environment do you have standards that you should be following ask somebody when you go back to your desk today all right if you haven't seen them before but you know they're there today's a perfect day to go out and find them and read them and see what's in there a lot of times people say well we have standards but nobody enforces them sometimes people will say well we all have our own individual standards that means you don't have any right because everybody's standard might be different it can matter all right last - thank you for hanging in there take advantage of available tuning tools does anybody use the tuning advisor are you licensed for it or not it's part of it either the tuning pack or the diagnostic pack most large corporations are licensed for it SQL tuning advisor you give them your code the tuning advisor looks at it and comes up with here's what I found here's what I recommend here's the rationale behind my recommendation and oh by the way there might be some fear some code if you decide to implement my recommendation it's just like another person you might go to and say hey a Dave what do you think of this code so now we're going to the tuning advisor for that okay so if you haven't seen some of these before you might want to take advantage of them they're going to help in tuning a lot every one of these I find beneficial and useful whether you're a DBA or a developer if you don't have the privilege to run it you ask the DBA to run it I work as a DBA most of the time I have no problem with that you come to me and say John this queries giving these fits I've tried this I've tried that I've ran through explain playing I've checked the indexes I check the statistics everything looks good I'm not sure what's going on can we run it through the tuning advisor and see what it thinks or what do you think okay PL sequel profiler there's a package for that do you have the privileges to execute the programs in that package maybe you do maybe you don't it'll tell you which which statements within your PL sequel program executed the most which ones consume the most time so if your PL sequel program took four hours to run what part of that program consumed most of that four hours was it one statement was it several statements was it just accumulated over the entire pro rave so it's kind of hard to tell but if it was only one statement oh this one statement took three and a half out of the four hours the rest of the program ran pretty quickly now I can focus on that one statement and tune it and then most people are familiar with explain plan auto trace and then you get into the tracing events that are available in Oracle like the one zero zero four six which then we can see it's kind of information it's very similar to explain playing an auto trace kind of combined the one zero zero five three is kind of interesting it'll tell you okay I pass a statement to the optimizer and if you run explain plan and says oh here's the plan I chose or here's the plan I think I would choose if I were to run your statement but the one zero zero five three tells you is not only here's the plan I chose but here's the key part here's all the other Pro all the other plans that I considered so if you're looking at the plan he chose and you're like that's a bad plan why did he choose that I can't believe it you can look in that one zero zero five three trace and see oh I see he considered this plan here which I think would be better and here's why he rejected it for whatever reason and you can see all the other plans there might be a hundred plans in there that he considered so take advantage of those other tuning tools whether you have the privileges or not be aware they're there so you can ask other people that do have the privileges to run them for you or if they have already run them and lastly I made this last on purpose hints a lot of discussion a lot of debate about how hints should be used okay III view hints as a temporary solution as a last result solution I know there's some of you out there that use hints boom right away my query doesn't run he did not use an index I'm gonna slap an index hint on there okay there are better permanent solutions than a hint available why did he not use your index is it because of your code is it because of the statistics is it because of a parameter setting is it because of a memory structure size that's bad there's all kinds of things that could cause them not to choose an index now you could use that hint index hint right now so that hopefully he would use it although there's no guarantee hopefully he will use it and that buys you some time while you try to wet diagnose the true problem and a more permanent solution so it's okay for that all right we talked about a lot of stuff there didn't weigh anything from simple things like putting aliases on your column names when you do joins to more complex things like taking advantage of extended statistics or the result cache we saw kinds and this is just 20 we should come up with 50 or 100 if we sat down and did this some would be very simple some would be more complex there are a lot of them out there for sure alright we talked about many of the other things in many other full classes that we have you know the SQL optimization class for example is three days long today we we had an hour so imagine what we could talk about in three days not only about these twenty but actually trying these twenty and trying you know forty other ones as well so lots of classes out there available for you to to take advantage of as well if you just go to FEMA Singh Kham you can see a list of those classes out there John Pak eval is a great person you can get ahold of there's his email address remember my email address too was Jay Mullins at FEMA Singh Kham if you want to copy the slides they're already out there at Thema Singh Kham slash webinars and like I say it in a day or so that the full presentation recording of this presentation will be out there as well now if you go out to the slash webinars you're going to find all kinds of other webinars out there if you've never been out there before there's other oracle ones out there there's db2 ones out there there's Java ones out there there's all kinds of webinars out there so take advantage of those those webinars are free in it if you decide you want more information more detailed stuff then you could certainly sign up for a class or request the class as well okay all right I appreciate everybody coming today I appreciate you sticking around to the very end - I'm very sorry that I went over as you guys know from class I tend to do that a little bit but hopefully some of those topics there will give you some kind of fuel for thought hopefully took some good notes if not go out and get a copy of the slides if you have any questions down the road send me an email at Jay Mullins at demas Singh Kham all right thank you everybody for attending hopefully you'll have a great end to your week and a good weekend that's coming up and hopefully I'll see you again in a future webinar we do have another webinar coming up on April 27th for the db2 folks out there that David Simpson has been doing it'll be part three of running SQL in the 21st century there so you can certainly come out and and take a look at that part three of a series he's been doing already all right thanks everybody hopefully I'll talk to you again soon have a great day
Info
Channel: Themis Education
Views: 59,671
Rating: undefined out of 5
Keywords: Oracle Database, SQL, PL/SQL, Database Tuning
Id: SxEQbz8tOwU
Channel Id: undefined
Length: 71min 52sec (4312 seconds)
Published: Thu Mar 24 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.