PostgresOpen 2019 Mistaken And Ignored Parameters While Optimizing A PostgreSQL Database

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
it and at the same time can all my active data set be cached in memory will that be a good approach by just increasing the memory in such a way that all the active data set could be cached in memory and do I have lot of sequence scans versus index scans so all of this data can be analyzed right and it may be helpful in understanding the behavior of your database and at the same time you may have several unused indexes which could even be writing you know which could be involved in writing a lot of performing a lot of right-eye ops onto the database and at the same time bloated tables and you know you may identify bloated tables and vacuum them to improve performance and you may partition bigger table so that you know I mean the amount of data that's fetched from the disk could be lesser and at the same time moving tables between table spaces so that you could distribute the I ops across multiple disks right maybe moving the hard data to different table spaces that are on flash disk could be right so these are some of the tuning techniques that you follow which are very in general but I would definitely want to go ahead and start talking about some of the mistaken parameters in Postgres and once we are done with the mistaken parameters I would want to discuss about the ignored parameters in Postgres right so when we talk about the mistaken parameters what I'm going to talk about this pickup twelve Postgres parameters and explain what they mean and also explain you how sometimes we mistake this mistakenly set different values to it but how they are supposed to supposedly to be set so the first parameter that comes to our mind when we are talking about tuning Postgres is shared buffers so I would want to start there right so what is this parameter it's basically the memory right for Postgres so that's the database memory area and all the pages that are being dirtied or requested with a client are fetched to this shared buffers and then it uses the LRU algorithm the least recently used algorithm to flush the pages away from the shared buffers so what we do is we follow a magic number like 25% of the total RAM or you know some time back like few years ago at conferences I used to see people talking about allocate 8 gigs to shared buffers that was a magic number right I don't know how many of you have heard about it but eight gates used to be the magic number twenty-five percent of total DRAM used to be a magic number - but it's not the same all the time 90 first 90 percent of the total DRAM could also get you better performance right and what you need to understand is Posca sequel does not do direct IO so what is direct IO consider Oracle and MySQL like databases let's say you run a select query on a table and that sell a query need to basically fetch from records which are there in the pages that are on the disk now they are fetched from the disk to the memory area of the database which is which could be a database buffer cache in Oracle or the inner DB buffer pool in MySQL likewise in Postgres it could be shared buffers so the pages from the disk need to be fetched to the memory right but in Postgres it's not directly fetched from os/2 shared buffers when the page that it needs is not found in shared buffers Postgres would go ahead and request the operating system and OS would fetch that page and get it to the OS Kashi and from there it will be copied to the shed buffers right so this way you may be actually double cashing right so that's why we keep hearing about Gail priority 2 OS cash sometimes but it's not all the time I mean when you can put all your active data set in memory and at the same time you do not have huge amount of sorts happening and your work mem can be set to too low right you could basically go ahead and try to understand your system better in a way that you could understand whether you could set more you could give more RAM to shared buffers or less RAM to share buffers right and PG buffer cache is one of the beautiful extensions available in Postgres to understand what's in your shared buffers so you could see how much portion of a table is stored in I mean is currently in the shared buffers so you could snapshot that to understand your hot or the active data set right so shared buffers need not be this 25 percent or the 8 GB magic number all the time try to use PG buffer cache extension and see what's in the shed buffers at the same time use vm touch or some other OS tools to understand what are those files that are frequently cached in memory and then you could understand how much amount of shared buffers can actually be allocated so that every request is served just from shared buffers or you may reduce a shared buffers and give more preference to the u.s. cash workmen again this is the where basically work ma'am if I remember correctly 4 megabytes is the default work mem value right and when you just install Postgres the value set to workmen would be 4 megabytes what does it do it means that amount of memory will be allocated for each sort operation right and if in case your query is performing multiple sort operations n times this work amount of memory is allocated to that so basically you may want to increase it so that you could avoid write in to temp files which are on the disk because writing to memory is always more faster anything that's happening in is more faster than on the disk right however in some of the analytics as well as some of the reporting queries that you run or some other type of different workloads you may need to have more work ma'am I have seen environments where the set 256 megabytes of work ma'am or 128 MB or even one gig but when you do that you need to remember if in case you have let's say 1000 max connections which is not right number maybe let's say you have a hundred or 200 max connection set I'm sorry and you have that means so many concurrent connections can run at a point of time and you also have work memset 228 megabytes so 128 times 100 right and let's say each query is doing more than one sought operation it could be times n so so much RAM is just dedicated to perform sorting and you could really get into out of memory situations right so it's not always that you could just increase work mem to avoid writing to temp so you could query views like PG Stata database to see how much of temp is being written right how much of temp gel raishin is happening just by snapshotting PG star database view but just by looking at that we cannot just go ahead and globally increase the work ma'am if you have specific sql's or the queries or analytic reporting queries which need more work mem use session level work mem instead from your application use session level workmen which is just like set work mem to 1gb or Phi to LMB and that would only be valid for that session do not increase the workmen globally when you need to increase it to bigger numbers and when you do that you could get into these errors effective caches size so this is only used for estimation purposes most of the time people mistakenly they assume that this is reserved for cash or from the cash right but that's not what it is so what it is used for is the planner the optimizer Postgres optimizer would basically see how much of effective cash a size is available so it then assumes that so much of RAM is available for cashing indexes right based on that it gives priority to index scans over sequence scans and if you set it to too low then it would prefer a sequence camp so usually like in most of the environments you see people setting at a 70% or more when you have faster discs and indexes can always be cached right so setting it to 70 percent or 90 percent doesn't mean that it reserves any memory it basically is only used for estimation purposes nothing else auto vacuum right many of us or maybe not from here I mean in this room but most of the users sometimes just go ahead and say I would like to turn or a vacuum sorry I would like to turn sorry can I get some water yeah I would like to turn order vacuum to off and when you do that you need to understand that there are three activities being performed by auto vacuum and that is freezing the Excite of table cleaning the triples and at the same time updating statistics of table so these three activities you need to go ahead and write your own manual vacuum logic which is very very difficult to hack I'm shouting after a very long time yeah okay so always make sure that you're never turning it off because if you're turning it off then you need to handle one of the important aspects which is freezing of exciting's because you could get into a transaction ID wraparound problem how many of you really understand the transaction ID wraparound problem okay okay so but we can we can definitely talk about it after this you know session you could meet me if in case you would like to know that but to avoid that you need to have auto vacuum running right yeah and so never turn this to off if somebody says that because there can be three important activities that can be affected track counts so we discussed about the three important activity is done by the order vacuum and those activities will only be performed by the statistics collected when track counts is set to on if track counts is set to OFF then auto vacuum does not know you know what happened earlier like how many inserts updates deletes have happened those statistics are not captured and now order vacuum does not know what to do except for freezing right so never said this to off because the tracking of statistics will not work and important and the most important parameters now related to auto vacuum vacuum scale factor and vacuum threshold I'm going to talk about some of the algorithms and it'll be very interesting starting from here right so we'll also talk about auto vacuum analyze scale factor and analyze threshold so what is auto vacuum vacuum we discussed about Ottawa cume always you know it should be on and it has got several activities that it should do so one of the important aspects is it cleaning up the debt tables what our debt tables right in Oracle and MySQL when a delete happens or an update happens right an update is also an insert and a delete so when tuples are deleted those deleted tuples are stored in a separate storage for undo for rollback right or for consistency but in Postgres they are stored in its own table and when it's stored in its own table its stored as multiple versions right through the transaction ID that actually deleted it so what we are trying to say is the undo is also stored in its own table but not a separate storage right and now in order to clean up those older versions we need this auto vacuum process and when auto vacuum is set to on based on the statistics collected because track counts is set to on right based on those statistics an auto vacuum vacuum which is to clean up those debt tables nothing but the older version deleted records right to clean up will start on the tables and those start when some of updates plus deletes is equal to scale factor times total number of tuples plus vacuum threshold right so when this formula is satisfied right based on the statistics collected do to track counts being set to on right then our our auto vacuum vacuum runs on a table and now that cleans up the dead tuples and if possible also freezes the tuples X ID which is something totally different we can talk about that later so scale factor is the fraction of table records that we added to that will be added to the formula so 0.2 means 20% of the table records 0.2 means 20% and vacuum threshold is the minimum number of obsolete records so the default of vacuum threshold is 50 and the default of scale factor is 0.2 now to understand all of this just understand this formula right so if order vacuum vacuum scale factor is set to zero point two and if you have total number of tuples nothing but records or rows in a table equal to 1000 and vacuum threshold equal to 50 that means 1000 into 0.2 plus 50 which is 250 so for every 250 updates plus deletes we see the table as a candidate for auto vacuum vacuum right so based on this formula when we have thousand tuples and how many such auto vacuums can run at a point of time it depends on the auto vacuum max workers will talk about that process and each worker process within each database every auto vacuum nap time seconds has started that's usually set to one minute right so if there are n databases a worker is launched every automatic um nap time by n seconds right so there will be a cycle of auto vacuums running all the time in your database so these are global settings right similarly automic um analyze what does analyze do analyze updates the statistics so that the optimizer could use them while preparing execution plans so here we are talking about statistics through analyze and some time back we talked about statistics through track counts and what track counts is actually doing is the stats collector process the background process is basically trying to understand or capture how many inserts updates and deletes are happening on the database and that data will be used by auto vacuum but these statistics that are collected through analyze will be used by the optimizer to prepare execution plans so similar to auto vacuum vacuum scale factor you also have auto vacuum analyze scale factor which is equal to you know the fraction of table records so 0.1 is a default so it means 10% of table records it should be order vacuum analyze threshold it's a mystic spelling mistake so it's a number of so just look at this formula again the analyze scale factor is set to 0.1 total number of tuples 1000 analyze threshold is 50 which means for every 150 inserts plus updates plus deletes you see an auto vacuum analyze again a spelling mystic so the auto vacuum launch a process basically spawns the worker process that runs either the vacuum or the analyze when these formulas are reached okay so so far you just need to understand just basic things about auto vacuum vacuum or analyze vacuum cleans up the dead tuples and analyze updates of statistics an auto vacuum is a background process that takes care of these based on these formulas now how to set it correctly global parameters may not be appropriate all the time why because it's regardless of the size of the table consider two tables with 10 records and a million records right through those formulas the frequency at which a vacuum or an analyze runs automatically could be greater for the table with just 10 records isn't that right yeah so instead you stable level order vacuum settings okay so see what are those tables that are frequently vacuumed due to this reason right and see if you can set table level order vacuum and most of the times I see people still taking care of vacuum logic manually yes by keeping the order vacuum set to on it's okay to go ahead and look for those paper tables that are missing vacuum due to that condition and run a nightly or a weekly vacuum and vacuum is on line operation it does not lock any of activity right and as I was discussing earlier there cannot be more than auto vacuum acts workers number of auto vacuum processes and this auto vacuum process includes vacuum and analyze together right so they cannot be three there cannot be anything more than three that's a default number we could always change it we could always increase that number as well but we will talk about that in order to talk about that we need to talk about auto vacuum IO right so an automatic you reads 8kb pages of a table from the disk right and it then so what what auto vacuum does basically it fetches the pages from the disk and then looks for the pages nothing but the blocks right that have the data pools and cleans those debt tables up so that the future inserts could reuse that space that's clear right so an auto IQ fetches a page of a table from the disk to memory and deletes the detrol and updates the free space map so that that space can be reused by the future insert right and so considering the default block sizes 8 kilobyte it involves both read and also write i/o right when it's cleaning up the dead tuples it's a write i/o when it's fetching the block or the page from the disk it's a redial there are four or five parameters that you need to consider again Auto vacuum vacuum cost limit the total cost limit the order vacuum could reach so it's it's basically combined by all the auto vacuum jobs so what is the total amount of cost that can be reached by all the auto vacuum processes together and if it goes beyond that cost then it needs to pause itself and proceed further after a pause that pause is the auto vacuum vacuum cost delay we'll we'll see how this cost is calculated a little later so it waits for so much of amount of time when this cost is reached vacuum cost page hit this parameter the default value is set to 1 what this means is what is the cost of reading a page or a block that is already in shared buffers and vacuum cost page miss is the cost of fetching a page that is not in shared buffers from the disk and vacuum cost page dirty is the cost of writing to each page when there are data pools in it so cost of reading for a page from memory is cost page hit cost page hit miss is the cost of fetching a page that is not in shared buffers and page dirty is the cost of writing to each page when dirty buffers are there in it or dirt debt tuples are there not so let's look at the defaults auto vacuum vacuum cost limit is set to minus 1 by default so what it means is it defaults to vacuum cost limit now and what is vacuum cost limit it is set to 200 okay an automatic um vacuum cost delay set to 20 milliseconds so when that 200 cost is reached the auto vacuum should pause for 20 milliseconds cost of heating I mean cost of fetching a page from shared buffers cos page hit is 1 cost page miss when it's not there and shared buffers fetching a page from the disk that is 10 cost page dirty to clear the debt tuples that means a right to a page is 20 so let's imagine what can happen in one second one second is equal to 1,000 milliseconds right in a best-case scenario considering the read latency is zero milliseconds auto vacuum can wake up go for sleep 50 times that means 1,000 milliseconds by 20 milliseconds is 50 right so let's consider Auto vacuum started it reads the cost it sleeps for 20 milliseconds again starts reads the cost again 20 milliseconds of C sleep so considering how many times it could sleep right it could sleep 50 times in one second the most and one second is equal to you know 15 to order vacuum vacuum cost delay now read I or limitation if all the pages with the triples are found in shared buffers in every wake up to hundred pages can be read because cost of reading a page from shared buffers is one right so it's nothing but 200 into one or two hundred divided by one right so in one second 50 times 200 by vacuum page cost I mean cause page hit into eight kilobytes so in one second with the default parameters 78 point one three megabytes can be read by all the auto vacuum processes based on the default settings and if pages are not in shared buffers then with the formula it can read up to seven point eight one megabytes per second from the disk and it can write up to three point nine megabytes per second to the disk or it can eat to the pages basically right so these are the read io and the right are your limitations with the default parameters now consider increasing the auto vacuum max workers what happens you increase that from three to six these limits are shared by all those worker processes is there any benefit no right so just increasing the worker processes is not a benefit all the time you need to consider even these parameters that are basically you know talking about the cost limits so that's that's what our next slide is all about so it is a number of auto vacuum workers that could run concurrently defaults to three check for the number of databases just increasing this would not work because the cost limit is shared across all the vacuums so you must go ahead and increase for example increasing vacuum cost limit to 2,000 means means 781 megabytes of reads per second from the memory and 71 megabytes of reads from the disk and 39 megabytes of writes per second right so consider your disk and check how much I ops it can handle right and then based on all the other things that you applied like adding the most fastest disk like a flash disk or could be you know like you scattered all your tables to multiple disks and then check how much writes and reads it could handle and based on that tune the cost limit and increase the max workers proportionally right so that is what you need to do in order to go ahead and play with the auto vacuum settings to finish your order vacuum more faster yes so these defaults are basically the numbers that we are telling Postgres this is the cost so that's why we could go ahead and play with those defaults and change it when we know about the disc yes though these defaults are done pretty long back considering the disc I mean the older hard discs right these defaults work for them but nowadays SSDs are you know quite the faster discs are quite more often I mean you could use them and the last mistaken parameter before we go into the ignored parameters maintenance workmen so this is the amount of memory nothing but from the RAM used by maintenance operations such as create index and Alta table at foreign key like maintenance tasks so in order to perform a faster database restore you may go ahead and increase your maintenance work mem so that the create index at least works more faster right so it it will be better performance I mean it'll be better performance when you set this to a higher value during a database restore how to set it correctly if your auto vacuum work mem is left at minus 1 which is the default right even your auto accuse could use maintenance work mem amount of RAM right if you have enough faster disk and I can you know do all its job nicely it could take more RAM so careful with that when you're setting it and at the same time check this formula as well right number of concurrent maintenance operations during peak times maintenance work ma'am should be very much less than total RAM - shared buffers plus work mem times max concurrent connections times N and that n you know it looks like a totally difficult formula to understand but when you basically look at the slides and try to relate it maybe yeah but that n is nothing but the number of sort operations that are happening for each query because sometimes each SQL may have like you know more sort operations than one right so that end can be substituted by that so it's just a formula that anyone could come up with I mean I know people can change many things in it so ignore parameters let's look at a very few ignore parameters random page cost right so this is a frequently ignored setting whenever we kind of perform architectural health checks and I you know or look at the customer environments we do see frequently this setting being ignored it is a cost of non sequentially fetched our disk page because indexes are considered to be random scan right the default is set to four so as this cost is set to four and sequence page cost is set to one right an optimizer could basically prefer a sequence can more say if you go ahead and query PG stat user tables and look at the number of sequence scans of a table compare it with the index scans sometimes a table that has caught an index could be performing better when that index is scan but still optimizer preferring the sequence can write and that could be due to these settings because you are telling that the random page cost is more and when you do that when the planner prepares the execution plan because it's a cost-based optimizer the cost could be higher considering random page cost equal to four but when you have a faster disk nowadays SSDs are default right you may be able to go ahead and set random page cost right equal to the sequence based cost in that case the optimizer will now know okay I could rather Gordon prefer index can wall compression right this is one more important parameter I think starting from Postgres 9.5 if I remember correctly so this is again one more frequently ignored parameter default is off right what it does is we need to be very careful every time a checkpoint happens right every time it checkpoint happens right after the checkpoint the first change to a page or a database block will always make a full page right to a wall right and the subsequent changes only captures the wall records anyways so those full page images that are written to wall are compressed right and basically that would only happen when you put it to turn it to on it of course you know is that an expense of the CPU but it is helpful when you have too many transaction walls or logs being generated in Postgres so for example in one of our workload where we generated three three five nine three point three terabytes of walls it decreased 1.9 terabyte when we set it to on right so it could be improving the performance when you are able to take a hit on the CPU right I mean which is not huge but even replication it basically has to compress that and apply so considering all those Serkan I mean still wall compression turning it to on is most of the times of win checkpoint time mode very very important parameter that I would like to explain some time back we talked about the full page being written to a wall after a checkpoint for its first change right so considering the default checkpoint of five minutes what it means that is every five minutes the checkpoint a process would go ahead and perform a lot of rights and you know it but that frequent checkpointing could degrade the performance due to more rights that can be possible right when compared to the checkpoint timeout value being set to 15 minutes or more right I mean these fifteen minutes or thirty minutes could be some magic numbers just to show you but you may be able to increase the checkpoint timeout value to a bigger number right because that's how you are reducing the number of full page to the disk I mean to the to the waltz right and so the amount of work that needs to be performed even by the checkpoint at a later point of time is pretty less so when you kind of configure or leave it at the default every five minutes the checkpoint needs to you know start writing and that could cause performance impact so but when you increase the checkpoint timeout to a bigger number like fifteen minutes or thirty or one hour the crash recovery time the after crash recovery time could be more right because it has got a lot of work to do now instead of doing the work applying all the transactions in the past five minutes now it has to start looking at everything since the past fifteen minutes or one hour that's more faster I mean you could just test that in your environment or even before shutdown because there is a shutdown checkpoint that's issued right you could perform a manual checkpoint and issue a shutdown I mean you could love all the checkpoint activities and see how frequently checkpoints are being performed right and the last ignore parameter would be checkpoint completion target the default is set to zero point five right so it is basically helpful for spreading our checkpoints so consider checkpoint completion target as 0.5 and checkpoint time-out set to 30 minutes so 0.5 into 30 minutes which is 15 minutes so your checkpoint activity can be spread across 15 minutes instead you could even set it to 0.8 of 0.9 and spread it across 27 minutes so that way your rights are evenly distributed whatever is being performed by checkpoint and it has got much more relaxed time to do it has got a bigger window to apply the checkpoint right so this could be a helpful parameter to and the last and the final one max wall size it's not just the checkpoint timeout that triggers a checkpoint right even if in case max wall size amount of walls are reached which defaults to 1gb a check point could still be triggered so if you just go ahead and increase your check point I'm out to one hour or 30 minutes and still see checkpoints happening every five minutes or ten minutes look at the max wall size you might have to increase that to 5 DX or 10 gigs because so much amount of walls could be generated in your database so checkpoint is one of the important activity that you need to tune in your production database because it looks simple but it has got a lot of benefit when you really tune it right so make sure that you have enough disk space and check the checkpoint timeout and log the checkpoint activity and accordingly even look at the amount of walls being generated you have some functions available in Postgres where you could do PG wall def and LS and diff and see how what is the beginning wall and the ending wall between a one-hour window you know you could you could basically use some functions to understand how much of wall is being generated right in an hour so it would definitely be helpful if you consider this parameter as well so one of the use case where we really perform the benchmark was using a suspect it's a multi-threaded benchmark tool there it's a three hundred and thirty five gigs database size on a Doug Xeon CPU and fifty six threads 256 gigs of ram almost everything in a memory right active data set can quickly sit in the memory and you have a nice SSD storage on and a bunch of sixteen oil TP type works load so just to speed up the data load as I am perfectly aware because I told you earlier like when you're turning off for a vacuum you need to rewrite or hack the algorithm or you need to know I mean if it can really cause a problem but in my case I really do not know that it's not causing any problem I disabled it but right after that I did perform a vacuum DB in parallel threats and I was also keeping a track of the transaction IDs being generated so Ike was able to do that so just to speed up the load I have said these parameters and then I've gone ahead and run this especially BTCC test right and so just the default and tune parameters was resulting in to 1978 TPS and tune the parameters the tuning settings that we discussed just now it resulted like five thousand seven thirty sixty at TPS it was in a post press eleven not on twelve which is still in bit our big released and crash and safe like disabling synchronous commit F sync which people should not do right it's not recommended it it got me more TPS right so yeah that could be helpful when you're doing a data load and do you want to your honor you understand that you monitor your transaction IDs it's okay but so you know yeah just a better representation of the numbers through diagrams or the bar diagram so these are all those tune parameters basically most of the things that we discussed today right that helped reaching those numbers but though I wouldn't say these are just the only parameters but these are some of the parameters that you could consider getting a good benefit immediately if they are not you and right exactly so in the beginning slide I was talking about shared buffers right where we kind of consider the magic numbers like 25% of RAM or 8 gigs but that may not be in all the cases when your entire database or the active data set can sit in memory right you could definitely increase that right I mean there was one more beautiful blog or the blog as well as the presentation done by Robert has explaining how he got much more better performance when he when he went up to like 90 percent of RAM for the shared buffers so yeah so it all depends on your workload and server resources that you have and the database size and the active data set so for my environment I was able to understand that the hard data the active data set can all fit in memory and it's or it's almost around 190 gigs so I set all that to shared buffers instead of copying that from OS to memory so that's what even Amazon or order that's like they allocate 75% of RAM to share okay so here the auto vacuum vacuum scale factor is set to 0.4 but I had my manual workings running in the backend like through table level vacuumed settings so that is what I've done so there were 15 or 18 tables in the backend that I've created and I've got proportionally like a separate cron script that is performing a different vacuum logic than what it is with Postgres because manual vacuuming is more aggressive and at the same time the auto vacuum vacuum the way the database get into the list and Postgres performs a vacuum on it for my example because I was talking about how a table with less records can get more priority for vacuuming right so that's why I had to write a manual vacuum logic for it and I increase the global settings so sometimes you could even put the thresholds to 1 billion and two billion or ten billion records so that the auto vacuum vacuum will never start on that table and I can you know handle that manually right so these are the some settings that we can play with but carefully which is not recommended yeah this is just for this test because I've done some other magic in the back end yeah any other questions which is my yeah before that I would like to thank Percona who has sponsored me for this conference and Postgres open Florida forgive me the opportunity to present here right we also have percona life 2019 are taking place in Amsterdam Netherlands you know definitely an interesting conference which is for not just the Postgres MySQL MongoDB almost all the open-source databases and yeah you could use my discount code to get a 20% discount and have fun in Amsterdam questions yeah have you on time yeah I think yes two minutes ahead yeah so in case somebody would like to know about transaction ID wraparound and how that works feel free to reach out to me but or you could just search for MVCC internals avinash and there's one of the talks that I've given in PG count for India and that was the talks that was clearly talking about transaction ID wraparound and explaining most of the internals that's a fifteen-minute stock you could see the slides and everything there yeah thank you [Applause]
Info
Channel: Postgres Open
Views: 1,687
Rating: 5 out of 5
Keywords:
Id: lJ18c1hGRBM
Channel Id: undefined
Length: 44min 24sec (2664 seconds)
Published: Fri Sep 20 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.