They Enabled Postgres Partitioning and their Backend fell apart

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
if you're using postgres partitioning you might want to watch out for this one you see guys postgres partitioning is this capability that allows you to horizontally slice your large table and effectively put them into multiple partitions or just literally multiple tables the the difference between creating your own tables and partitioning is just it is still presented to you as one logical table and it's hiding the the fact that it's actually multiple tables and the beauty of this is when you do a partition key you can specify based on your where Clause that key and postgres will immediately know exactly which partition to hit so it doesn't really scan all the partitions just scan one or two partition depends on what you're trying to do so instead of working with massive table with billions of rows you work with a smaller tables with maybe millions of rows like a smaller tens of thousands and the smaller the table the smaller the index is the faster the queries that's that's the rule we talked about always I talk about it in my course right my database course the best way to work with a billion raw table is to avoid working with a billion rows you take as much as possible you do absolutely everything you can to avoid touching billion rows you work with smaller thing and that's basically that's the idea of indexing really right we take shortcuts like okay the skip this branches skip this Branch so we can directly hit what we want but partitioning has a price and this is something I never knew before again this is only postgres and Kyle Haley is a really experienced dbas been working in this industry for over maybe 20 years since 1984. so he has like 13 years of experience in Oracle and other new work with the other companies you know this is a new startup they're using postgres and he's been blogging about the some of the Pains of postgres I know I took a lot of good thing about postgres in this channel but nothing is nothing is without you know limitations and it's good to actually understand these limitations and see if there are solutions for them and the good news is there is most of these I would say bugs if you will issues not 30 bucks has been fixed and addressed in your version of postgres how about we actually go into this article and discuss it in detail this is fascinating stuff I love this I absolutely love this stuff let's jump into it alright this comes from Kyle Haley's blog uh titled postgres partition pains lock manager weights I'm not gonna read the whole article I'm gonna read some blurbs and and kind of add my commentary to the sort of I'm gonna reference this article in the in the show notes below of course this episode is available on both in podcast form and on YouTube so you can listen to it if you prefer different if you prefer the podcast form so let's get started so the blog starts with showing some charts about a degradation of query performance application retries and failures and essentially weights and errors and usually when you when you see that you freak out right there's like a failures and and weights and if queries are are waiting and and essentially causing other queries to avoids and failures at the application Level debugging and logging that's when when basically things are going wrong at the database so it's certain time chart that at the start at this particular point and is adding a very nice summary of what happened so let's read that summary in the above charts our production system hit the wall by a blind siding pile up on a lock manager weight he doesn't explain what a lock manager is neither did he know what a lock manager was when he when he encountered that neither did I to be honest so I don't know what a lock manager is right and causing the application I'm gonna we're gonna explain all of that causing the application to start hitting thousand errors a second that's just nuts right I managed to mitigate the pile up by detaching seven partitions for the core table so now he's explaining that I was using partitions right but there's like a story to all of that so the partitioning is already there so he solved that by detaching partitioning but what is the problem right it goes into you know painstakingly good details I absolutely love I just recently discovered them to be honest and I absolutely love the way he goes into the details so now he's saying he's like look at this like picture this you're an experienced Gray beard database administrator walking into one of the most buzzed about an Innovative startup your entry ticket your depth experience so he is really experienced he has like what over 20 years maybe more you know in database you know administrations so people rely on that for the database and stuff right and I like how humble he is here you know and I absolutely like this it's not she's not afraid to call himself out for not getting this right so now they discovered the database like this startup is using postgres and we're talking eight terabyte with this database is eight terabyte in size with 10 000 queries per second is described in the workload and 10 million new records added daily each receiving an average of three updates you know so 10 million rows and each one receiving an update so you're talking 30 million tuples effectively right because that updates in postgres is actually an insert right many things managing this is not small feed and this is where the experience comes in I want to jump into where he actually talked about this is the orders table this is the massive table where things go in and there was a problem with this large order table this is pre-partitioning right so this table became so large that vacuuming started to stall you know query is getting started to slow down and as a result as a you know naturally you partition the stable into smaller and smaller and smaller small table now you can make strategy what do you partition with right and one the strategy that he initially picked was to Partition by day so by date the order date but to Partition by day so every table every day you get a new tables that's what it is because hey it's a 10 million rows it's like 10 million a table that's that's a nice right I would have done the same thing to be honest like okay 10 minute table let's do that all right so he started here's another complete blog I'm gonna reference it as well he's referencing it to say hey this is where we did the work we moved from a single table to a partitioned by day right so every day there is a new table added so think about how many tables this instance has now and instead of one big table now we have smaller and smaller table but a lot of them right and there is a partitioning so still looking at the orders table but behind the scenes it's it's essentially um it's essentially partitioned right cool right so they did this work everything was happy everything was good they so great performance you know so they did the conversion April 1st 2023. this is where the April 1st they did it everything was good he didn't pay my attention they he sold like a little Spike up and it was so tiny it didn't really zoom in just to take a look at May 6 to be exact 2023 right the database begin to experience severe lock manager weights that's when the client started feeling that's why the user started to feel it you only you I mean yeah you can have all the chart and monitoring you have right but oh the real real production is like you only really take a look really deep look when something is affecting the users so when I click a button or I submit something and the backend application struggles to process this because of another backing request to the data store in this case postgres then you start your notice okay let's take a look at the neighbors and there it is May 6 lock manager wait like what is lock manager it's marked a sudden change from a smooth operation we've seen since the part partition conversion on April 1st the primary function of the database the work EQ operation started to stall so he's showing a graph here for the log manager where it's buy sessions and you can see that there are weights started to pile up and especially in May 6 that weight started to really show there was something but it's barely noticeable was very noticeable right so and then he's like comparing it with the application errors so it's like matching up that's how we that's how you do basically forensics now he's doing DBA forensics you don't see really much details until you do log scale right so logos skills basically scales your chart so you can see everything and there it is when did this happen the real thing that started to happen is actually on April 11 they started to see these weights right it's not as much as May 6 but April 11 is where it started so a mere 10 days remember what happened April first they started partitioning every day they add a new table remember right every table is a 10 million row table that's the average here right so so let's read this part that's what I'm interested in when we examine the log manager wait on a logarithmic scale it becomes apparent that these weights actually commence on April 11th a mere 10 days after the partition conversion this suggests that once we had 12 partitions including all data in in one of the partition so total of 12 and then a partition for each day the database began to struggle with managing the volume of lock requests now we're going to need to talk about lock request here with each partition having 22 indexes so this this is the first time I think he mentions how many indexes right or I mentioned maybe it did so the table has 22 indexes that's a lot of indexes but they are necessary of course for for the business but when you partition you don't have just normal 22 indexes anymore you have 22 index per partition because it's just another table right so you still need these indexes on every single partition so now remember let's talk about locks a little bit when you do end a select on a table on postgres at least or insert or update you acquire essentially two types of logs something called the table lock and something called the roll locks and the table lock or the relation lock when you do a select you do something called we acquire something called access share access share lock it's a very weak lock it doesn't do anything it just acquires it such that to till the lock manager such that to tell the who the database say hey someone is selecting something from this table it doesn't care what right just say this table is being selected by this process and if you know postgres each connection made to postgres creates a new back end process so it's like a one-to-one thing and that's a whole discussion uh in in uh in a neat recent thread to for postgres to move from processes to threads that's a whole different thing you know discussion so it's a heavyweight operation to process versus thread so we have processes so every connection Creator process so and every process if it queries anything any table touches any table in a transaction okay it it create it gets this table lock right so if you select you get this access share lock it's a very simple log why because we want to know that someone is selecting to block other things such as ddls you can do a ddl if you're selecting we don't let you because those are conflicting logs I talked about that in another video I'm going to reference it there here to learn if you want to learn more about it but there's you need to have these conflicting locks right so there are certain table locks that conflict with each other some are they do not conflict right so while you select you can still insert you can still update that's fine and we also acquire raw locks that I'm not going to talk about for update logs when you do an insert or you do an update we do an F4 update log at the row level but here we're talking only about the table level locks and also index blocks so when we do that we acquire let's do just the select right select across that table log if you do an insert update or delete we acquire I think called raw exclusive lock that's while it's confusing that's actually a table log right raw exclusive log is actually a table Block it's just called Draw exclusive right but that is another type of Lock and it's a table log so that's that has to be an entry that says hey I am acquiring this lock I am acquiring this lock can you imagine the cost of this locking so you need a lock manager to manage all of this thing all of the stories are stored in a table called pulse the postgres lock stable and there's a post the lock manager manage all this stuff you can see that it's not cheap to manage many many logs so let's go back to the article C here each partition having 22 indexes querying the order still just just squaring forget about editing just query because querying is a select select is an access share lock right so just querying you get 20 220 locks per query why how do they get that because you have 22 indexes right and for each partition uh each partition has 22 index not only you get a lock on the table itself you also get a lock on the indexes as well access share lock on the indexes yes because an index technically is also a relation right in postgres everything is a relation so you have 10 times 22 indexes I think he missed to include also the the partition itself so technically it should be 22 plus 1 so 23 times 10 right really that's that's how it should be because you want to also include the table itself the partition itself so it's actually 230 and that's why you say 2 220 plus right and it's thin partition because every day they are adding a new Partition that's there's a can you do it like automatically it has to be a script that doesn't right to automatically add a partition so fast forward again they didn't notice that but again when you get a lot of queries you get all of this stuff you might say I was saying we're querying okay sure we're clearing the order table if you query the entire table you get 220 230 logs right because you're scanning all the partition but what if I select one row here is where a performance degradation and postgres version 13 which is what they are on exist if you are querying even a single order on a specific date let's say I want to pick give me the order on this day that that's it just give me that order on the Day postcard is still unfortunately creates a lock which is the table lock access share lock on all the partitions yes this is really bad that just exacerbated the problem that's why he mentions that at the end of the block fortunately it's it's something has been fixed in postgres 14. it's not mentioned anywhere that I can find I actually had to uh I have to verify it myself in the stop boss is 13 and compared to Tobuscus 14 and there it is I query one partition all the partitions are are gets this lock so what's the problem with this so many locks creates contention so fast forward 40 days later and the lock manager Waits had become a significant part of the total weight at this part at this point we had 40 partition now times 22 times 22 10 23 if you add then you have what 880 logs per query despite my effort to explicitly limit queries to specific partition ranges the database continued to enforce locks on every partition and that is I wouldn't I don't want to call it a bug I think it was by Design because when you query they still need to look at the other tape partitions postgres 14 they optimize that not to do that that's my thing that's how I think about it I don't know if I call it a bug to be honest maybe you can disagree with me he's sharing the same thought I had here upon initially partitioning the orders table my apprehension centered on the potential issue of increa of the an increasing number of partitions with parse and execution time progressively lengthening I had anticipated a slower gradual increase in latencies I certainly did not foresee the Deb is hitting an Abrupt wall that's a wall actually they have a lot of partitions okay as we scan the Partridge as we figure out which partition to scan we're gonna hit a little bit of a latency but I didn't really know that we're gonna hit it so quick and 40 day 40 days after that and now he's sharing queries like okay how to find out all the logs how to find out all the other stuff like I'm gonna describe part of this knowing what a lock manager is has a Google re a Google search didn't really turn up much stuff but there was something called fast path and I want to talk about that a little bit you know you see I'm gonna summarize this little bit and then we'll skip skipping a little bit fast path locking is locking that doesn't use the lock manager essentially and as a backend process which is you think of it as a connection you get 16 of those and essentially what we use we use like an in-memory data structure like a tiny in-memory data structure a part of the lock manager and there is a hard coded literally a hard-coded I wish I can have it I think I have it here maybe actually do here's the source code for those listening I'm looking at file and postgres proc.h there is a constant called FP underscore log underscore slots underscore pair underscore back end and that constant is 16. so it's it's not configurable at all it's a you have to recompile postgres to change it and so you can get 16 fast locks what does that mean it means once you hit that 16 table relation which is whether indexes or tables you're stuck that's it you're now out of your Slots of the fast path now you have to switch to this slow path I suppose which is using the lock manager right if you use the lock manager then air your best just basically fighting with mutexes you know because I was like how do you guarantee shared memory access right with with this it's like you're contending a lot of processes are contending on these uh shared resource which is the lock manager and that's where problems start to happen if we acquired a lot of logs problems happen and here's your he's showing a query how to actually see if you are in the fast path or or or the not of the fast path essentially and it's it's in the PG lock it's literally at Boolean Fastpass true or false and it's pair process it will tell you that okay this lock has a kawaii has been acquired with with fast pass fast pass what's that isn't that the road thingy foreign Genie the Disney magic Genie right the fast lightning Lane so fast path right or the lock manager essentially so every lock entry you see you can know if it's as required by the fast path or the slow path essentially pair process which is a back-end process and guess what not only that he you're essentially other processes can acquire locks as well right like the vacuum when you do vacuum and you also acquire lock but that's another another account essentially right it is it has its own count per process I don't think uh that counts against the fast path the vacuum Auto vacuum may have a different counter for a fast bet I might be wrong there and then it goes into like okay how how did I find about this it goes through the source code the only solution is to actually read The Source can still learn about this and um so yeah if you have a lot of partitions a lot of tables essentially and and your process is reading many many many tables then be wary of this especially if you have long transactions if you're like if you do a transaction and you commit quick query and commit you release those lock so it's like what do I do he's like what do I do right solution is like okay time to move from daily partitions to weekly because he just learned that hey the daily part having a lot of partition is a bad idea because of this so he moved from daily to weekly so now he has a hundred million rows per partition instead of 10. well let's say 700 770 million seven a week has seven days right and we so he started what what do you want to do look at this let's just start moving stuff so he had problems moving data because he wanted to move seven tables into one and even that created more contention because now he's inserting and creating more logs he's just been locking himself and he'd been on call left and right and so so nothing went right essentially for him at all and also very good important Point here is when he was moving the seven partitions into one weekly uh were inserting right remember we have what we have essentially 22 indexes so what he said okay oh let me drop these indexes so inserts are faster because we have to maintain index management like I don't I don't need indexes I mean it's a new table anyway so let me just drop it by dropping the index he created an axis exclusive lock which essentially created a lock on them on that table I don't think that query be that table was not queried actively there but the fact that it is dropping the index essentially created more problems for him because now locking contention started to happen more locking and more locking when you drop an index essentially what happened is the the full table is is locked as as exclusive right unless you use concurrently which I don't believe he did there for other reasons as well right so it is really interesting when you knew actually into the in the mix of actually doing something right everything goes wrong and this is exactly what happened back to the video oh look at that they are using nginx as a Gateway so it's this is where it says that Gateway errors which is nginx uses the back end the back end returns a Gateway error and the backend talks to the database and that's where the weights happening right so what he started to do is essentially all right let's now okay we moved thanks to the to the Wii to the weekly now let's let's start to detach the partitions another problem happened detach essentially to detach this daily partition because you no longer leave them right you need to remove them and to detach these daily partitions posca is actually a consider this a ddl because you're doing an alter table detach all right and when you do that it's a ddl what does a ddl do acquires an access exclusive this is the most aggressive lock it's not a weak lock like the other types right so we we talked about like weak locks or can be candidates for fast path right only the weak locks by the way I I don't think I mentioned that all right which is a weak locks or things that you acquire an insert update delete and select that's basically it I think also other operations maybe the auto vacuum also there's a way clock I might be wrong there but but access exclusive is the most aggressive one it's like doing a vacuum fall not the vacuum a vacuum full or an A Drop table detached partition is also an axis exclusive so so just by doing that he actually blocks by doing that you blocked essentially selects too you can even select from the from other tables the pattern table gets I think when you detach it you get an access exclusive on the pattern table so the order table became completely unaccessible it's like oh my God that's even just skyrocketed the the weights even more and more so and guess what in postgres 14 there is a fix for this called detach concurrently which acquires as a softer look which allows you to continue to edit essentially and that's and that's basically the problem yeah so this is my tool that I wrote to actually show all the postgres blocks and what conflicts with what right so uh there if you scroll down you will see that uh detach partition there is detached partition and there's this Earth partition concurrently if you click on detach partition it requires an access exclusive log and look what it blocks it blocks command conflicting with alter table detached partition pretty much everything select selects for update inserts you cannot do anything when you do a detached partition but if you do it concurrently you can uh the only you are allowed to do a select you're allowed to do an answer you're allow to other update delete but you can't do like normal vacuums are blocked right re-index is blocked when you do that so so it's likely as softer kind of lock with concanity but he can't do that anyway because he's on posca 13 and postgres 14 added that concurrently feature and again I didn't see that option in the feature Matrix for some reason but I noticed it in the dock so I was like okay concurrently is available here but it's not available in 13. so that's uh pretty neat he included like he's like okay I had to kill some processes to move forward and now now that he moved to weekly he knows he's gonna hit it again in four to six months as the partition more partitions uh as more partitions start to to accumulate one options upgrading to PG-14 where apparent apparently is misspelled apparently if I have a predicate filter limiting to query to one partition then it will only take locks out of that partition uh it will only lock that table when I say lock a table will not like prevent editing or anything it's just because it's that soft weak locks as we talked about it right only on that table like right that's what you what that's what I thought too when I query that partition I only you should only lock only that partition but no voskus 13. and bill before it locks all the partition and that creates more and more locks so there's more management for locking so locking is nasty when it comes to this stuff one thing I think I mentioned him on Twitter uh for this concurrently thing it's not in the blog but essentially you you could have also detached partition concurrently in postgres 14 as well and and that also will allow you to kind of well not kind of stop everything right it will just uh it will allow you to continue editing at least and querying right you won't be dead on the wall so yeah I absolutely loved this a Blog I'm going to share it below for you guys but yeah I learned a lot about this to be honest right postgres partitioning just watch out for it if you're on positive 14 or later you should be fine right it's not as problematic as possible 13 but man this is really interesting and I don't know what other problems might be there but yeah just uh I thought I'll I'll talk about that a little bit see you
Info
Channel: Hussein Nasser
Views: 12,091
Rating: undefined out of 5
Keywords: hussein nasser, backend engineering, postgres
Id: YPorP8BsF_c
Channel Id: undefined
Length: 31min 52sec (1912 seconds)
Published: Fri Jun 23 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.