Divide and Rule Partitioning in PostgreSQL 11 by Rajini

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

okay good afternoon everyone is everyone awake sleepy okay so please don't sleep on it can be boring but it can be interesting if you'll focus so hi my name is reg rebellion I'm working with Fujitsu today I'm going to talk about partitioning in post verse 11 it's kind of mixed not only dedicated to post castle Evan but I have some mix of PostGIS 10 and 11 and some comparison so that you can understand what is what is different in 11 from 10 so let's begin this is me I think my face is doesn't look like that it's being taken like six months back so some differences there but yeah at the moment I am working with Fujitsu as a database Support Manager I started my career around nine years ago as a database administrator specifically in Oracle when I got my opportunity to work first time with Postgres to be very frank I was not having any idea about his Postgres and by then I was working for like five years in Oracle but I was not having any idea of Postgres so I started learning post base in 2013 in my first job and I really find it similar to Oracle was pretty easy for me to get this new database because it was really a good database by then so it's you can see it's actually improving day by day ear by ear released by release and it's I'm just loving it so I'm also there in the community I was a speaker in 20s 2000 17 and 18 in Melbourne so which is under post quest sequel down under conference it's same like epic I think some he is also a member of it and he do have many active members Andreea forms from second quadrant is also part of the same community in Australia I was a speaker in a pact last year Singapore and this year I am the committee president so I am the president of PG sequel down under committee in Australia these are my contact details if you like so let's start so what is partitioning is anyone here don't know what is partitioning okay everyone knows what is partitioning right okay that's good we do have some touches of this partitioning here and there yesterday in many talks so stiffened mention about partitioning improvements in 11 and there were some other talks so I will not waste much time what partitioning is but you guys really know what partitioning is like it's dividing your parent table into the smaller smaller table so you can say partitions and if you have a database with this kind it's better to partition like that yeah this is a simple what in the simple word I can say is partitioning is like partitioning of dividing your data into s for your requirement into more relevant ranges or lists or as per your hash so today we are going to discuss about the partitioning will touch what is partitioning partitioning exceptions when not to partition when to partition what was the limitation of partitioning in 10 version what are the advantages and improvements in 11 and then we will touch some of the improvements in 11 with some still some limitation and then what is what are we expecting in it in 12 so what are we expecting in 2 n is actually copy/paste from Stephens presentation so don't blame me for that I'm okay partitioning postgis sequel as we already know that initially it was an inheritance based from prior to version 10 but from version 10 onwards it is declarative partitioning which means that there is no there is no need to maintain the tray and there's no need to write rules for your data to come and enter into the right partition so it can be done from the parent label and it's selective it's very easy to use and it's a convenience so in PG 10 Post PJs Polaroid so in PG 10 we were having a list partitioning and rich range partitioning we have actually added hash partitioning in version 11 yep so this is one of the change that but of the addition and new version it's not new version I should say because as Steven mentioned about the PG 12 already it's going to be in beta in April right in April April May so it's not new but it's good to know if you want to use partitioning in 12 so you should know the you know the differences in all the partitioning over the period of time and all the versions so why use partitioning so usually when you use partitions your queries are more faster so the speed the flexibility and these are cheaper these are the three I should say three main reasons to partition your data or your table so queries will access only the relevant partitions so your queries will not actually touch each and every partition in a table it'll just touch the partition which where the data is so in that way it's faster so reporting queries X is most of your most of all your data in the entire partition so if you know that your data is partitioned properly you will actually go and hit the hole hit only one partition so that's that's against again it's a fast it's again set up on the database level so it's transparent your application so your application needs to just fire a query and the database base will return the data according to the according to the partition so it's transparent to your application you don't have to do anything at the application level it's flexible easy to main you can attach you can D attach the partitions you can create the partitions so it's very flexible you can have you can archive the partitions by D attaching them you can archive the partitions and in that way it's flexible to take the backups if you want to take a backup of a partition which you are not using very frequently you can take it separately so that's what it's flexible in that way it's cheaper because there is no need to maintain the indexes on your partition so it's it's cheaper when used properly so yeah you can use actually you can keep your one partition on fast san storage you can keep another partition which is not which you are not using on the slower one so in that way it's cheaper you can actually switch your storage but it is not one fit for everything so you have to actually think before using partitioning whether it's a requirement for you and all divide and rule always not work in a positive prediction so you have to think whether what if any when you are trying to divide is it actually going to rule or is it actually going to ruin your system performance so when to partition so we had some suggestions last last days well when to partition definitely when your table size is very big how big we had the discussion yesterday but PostgreSQL documentation says the rule of thumb is if your table size is growing more than the physical memory of server it's when you need to decide to go for partitioning right so it's not one fit solution for each requirement you really need to understand when to partition another thing is when you when you know your data data archiving is a requirement when you want to keep the data for like 20 years but you are not accessing that data on the daily basis that's when your archiving is in is important so you can actually archive that historical data without even deleting it from your main table right so that's when your archiving is required you can think of partitioning so if you can see in the example I have a sales table I have three partition based on the ear 20 18 19 and 20 20 and I no longer need 20 17 data I'm not reporting I am not accessing 20-17 data but I don't want to delete that that data even right so I can actually make it as a make this as a archive one put it on the table space which is on a slow storage right and rest everything you can keep it on the tape or save on the table space which is on fast storage so that way you can actually maintain your data and achieve the performance this is again the same thing if you need a better IO you can actually keep your partitions on the separate table space depending upon your business requirement how frequent you want to access the data there are some partitioning exceptions that you should also keep in mind while deciding when to partition or not so if you have any of such requirement which is there in the exception partitioning is not your choice right like if you want to truncate if you want to use a function if you want to use truncate only on the potential table you can't do that because truncate only on the partition table partition table is the parent table right and the partitions are the child so on the partition table you can use truncate only because the partition table is having child's underneath right so there's a dependency so only to add or drop a constraint you can't use only to drop or add any constraint on your partition table because now it's a hierarchy so you have to think about all these exceptions your column should be same in part table and partitions so columns should be same if there's a requirement to have a different columns on the chair child table partitioning will not fulfill your requirement so you have to be aware of all the exceptions before actually thinking of partitioning another thing is before you actually think of a partitioning you should also know be aware that you should know on what partitioning key you are going to partition your data so you can take advantage of partitioning if you are using your partition key in where clause if you are not using your partition key in where clause you are not going to get any benefit right so these are the few things that you should keep in mind before even deciding to go for partitioning let's see how it is and these are very important things right that's why I have put them in my slide because see partitioning is very cool we can do that we can divide that but at some point time you should not think I have done a very big mistake it's a blunder so you should have all these things I believe before going for anything rather than having positive out effects you should have negative of its first what could be the side effects what could be the you know impact of if I choose this so if you have those exceptions those constraint in mind you can actually take a good decision how to partition your data when not to partition so this is again related to the same thing we note the table size is very small you can't go for partitioning like we are having a talk yesterday you can't go for partitioning for one gig of table right so you should know how your data is growing if you can say okay today my data is five gig but two years down the line I'm going to be a 5 TB you can plan in advance so you should know the growth of your data you should know what is your current requirement and what is your future need so according to that you have to take a decision you are not facing any performance issue your queries are performing super fast giving you results in milliseconds right you don't unnecessarily complicate the things by partitioning because it's cool I can say okay my organization is using partitioning the thousand partitions in place no it's nothing to feel proud about it's how well you you actually think about using any functionality in any database in any technology so groundwork should always be there your application is mostly read/write intensive so you are actually reading and writing the data every day basis so that's where you need to think so are if you want to have a if there is a need to archive your data keep it like don't delete it from the system you might need it maybe after two years so that way you you need to think that your application is right in the same right intensive write intensive it's writing writing writing then you can actually divide your order according to your requirement so your when archiving of your data is the requirement you can actually think is not the requirement you can actually think of not to partitioning not to use partitioning so we know that we started partitioning from version 5 9 5 click inherit inheritance 9 5 9 6 or from mine 6 is from 10 yeah yeah so inheritance was there based on the rules based on the triggers you can actually do difficult to maintain difficult to maintain but from version 10 onwards we have declarative partitioning right but they are still some limitations in version 10 which actually community looks seriously and improved in version 11 so let's see what are the limitations that we had in version 10 so in my example I'm taking a sales table you cannot actually create a primary key foreign key and indexes on a partition table I am talking about partition table not about a table there is a difference right you cannot you can have partition partitions can have index but your partition table the index on the partition table which is on the top level cannot be created if you have partitions and if you are creating and that index on the partition table will not be applicable on partitions as well so this this is missing in version 10 you can't have primary key foreign key indexes on your partition table once your table is partition you can have it on individual on on on single table if it is not partition that's that's not the point here there is no concept of hash keys you can't have hash partitioning in version 10 there is no concept of default partition which we were discussing yesterday so if you are trying to insert a data which is actually falling beyond the bounds that you have defined for the partition in version 10 you will get an error but in version 11 if you have defined a default partition that data will go to the default default partition so it's missing in version 10 again there is no concept of partition pruning so when you do a select on the on the partition table which it will actually scan through all the partitions right in version 10 automatically there is no partition pruning in version 10 right it's there yeah so the point here is there is nothing like partition pruning enable intend which is there in 11 by default yeah but not the way it's working in 11 it's different in 11 so that's that's the whole point so here what we are talking about the simplicity you know so it's different in 11 you can achieve you can achieve everything here in version 10 one way or the other right but it depends how simple is it it depends how are you going to achieve it so this is the same thing that I have just mentioned in the diagram so you will see many repetitions in my presentations because I always refer to the pictorial thing and then I have a text for your reference so this is the same thing when there's no default partitioning no support of hash partitioning in PG @ PG temp so based on this let's see what what are the improvements in version 11 so same table you can actually create primary key foreign key and indexes on the parent table you can have child you can have a different default partition you can create a partition based on the hash key you can actually update the data from you can actually update the data which can go from one partition to another which was not possible in version 10 you can't do a update on a data which is in one partition you will get an error so this is possible now with version 11 and again same thing with the with the partition pruning you can actually your planner will actually go and hit the write partition where the data is so again the same thing so in general now the partitions have the same capabilities like ordinary table where you can create an index where you can create a primary key foreign key and then you can do the update and in sudden all sort of things so now I'm going through the all these things and all these improvements in detail for the sake of simplicity I have taken the example of range partitioning so I'm creating a table parent table and I am creating few two partitions here inserting something and here now on you will see some differences in version 10 and 11 so you need to pay attention so you can have there's a concept of sub partitioning as well you can actually create a partition and then create a create a partial partition table the child or the partitions and then you can create a separate table and attach it as a child of your already partition table right so that's called sub partitioning so it's it's it's possible in Postgres in that way so you can have an inheritance partitioning like this you have one table then you create separate table this this EMP one ones usually is a separate entirely separate table which is partition and I have attached this this table as a party as a child to already partition table so this is called your sub partitioning yep so if you can see in the example in PG 11 if you query your partition table you can see that this EMP 1 1 0 is further partitioned right but same thing is not not here so now the view is more friendly so it's the same thing here as well I have actually created this table as a sub partition but I cannot see when I'm when I'm seeing the describing the table ID I really can't figure out which table is for the partition but in you can actually see that that table is for the partition this is it's very minor but it's very you know useful friendly it's the same thing if you engage it and you can create an index on your partition table but in eleven you can actually create an index it will be applicable for your all current partitions and will be applicable for your future partitions as well so whenever you are creating an index on a parent table that index will be applicable for your partitions your child as well and when in future you will be creating new child you will be adding say twenty twenty one partition right so that index of the partition the partition table will be applicable to that twenty twenty one partition as well so as you can see in version 10 you can't create an index on your parent table will give you an error but in 11 you can actually create an index on your parent table so then this is a concept of your global indexes and local indexes yeah yeah and this will be applicable for future partitions as well yeah so I will be attaching automatically to the future partitions the index on your parent so if you're using index extensively this this is very good for your performance foreign keys absolutely so foreign key support now in PG eleven you have a foreign key support so you're actually what you can do if you can see the example I have created this stable department and now I am creating a table called department head and referring the department head as a and giving reference to this table and this is um this is my partition table but if you have a partition table and try to create a new table giving reference to it is not possible so you can create a partition table with the foreign key reference to another table but you can't create a new table with the foreign key reference to the partition table so the vice versa is not possible is it going to be better problem okay it's not my fault it's a default partition this is very handy yet come with with some limitations as we were as we were discussing yesterday but still very handy it won't actually just draw your data away it will accept your daughter irrespective whether you have created a particular partition for that other or not so if you can see I have created and you can see the statement here so you can create you can create your table like I'm creating EMB default partitions of table called EMP and you don't have to define any limit here if you can see I'm not defining any limit to create this default partition I'm just saying default right so this is how you create a partition and you are trying to insert a data into a table where there is no defined range if you can see that well you won't fit in any partition right so where that value will go this insert this value will go into the default partition actually right so this is how it is and when you query the partition table you can see this is default so this is how you can identify this is this is my default partition this is an example when you in eleven when you in ten when you try to insert a data which is goal which is going beyond the bounds of your partition it will actually give you okay well something okay no so yeah I'll take my words but so what I'm trying to do here is I'm actually I have inserted a row into my default partition and now I'm going to create a partition which is actually having that value which I have already inserted in my default partition then if you try to do that you will actually get get this error violation error right so the right way to do right way to move that inserted row from default partition to the proper partition is you need to first detach your default partition from the partition table you have to create a table you have to basically create a partition which is which is which is having the bounds that you want to define for the value which you have entered in default partition you have to move the data from the default partition to this partition and then you attach this partition to the partition table and then you can reattach the default partition to the partition table so there are few steps that we need to actually follow when you are moving the data from default partition to the partition that you have created according to your requirements and I'm not sure we are going to we are going to solve this in version 12 or not but you haven't defined any bounds to the default partition right you have to define the bounds mean image yeah yeah yeah that's that's [Music] yeah so here it's only one row right we know what what the data is so as yeah as you mentioned if we having like hundred odd rows 100 or odd rows in this default partition and we really don't want to move or we can't rename that partition because that falls in the different bounds so you have to actually come up with a strategy where you have to define the bounds you move the data from default partition to the proper partitions but still it's it's - yeah yeah but at least something is better than nothing we have something it's not throwing back here data its accepting your data at first place and I think you can you can have some triggers of those sort of things to do this yeah yeah that's manual thing is always there but I believe personally as a user I think it should be improved in the further version so that this can be simplified yeah yep at the moment it's not yeah it doesn't it doesn't yeah might be in future versions it should do that it should be smart enough to smarter way yeah I just think about you have thousand odd rows in your default partition it will be really a difficult task to you know move d-rose to different partition and it's it's TJ TJ just asked yeah yeah so this is again you can attach so this is the whole process that we have discussed just now directly yeah once you attach the partition which is actually falling into those yeah yep agree yeah that'll be good same thing so this is a new addition in version 11 which is hash partitioning so what this hash partitioning actually does it if you are not sure that how you are going to divide your data you can still have partitioning in place using hash partitioning so what hash partitioning does it it actually separates your data randomly in a random way so it has a partitions and more or the less partitions are having similar amount of equal amount of data into the partitions so this is fairly useful when you want to actually divide the data but you don't know how to divide it so you can make use of hash partitioning in those scenarios so this partitioning hash partitioning is basically based on your modules and the remainders so if you can see in this I have a table hash I have a table and I have actually partition it based on the modules and the remainders so the modules is actual number of partitions that you are going to do and your remainders should not be less than your modules so this is how you need to keep in mind when you are using hash partitioning so yeah this is the syntax to partition so you just it's same as the range in list you just need to say partition by hash and then you need to you will be able to do the partitioning you have to define your modules and the remainders the key thing here to notice your module Clause should always be greater than your remainder as I mentioned and your module is like your number of partitions that you are going to do if you have modules less than your remainder when you'll try to insert the data you will actually get an error so that thing you should mention you should keep in mind how many partitions actually you want and how you want to divide your data so there are some advantages the main advantages when you really don't know how to divide your data you can actually utilize helsh partitioning it's not screwing your your data into the one partition making your one partition bigger it's actually kind of equally dividing your data into number of partitions available at least this is what it says right yeah yeah so yeah this is some this is you can say some and some advantages over your range partitioning but this has been added in version 11 which was not there in version 10 of Postgres so roam aggression on upgrade as I mentioned in mind that that that visual that when you try to migrate some data which is falling into one partition to the another partition somehow your requirement changed and you want to move update or move that row through the different partition you are not able to do it in version 10 but now you can actually do that in version 11 you can move your data across the partitions so if you can see in an example I am actually updating this record where you're joining date is this so I'm actually changing the employee ID so in version 10 this will give me an error but in version 11 actually I can update the record so this is a improvement in version 11 in as compared to version 10 partition cloning as we are discussing so it's enabled by default in version 11 so what partition pruning does it it actually performs it actually it actually enables your planner to look for the particular partition which is s for your partition key so this improves your performance it will not go through each and every partition it will just go through the exact partition its I should not say it's it's some something new in eleven it's something simple in 11 and this this is a parameter so this is an example when partition when I have enabled partition truly in version 11 if you can see my key I'm looking for 2018 and it is actually going for some partitions right but if I'm in a in a if I disable this for the same thing it is actually going for 2017 partition as well 2016 partition as well so this is I have done on version 11 when you enable and disable the partition pruning so this is a difference you can say okay so this is what we have in I think still there might be something I might be missing here but these are the the you can say the major changes or improvements in version 11 as compared to version 10 as far as partitioning is concerned and for the sake of simplicity I have just put some some useful commands here if you need to know how your rows are distributed across the partitions you can actually use this query if you know which partition contains the row you don't know how the partitions how you have partition but you want to know your the in which partition your record is you can actually use this query to go and see which partition your record is containing and it might be useful when you want to update something right so you know you you need to know where where it is and where where it is falling so these are the few commands that you can use these are the few they couldn't they can be many so I have just listed down what I have came across while doing this preparing for this presentation so as we are discussing there are still limitations in PG 211 and I am Not sure I'm not the right person to say whether going to be covered in version 12 or not but yeah there are few things we should be looking for like foreign key thing it should be vice versa as well referring to your parent for partition table and the and the table default partitioning it's good to have but need to destroy trash so we need to improve it in the future releases so that we can efficiently use utilize the default partitions what's next in PG 12 is actually copy/paste from Stephens slide I have no I have I have added some you know okay so it like still more to be here in the future it's my thing so I haven't you do this so I think how many partition you should have is actually again something you should look into whether having thousands of partition will improve your performance or will decrease your degrade your performance right so what I can actually think of moving onwards we should have we should not have a there's no limitation though but we should have not a constant or number of partition compromising on the performance so what I prefer Lee I personally want community to look into is as we grow we should not compromise with the performance so maybe thousands of partition with better performance is what we should look for inside performance should be improved we yesterday saw deep we can we are expecting the attach and attach in version 11 concurrently might insert during the copy is also what stiffened mentioned PG partition tree function is also what we are expecting in version 12 to be there so that we can have a better view of a partition nevertheless we have so this is something which is better than the declarative the inheritance partition that we were having in past and as Postgres is growing the features are getting mature and I'm sure by version 13 and 14 partitioning will be matured in Boswell's and should be utilized no efficiently and automatically many things should be there I have just compiled as we were discussing about PG partment yesterday so I have something you might need to correct me sniffing so this is something I have actually googled yeah and as per my experience as well what is there in nine six ten eleven and PG partment so PG partment is a good tool which most of the companies were using where the requirement was there for partitioning but I think PG eleven is fitting into the gaps now as far as you know partitioning is concerned so I'm hoping like in future version twelve and thirteen will have everything tick-tick-tick against yeah so yeah I go and see what PG Patman does what natively supported in Postgres have some comparison because definitely nothing is better than native support it's in the engine you don't have to hook in anything you don't have to think about any further complexity of using any further tool maintaining its version you know all those things if it's native it's best it's in the engine so go and have a look of what your apartment gives you what PG eleven is if you have any questions anything you can always connect me so am i running out of time all I think final minutes she has yeah yeah oh okay doesn't use it okay yeah that's where there is a star the little star there [Laughter] yeah yeah so it's really I felt it difficult to put it that's why wherever I was like half confident I put star and the tick so it is it's being done maybe in a different way or maybe like updated in the new version that I haven't played around with so nothing is available as such on the internet anywhere where there is a version by a version comparison feature by feature feature comparison in ten and eleven partment so you might just need to update it as per your experience what you have gone through what you are using what you will be using in the future so that's a beauty of any tool around pH Postgres that they do update it according to Li the server version according to the server you know the features and enhancements so that's why the tools are up to date but I personally am personally a native kind of person who likes to have everything in some greedy you can say yeah so you can have yeah I think you can have partitions on the fly using PG part Munroe you can script it to create it automatically as as per your define thing yeah so that's that's the main advantage of peach apartment any question anyone

Info

Channel: PGConf APAC

Views: 1,203

Rating: 5 out of 5

Keywords: Pgconf apac, Postgres confernce singapore, postgres, postgresql, database, open source database, RDBMS, DBMS

Id: yl_YkPafcfs

Channel Id: undefined

Length: 45min 49sec (2749 seconds)

Published: Sun Apr 07 2019