Daniel Silk: 10 Things I've Learnt About PostGIS

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
thank you very much Martin it's great to have you all here today so yeah obviously I'm going to be running through ten things I've learned about Pocius relatively quickly and there's there is quite a bit of SQL in some of these slides so we'll definitely try and make the slides available later so we can never look back at them it's my clicker working yes cool so just brief introduction post just at lane information New Zealand has been in use for quite a long time in 2004 the geodetic team started using post juice from about 2014 we've had a number of systems which have been developed with cue juice as a front end and for data manipulation and post careers post juice is the back end so it started with the gazetteer which is our place name status hit and has continued through addressing and road data management and more recently building outlines in aerial imagery surveys so the first thing that I've learned about post juice is that spatial indices are your greatest asset so adding a spatial index is relatively straightforward and then updating the query planner so that it makes best use of that spatial index especially in it is effectively a very simple representation of your geometry being stored in your database in order to do rapid analysis so this is to Papa Tongarewa and Wellington New Zealand it's quite a complex shaped building button up a little bit of building next to it but its representation as a bounding box is just a square and so the entire water Wellings from waterfront looks something like this those bounding boxes are then grouped into progressively larger rectangles and so when when doing you know an intersect or something like that the database can very efficiently compare these very simple geometries before it needs to get to a point to actually extract that complex geometry and do the final comparison so the next thing I've learned is that it's fast and it's mostly because of the previous point these this is a representation this is three hundred thousand building outlines that cover the Wellington region if we do something like find the centroid of those building outlines in queue just three it takes about 23 seconds and if we do it in post just it takes about three seconds however now that I've put this slide together Niall will probably rewrite this in troy'd implementation in QGIS and make it ten times faster than what it is at the moment but that's competition that I'm quite happy to see continue so there's just a bunch of centroids on our building outlines and the Wellington CBD what I've also learnt though is that if it's slow there's probably a better way if a post just query one runs for like two minutes then I just been it and start again because I probably did something wrong or forgot to add a spatial index or something but this particular case part of our light our process is part of the QA process for our ladder surveys is analyzing the grids of ladder data they have been supplied to us so something really simple that our ladder analyst wants to do is find out which of these 1.1 million grid squares that make up the whole of New Zealand are part of the latest lidar survey so just a simple list en6 should get the right result but in this particular case it was taking a couple of minutes just to do that one large polygon gone against a whole bunch of small polygons so using st subdivide we can make that one large polygon a whole bunch of small polygons that have a more equal distribution like number of vertices and so if we zoom in a little closer at the edge of this lidar survey where there's more dense points there's smaller squares and if we then find the intersecting polygons from the previous the previous grids do the exact same intersects but on the subdivided instead of the larger polygons it takes a third of a second instead of two minutes so that's just making really good use of the spatial indexes on both of those layers the fourth thing I've learnt is that consistency is key if I am doing a bunch of analysis of different of trying to figure out some spatial relationships and stuff and as I go through the process I really want to make sure that I've got a primary key and I've got a spatial index the whole way through otherwise I might find some inefficient processes of in amongst there so in post just when doing different spatial analysis often end up in a situation where you know in this case if I take a polygon and I want its vertices then I end up with five rows and if I run the query that's showing that I get the same ID the original ID for that bottom five times so I can't use that as a primary key so you can make use of the row numbers that post-career stores so on the previous slide we've got one two three four five on the side which is a representation of that so if we wrap our original select statement with a window function and select the row number over as ID then we get one two three four five and can immediately use that as our primary key so we also obviously make our own functions that have our business logic and store those in the database so combining the spatial operations from from post juice to get the result that we want to get for the task at hand and in many cases we can find similar tasks that we can reuse one function over and over again for different purposes this is pretty heavy in terms of code that basically if we want to find out what suburb intersects a polygon then we've got some sort of business rules around how that works I was run through it step by step these building outlines again in Auckland harbour so this area down the bottom here is the end of a wharf and there's actually is the wharf actually continues a little bit further because it's changed recently and we've got a few building outlines on the end there so as they stand they don't exist within any of our suburbs but what our function does is first searches for all suburbs within a kilometre of that building outline using STD within which makes really good use of our spatial indexes again to drill down to just the five suburbs that intersects that area and then we can get the distance in ascending order so we can find the suburb that is closest this function also handles the case where a building crosses over the boundary of two different suburbs and we use the area of the intersection if we if it's a case where we just want to assign one suburb and forget about the other so the final part of that whole process is limiting it to just the one these store functions can then be used just as part of other tasks that we need to do for example inserting a bunch of building outlines in a bulk import the more complex code that was on the previous slide is just part of the suburb underscore intercept and it's called polygon function about each parts broken down and it's not you know one large piece of code stored in one place the sixth thing I've learned is that geometry types posters can be very freeing about what you can store in a geometry column so if you just say I've got a geometry column and and that's it then you can do something like insert a point align string and a polygon which is cool but not particularly practical if you can't display it in anything you just obviously needs those three things to be separated or you need to load it in three times and assign which geometry type you you want to display that's adding a geometry type with this syntax as point means that if we repeat the same insert as before the first time it hits something that's not a point it's going to air out so understanding what you can and can't do and where it's appropriate to add that kind of type constraint is obviously really important to maintaining the integrity of your data geometries that constraints can get a lot more complicated than that or richer than that if you don't want to allow overlaps between polygons in your database then you can create a function which uses stn 6 n not touches which is close approximation of this to of I don't want this thing to overlap and then you can add that constraint as a chick constraint to your database and in every single time you add a new polygon you'll have a check make sure that this is not overlapping any of the polygons that I already have in my database so again if your business rules determine the X Y Z H shouldn't happen you can probably build something in at the database level to really make sure that it doesn't happen because the post-race community is an open-source community there's also a whole bunch of functions that are already out there that you can go and have a look at there might not be part of the core code at the moment so there's a cool one called normalize geometry which pretty much has its own website caspari's created a whole bunch of different parameters that can feed in for how you want it to behave but some of the things that it can do is remove spikes from existing geometries so the top line is just showing geometries that have like an out and back and then cleaned up and it can also simplify the number of vertices so this is a circle that's slowly becomes you know coarser and coarser depending on the parameters that you use with this function so if you've got overly detailed information then that's one way that you can simplify linz also has a whole bunch of open-source Postgres extensions and indeed building outlines so some of our systems that we use to store post just data in our own github and you can have a look at how we're using it there the night thing I've learned is that recently PGM before got a geometry viewer so in PGM 4 you can now select ID and giome and if you have a geometry column you'll get this little blue eye icon and if you click on that you actually see your geometries within PGM if they are in a coordinate system that's not 4 3 2 6 then you just get a blank background but if you transform the selection - for three to six wgs84 then you get some OpenStreetMap contextual information in the background and you actually choose between there's five different maps to show on the background the EPG Edmund for geometry viewer also has the unique distinction of being able to show you that mixture of polygons lines strings and points so if you are debugging some intersection where you've returned polygons and line strings and you and you wanted polygons you can actually view it in the air and see what's going on the tip thing is that I've learned that I really enjoy solving spatial problems with posters I think it's really satisfying when you find the way to do something that works really efficiently and with obviously at Lynne's continue to expand our use of posters for that reason and yeah I I think the list of projects will just continue to grow that air that is being stored in a post queries posters database [Applause]
Info
Channel: FOSS4G SotM Oceania
Views: 5,733
Rating: 4.9560437 out of 5
Keywords:
Id: 93bX1AO95nY
Channel Id: undefined
Length: 13min 57sec (837 seconds)
Published: Tue Nov 27 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.