GopherCon Europe 2020: Johan Brandhorst - A Journey to Postgres Productivity with Go

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] welcome back i hope the break was good i hope you're able to you know get all the refreshments you need and everything just make yourselves at home basically is the way to do it uh so before i asked you about this quiz here's some code and there's a bug in there and i asked you if you could debug it well if you have a look here there's a little dereference dereferencing this variable here is not needed and instead if we put this down here on this function this will then allow it to work because essentially the request is was being set before on a copy because this wasn't a pointer receiver so the fix was to make that a pointer receiver and now when we set this request here it's going to be set in the original struct not in a copy of it and that was it that's how that works so congratulations to anybody that was able to get that our next speaker absolutely loves databases always storing things in there you'll never cat you know what i mean it's go don't need johan you're storing stuff all the time but certainly yes all the time yeah you're always at it uh before we get into your talk which will be about databases could you tell me a little bit about uh viva the charity that you've chosen to donate your speaker swag to today yeah viva is a charity that works to promote veganism in the uk and it's something that's dear to me because i think animals are being uh mistreated in the animal industry and it's important to me that that those voices are being heard great i think it's a great choice um and it is is it sort of more driven by a love for animals or is it that you really hate plants yeah uh plants they can go stuff it to be honest yeah they really have to go yeah okay well without further ado um let's i'm gonna let you do your talk uh this is a a journey to postgres productivity with go so please do actually clap because we will be checking at people at home uh welcome johan brunthorst johan thank you okay let me show my screen there we go okay hi everyone i'm johan i'm a maintainer of various open source projects including the grpc gateway and improbable grpc web i've contributed to the go standard library and i run a blog writing mainly about go and grpc i just started working at buff where we think about protobuf so you don't have to this is my first gothic on eu and it's not quite what i expected i'm here today to talk about state and postgres so let's dive straight in we're going to start with the question why what do we mean by state management and why do we need it we're going to move on to talking about a few different state management solutions and discuss their various strengths and weaknesses we're then going to dive deeper into my state management solution of choice postgres once there we're going to explore some of the tooling available and narrow it down to the pieces i consider essential to becoming productive with postgres at the end of this talk we'll all be postgres rockstars so why do we need all of this state management is the source of a lot of the problems we have to deal with as programmers if we didn't need to worry about state we'd be able to effortlessly parallelize computation with no need for synchronization no worrying about chronological ordering etc it's basically the reason why we can't have nice things however it's also the reason we can have some things facebook and twitter wouldn't be much without the ability to share text and images and google would struggle to be useful if it had to search the whole internet every time you made a search state management is unfortunately here to stay so we need to store data how do we store data well we could just store things in memory create a struct like this store some data in a map or slice and you're done storing in memory also means it's super fast to access but what about when we need to restart your application to update it all of your data is lost what about when you want to run several applications because you have a high load they can't share memory between each other storing data in memory is sometimes useful but it's not a great solution for many problems so in-memory is out what about just storing files on the file system the files on the file system persist across application restarts and can be shared between processes problem solved sure things aren't as fast as when storing data in memory anymore but that was a compromise we had to accept using the file system is not a terrible idea for some problems but it also has this shortcomings as you can see in this snippet here we've already had to make a decision about how to store data on the file system what about when we want to find all users who are over in a cert over a certain age or all users whose name starts with t this and many other problems are sold for us when we use a purpose-built data store generally as a programmer anytime you need to store something for longer than the lifetime of your application you should turn to a data store these are applications that have solved all the hard problems associated with persisting large amounts of data ensuring safe concurrent access easy access to read data filtering of results etc let's take a look at a few different types of data stores data stores can be broadly split into three categories all with different properties and use cases these are in-memory key value stores document stores and relational databases there are of course exceptions to these classifications but we're only going to cover these three today so what is the key value store an in-memory key value store are typically implemented to allow quick access to small amounts of data tied to a specific key you can think of them as a distributed go map they are often used as caches because of their speed but they are not meant to hold very complex data structures and often don't support sophisticated filtering of results they are also obviously limited to whatever the size of your machine memory is examples of in-memory key value stores are memcached written by brad fitzpatrick an ex-member of the go team read this and more having said all of that a key value store may well suit your needs whatever they are speaking personally as someone who has implemented sophisticated logic in redis using their lua engine go nuts but make sure you're optimizing for the right thing i was not document stores typically use the file system and sit somewhere between a key value store and relational database in terms of performance these are usually called so so called nosql data stores which means that they don't require a predefined schema to work basically you insert some json and you've got a data store you can insert more json of the same format and quickly read it out again these data stores are normally built with large-scale deployments in mind and automatically splits data amongst its servers the compromise that some document stores make to provide high performance is that some data may be missing or incorrect mongodb couch base and elasticsearch are examples of document stores mongodb especially has gained some fame recently for its dubious claim about data consistency renowned the database researchers justjepson.io had the following to say about mongodb mongodb lost data and violated causality by default i don't want to show throw shade on all document stores but don't use mongodb if you can help it finally we have relational databases these use the file system for storing data provide strong data guarantees use the structured query language aka sql for inserting and reading data and are often backed by decades of development and testing they also tend to provide so-called acid transactions which basically means that your data will not get corrupted if something unexpected happens while you're writing naturally speed is a secondary concern after data integrity so they are normally not as fast as document stores oracle mariadb mysql postgres and cockroachdb are examples of relational databases all right with that background out of the way let's talk business which data store should we use postgres obviously because that's the name of my tag for most problems postgres is a great alternative it's been around for over 20 years it's free it's open source and used by thousands of companies worldwide importantly for us it also has excellent go library support for basically anything but in memory caches i tend to lean on postgres if you need to scale your database to several machines cockroachdb implements the same wire protocol as postgres so can often be used as a drop-in replacement postgres can be run on its own in the cloud or you can use one of the managed postgres solutions which exists in pretty much every cloud provider i like that defining the database schema makes you think about how the data fits together it's a bit like going from writing python to writing go and it can save you a lot of trouble in the long run so how do we use postgres with go unfortunately even after choosing a database we're presented with a myriad of options for tooling should we use an orm should we generate the code what driver should we use let's take a quick look at some of the tools available you've probably heard of object relational mapping aka orm libraries of which there are many these try to provide an interface to the database that doesn't require writing sql which is often very appealing to beginners however orms often tend to become messy when working with more complicated data structures and can lead to very inefficient queries being made to the database furthermore the lack of generics in go means it often has to sacrifice static typing orans are also often unable to support all the features of the databases they abstract away as they often naturally aim to provide a database agnostic interface working with an orm can feel a bit like magic when it works but i recommend learning to use sql directly it will pay dividends in the long run then there are generator libraries these will generally require you to provide some sort of markup to define your data structures it could be sql it could be in the form of go code or maybe something else it will then generate structs and helper functions for writing and reading data to the database they are able to provide a typesafe interface to the database as opposed to an orm but none of the generators i have seen in go can support the range of features supported by postgres so it's usually a non-starter for me having said that generator libraries show a lot of promise and if you know of a good one please let me know all right so we're not using an rm and we're not using a generator library what tooling are we going to use i know you didn't come along to this talk to hear about theory so i'm going to give you my own recommendations after years of trying libraries and methods for interacting with postgres i have identified four tools that complement each other very well each solving a separate issue they are pgx golang migrate squirrel and docker test pgx is the database sql driver and provides database types golan migrates solves the initial setup and migration of the database schema squirrel solves the issue of writing queries and docker tests helps us gain confidence in our code by letting us test against a real database in our tests let's take a closer look at each one pgx is a pure go driver for the standard library database sql interface that also implements its own interface to squeeze that extra bit of speed out when you need it i tend to use the slightly slower standard library interface for compatibility it has support for over 70 postgres types such as uuid store json byte array interval and arrays it can be used with sql open but it also supports a custom type for configuration things like tls and db debug logging of queries as can be seen in this snippet here the logging interface logs all queries and has adapters for popular logging libraries such as loggress sap zero log and log 15. you probably don't want this turned on all the time but it can be a great help when debugging some queries at any time if you need to do something that could take advantage of the extra speed offered by the custom api such as when you want to insert a large amount of data you can acquire a pgx con from an sqldb as seen in this snippet you could then do the intensive operation while maintaining the use of normal database of the normal database sql interface for familiarity and compatibility elsewhere before we dive into the next tool let's talk a little bit about database migrations when i first started learning about databases it wasn't clear to me why you would need migrations migrations are in the simplest case useful for migrating from nothing to an initial database schema and back to nothing we're going to look at an example of this in a minute the second use case for migration is when you have some existing data but you need to evolve the schema such as adding a column to an existing table or adding another table altogether and you need to do something to the existing data such as setting a value in the new column based on current values or inserting a row in the new table the third thing it can be useful for is when a planned change didn't go as well as desired and you need to roll back the changes my choice of migration library is golang migrates it supports reading migration files from a number of sources and writing to a number of different databases though of course we only care about postgres for this example it can run both via a command line interface or inside an application at startup the examples we're going to look at today will be using it inside the application note that this means that we are tying the version of the database schema to the application which means that you have to be careful if you're running several applications against the same database at the same time if you're using a single client per database it is the easiest way of managing migrations though first up we have to write migration files in the simplest case this is just the initial database schema if your application is small and doesn't need to change this may be all you need even if you don't anticipate that you will need to migrate data in the future i would still recommend using migrations from start just because they allow you to easily control the state of the database schema the initial up migrations creates the first table or tables and the initial down migrations is an exact inverse of that that is it deletes the tables again so there's nothing in the database this adds another benefit which is that you can clean out the database state by just running the migrations down to zero so let's add an up migration here that creates a single table and the dow migration that deletes it again note the naming of the files they need to be named so that the migrations can be ordered lexicographically it's often accomplished by using a numbered prefix like here so if you added another file you would name it 002 underscore whatever and this syntax here is just normal sql next up we have the question of how to integrate the migrations into our application and for that we will use the bin data generator if you've never heard of bingo bin data before it's basically a way of taking some files on the file system and making the content importable as a go package this means we no longer depend on the file system so an application can still be shipped around as a single executable so we can take this command here which runs gobind data to create the bin data file stick it in a make file or a go generator directive and then import it and use it in our migrations like in the snippet this reuses an existing sql.db to do the migration to version one creating our table now that we have the database schema created we can start thinking about writing some queries and that's where squirrel comes in ever found yourself writing a huge huge fmt.sprintf expression mapping column names or dynamic data into an sql query i've been there squirrel is a query builder utilizing the builder pattern to add properties to the query the builder pattern is isn't something you see often in go because it sacrifices static typing a lot of the time and squirrel is no exception i would generally not recommend the builder pattern but squirrel strikes a very good balance between power and type safety the code on the screen shows a very simple example of using squirrel parameters are automatic parameters automatically become positional parameter which prevents sql injection vulnerabilities it aims to support most of the sql standard and this can be further extended to arbitrarily complex queries with joins postgres specific suffixes etc this example creates a query which selects all rows from the users table which have the name johan we then convert it into a query string and positional variables with 2sql let's have a look at a few more examples this example shows a typical insertion with squirrel notice how it allows you to map from column name to value directly in a map notice also how we're using the dot suffix to add the postgres extension returning which returns the data from the new row also notice how we've ended up with an empty interface in this map since we want to be able to put anything that can be used as a variable in postgres inside of the map setmap can be used in both insert and update statements finally we also see how squirrel can be used to invoke the quarrels directly without having to use 2sql in this example we see how squirrel makes conditional filtering a trivial and pleasant experience note how we're able to use the native time type directly when interacting with a database and that we can easily convert iteration to a native postgres type before handing it to squirrel using one of the powerful types provided by pgx using squirrel.gt we say that we want values with a create time strictly greater than the provided value and using squirrel.expert allows us to add arbitrary sql commands into our filters not also the use of the question mark for precision variable interpolation and this is all only a small glimpse of all that squirrel provides now we've got the application up and running and talking to the database using pgx writing migrations with golang migrate and writing queries with squirrel but there is still one piece to the puzzle how do we test that our code does what we want it to there exists several solutions to testing code that interact with the database such as mocking the database sql interface or using an interface and perform testing against that but i'm here to tell you that neither of these solutions are adequate fortunately there is a better option what if we could test against the real database docker test does exactly what it says on the tin it allows you to test against a docker container this allows us to run tests against a real postgres server it can be called from normal test code like any go library and it uses the docker api under the hood so all you need is a local docker socket you can configure the tests to automatically clean up containers after finishing or leave them up to allow for debugging after tests run it comes with a convenience function to test that a container has started and it can support advanced use cases like uploading files to the test containers and reading the log output for debugging it works just as well in local testing and ci so no more need for ci specific container configuration the only requirement is that your ci runner has access to a local docker socket so that it can both start containers and access them via the local network in ci this can mean using custom runners or using something like a circle ci machine runner let's take a look at a real example in my example repo and now i'm going to be changing my presentation to show some code and if you want to you can take a quick screenshot of this and clone it yourself all everything that i've talked about so far is covered as an example in this repo and it should come to the surprise of no one that my postgres example repo is also a grpc server so let's take a look i'm going to change my presentation okay so what we've got here is the initial schema migration we saw an example of that earlier and in in this case we're doing basically exactly the same thing as before but we're also creating an enum here we don't really need to worry about that it's a little bit more fancy to show that you can do those sort of things with the pdx types um but this is just creating a normal table and then if we jump to users.go then here we can see that we're creating a new directory where a directory is something that interacts with the database it needs a url to connect to and it takes a loggers logger and we can see that we're using the pgx types here to parse the config from the url and then setting a logger and then using the pgx open db function to create the database and then in the validate schema here if we take a quick look this does the migrations that we discussed before and it's locked to version one because that's the version that we have and if you wanted to add another migration here you would add a file and regenerate a bin data file and then set this to version 2 and it would just automatically apply both version 1 and version 2 next time it connects and then if we scroll down here we can see that we have some examples of squirrel interactions so we're inserting um into the users table we're setting the role of the user to the parameter that comes straight from the jrpg server actually and then it does a little bit of a fancy handling here to convert to a database value type if we take a quick look here you can see that this type here implements the database driver.valuer and driver.scanner so that you can both read and write to it and then we're using the suffix here to return the id role and create time of the new row and then we have a helper function here for scanning directly from a row that's returned from query row context we have a delete user here as well so that again we're using a sql delete where id equals get id and we return the row that we deleted and then we can also do some special error handling here to find if the id that was provided was not a valid uid and then in the list users here it's a little bit more interesting we're doing a select but we're also doing this conditional formatting as we talked about so you can take some time to look through this later because i'm running out of time i want to get back to the tests if we look at the user test here we have a function called start database which is a new testing helper so you can mark it as a helper and it will be removed from the back traces if there's an error and then you can see here that we're creating a process url we're creating a pool and setting up environment variables that will be necessary to start the postgres container and then we just call pool.run postgres with the tag 13-alpine with these environment variables and we can use this this new gu114 cleanup feature to automatically clean things up as soon as the test has finished and then the magic is here where we ask the docker socket okay what is the ip address of the container that you just started on my local docker socket and we can set that to the url host and then there's a little bit of magic here to work around the fact that darwin is um back is it has a special implementation of docker we're also attaching the container logs to the uh container when running you can skim over this a little bit you can see we're cleaning it up afterwards and then down here we have the function that automatically retries a database connection until the database is started so you don't have to worry about the database taking some time to spin up it's it's all as soon as the start database function has returned you know that a database is running and this and it's online so then we can use the start database here when talking creating a new directory as you remember the new directory takes a url a starter database returns a url type directly which is the url to the container that we just spun up we can use the directory uh in um so here we have a a couple of tests that we run in parallel you can see we can do things like add user delete user and see that all of that works and because these these tests are completely independent of each other we can both run them on the same database at the same time which means that the tests are faster and then in general we have several tests here but you can see that each one of these tests uses a separate database uh so the the tests can also be run in parallel because they just spin up a separate container and again we can use cleanup here to close our directory afterwards so you're probably saying okay great we can run tests against a container but i've written integration tests before and they take flipping ages to run so i'm going to see um if we can convince you that it's actually worth doing by showing you just how quickly these tests run oh okay this has to actually pull the the image now because i want a new computer but this is pulling the postgres image in the background so it's going to take a little bit longer i'm going to rerun it in a second and you will see just how quickly it is this is ah it's not my image it can't take that long okay that took 12 seconds and that includes pulling the image but if i run this again now 3.6 seconds to spin up two containers run several integration tests against them and then tear them down again i think you probably have time to do that within your testing and as you can see this just runs go test you can run this in ci as long as you have access to a docker circuit it just works and you're testing as the real database so you know that it will test that all of your occurs actually work all the data is being read back and so on okay let's jump back to the presentation let's quickly review what we went over today in summary we started out by looking at state management and why we need the data stores we learned about a few different data storage solutions such as redis elasticsearch and postgres we learned about the four libraries i consider essential to becoming proficient with postgres in go pgx golang migrate squirrel and docker test and looked at an example that incorporated all the different tools putting all of it together we become postgres rockstars ready to be productive with postgres the next time we need to store some states i hope this talk is giving you the confidence and desire to try out postgres with go it's something i enjoy every time i do it thank you and now i think we have time for some questions hello thank you johan great stuff i can't believe how fast that test ran right it surprised me as well the first time i learned about it it's it's super cool because you expect it to kind of take ages to spin up and stay at the base and you'll get started but actually it starts up really quickly yeah was that doing the docker thing as well yep that was everything all in once yeah because that was going to be my question was you know if you're using docker as sort of developer environment what's involved in that and what's the time that it has to take to kind of spin up and stuff so yeah i'm shocked does it leave those docket images running in the background was it so that actually included deleting them as well immediately afterwards but you can you can remove the pool.purge function that was being called in the helper and you leave them up if you want to query the database afterwards right that's awesome i'm just checking the list uh don't i don't know about questions oh someone's asking why would you use a docker container compared to just spinning up like with a continuous integration spinning up an actual instance of the database yeah so uh if you if you do that sort of thing which i know github and circleci supports natively where you can say here i want the test to run with this container next to it the problem is that you can't run that locally because it requires you to have you have to manually kind of run the container locally when you run tests so you don't want to have to do that that adds extra developer friction the benefit of this is that it works both in ci and locally you just have to run go tests and you don't have to worry about anything and i suppose all the developers can have different development machines even exactly all you need is docker that's that's a beauty yeah very cool well johan thank you so much uh that was brilliant um i'm going to now take over the screen i think if you could stop sharing yours i don't know how that okay yeah um i just wanted to call out there's this uh networking tab where you get two minutes and 40 seconds of time with random people so uh if it's a great kind of replacement for actual conferences so check it out and johan if you want you could be you could go and explore there and people have they chances of running into you which would be nice for them wouldn't it um so we're gonna have a quick break a bit longer this time i'll leave you with this quiz i'm also tweeting these quizzes because the screen doesn't stay up i know um so here's the next one there's a bug in this code which is making it misbehave so if you could find out what that bug is and you can tweet at matt roya i'm gonna be looking for answers there and i'll tweet them we are to be back at 22 20 minutes to the hour 40 minutes past this current hour i think we should do relative time from now on since we're all in different time zones so uh yeah we'll we'll see you in about 10 minutes a bit less thank you
Info
Channel: GopherCon Europe
Views: 3,242
Rating: undefined out of 5
Keywords: golang, gophercon, database, db, postgres
Id: ZRUEJX1fqYc
Channel Id: undefined
Length: 32min 27sec (1947 seconds)
Published: Sat Jul 11 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.