Learning Golang: Relational Databases - Bulk Insert with PostgreSQL

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello my name is mario welcome to another learning go video in today's episode i'm going to be showing you how to deal with bulk insertion in relational databases specifically using positive sql as usual the link to the code will be in the description of this video so feel free to check it out i also have an example using my sql as well first of all we have this file called data csv we're going to be reading this csv we're going to be reading each one of these lines and we are going to be inserting the values into our database i want to show you how you can do this using the standard library and how how you can use this or how you can do this using the concrete third-party package and the difference is significant because when you're using the third-party package you're actually using the implementation of the engine in in the database so we have there just like i show you a data csv nothing really out of the ordinary what i have here is uh sort of i'm reusing what i show you when i uh i was describing the transactions in in database sql and specifically the connection function that i have right here what i'm going to be doing is specifically in this case i'm not going to be using database equal however the methods that i'm going to be using are basically equivalent to using database sql but i want to use pgx because it's a little bit easier using the transactions again what i'll show you before the transactions method in the connection package now because i want to show you how long it takes i want to show you also a way to indicate how long it took so the simple way to do this is using the time package that you can literally take and now and then use scenes and it will indicate how long it took since e the noun was initialized to when you call scenes okay so we have error all right we are going to be implementing a function called insert users that receives the connection and returns an error if error is different than nil we're going to return we're going to return and say failed with the error that we receive now before we do that we need to actually run our postgres and then if you don't have any posters running you can use docker but if you can use your own local you know database maybe you install it via homebrew or maybe using linux or windows or whatever but i like using docker because honestly it's kind of easier to manage when you're running different versions and whatnot so i have my p sql that i'm going to be using for creating the user and the db name all right to create the actual table that i want to use and again all these instructions are in the readme so you can follow along now i have my users table that is empty i just created i'm running my docker container that i can use now if you remember when we were discussing about transactions in in specifically when using postgres there is a method called to transaction tx begin tx func are similar i can recall exactly but we will we will show you that in a while that is going to be useful in this case because we're going to be reading the file we're going to be reading each one of those lines and then inserting each one of those lines as via or rather using the insert command the sequencer command are going to we're going to see how long it takes and we're going to be comparing that when you're using the actual concrete implementation that is used using the copy command in pgx so we have this uh we're going to be reading the file so for the file we need to use the os open all right the os package the os package is that has a function called open we have a path package that indicates it's going to be used for joining the file that in the end we're going to be using for reading so so far we open the file we concatenate it in a way that is you know equivalent for different operating systems if there is an error we return the error as usual fmp rf os open and then we wrap the error as we have been doing uh obviously for a while now now how can we read the actual csv file there is a csv package in the standard library called encoding csv if you notice in the in the you know in the pop-up so you read the cs you open the csv using the new reader function in the csv package it receives a reader and the you returns a cr file all right you see our oh no it doesn't return scr distance an instance of the reader type in the csv package the way it works with this is that you need to use the for loop for reading each one of those records now the cr type has a next that indicates um right here all right not next a read function that you can use for reading each one of the lines each one of this every time you read these or every time you call the read function it will return a line which is a slice of strings and an error in order to determine if you finish reading the file you have to check if the errors is an ioe of eof that indicates an endo file if there is an eno file you re you break because there is nothing else to read if error is different than nil maybe something else happened you have to obviously return there that happen in that case which could be you know so many things so we can say see our read and again we wrap the error and so far we have this error all right this record variable that represents the lines or the columns for each one of the lines that we're reading from the csv what we have to do next is wrap this data into the con begin dx function that i was mentioning you before that receives a context background and also pgxt options and the function that is pxtx and returns an error so if you remember what i was describing you before this is uh in my opinion a clean way to handle transactions when you happen to be using the pgx package and we can copy all of this from here i missed that cr line so let me copy that over all right let's put it up here so what is going on so it's complaining about perhaps this one is 59 declaration i know and you know what you know what so you have this function and actually this has to be wrong here so we can get rid of this we have so what we have to do is basically move the content of the function to this one what the heck is going on why is taking so long okay so there you go he's complaining it's fun okay let's do something when i'm having these kind of issues what i like to do is take the the easy route oh it's an except you know not what i call this parentheses yes parenthesis so i have it here so i'm supposed to be returning nil okay so everything is back to normal so again i open the file the csv and reading each one of the lines i'm determining if this is the end of the line and if it is i just return writer i break and if i break is the end of the file so i return now if i go back to this an error f error f and then i can say con begin x begin tx and error so everything seems to be making sense if i compile this there's an error insert users and this is because i miss an s right here so i compile it says insert users there you go it means two ss so record is declared and not used okay i know why we are why is this failing because we use we need to use this variable for actually inserting the records into the database so if you remember con has an exact function that received a context background and again ideally in the perfect world you should be using an actual context that is coming from your the function that is calling you so you can you know handle propagation errors errors propagation cancellation those kind of things but in this case we just we're just going to be um using a context background just for the sake of uh simplicity so you have user's first name last name values and again placeholder one placeholder two close and then we take record zero because that's the if you remember the way in the csv how is implemented there are two columns that are represented in the slides as the zero and the one in the slide so zero position zero will be the first column position one would be the second column so if you also remember the way except works is that it returns a command tag that indicates the id and some other information that we don't need for this specific case and the error so we're going to be assigning that to an empty value if there is an error during the insert we return an fmt error f con exact now all of this is used to show you how you can um determine why doing this is not efficient and why you shouldn't be doing this in the first place so everything is fine there are no records if i decide to run this what is going to happen is that it's going to take 77 milliseconds and as expected well i have all my records in the database i do a count star and there are 50 records as expected and you can argue with me that okay 78 milliseconds is not that much but let me clean this up before we continue and show you the actual example using the pgx implementation and actually using the copy command so give me a few so what is a better way to do what we did before using the copy command and therefore the copy method in the connection type in the pgx package so what i'm going to show you next i'm going to go into the pgx folder i have what we did before i'm going to be refactoring this and show you how what we did before can be useful for implementing what we're we should be doing so insert users stays the same one important thing is that the connect the cone rider the con type in pdx defines this function called copy from and copy from defines a few different arguments that we're going to be using and it's really simple okay so you have let me for the sake of you know because i don't forget i'm going to copy um here the the api oh what the heck is going on okay there you go ah come on so i have it here just for for showing you that the copy from uh method in the cone package receives a context a table name and identifier column names and receives a type that implements the copy from source type again let me jump into the implementation copy from we're going to be following what the signature of that api indicates this is a table table name called users oh no actually it has to be a pgx identifier the columns will be first name and last name so we have uh what the heck okay first name will be first name all right the first column will be first name last name will be the second one and then we're going to have a hypothetical row sears and then row source is the type that we're going to be implementing uh for doing this so what does a source what does raw source do so before we do that let me close this in a way that if error is different than nil return fmp rf because basically the whole the whole interesting thing about this implementation is that we're going to be implementing the row source type all right so what is going on here what is this complaint so operand oh you know i forgot to close the slice so that's it so everything seems to be okay so what does the copy from source type implement so the copy row copy from source type in the pgx package defines three methods next values and error now if you remember i want to pick up and then just for the sake of keeping this simple i'm going to be copying the implementation and then i'm going to be implementing a type that i definitely define here that doesn't exist yet called row s r c so i can define a type called row src extract and i'm going to be copying or implementing these three functions so we have row src next that returns let's so false just use for now so it my my editor doesn't complain too much we have values that returns the values we have finally the error method that returns an error so if i compile this everything is is oh of course because copy come from copy from returns also oh you know what let's do the same count an error because it actually the copy from indicates how many rows were inserted into our database so i want to print that out because it's important so i want to be fmp println let me move this up so you can see rows count so everything seems to be working according to the compiler that is is not actually working in real life but i want to show you that what is important here is we're going to be taking what we did before we're going to be refactoring that and making sure that the implementation that we had before using this begin from transaction that is right here let me add a use that comment to separate these two things will be what we are going to be using next now if you remember there are three functions one called next values and error so what we have to have here is in a way a way to um receive the csv reader and then read that from the functions that are exported and that are used by the pgx con type via the copy from method so we have a cr so csv reader then we are going to here have here values which will be an interface and a slice of an interface we're going to have an error and if you notice most those three the uh fields that we're going to be defining in this type are going to be used in the methods that we have right here so values will be returned literally values and error nothing else obviously referring to the actual type error will be returning error and next is actually the one doing the actual work and this is where we're going to be taking what we implemented before and moving in up we're going to be extracting what we had in this implementation that i have right here highlighting it and moving up to the actual new type so next is sort of like what we did before is literally reading each one of these lines so we're going to be doing hey if there is a record all right if there is the end of the file we return false because there is nothing else to read if there is an error we're going to be doing is assigning the error that we have local to the type and return false okay and next we're going to be doing the values values will be a make of an interface an empty interface and we know that it's two because there are only two columns in each one of the rows so our values zero will be equal to record zero and our values one will be equal to record one so with this in place we can actually get rid of all of the things below and that will be the implementation that we have to have for using the copy command and therefore the copy from method in the con package let's go look at it again so we have a function already we have a type called row source that implements three methods next values right here an error and these three methods are internally used by a function uh or a a method in the con type called copy from and copy from receives that type that is right here to read the data that we're going to be supposing so we're supposed to be inserting into the database so if we compile this oh cr is failing because i missed i was referring to the actual uh field in the type so another error that i have here so this will be the reader they call it reader or what did i call i forgot see i actually call it cr so i need to rename this to cr and with that everything seems to be compiling let's run again we don't have any users in our database in our table if i go and do run a main it will take seven milliseconds let's run uh what did happen so i have uh something something went wrong so it says something it's most likely was in the insert rather in the next so oh if oh of course i'm returning false where i should be returned actually true because there was no error so if i run this again and now i have and it's taking me seven milliseconds i run this instead of running that select start let me do a select count and now i have 50 rows now remember what we had before with 47 milliseconds to 7 milliseconds now think about this in terms of millions of records right you can use insert for sure and that is going to work but if you use the command that is meant to be used for loading data for doing bulk insertion insertions into postgresql specifically in this case it will be significantly faster now i highly encourage you to take a look at the method that i just grabbed you and and it's really i mean i in my opinion is really easy to use because you just define sort of like what you will be doing if you using packages like the io is super similar and i think the interface is the user experience and the api that they define in the pdx package is pretty straightforward and and again any questions of any or any comments about bulk insertion using possible sequel please let me know in the comments section below i will talk to you next time and please take care and stay safe see
Info
Channel: Mario Carrion
Views: 374
Rating: undefined out of 5
Keywords: golang, golang bulk insertion, golang bulk insert, golang copy postgresql, golang copy, golang database bulk, golang copy command, golang postgresql copy, golang etl postgresql
Id: 7m7EKTKeaK8
Channel Id: undefined
Length: 22min 11sec (1331 seconds)
Published: Thu Dec 02 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.