Import Records From CSV File (or any data file) to SQL Server (or any database system) With Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Saved!

šŸ‘ļøŽ︎ 2 šŸ‘¤ļøŽ︎ u/Rogue_Of_Life šŸ“…ļøŽ︎ Dec 03 2020 šŸ—«︎ replies

Dudee Iā€™m working on a project that involves exactly this! The csv part is almost done with pandas and a gui i made. Now I need to learn the SQL part.

Thanks

šŸ‘ļøŽ︎ 2 šŸ‘¤ļøŽ︎ u/mdr7 šŸ“…ļøŽ︎ Dec 03 2020 šŸ—«︎ replies
Captions
[Music] hey how soon guys in this python tutorial we're going to learn how to import records from a csv file to microsoft sql server so the script can be adapted to import data set from excel file a text file or any df file that you are going to be working with so it's not only limited to csv file i'm using csv file just because this is the most commonly used data file today and the database is not limited to microsoft sql server you can use this script for other database systems as well such as mysql oracle or even microsoft access now let's take a look at the data set source so i'll be using this data set real-time traffic incident reports and he'll have an export button which i can export the data set to a csv file and that'll be the file that i'll be using to import the data set to my sql server database system and you can download this dataset from data.austintexas.gov i'll post the link in the description below so that we can also download the data set as well now let's take a look at the table itself so this table has nine columns traffic id published date issue reporter location letter 2 longer to address and there are two more so here let me display all the fields status status date it will have 196 thousand records that will be importing in this exercise and here's a preview of the table i already downloaded the cc file surnamed file real-time traffic incident reports and just extra file now we have the data stores taken care of we need to work on our database system so i'm using microsoft sql server just because i know this database isn't the best and here's the sql statement to create the table that will be storing the data set and name the table i'll stand traffic instead and here let me recreate this table if i run the select statement we have our table with zero record let's go into our python editor i'll be using pypyodbc library to connect to my sql server database system i'll name this library odbc oh and you can install this library by using the command pip install pypy odbc and to clean out the data i'll be using pandas library oh and this one would be pip install pandas and those are the only two libraries that i'll be using for this task all right so let's create our drive variable for sql server the driver name is going to be sql server if you don't know the driver name for your database system on windows you can use the odbc data source tool if you go to drivers here's the list of drivers that are available on your pc and the driver name on the name column is the driver name that you'll be inserting to the driver vehicle next we need to assign the server name to get the server name in sql server so here we will type select 4 by 2n symbol server name and i'll return the server name i'm going to copy paste the server name and the next item is the database name so name this variable database name and my database is going to be called jj if we look at our sql server object explorer so this is my database name that i'll be connecting to i also have a different database called youtube now we have created our variables that we need to provide when we connect to sql server database system now we need to import the data using the pandas library and since my data set is stored in a cfc file so i'm going to grab the file name and from the pandas library i'll insert the method with csv followed by the file path and i'll save the data set as df and this is going to be step one importing data set from csv in step number two is we're going to do some fake basic data cleanup now let me import the library first so around these two lines and input the data set to my data frame object now if i print the df option that's going to return 196 357 records let me open the csc file so we can take a quick look at the table itself let me zoom in a little bit alright so the next step is we need to figure out which column do we want to import to our database system if you look at my table creation statement i'm only inserting one two three four five six seven i'm only inserting seven columns from the cfc file and the last column created is the date time when a record gets created oh and here i need to provide a default date time value so i'm going to fix the sql statement a little bit so i'm going to assign t file i'm going to insert the get dates function to insert the current timestamp i'll recreate this table [Music] right now let's go back if i take a look at this csv file i want to import traffic report id column publish date issue reporter location and i'll skip added to any longer two columns just because i can create a function to extract the value from location column address status and status date so those are the seven columns i want to import to my table in sql server but first we need to clear some data so from the csv file we have two daytime column status date and publish date if i simply create the data from publish oops publish date and we'll grab the first five records and notice that the data type is an object that means when i try to export the datetime value to sql server this is going to recognize this regular string and i need to convert that to actual data and valida single server recognizes and to do that in pandas there's a function called to datetime and this is one of my favorite function when i work with dates or time values so we can simply insert the publish date value to this function and that will actually convert the uh string to date times uh format so here let me show you if i run this theme right here just give it a second [Music] now we have daytime 64 data type of our values and that's what we want so that's something that we can uh convert to the format that sql server recognizes in here i'm going to type dt and it's going to convert the value this date time 64r format value to the proper datetime format now we can convert the formatting for the formatting pattern it's going to be uppercase y hyphen lowercase in hyphen lowercase d i also want to grab the time value so it's going to be hour minute and a second and we're going to replace the current publish date column which is a new set of values now let me run line 12 now if i create the publish date column again and that's what uh oops so here my mistake so the s didn't get oh i forgot the percentage symbol so here let me re-import the data set to my data from object and you know i made a mistake before now i fix it so i can basically do the same for the other daytime column i remember the name was status date all right so i'll run these two lines individually now the next data cleaner test is remove all the bad data i already worked with this data set before if we look at the location column and there are some empty fields not views as empty values so if i do a search and look for empty string um rows are 117 so here's one of the empty location to remove bad records i will reference the data frame object now is the job method to delete the records and to create the band records i'll use the query method inside the query method we'll supply the column name in string format so the column name is location you can use the is now function so this will return all the records where location column value is empty now i'm going to insert a pipe symbol and the pipe symbol represents all operator i want to insert another condition so i want to reference status column this time and i know some of the values are empty so it's the same function is now to create those empty records and here i need to type the index to return the index number of those rows and set the in place value to 2 to replace the record to this df diaphragm object all right so that's the cleanout test and step number two actually this will be step number three i'm going to name this step 2.2 for step 2.2 we want to specify the columns we want to import i will create a list option i'll name this object columns i'm going to type all the columns i want to import so the first comment is traffic report id and since ids uppercase yup uppercase here i'm going to just copy all the columns it should come and hit us i'm going to just copy and paste oh sure i forgot i can just a another way i can do it so i can reference the the diaphragm object i can set the columns property let me see all right i'll grab this list and we'll just copy and paste and remove ladder 2 and longer 2 from my list here i'm going to create another data frame archer except that this object is going to contain the records and the commands i want to export to sql server so from df option i'll insert the column names now i'm going to round these three lines actually should be i see i ran line 12 and 13 before now i'm going to run line 21 to 24. if i print df data option that will give us 194 221 records and 7 columns because when we use odbc library to insert records to a database system we need to provide nestless object so that means we need to convert our data from data set to a list option and to do that i'll convert df data data set to values first then to list now if i run line 26 oh this should be plural so i made a mistake this should be reckless now if i print the first 10 records of the records option that will convert each row into a list now we are finished preparing the data now we need to work on creating the sql server connection to communicate with python so this is going to be step three actually let's do step 3.1 create sql server connection stream it is always good practice to use your code so here i'm going to create a function selling in this function connection string and this function is going to have three parameters driver server name database name inside this function we need to construct our connection string oops should be a string i am creating an f string so just be aware of that and the string is going to be driver this equals two and you'll need to insert two clay brackets followed by the argument name semicolon idiom server we only need to insert one set of curly bracket and follow by the server name argument and database one set of curly bracket followed by database name argument you also need to make sure that we set the trust connection property to yes in case if logging into your sql server with a user id and a password they need to insert two more properties so the first property is actually not password the first property is going to be username then we'll type your username id and followed by the semicolon so username and the second property is going to be password and here's where you're going to insert your password and since i don't need to provide my user credential so i'm going to delete those two properties and i want to return the connection string and here's some made a mistake so this should be three sets of curly bracket not two now let me create this function in memory now if i insert the connection string function and i'll provide my driver server name and database this should be server name database name if i'm getting driver is not defined oh this should be silver okay so if someone is good so this is what i'm looking at the driver value server value database name and trust connection is yes in case if you are wondering when you are creating a connection stream for a different database system you can go to the site connectionstreams.com this website has a list of all the available database systems connection stream reference now let's go back now we can create our connection option and this will be step 3.2 create database connection object actually i'm going to name this instance i'll grab the odbc library dot connect and all we need to do is we just need to provide our connection stream to the connect method name the output is connection content and here i need to insert try accept block just because it's very likely that you're going to run into some sort of air in the beginning i'm going to insert the first handle except odbc database air let's see and just it happens i want to print message database here then i want to print the a message if other than the database here then i want to just print a very generic error message and this will be connection here and open the end message now let's run this code block and here's my first error the last first name cannot found no default driver let me take a look so i run into this issue yesterday so me replace the single quote with double quote oops try again okay now this time work anyway uh let's take a look at the connection string so if we print the connection object and then return us pypyodbc connection option now about connection created we now need to create a cursor connection and this will be step 3.3 create a cursor connection again i'm going to insert try accept block and this time going to use exception to capture audios inside the try block i'll instantiate my cursor connection if you don't know what a cursor is a cursor is basically an identifier associated with all the rows of records in a database system and to insert our records we need to prepare a sql statement to do that sum name the sql statement sql insert and the insertion statement is pretty straightforward so insert into followed by the table name austin traffic instant oh actually uh here we remove the default statement i want to recreate my table i'll show you why in a second here for the values argument we have 7 parameters so one two three four five six seven and the eighth column is going to be the date and time when a record is created and for that i'm going to insert the function inside this sql statement and the reason is because if i decide to use this sql statement to insert the default stamp then i need to merely type the column name normally run the sql inser string from the cursor option there is a method called soq menu and this method allows us to execute a batch of values from an array and the first parameter is the sql statement in the second parameter is the nest list array object able to create that before once all the records got inserted to our table we need to commit the transaction however if we run into an error then we need to roll back the transaction and finally if the transaction is successful then we now have finished uh inserting all the rackets now we can close everything so we need to close the cursor connection and the connection connection the reason i didn't want to close the connection object and the cursor object inside this air block is because when we run into here i want to keep all the connections open that way we can re-attempt to insert the brackets again now this is everything we need to write for the python code now let's give it a try so i'm going to save this script i'm going to press f5 to run the script looks like the script has finished and we should insert print message to indicating that the test is complete now let's go to our sql server database i'm going to run a slide statement and looking at this table i have 194 229 records in this outstanding traffic incident table all right so this is everything i want to show in this video if you have any feedback or any comment please leave them in the comment section below and as always see you guys watching i'll see you guys on the next video
Info
Channel: Jie Jenn
Views: 18,324
Rating: 4.9846153 out of 5
Keywords: sql server python, database python, insert record to database with pyhton, insert csv file to sql servre with pyhton, insert table to sql server with python
Id: 0E4AuCqgN90
Channel Id: undefined
Length: 26min 39sec (1599 seconds)
Published: Wed Dec 02 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.