Python for Beginners: CSV Parsing (Part 1) - Parsing a Simple CSV File

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
the biggest question i always get is how do i begin learning to program in python and my answer is how i learn how to do it which is parsing csv files when i was an undergraduate i had a research project that required me to parse thousands of csv files and if there was an error in the in the program i had to do it all over again so it was either learn how to code or parse thousands of files manually sometimes for nothing uh csv parsing is really important for developing the basics of programming it teaches you how a computer thinks it teaches you how to think logically uh it teaches you how how do you how to work with indexes it teaches you how to work with generators uh it teaches you how to work with lists it teaches you how to write for loops uh it teaches you how to set flags how to break out of for loops so in this video i'm going to show you the basics of how to parse a really simple csv file and i'm going to try to go to great lengths to explain what the code is doing so first-time viewers first-time programmers can can really develop a sense of how python works and if you're a veteran programmer well then you get a nice nice brush up on your skills so here's our csv file uh i have it's called studentgrades.csv uh and i have four columns here and in the header row uh which describes the contents of each column i have first name last name year and grade okay so this is the file that we're going to parse and i've already set up a uh a uh document here to write code in i and studentgrades.csv is in the active directory so i'm going to go ahead and import the csv library which is the library that we're going to start out using and the first action is to actually open the csv file file with python so we can begin operating on it so generally you'll see this as with open and the name of our file of course is student grades csv and we open it in read mode as opposed to write ode or read write mode uh as in file so this is just a fancy way of opening the contents of the file and assigning it to a variable uh next you'll generally see uh someone read infile here into a csv reader function so generally it looks like this a reader equals csv dot reader in file and this isn't necessary but i always like to specify it we can specify in a parameter our delimiter so we have a comma separated file so our delimiter or what's spacing out the columns you know it's a comma if we had a space separated file which we could use the csv library to parse we could put a space in here or maybe a tab if we have a tab separated file but of course we have a comma so what this is is the csv reader is a generator so instead of opening the entire file and reading it into memory um what the generator will allow us to do is to go one row at a time uh so every time we access reader to do something to it um it's gonna go ahead and move to the next line so let's go ahead and say we want to just capture the header row here how would we go about doing that you know we have a generator we can't necessarily index easily um how do we go ahead and get this uh header row well we haven't actually advanced through the reader yet so it's the first row on deck essentially so what we can do is say header equals next reader and what next does is it just takes the current row um converts it to a list and then advances reader to the next row which would be you know row two here this one so if we go ahead and print that out we'll just see what what it actually is and i go ahead and run the code here you see we get a list with the header row so in the same exact order as it's in in this uh in this input file here we get it back out in python as a list uh and if i wanted to actually maybe you know get the name of the first row i can go ahead and index it so i grab the the first index value of this header list which would you know because we start counting from zero in python as opposed to one uh the first value in header that corresponds to index 0 is first name so if i print that out of course we just get string back first name easy enough so let's go ahead and start looping through our file so we're gonna go ahead and use a for loop and say four row in reader and so we're just gonna go ahead and march down each row of the file so four row and reader we're gonna do something and then we're gonna move to the next one uh so just using our standard index values we're gonna go ahead and say student first name equals row zero student last name equals row one uh student studnet student student year equals row two and student grade equals row three and again these index values just correspond to the order uh the ordering of the columns okay so if we go ahead and print all of this out let's just put you know go in order student first name student last name student year and student grade and if we go ahead and run the code it's going to loop through the files and give us back each item in the order that we're printing it so what if we actually want to do something with this data let's say that we're interested in just reading in the student first name and student last name and writing it to a different file a different csv file so i'm gonna go ahead and operate on this existing code uh and so we're gonna go ahead and start by creating a new csv file to write right to so we're gonna call it out file and we're going to open uh let's call it student names.csv and we're going to open it in write mode okay so so this file doesn't have to exist already it's uh python is going to automatically create it and we're going to write to it so let's say that each line in our new csv file we're going to have the student first name and the student last name so what i'm doing here is i'm just going to use these curly braces to insert into this pre-existing string the student first name and the student last name uh because we have a comma separated file um we uh uh we're going to separate these two things with the column and then we're going to end with a new line so once we write the line uh the computer is going to know the next time it writes the file or the next time it reads the file uh that uh it's going to go to a new line when it when it reads us in so we'll just say outfile.right we're just going to write to the out file this line um and so every time we iterate through a row it's going to read in all the data we're going to produce the line ahead of time and then we're just going to write to our new out file and now you know when this is all done and we're ready to terminate the program we can just say outfile.close and it'll cleanly close this out file and we can access it through a different program so i'm going to go ahead and run the code and it finishes and you can see here that in our working directory student names.csv has appeared and when we look at it sure enough here's our first name and last name in a comma separated manner our new line character has worked um every time it writes a last name it goes ahead and moves to a new line so let's go ahead and and delete this quickly okay now let's say we wanted to write a header ahead of time that's pretty easy to do we could just say out file header equals student first name comma student last name and we'll of course have can't have that have the new line character there and then we'll do outfile dot write out file header and we'll run this one more time and here we go yep there is our our header names so we can associate a header name to our new csv file so that brings us to the end of this video in the next video i'm going to show you a much more complex and convoluted real world example of a really complicated csv file that i had to parse when i was learning how to write in python so we'll see you in part two
Info
Channel: Scott Hull
Views: 17,422
Rating: 4.9741101 out of 5
Keywords:
Id: _r0jzrlcDPM
Channel Id: undefined
Length: 10min 1sec (601 seconds)
Published: Mon Aug 10 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.