HOW TO PARSE NESTED JSON AND CONVERT TO DATAFRAME | STOCK EXAMPLE 3 DIFFERENT WAYS | PYTHON

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome everybody you're watching mr fugu data science today we're going to parse some nested json stock data and convert it into a data frame this was a question from a viewer who needed a little help and here's the question that they put on stack overflow i was too lazy to create an account to do this problem let's get started feel free to hit me up on instagram and twitter if you like to help support the channel here's my patreon or consider buying me a coffee you only need these two imports for now we'll have one more later on i don't know where these data actually came from if they are parsed from an api or what but we have some form of a time stamp we have this o which i don't know what it means we have a high a low a close and then the volume and then we have the ticker name we need to convert these into a data frame i'm going to show three examples before we move on i want to mention something the ticker all of the values are held within this list and then within that list you have two separate dictionaries inside of that dictionary you have all these key value pairs that's the nesting that we have to deal with which is can be quite confusing the objective here is to perform our three ways let's get into the first one real quick let's create a list to store our variables and iterate inside of our stocks variable that i'm converting into items which will be tuples for ease of parsing let's look at what this gives us the keys are zero one will be all of our values which are the list of our nested objects therefore let's go inside of this and save everything and then we should convert this into a data frame now this is kind of what we're looking for let's just make up a variable name all right we still have to address this problem here so we have a list there's a function we could use which is pandas explode that flattens that but then we're going to have the nested objects inside of that we have to deal with let's do this in a couple steps all right i'm just going to make up some variable name because i'm too lazy to think of what this should be called let's use the flattening of the list and that's what we produce here now we have to do with the json object how are we going to deal with that let's figure that out we got to use something called json normalize but we have a few steps to do this we first convert this into json and then we orient it into records from there we got to take that and we're going to use a json.load s which represents strings and then we're going to do the pandas json normalize which spreads everything out for us this is one way which logically is how i would set these data up but this isn't what the client wanted that'll be in the last example from there we would rename these columns here so let's do that real fast and just cut and paste that just for simplicity we'll take our old column put the new name old column put the new name chuck it in this is what is this called i'll just call it df1 i'll call this df1 and there we go i didn't know what this was called so i just left it b there's the change of our names let's move on to something else wait we could have done this more simply we had too many redundant steps how should we have taken care of this instead which would have simplified the process therefore we could have taken in the original data like this then we're going to create this data frame but this name the columns real fast just do just like we did before when the pandas explode now this particular thing only works on one column at a time if you're using multiple columns consider doing an apply function i've showed that in a lot of videos but we don't need that in this case now we're almost there what else do we need to do we need to convert the json formatting that we did before so that's instead this is getting kind of long name this is df1 better then let's call the df1 better and then scroll up we'll just copy all this come back down and then we paste it right here then we got the same thing then all you need to do is just rename these let's get into the second example real quick let's just take the stocks and here we got the columns now which is the ticker symbol or the stock name let's just call this a df2 real quick and let's run our json normalize so what i did was the same thing we did before for all of the json.normalized to this point but you'll notice this looks kind of goofy it's two rows and everything split out wide and this could explode on you really fast we only have i think what three or three or four entries for stocks and so this isn't really something you want to do this causes a lot of problems and not something you want to do for this particular data set so we need to move on to another example of what we're trying to achieve this one is more challenging and we should think about what's really going on there's quite a few steps for this so let me run through this we need to call in collections because we're going to import default dictionary we're going to also create a separate list to store our variables and then we're going to iterate inside of our stock data which are going to be converted into our tuples the keys and our values then we are going to iterate once more because we want to go inside of each one of these lists what does this give us all right so now we're inside of each of the lists so we need to go a little further since we're dealing with dictionaries again now what are we going to do let's look at the formatting of what the client actually wanted so give me a second this is what the client wants for the formatting here the closing price is what they want is what i'm seeing so let's get into that and we need to take these i want to show you something before we move on the timing here and here is with respect to one ticker symbol or stock it's the same exact value for each of these because they're taking at the same time for however they were pulling this data from some api since we're going inside of this once more let's see what it looks like okay we have our tuples but now we have to do comparisons to get the timing correct for each one of these and then open closes what i did but really i just need the closing values we need to start going inside of all of this and using the conditional statements to go one by one to take our information first we're going to take the time and make sure that it's not inside of the list that we're creating put it inside of this list oh i made a mistake we didn't set up our default dictionary which is what should be here therefore we're not going to get the repeats of this then we're going to chuck in our key and append the values and then we've got to do an alif statement which is taking in the closing price which i originally did the open and closed price for each day and let's chuck that in i don't even need this so let's take our key and append our values well okay sounds good so we have the closing price for each day for each one of our stocks and then we have the time let's do this into a data frame okay cool so we got the formatting that they wanted for stack overflow let's call this df3 real quick and chuck this in we need to rename the column for the time or the t we're just going to call it date i guess so do that now we need to do some time stamp information to convert everything so let's look at this piece by piece real quick since we're dealing with that time stamp which is just those jumble looking values what there we are we move on to this and converted each entry into date formatting so we could actually interpret this as humans so we iterate through this i just convert it into values do a range length so we go row wise throw that in here with the column and just convert it from time stamp into the daytime different before there we are then we chuck in the column of date we print everything off and we're done now to solve the problem for that stack overflow question and the viewer who needed a little bit of help as always feel free to hit me up here on instagram and twitter if you'd like to help support the channel here's my patreon or consider to buy me a coffee but i hope this brought utility to someone feel free to reach out if you have any questions or suggestions i hope this brought utility to someone see in the next video bye [Music]
Info
Channel: Mr Fugu Data Science
Views: 2,395
Rating: 4.9166665 out of 5
Keywords: how to parse json, how to parse nested json, json to dataframe, mr fugu, Mr Fugu, Mr Fugu Data Science, mr fugu data science
Id: KZbU-edZ8_w
Channel Id: undefined
Length: 9min 18sec (558 seconds)
Published: Fri Jan 15 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.