HOW TO PARSE JSON FROM AN API: USING PYTHON

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Welcome everybody, you're watching mr fugu data science. We're parsing json data today using one of the New York times api's. We're making a connection and using two examples today. The first will be a blind extraction of all the raw data throwing it into data frame. The second we will strategically parse these data and take specific key value pairs. You only need to import pandas json and request today. If you do not have or have not used this api feel free to go to this website here and it'll give you a brief rundown and it's extremely easy to set up. Probably the easiest api you could start with for your basics. Now, we're going to use the most popular api today. Each api will have a similar call such as this. Where you have the first portion which is the website api and then what api you're deciding to use, what parameter of the api you would like to use such as viewed, emailed or shared. And then a time frame that you have of interest. I think this is one day we're taking it as json data and you're throwing in the key. Your specific key; think of it as your password you don't want some pirate stealing your password right! You throw in your password. Zoom out a little bit. You call in what you would like for your api with this web page throw in that password of yours. We're going to make a request for a get call with this url of yours and i'll put it as json. And here's what it looks like. I took a list of this request call to see what the outermost keys are. Then let's see what this looks like. Let's just store this as some dummy data frame. Here we go. Let's zoom out a little bit, once again. So here's all of our information blindly took everything. We scroll over and we see we have some string list of strings, we have a list of dictionaries. We're going to take this so we can work some nested data. And of interest i'm taking anything that has string data that we can use for a little project later. In that media portion that was nested we will ultimately would like to get the captions: which would be our strings. Here's all of the strings that we have that are of interest today. We save that. We store under this variable name of popular articles our connection. I would like to show you what the media looks like. Some of them are absent. The ones that have data will have this list of dictionaries. Okay, and not all of them have captions. Now, let's start parsing through this. But, this is going to be very lengthy: we got a lot going on here. We need to basically iterate and do a comparison for each one of our keys to see if it's actually in our data and append it to the list of values if it is. Some of this is going to be cut and paste some of it's going to be doing else statements just depends on what we're dealing with for our data for instance. So let's do a cut and paste for two of these. Why did this happen? Okay, second one we're gonna do is the publish date and then after the publish date we're going to do the adx keywords. Do a little cut and pasting of this monotony so not fun. And if you mess up screws up everything else you're doing. Let's get our by line and this will be our first else statement that we have. Because, in this circumstance it's not always there. So we'll just take care of the byline by saying if it's not there just append a none value. So we're a little halfway we need to get our title next. So let's get our fun finding dandy title and it's fun to type all this out. So what do we have left? We need the abstract as wel.l Take that abstract. Wonderful now, we're getting a lot closer. We've got two things that are going to need some else statements. So it's this des facet and then there'll be the per facet. Which sounds kind of obscure but you'll see in a second. If you're doing an else statement just cut and paste this. Now we'll take this cut and paste it for the per facet. And then we're off to the races. Because, the last thing we have which is totally different than anything else is the media. And the media is handled like this. We append h. If you notice from the top. And if there's no media; just the null value as usual. Let's go back and look and see if we did everything correct and no overlapping. So the sources look good publish date, adx keywords, by line title, abstract. Okay, looks like we might be in the clear. We'll find out at the end. We'll find out now. Find out now, all of these have the same problem. What a delight, okay. That looks good. Now, What do we do with this hunk of junk? Let's look at it. So as you see the media: sometimes they have something. Sometimes they don't. If they have something it's a list of dictionaries. Inside that list of dictionaries we want something called the caption. If we're dealing with an empty list we're going to append that and say none else just append our trusty old iterator. See what this looks like. Uh oh that didn't help. And why is that? Because, we need to further iterate inside of this and take the caption. So if we have a caption: then let's go ahead and iterate that bad boy. And what happens if we don't have it. Same as usual we just append a big old: uh uh big old none. And there we are. But, we have this empty string we need to take care of in the next step. Now, let's take care of that in this step just like usual. Just fill it all in. But, this time we're gonna fill it on into the media dictionary, okay. Oops none. Getting ahead of myself. Getting excited. So mr media friend. Now we can take in our good old iterator, take in a little iterator. See what it looks like. There we are. It's looking pretty good. Let's check that length. So i could verify. Uh i'll do that in the next step. I did it twice.. And there we go. Be careful when you're running this you run it again and for some reason it doubles. And then the next thing you know this tells you you're out of your parameter specs. It's really annoying and you have to re-clear and do everything once again. Reload the page once again. So here we are. Let's check out our good old media column which originally was a nested list of dictionaries that we parsed through using the api's data directly. And then sending that to a data frame. We have all of our strings and what can we do with this later for a little project: sentiment analysis. Ooh this look wonderful isn't it. So i'd like to say thank you for watching this video. Please like, share and subscribe and if you subscribe. and turn on that notification bell. Thanks a lot. See you next video Bye
Info
Channel: Mr Fugu Data Science
Views: 16,983
Rating: undefined out of 5
Keywords: mr fugu, mr fugu data science, Mr Fugu, Mr Fugu Data Science, parse json, parsing json, Python API, python api, how to parse json, how to python api
Id: iaN1FxjBuGk
Channel Id: undefined
Length: 8min 47sec (527 seconds)
Published: Fri Sep 04 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.