Welcome everybody,
you're watching mr fugu data science. We're parsing json data today using
one of the New York times api's. We're making a connection and using two
examples today. The first will be a blind extraction of
all the raw data throwing it into data frame. The second
we will strategically parse these data and take
specific key value pairs. You only need to import
pandas json and request today. If you do not have
or have not used this api feel free to go to this website here and
it'll give you a brief rundown and it's extremely easy to set
up. Probably the easiest api you could start with for your basics. Now,
we're going to use the most popular api today.
Each api will have a similar call such as this. Where you have
the first portion which is the website api
and then what api you're deciding to use, what parameter of the api you would like
to use such as viewed, emailed or shared. And then a time
frame that you have of interest. I think this is one day
we're taking it as json data and you're throwing in the key.
Your specific key; think of it as your password you don't want some pirate
stealing your password right! You throw in your
password. Zoom out a little bit. You call in what
you would like for your api with this web page throw in that
password of yours. We're going to make a request for a get
call with this url of yours and i'll put it as json.
And here's what it looks like. I took a list of this request call to see what the outermost keys are. Then let's see what this looks like. Let's
just store this as some dummy data frame. Here we go. Let's zoom out a
little bit, once again. So here's all of our information blindly
took everything. We scroll over and we see we have some
string list of strings, we have a list of dictionaries.
We're going to take this so we can work some nested data.
And of interest i'm taking anything that has string data
that we can use for a little project later.
In that media portion that was nested we will ultimately would like to get the
captions: which would be our strings. Here's all of
the strings that we have that are of interest today. We save that.
We store under this variable name of popular articles
our connection. I would like to show you what the media looks like. Some of them
are absent. The ones that have data
will have this list of dictionaries. Okay, and not all of them have captions.
Now, let's start parsing through this. But, this is going to be very lengthy:
we got a lot going on here. We need to basically iterate and do a
comparison for each one of our keys to see if it's
actually in our data and append it to the list of values
if it is. Some of this is going to be cut and paste
some of it's going to be doing else statements just depends on
what we're dealing with for our data for instance.
So let's do a cut and paste for two of these.
Why did this happen? Okay, second one we're gonna do is the
publish date and then after the publish date we're
going to do the adx keywords. Do a little cut and pasting of this
monotony so not fun. And if you mess up screws up
everything else you're doing. Let's get our by line and this will be
our first else statement that we have. Because, in this circumstance it's not
always there. So we'll just take care of the byline by
saying if it's not there just append a none value. So we're a little halfway we need to get
our title next. So let's get our fun finding dandy title and it's fun to type all this out.
So what do we have left? We need the abstract as wel.l
Take that abstract. Wonderful now, we're getting a lot closer.
We've got two things that are going to need
some else statements. So it's this des facet and then there'll be the per
facet. Which sounds kind of obscure but you'll see in a second. If you're doing
an else statement just cut and paste this.
Now we'll take this cut and paste it for the per facet. And then we're off to the races. Because,
the last thing we have which is totally different than anything else
is the media. And the media is handled like this. We append
h. If you notice from the top. And if there's no media; just the null
value as usual. Let's go back and look and see if we
did everything correct and no overlapping. So the sources look good
publish date, adx keywords, by line title, abstract. Okay, looks like we might
be in the clear. We'll find out at the end.
We'll find out now. Find out now, all of these have the same
problem. What a delight, okay.
That looks good. Now, What do we do with this hunk of junk?
Let's look at it. So as you see the media: sometimes they have something. Sometimes
they don't. If they have something it's a list of
dictionaries. Inside that list of dictionaries we want something called
the caption. If we're dealing with an empty list
we're going to append that and say none else just append
our trusty old iterator. See what this looks like.
Uh oh that didn't help. And why is that? Because, we need to
further iterate inside of this and take the caption.
So if we have a caption: then let's go ahead and iterate that bad boy.
And what happens if we don't have it. Same as
usual we just append a big old: uh uh big old none. And there we are.
But, we have this empty string we need to take care of in the next step.
Now, let's take care of that in this step
just like usual. Just fill it all in. But, this time we're gonna fill it on
into the media dictionary, okay. Oops
none. Getting ahead of myself. Getting excited. So mr media friend. Now we can take in our good old iterator,
take in a little iterator. See what it looks like. There we are.
It's looking pretty good. Let's check that length. So i could verify.
Uh i'll do that in the next step. I did it twice.. And there we go. Be careful when you're
running this you run it again and for some reason
it doubles. And then the next thing you know this tells you
you're out of your parameter specs. It's really annoying and
you have to re-clear and do everything once again. Reload the page once again.
So here we are. Let's check out our good old media
column which originally was a nested list of
dictionaries that we parsed through using the api's data directly. And
then sending that to a data frame. We have all of our strings and what can
we do with this later for a little project: sentiment analysis. Ooh this look wonderful isn't it. So i'd
like to say thank you for watching this video. Please like, share and subscribe
and if you subscribe. and turn on that notification bell. Thanks a lot. See you next video Bye