Apache Nifi | Aviation Data Flow | RESTful API with InvokeHTTP Processor | Part 1

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
i'm stephen and welcome back to apache nypi so today we're going to start a new knife flow and let's go ahead and get started with that shall we so i want to create a process group and we're gonna call this one aviation data okay so that's our new flow or that's our process group name create that and let's go ahead and dive into it now from here we're gonna collect some aviation data and we're gonna get it from an api and that's how we're gonna we're that's how we're gonna start collecting that data so i came across one uh site known as aviationstack.com and here they have api services available for a rest api where we can consume data from diff uh from their endpoints and get some information which will be great for what we're trying to do and the nice part about it is you can start under the free area uh so you don't have to pay anything 500 requests per day not a problem and it gives us defoivation data and real-time flights which is the one we'll be playing with right now so that works for me i created an account got that all set up got my uh api key and i'm good to go so let's jump back over to nifi here and let's start the first thing we want to do is we're going to use our the main processor we're going to use is the invoke http so we have that guy now excuse me this guy this processor is we can trigger it with another processor that's how we that's how we want to use it so and the reason is because we need well we're going to use another processor to generate a flow file so we'll get the generate flow file out there and let's take a look at the text that we're going to need so let me grab that so to use the api let's go back over here real quick go to documentation over here we see an example of getting started right here's the endpoint although for free users you can only use the http version not the https and the first thing we need to do is tell what endpoint we want to get information from this in this case we want real-time flights so that is the flight endpoint and then we have to provide an access key with our key that's been generated and we can reset those every time too so that's what we're going to do and then there's a couple of functions that we can for parameters that we can add for different information that we want so if we go back up here real-time flights we can see under the api endpoint for real-time flights besides providing the basic information we also have a limit that we can do now we can only do a hundred total per request and get a hundred results back uh because we're doing the free version offset lets it know where you wanna start uh what we're going to bring in is the active flights only not the land that's scheduled we want to get active ones because there's some additional data we want to be able to get out of that when it's available and you can see other things in here too that you can add like uh do you want you want to filter to uh results by departure city or the arrival location so in this case the arrival departure city arrival city uh and a whole bunch of other options as well right and then i think another one we're going to use in here let's see we are going to also actually i think we're just going to use limit offset oh we don't need offset uh flight date we don't need because it's we don't get that as an option uh i've already played around with it we don't get the option and it provides today's anyways when we do active because they're active flights and i think we're going to be okay if i change my mind we'll see later so you can see example down here of your result that comes back so you have the first part here that tells you there's a total of the total count of records available uh how many are being sent to you so you're limited to 100 so you can get 100 out of here many of the data portion which has blocks of information on every flight so this part is the first flight here starts here we have the arrival data the airline data are the departure data the arrival data the airline data the flight the aircraft and if it has live data available then you have that and then you get the next record that moves on to the next one so let's go ahead and get to setting this up in iphone first thing we're going to do is let's go ahead and set up the input the invoke http and what we want to do here is give ourselves maybe a friendlier name so we're going to say get flights from api oops api deviation uh stack so i just know where it's coming from and scheduling we want to do this so 500 times i'm gonna do about one pull every three minutes that should be enough to do to not over saturate how many we get per day so we won't use them all up right away and we'll get a steady stream every three minutes and that'll be results of 100 because we're going to get all the results we can we're going to use the get http method the url we're going to do is now one way we could do the url is to let's see here if we go remote url we could just hard code it in here and we could say something like we want this guy and this one we know we can't use the s so let's get the rest of it flights is the end point we're looking for get that question mark in there access key canvas space there we go equals and then our access key in this case that would be this guy right here which was provided to us and i'll reset this later so it gives me a new one and we could say from here the next part we would do would be flight status right so and flight underscore status equals active uh and then just because and limit equals 100 which is the max we can do for ourselves and i think that's all we're going to really use we just want active flights from around the world because time of day determines how many flights in each country there are as well so we'll do that now we could have hardcore this way and it would work right so let's go ahead and save that apply it we don't have any other settings we want to change in there and we yep good there let's just grab a processor to dump this to real quick so we can connect oops connect it up and the response go in there you know what we don't want anything else right now we're not going to handle for anything else other than a response at this point and we can run it and we got a response back here so what's the queue take a look at this and did we do it right and yes we did so we see we have just like the example showed us right we have the information there's total four 219 records for the active endpoint and here's our results down below so our first flight starts here and in somewhere right here now one thing we notice right away is apparently not every flight has live information live data right so let's go down to the next one this one doesn't either and no some of these do every once well so i'm not sure the reason for that but as long as some of them do be enough to play with later and do some extra stuff with this uh and we'll see it'll be easier to look at later to figure out what's going on so let's go ahead and close it so we know that works but maybe i don't maybe i don't want this to be a hardcoded like this maybe i want some of these properties to be filled in by when i have so i could start off uh maybe i was looking at flights right they they do have an option in here for looking up flight information as well or by country code or by airports and stuff right to get information from those and maybe you wanted to pass maybe you had a list of airports you cared about and you want to collect data from those airports right for maybe flights going into those airports or whatever uh using the active endpoint we could do that and we could just what we want to do in this case so and i'll show you how we're going to do it so the generate full file be our example so we'll go ahead and generate flow file or edit the options in it and what we're going to do is add a couple changes to it so custom text will leave alone what we want to do in this case is create ourselves a couple attributes for the flow file first thing we want is the access key because we know we need that so let's go ahead and create one called access key make sure we have our access key as the value that one's in there and we know we want to see flight status as the other option so we'll put that there and flight status should be active and just because we did it earlier let's go ahead and add limit okay so we're going to bring these three we could add more we could add one for every single uh parameter value that was available but it doesn't mean we have to use all of them so now how are we gonna use this well this is this is what we're gonna use to say every three minutes give me a new or create a new flow file and this is how we'll control for right now when we make requests into this guy so in this case we want to change him back to zero sec so he performs and because we created it we're creating a relationship an input relationship now that means this will only trigger the invoke once it receives some from the relationship and that something is a flow file so let's go ahead and do that and because i want to save some space here so i can compact things a little more i'm just going to put the queue over there on the side let's go ahead and get rid of this last one in the queue and let's make our changes here to this so inside the invoke we're going to change our url and now we're going to utilize the nifi expression language and say give us access key and that's what we want you to populate it from so the incoming flow file will have an attribute called access key and we want you to cover that and use that to populate the value right there same thing with flight status and there we go there and the last one is going to be the limit okay so one thing one advantage here is much smaller string that we have to deal with now and all these are dynamically populated so depending on what's inside of the flow file is what we can use this with now a good example of why this can be really helpful i mean i do this at work too is i have apis and we're using a websec for these apis so i get i have a token bearer that i have to provide so in this case i would my access key would be could be my token my bearer token that i to provide pass on to it but the way our token might work is maybe every 120 minutes every 30 minutes or whatever that timeline is maybe i have to refresh and retrieve a new token but i could use the same token during that window of 120 minutes so i want to use the same one i don't want to keep wasting over i don't want to create overhead by hitting or requesting a new token for every single request especially if i'm doing thousands or hundreds of thousands of requests into an api uh where getting a new token would just put additional pressure on that website server service where i have to get the new token from so what i want to do is maximize the token and use it for its entire life cycle so that's where having a dynamic uh part to my string here makes it easier to manage that type of stuff and maybe i have a stuff like values that i want to be able to change as well so in the example i was talking about just now i could have maybe i had a select statement up here that selects the that retrieves the token from a database that i maybe temporarily stored in and i use that to manage it and then i add it to my flow file and then i pass it on as i get down to the evoke part so that would be example why it's really nice to be able to create a dynamic link like this where we can go ahead and pass things on so let's go ahead and see if it works let's go ahead and create our first one we see it got created if we take a look at it we can check the attributes here we do see flight status is there access key is there so we're good to go now we should be able to pass it into the invoke the invoke should run and when it's done it's going to refresh that we should get a response we did get a response to see if it's a favorable one and there we go we got the exact same response we did before except now we got fresher data because we just requested it and it's from active flights and that was very cool that gives an example of how to use the invoke http now if we were using https it would be the same thing right we're going to populate it with the s on the end that will use a secure method and communicate securely to the other server or to the api and it would just work we don't have to do anything special in here and we're taking care of things we don't need any other options in there right now and let's see we still let's go ahead and end it here we got quite a bit done right there with the invoke http i have been uh just real quick uh i've been taking a look at the comments in the videos definitely if you have questions or is there something you want to try to see that i can i should be able to squeeze in some stuff into uh these videos i had to use different uh processors that i haven't gotten to yet or that i don't that i didn't have planned to get to yet so definitely feel free to utilize comments there and i'll see what i can do about adding your questions into the flows on how to utilize that stuff so our next video will cover a couple different ways to move this data around what we're going to do if they clean it up a little bit uh using some familiar processors and some new ways of doing it too and process that answer a couple questions i'll catch you next time
Info
Channel: Steven Koon
Views: 13,635
Rating: undefined out of 5
Keywords: apache nifi, steven koon, apache nifi examples, data streaming, data flow processing, learn nifi, nifi training, nifi training videos, nifi training online, nifi training courses, nifi for beginners, learn nifi in a day, big data, data science, data scientist, dataflow, etl, apache kafka, streaming, data, data engineer, datascience, bigdata, kafka, spark, data pipeline, data pipelines, data engineering, Kinesis, nosql, aws, azure, api, data analytics, database, rest api, rest client
Id: SZhX_gce63E
Channel Id: undefined
Length: 16min 31sec (991 seconds)
Published: Wed Jul 22 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.