Module 1 : Apache NiFi

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
and welcome to hadith exam learning resources in this session we will be starting a new training that is knife fight also known as Niagara fights or if you go for Hortonworks then it is known as hdf Hortonworks dataflow indian okay so we'll go into this like and we will also try to install on the hot and rocks plate form as well as independently okay and these are the concepts which we will be initially discussing to get the introduce about this so before like starting let me show you this is the UI for the knife I okay where we will be creating the data flow okay so let's introduce ourself first with data flow what exactly the data flow I'll not go through the content here like this is the whole content I will be going to covering okay so data flow engine is actually what II it's all about the data processing okay one is a producer another is a consumer side so when your data is produced okay so and this data needs to be transformed any other form or whatever you want to like from CS like let's say you are grading some log files and you want to create the CSV file or you go on to create the JSON files or you want to create a XML file so this kind of transformation you will be doing and this you would be enriching this data if something is missing you can join with another data set and create and enrich the data if there is any issue with the data you will filter it out error and send it to the error log kind of thing if you want to do the data is correct and everything is fine you do apply your calculation and once the calculation is done you will submit in the repository and from the repository it can be further used for the analytical purpose is the graph we are creating or here kind of thing correct so that is what actually the in general the data flow there are various other steps can be involved but it just to give you a like highlight what exactly is the data flow and how things works actually so similar kind of thing we can create into the knife I this kind of workflow okay so this is completely different workflow but like just to show you this is the how the workflow can be looked into the knife we'll see like knife for installation and creating the workflow and everything in advanced models like as we move ahead okay so the producer system generates the data before consuming data various steps are followed it is not new requirement correctly in the industry if you already know the business processing management BPM and workflow engine or integration tool from the Java g2 I said if you have worked upon and you know this is at the very basic requirement and ETL and kind of things you would be doing this thing correct so like why this new system is required knife I for this like kind of existing solution are present and then why me you have to go for knife I so there are some reasons correct every product is developed because some existing product either not supporting everything or because of competition competitive reason but here the clear-cut reason the new requirements which are coming in industry like Internet of Things millions of devices are coming and dead millions of devices are sending the data and it will be sent to the one centralized location I will tell you in there some little time mutually what i how i OT would be like our problems can be solved with the knife i okay internet of anything okay this is like other than sensors there are anything so other term for the similar things actually use volume of data obviously for that hadoop is coming up and like Hadoop is already a decade all the technical old technology and very like keep maturing every day new components are coming up with the around the Hadoop framework real-time analysis there're actually existing products are available to do the real-time analytics data provenance okay this is kind of terms you might be hollis inning first time so that could be like how the data generated and are auditing about the data okay so you want to replay the data everything you want to do this so this is inbuilt out-of-the-box features are provided by the knife I so that's the reason and guaranteed delivery so this is most of the system obviously provides a guaranteed delivery but the problem is like the most of the system might be providing this all the things actually but how complex is the configuration of what you have to do this kind of things so this is all solution or out of box provided by the knife I and it's way easy to configure answer so that's the reason the knife is becoming popular okay so what exactly is a knife a knife is a construction engine for the data flow as I've shown you here you can create as complex workflow as you want it okay so this is like very simple workflow with the three processor and two connections here but I'll talk to about processor and connection in later on okay so it is a more powerful and take care of more requirements which we have discussed some of it can be deployed on cluster environment as well okay so like most of the integration tool or business process management tool or workflow engines are like a single instance or something like knife I it can be used in the clustered environment so having in cluster environment we have a lot of benefit correct you know like this is like Fino or so and more data processing parallel processing is high kind of thing so various other advantages of using the knife or in the clustered environment okay so common data flow challenges what challenges you face when you are working with the data flow engine system failures no system is filled flow hey there there are various places where the system can fail network failure disk failure software failure human error so this has to be taken care in to the system okay so now if I provide support for handling this kind of however you have to make sure like the how to use these features like this failure and kind of thing so now if I is can be deployed in the cluster environment to avoid any disk failure software failure or network failure any kind of error or some other human error is there then you can record it what human error is then you can replay the things which you without like correcting the human error and kind of things so this all problems can be solved in the knife way okay the producer is faster than consumer now let's say another problem now you producer is producing the data in the high value and the consumer is consuming the data but the consumer is very slow compared to the producer producer is like pushing the data very hard and so what is the cause out of this like you would lose the data out of the X but now if I provides the connection buffer in between so if the data is not being processed by the consumers and it is consumer is in slow then this is a connection buffers are provided which can save your messages and can save the loss of the data in this case possible data loss if you don't able to provide so so this is the out of box functionality provided by the knife I so you don't have to explicitly configure anything this is out of the box available boundary condition sometimes happens like you to Big Data you are receiving too small data too fast too slow wrong format of data so these are all common problems in the data flow engine and this is taken care into the knife I as well data priority changes now let's say assume it like you were previously producing the data in the leg your g2e servers correct or the website is deployed and it is ecommerce website let's estimate so initially you were already interested in the error locks or warning locks like the if there is any error is coming or there is any warning then as a developer I will take care of this locks and and look into the locks and could solve the problem kind of so this was my previous required but nowadays what is happening like whatever the locks are being generated needs to be processed by the analytics team because analytics team wants to know like what the user is accessing which PC user is accessing we in which product it is a like interest he's interested in so he want to do some analysis and kind of ticket or recommend the products to the end users kind of thing so various other analytics can be done so for that purpose the no requirement has change no priority has changed privy ously you are only interested in the error logs and warning locks but now they are saying you are also interested like analytics team is also interested in in foldable log or debug level look press log not debug actually trace and in full evil log so so so how do you get it so your priority changes you need to have this log as well in the path actually so that is what the priority changes so it's just a small change in the configuration and you can get the all the entire this log level or different lock levers okay so so that is all out of the box supported by the knife ID system evaluation previously there was no tomb so many sensors are sending the data kind of thing only few devices are there and you are like more purpose was like website hosting like Enterprise Solution and kind of thing but now the requirement is changing in the more industry completely and entire industry is changing like kind of thing in the like processing the data which is coming from the edge location age location with sensors ok let me give you an example over here by the diagram so these are the one of the logistic company which has a thousands of trucks or more than thousands of trucks and each truck is running and they they are delivering the logistic whatever is required correct no no no each truck is like having telemetric senses and salt install on the each truck okay so this telemetric senses what it does is actually it keeps sending the signal to the or mobile tower and mobile tower to the like it could be under your data centers is can be received so this is one of the example like where the requirement is changing correct now the each truck is like it's not just giving an example of thousand trucks there could be more devices such kind of thing a refrigerator in how your house actually correct keep sending the signal to the mobile tower and it goes to the data center okay so this kind of analytics would be done on the data like if any brake failure in any trucks is happening any kind of heat is increasing or kind of anything can happen with this so this is the age location these are age location this is known as where the sensors are installed protocols are not supported to you like they're like the devices sensors which are supporting only C++ kind of language or UDP so data only kind of thing so no so this is not supported how this it would be supported so now if I provides the solution for this kind of thing they have created another project up up on the based on knife i that is known as mi knife i'm minify you can say minify okay mi ni fi okay so this is for the purpose of this age location and kind of thing we'll discuss about this later on or more detail okay so less this just I am giving you the overview of about the knife i from the like 5,000 feet above you can say like what is the purpose of knife I you can understand so massive devices like huge volume of devices and millions of devices are possible and signals are coming from the data is coming from the millions of devices so these were not previously whatever the India tools you would be using for the work flow and data flow engine would not be supporting this kind of functionality that's the reason knife I was involved they introduced okay so a knife is also not new technology actually it is adopted by the Hortonworks and made open-source kind of thing but it was existing technology and it is privately used by the National Security Agency actually okay and we will talk about the history as well so this is the outer list of the sensor data would be sent okay compliance and security issues now like if you are processing data from the various geography it's not only about your country or something correctly you have your data centers which is receiving data from the various location from the globally like one is data center in China one data center in India one data center is in u.s. one is in Europe kind of thing every data is being sent into the one centralized location correct but every country has their own rules and regulation their own compliance level you can select China will say I won't allow this data to be sent out sort of news for this country India is like a more open for that to innovation and kind of thing they are okay they are saying like you can unsend it the data for the further analysis kind of thing or they might want to restrict it this kind of data should not go out one kind of thing and from some various side to side data would be going and kind of things so every country has their own compliance kind of thing so you should have a different configuration for the each data flow in each location all right so this is very easily can be configured with the knife fight okay so new laws are keep coming agreement between businesses changes system interfaces keep changing previously it was supporting UDP now they are saying we are supporting HTTP so you have to change it kind of so this is all things can be supported with the this what would say in the knife and it's even it is not supported then if you need to change the configuration then it is not very difficult task to change the configuration so that is all and data provenance required like this is auditing of the data and kind of thing how did I generated and everything so this is all no limited resources what I mean the limited resource a very low bandwidth in the remote X is intermittent bandwidth let's take this example this trucks are running okay in the globally not globally it's in the country or in a city or something correct and you might see like sometimes the bandwidth are not there and intermittent signal is there so because then you know like this is not everywhere the bandwidth is not very easily available nothing correct so also intermittent detail needs to be able to send and supported kind of things so this is very well supported and this is what the dataflow also face the challenges and it does not come continuously receiving so this kind of problem is also solved and can be handled using the knife way so this is all about them data flow challenges so what knife is knife as previously known as Nagre files okay it is a simple event processing data flow engine as we have discussed this is like what your data would be generated from here from this to say if there is any error then would be sent to this if the reason is for data is successful to process then it would be sent to this direction so this kind of thing it automates the movement of the data between different system it's not about just processing and I think there's some small entity and processing the data it can connect with the different systems ok so you can create the different processor or adapter you can say like to support your like a data from the various other hardware's to be getting the data into the system ok handle all the situation and manipulate mentioned above like whatever the problems we have discussed about the data flow that can be handled in the knife or entire data flow you can design on this just single web UI that is known as a canvas ok so this is the canvas on which you would be designing your s complex looks as you want it ok so this is very simple UI to look into this and once you get acquainted with this then it would feel more like comfortable you would behaving keep in mind I Phi is not a replacement of message bus like Kafka ok we'll discuss about this on more detail about while comparing the coffe kind everything now if I can be Kafka knife I can use Kafka as a producer or a consumer side ok as a source and sink or something so so that's all about an i-5 and data flow okay so like I'll discuss about this knife I and hdf and everything in the next session so thanks thanks for watching and I hope you liked this session this is the first session regarding the knife way and we will be as moving ahead or will be there going in-depth about the knife I will be create the simple workflow then complex workflow and then deploy in the cluster mode and kind and we also try to deploy into the Hortonworks platform this knife I okay and various other things we will be keep doing actually okay so so please follow up with all the training videos and we are keep adding new videos so thanks thanks for watching and I hope you are watching on the YouTube then please subscribe don't forget to subscribe on the YouTube as well because we are keep launching the free sessions as yeah and if you are like I want to subscribe and get the email alert well like anything update happens from the hadoop exam site you will be receiving the updates so just click on the subscribe button and subscribe over here and send your email I did or use an email id and name into the form and we will send you the updates regularly so these are other products from the how to exam dot-com please keep visiting we have keep updating every day kind of thing okay so now it would be helpful for your career development thank you
Info
Channel: HadoopExam Learning Resources
Views: 10,089
Rating: undefined out of 5
Keywords: NiFi, hortonworks Data Flow, NiFi Training, HDPCA, NiFi Certification
Id: k-PbR0VJ6do
Channel Id: undefined
Length: 16min 14sec (974 seconds)
Published: Wed Dec 20 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.