Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Training | Edureka

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hello everyone and welcome to this interesting session on text mining and NLP so before moving forward let's have a quick look at the agenda for today's session I'll start off by explaining the importance of language and its evolution then we'll understand what is text mining and moving forward we'll see how text mining and NLP are connected now NLP here stands for natural language processing and moving forward we'll see the various application of NLP in the industry and the different components of NLP along with the demo and finally in this video with an end-to-end demo where we will use NLP along with machine learning so let's get started now the success of human race is because of the ability to communicate and share information now that is where the concept of language comes in however many such standards came up resulting in many such language with each language having its own set of basic shapes called alphabets and the combination of alphabets resulted in verse and the combination of these words arranged meaningfully resulted in the formation of a sentence now each language has a set of rules that is used while developing these sentences and these set of rules are also known as grammar not coming to today's world that is the 21st century according to the industry estimates only 21 percent of the available data is present in the structured format data is being generated as we speak as we tweet as we send messages on whatsapp Facebook Instagram or through text messages and the majority of this data exists in the textual form which is highly unstructured in nature now in order to produce significant and actionable insights from the text data it is important to get acquainted with the techniques of text analysis so let's understand what is text analysis or text mining now it is the process of deriving meaningful information from natural language text and text mining usually involves the process of structuring the input text deriving patterns within the structured data and finally evaluating the interpreted output compared with the kind of data stored in database text is unstructured amorphous and difficult to deal with algorithmically nevertheless in the modern culture text is the most common vehicle for the formal exchange of information now as text mining refers to the process of arriving high-quality information from text the overall goal here is to turn the text into data for analysis and this is done by the application of NLP or natural language processing so let's understand what is natural language processing so NLP refers to the artificial intelligence method of communicating with an intelligent system using natural language by utilizing NLP and its components one can organize the massive chunks of textual data perform numerous or automated tasks and solve a wide range of problems such as automatic summarization machine translation named entity recognition speech recognition and topic segmentation so let's understand the basic structure of an NLP application considering the chat pod here as an example we can see first we have the NLP layer which is connected with the knowledge base and the data storage now the knowledge base is where we have the source content that is we have all the chat logs which contain a large history of all the chats which are used to train the particular algorithm and again we have the data storage where we have the interaction history and the analytics of that interaction which in turn helps the NLP layer to generate the meaningful output so now if we have a look at the various applications of NLP first of all we have sentimental analysis now this is a field where NLP is used heavily we have speech recognition now here we are also talking about the voice assistance like google assistant cortana and the siri now next we have the implementation of chat pod as I discussed earlier just now now you might have used the customer care chat services of any app it also uses NLP to process the data entered and provide the response based on the input now machine translation is also another use case of natural language processing now considering the most common example here would be the Google Translate it uses NLP and translates the data from one language to another and that - in real-time now other applications of NLP includes spell checking then we have the keyword search which is also a big field where NLP is used extracting information from any particular website or any particular document is also a use case of NLP and one of the coolest application of NLP is advertisement matching now here what we mean is basically a recommendation of the ads based on your history now NLP is divided into two major components that is the natural language understanding which is also known as NLU and we have the natural language generation which is also known as an LLC the understanding involves tasks like mapping the given input into natural language into useful representations analyzing different aspects of the language whereas natural language generation and it is the process of producing the meaningful phrases add sentence in the form of natural language it involves text planning sentence planning and text realization now any new is usually considered harder than analogy now you might be thinking that even a small child can understand a language so let's see what are the difficulties our machine faces while understanding any particular language now understanding a new language is very hard taking our English into consideration there are a lot of ambiguity and that too in different levels we have lexical ambiguity syntactical ambiguity and referential ambiguity so lexical ambiguity is the presence of two or more possible meanings within a single world it is also sometimes referred to as semantic ambiguity for example let's consider these sentences and let's focus on the italicized words she is looking for a match so what do you infer by the word match is it that she's looking for a partner or is it that she is looking for a match be it a cricket match or a rugby match now the second sentence here we have the fisherman went to the bank this is the bank where we go to collect our checks and money or is it the river bank we are talking about here sometimes it is obvious that we are talking about the river bank but it might be true that he's actually going to a bank to withdraw some money you never know now coming to the second type of ambiguity which is to syntactical ambiguity in English drama this syntactical ambiguity is the presence of two or more possible meanings within a single sentence or a sequence of words it is also called structural ambiguity or grammatical ambiguity taking these sentences into consideration we can clearly see what are the ambiguities faced the chicken is ready to eat so here what do infer is the chicken ready to eat his food or is the chicken ready for us to eat similarly we have the sentence like visiting relatives can be boring are the relatives pouring or when we are visiting the relative it is very boring you never know coming to the final ambiguity which is a referential ambiguity now this ambiguity arises when we are referring to something using pronouns the boy told his father the theft he was very upset now I'm leaving this up to you you tell me what does he stand for here who is he is it the boy is it the father or is it the thief so coming back to NLP firstly we need to install the NLT key library that is the natural language toolkit it is the leading platform for building Python programs to work with human language data and it also provides easy-to-use interfaces - or 15 core for our analytical resources we can use it to perform functions like classification tokenization stemming tagging and much more now once you install the analytical library you will see an analytic a downloader it is a pop-up window it will come up and in that you have to select the all option and press the download button it will download all the required files the corpora the models and all the different packages which are available in the NL ck now when we process text there are a few terminologies that we need to understand now the first one is tokenization so a tokenization is a process of breaking strings into tokens which in turn are small structures or units that can be used what organization now tokenization involves three steps which is breaking a complex sentence into words understanding the importance of each words with respect to the sentence and finally produce a structural description on an input sentence so if we have a look at the example here considering this sentence tokenization is the first step in NLP now when we divide it into tokens as you can see here we have one two three four five six and seven tokens here now nltc also allows you to tokenize phrases containing more than one word so let's go ahead and see how we can implement tokenization using NLP k so here I'm using Jupiter notebook to execute all my practicals and demo now you are free to use any sort of IDE which is supported by Python it's your choice so let me create an own notebook here let me rename as text mining and NLP first of all let us import all the necessary libraries were importing the OS NLCC and analytic a corpus so as you can see here we have various files we should present different types of words different types of functions we have samples or Twitter we have different sentimental wordnet we have product reviews we have movie reviews we have non breaking prefixes and many more files here now let's have a look at the Gutenberg file here and see what are all the fields which are present in the Gutenberg files so as you can see here inside this we have all the different types of text files we have Austen Emma we have the Shakespeare we have the Hamlet we have Moby Dick's we have the Carol Alice and many more now this is just one file we are talking about and nlgj provides a lot of files so let's consider a document of type string and understand the significance of its tokens so if we have a look at the elements of the Hamlet you can see it starts from the tragedy of Hamlet by William Shakespeare so if we have a look at the first 500 elements of this particular text file so as I was saying the tragedy of Hamlet by William Shakespeare 1599 act as Primus we can use a lot of these files for analysis and texts for understanding and analysis purposes and this is where NL ticket comes into picture and it helps a lot of programmers to learn about the different features and a different application of language processing so here I have created a paragraph beyond artificial intelligence so let me just execute it not this AI is of the strength type so it'll be easier for us to tokenize it nonetheless any of the files can be used to tokenize for simplicity Here I am taking a string file the next what we are going to do is import the word underscore tokenize under the NL ticket tokenize library now this will help us to tokenize all the words now run the word and the score tokenized function over the paragraph and assign it a name so here I'm considering a I and the score tokens and I'm using the wardens code tokenize function on it let's see what's the output of this en score tokens so as you can see here it has divided all the input which was provided here into the tokens now let's have a look at the number of tokens here we have here so in total we have 273 tokens now these tokens are a list of words and the special characters which are separated items of the list now in order to find the frequency of the distinct elements here in the given a a paragraph we are going to import the frequency distinct function which falls under analytic a tour probability so let's create a F test in which we have the function here frequentist and basically what we are doing here is finding the word count of all the words in the paragraph so as you can see here we have comma 30 times we have . nine times and we have accomplished one according one and so on we have computer five times now here we are also converting the tokens into lowercase so as to avoid the probability of considering a word with uppercase and lowercase as different now suppose we were to select the top 10 tokens with the highest frequency so here you can see that we have comma 30 times 3 13 times off 12 times and and 12 times whereas the meaningful words which are intelligence which is six times and intelligent 16 now there is another type of token answer which is the blank tokenizer now let's use the blank tokenizer over the same string to tokenize the paragraph with respect to the blank string now the output here is 9 now this 9 indicates how many paragraphs we have and what all paragraphs are separated by a new line although it might seem like a one paragraph it is not the original structure of the data remains intact now another important key term in tokenization ZAR by grams diagrams and graphs now what does this mean now piyah grams refers to tokens of two consecutive words known as a bragg rub similarly tokens of three consecutive written words are known as trigram and similarly we have n grounds for the n consecutive written words so let's go ahead and execute some demo based on background diagrams and n crumbs so first of all what we need to do is import diagrams trigrams and and grounds from and in sticky dot util now let's take a string here on which we'll use these functions so taking this string into consideration the best and the most beautiful thing in the world cannot be seen or even touched they must be felt with the heart so first what we are going to do is split the above sentence or the string into tokens so for that we are going to use the word and is called tokenize so as you can see here we have the tokens now let us now create the background of the list containing tokens so for that we are going to use the NLT Couture diagrams and pass all the tokens and since it is a list we are going to use the list function so as you can see on the output we have the best best and and most beautiful thing in the world so as you can see the tokens are in the form of two words it's in a pair form similarly if we want to do the trigrams and find out the trigrams what we need to do is just remove the buy grams and use the trigrams so as you can see we have tokens in the form of three words and if you want to use the engrams let me show you how it's done so for engrams what we need to do is define a particular number here so instead of n I am going to use let's say four so as you can see we have the output in the form of four tokens now once we have the tokens we need to make some changes to the tokens so for that we have stemming now stemming usually refers to normalizing words into its base form or the root form so if we have a look at the words here we have affectation effects affections affected affection and affecting so as you might have guessed the root word here is affect so one thing to keep in mind here is that the result may not be the root word always swimming algorithm works by cutting off the end or the beginning of the word taking into account a list of common prefixes and suffixes that can be found in fact at words now this indiscriminate cutting can be successful in some occasions but not always and this is why we affirm that this approach presents some limitations so let's go ahead and see how we can perform stemming on a particular given data set now there are quite a few types of stem so starting with the part of stem we need to import it from NLT kit of stem let's get the output of the word having and see what is the stemming of this world so as you can see we have have as the output now here we have defined words to stem which I'll give giving given and gave so let's use the porter stemmer and see what is the output of this particular stemming so you can see it has given give give and give and gave now we can see that the semi remote only the ing and replace it with an e now let's try to do it the same with another stammer called the Lancaster symbol you can see the stem of stem all the words as a result of it you can conclude that the Lancaster's demo is more aggressive than the part of stem not the use of each of these stammers depend on the type of task that you want to perform for example if you want to check how many times the words giv is used above you can use a Lancaster stammer and for other purposes you have the porter stemmer as well but there are a lot of steamers there is when snow Boston were also present where you need to specify the language which you are using and then use the snow ball steamer now as we discussed that stemming algorithm works by cutting off the end or the beginning of the void on the other hand limit ization takes into consideration the morphological analysis of the world now in order to do so it is necessary to have a detailed dictionary which the algorithm can look into to link the form back to its lemma no limit ization what it does is groups together different infected forms of a word which are called lemma it is somehow similar to stemming as it masked several words into a common root now one of the most important thing here to consider is that the output of limit ization is a proper would I like stemming in that case where we call the output as giv not giv is not any word it's just a stem now for example if a limit ization should work on go on going and went it all stems in to go because that is the root of all the three words here so let's go ahead and see how limitation work on the given input data now for that we are going to import the lemma tiser from NLT kay now we are also importing the word net here as I mentioned earlier that limit ization requires a detailed estimate because the output of it is a root word which is a particular given word it's not just any random word it is a proper word so to find that proper what it needs additionally so here we are providing the word LED dictionary and we are using the word net lemma Taizo so passing the word corpora into the word net limit Iser so can you guys tell me what is the output of this one I'll leave this up to you guys I won't execute the sentence let me remove this sentence here you guys tell me in the comments below what will be the output of the limitation of the word corpora and what will be the output of the stemming you guys execute that and let me know in the comment section below now let's take these words into consideration and give caming given and gave I see what is the output of the limit ization so as you can see here the limit Iser has kept the words as it is and this is because we haven't assigned any POS tags here and hence it has assumed all the words as nouns now you might be wondering what up us tags well I'll tell you what our previous tags later in this video so for just now let's keep it as simple as that is that POS tags usually tell us what exactly the given word is is it a noun is it a verb or is it different parts of speech basically pu s stands for parts of speech now do you know that there are several words in the English language such as i-84 above below which are very useful in the formation of sentence and without it the sentence wouldn't make any sense but these words do not provide any help in the natural language processing and this list of words are also known as stop words and l TK has its own list of stuff was and you can use the same by importing it from the NLT k-dog corpus so the question arises are they helpful or not yes they are helpful in the creation of sentences but they are not helpful in the processing of the language so let's check the list of stop word in the NLT K so from NLT KDOT corpus we are importing the stop words and if we specify what all stop was are there in the English language let's see so you can see here we have the list of all the stop words which are defined in the English language and we have 179 total number of stop words now as you can see here we have these words which are few more most other term now these words are very necessary in the formation of sentences you cannot ignore these words but for processing these are not important at all so if you remember we had the top 10 tokens from that particular word that is the AI paragraph I mentioned earlier which was given as f-test underscore top 10 let's take that into consideration and see what you can see here is that except intelligent and intelligence most of the words are either punctuation or stop was and hence can be removed now we'll use the compiled from the re model to create a string that matters any digit or special care and then we will see how we can remove the stock words so if you have a look at the output of the post punctuation you can see there are no stock was here in the particular given output and if you have a look at the output of the length of the post punctuation it's 233 compared to the 273 the length of the air and the score tokens now this is very necessary in language processing as it removes to all the unnecessary words which do not hold any much more meaning now coming to another important topic of natural language processing and text mining or text analysis is the parts of speech now generally speaking the grammatical type of the word which is the verb now an adjective adverb article indicates how a word functions in the meaning as well as the grammatical within the sentence now a word can have more than one parts of speech based on the context in which it is used for example if we take that sentence into consideration Google something on the Internet now a Google acts as a verb although it is a proper noun so as you can see here we have so many types of POS tags and we have the descriptions of the various tags so we have the coordinating conjunction CC cardinal number C D we have JJ as adjective MD as madad we have the proper noun singular approval we have verbs different types of wars we have interjection symbol we have the Y pronoun and the Y at work now we can use viewers tags as a statistical NLP task it distinguishes the sense of the word which is very helpful in text realization and it is easy to evaluate as in how many tags are correct and you can also infer a semantic information from the given text so let's have a look at some of the examples of POS so take the sentence D dog kill the pad so here D is a determiner dog is a noun killed is a verb and again the bat a determiner and noun respectively now let's consider another sentence the waiter clear the plates from the table so as you can see here all the tokens here correspond to a particular type of tag which is the parts of speech tag it is very helpful in text realization now let's consider a string and check how an Lda performs Pierrot is tagging on it so let's take the sentence Timothy is a natural when it comes to trying first we are going to tokenize it and under NLT only we have the POS and the score tab option and we'll pass all the tokens here so as you can see we have Timothy as now is a world as a determiner natural as an adjective when I was a whoa it as a preposition comes as a verb - as a - and drawing as a verb again so this is how you define the POS tags the POS underscore tag function does all the work here now let's take another example here John is eating a delicious cake let's see what's the output of this one now here you can see that the tagger has tagged both the word is and eating as a wool because it has consider is eating as a single tone this is one of the few shortcomings of the POS taggers one thing important to keep in mind now after POS tags there is another important topic for this the named entity recognition so what does it mean now the process of detecting the named entities such as the person name the location name the company name the organization the quantities and the monetary value is called the named entity recognition now other named entity recognition we have three types of a notification here we have the non phase identification now this step DS with extracting all the noun phrases from a text using dependency passing and parts of speech tagging then we have the phrase classification this step classification this is the classification step in which all the extracted noun phrases are classified into respective categories which are the location names organization and much more in apart from this one can curate the look-up tables and dictionaries by combining information from different sources and finally we have the entity disambiguation now sometimes it is possible that the entities are misclassified hence creating a validation layer on top of the result is very useful and the use of knowledge graphs can be exploited for this purpose now the popular knowledge graphs are Google knowledge graph the IBM Watson and Wikipedia so let's take a sentence into consideration that the Google CEO sundar pichai introduced the new pixel act Minnesota ROI Center event so as you can see here Google is an organization sundar Pichai as a person Minnesota is a location and the ROI Center event is also tired as an organization now for using any R in Python we'll have to import the nae underscore chunk from the NDK module which is present in Python so let's consider our text data here and see how we can perform the any are using the nm Tiki library so first we need to import the any underscore chunk here let's consider the sentence here we have the US president stays in the White House so we need to do all these processes again we need to tokenize the sentence first and then add the POS tags and then if we use the any and US Code chunk function and pass the list of tuples containing POS tax to it let's see the output so as you can see the US here is recognized as an organization and White House is clubbed together as a single entity and is recognized as a facility now this is only possible because of the POS tagging without the peers tagging it would be very hard to detect the named entities of the given tokens now that we have understood what our name and entity recognition and he asked let's go ahead understand one of the most important topic in NLP and text mining which is the syntax so what is the syntax so in linguistic syntax is the set of rules principal and the processes that govern the structure of a given sentence in a given language the terms syntax is also used to refer to a study of such principles and processes so what we have here are certain rules as to what part of the sentence should come what position with these rules one can create a syntax tree whenever there is a sentence input now syntax tree in laymen terms is basically a tree representation of the syntactic structure of the sentence of the strings it is a way of representing the syntax of a programming language as a hierarchical tree structure this structure is used for generating symbol tables for compilers and later code generation that he represents all the constructs in the language and their subsequent roots so let's consider the statement the cat sat on the mat so as you can see here the input is a sentence or a war phrase and it has been classified into non phrase and the prepositional phase again the noun phrase is classified into article and noun and again we have the verb which is sad and finally we have the preposition on the article and the noun which are the--and mat now in order to render syntax trees in our notebook you need to install the ghost rip which is a rendering engine now this takes a lot of time and let me show you from where you can download the costs prep just type in download and go script and select the latest version here so as you can see we have two types of license here we have the general public license and the commercial license as creating syntax and following it is a very important part it is also available for commercial license and it is very useful so I'm not going to go much deeper into what syntax tree is and how we can do that so now that we have understood what our syntax trees let's discuss the important concept with respect to analyzing the sentence structure which is chunking so chunking basically means picking up individual pieces of information and grouping them into bigger pieces and these bigger pieces are also known as chunks in the context of NLP and text mining chunking means grouping of words or tokens into chunks so let's have a look at the example here so the sentence into consideration here is we caught the Black Panther V is the preposition court is a verb the determiner black is an adjective and Panther is an hour so what it has done is here as you can see is that pink which is an adjective Panther which is a noun and D is a determiner are chunked together in the noun phrase so let's go ahead and see how we can implement chunking using the NLT K so let's take the sentence the big cat ate little mouse who was after the fresh cheese will use the POS tax here and also use the tokenizing function here so as you can see here we have the tokens and we have the peers tags what we'll do now is create a grammar from a noun phrase and we'll mention the tags that we want in our chunk phrase within the curly braces so that will be our grammar underscore NP now here we have created a regular expression matching string now I will now have to pass the Cheung and hence will create a chunk pass and pass our non free string to it so as you can see we have a surgeon error and let me tell you why this error occurred so this error occurred because we do not use the core scrip and we do not form the syntactical tree but in the final outward we have a tree tree structure here which is not exactly in the visualization part but it's still so as you can see here we have the NP non phrase for the little mouse again we have the noun phrase for fresh cheese also all the fresh is an adjective and cheese is a noun it has considered a noun phrase of these two words so this is how you execute chunking in analytic a library so by now we have learnt almost all the important steps in text processing and let's apply them all in building a machine learning classifier on the movie reviews from the NLT corpora so for that first let me import all the libraries which are the pandas the numpy library now these are the basic libraries needed in any machine learning algorithm we are also importing the count vectorizer I'll tell you why it is used later now let's just import it for now so again if we have a look at the different elements of the corpora as we saw earlier in the beginning of our session we have so many files in the given analytical corpora and let's now access the movie reviews corporis under the NLT que corpora as you can see here we have the movie reviews for that we are going to import the movie underscore reviews from the NLT corporis so if we have a look at the different categories of the movie reviews we have two categories which are the negative and the positive so if you have a look at the positive we can see we have so many text files here similarly if we have a look at the negative we have thousand negative files also here which has the negative feedbacks so let's take a particular positive one into consideration which is the CV double zero two and five and zero you can take any one of the files here doesn't matter now the above tokenization as you can see here the file is also a tokenize but it is generally useful for us to do the tokenization but the above tokenization has increased our work here and in order to use the count vectorizer and the TF idea we must pass the strengths instead of the tokens now in order to convert the strings into token we can use the d tokenizer within the NL TK but that has some licensing issues as of now with the with the corner environment so instead of that we can also use the join method to join all the tokens of the list into a single string and that's what we are going to use here so first we are going to create an empty list and append all the tokens within it we have the review and the score list that is an empty list now what we are going to do here is remove all the extra spaces the commas from the list while appending it to the empty list and perform the same for the positive and the negative reviews so this one we are doing it for the negative reviews and then we'll do the same for the positive reviews as well so if you have a look at the length of this negative review list it's thousand and the moment we add the positive reviews also I think the length should reach two thousand so let me just define the positive reviews now execute the same four positive reviews and then again if we have a look at the length of the review list it should be 2,000 that is good now let us now create the targets before creating the few features for our classifiers so while creating the targets we are using the negative reviews here we are generating it as zero and for the positive reviews we are converting it into one and also we will create an empty list and we'll add thousand zeros followed by thousand ones into the empty list now we'll create a panda series for the target list now the type of y must result into a panda series so if you have a look at the output of the type of why it is pandas dot codes or series that is good now let's have a look at the first five entries of these series so as you can see it is thousand zeroes which will followed by thousand ones over the first five inputs are all zeros now we can start creating features using the count vectorizer or the bag of force for that we need to import the count vectorizer now once we have initialized a vectorizer now we need to fit it onto the rev list now let us now have a look at the dimensions of this particular vector so as you can see it's two thousand by sixteen thousand two hundred and twenty eight now we are going to create a list with the names of all the features by typing the vector is a name so as you can see here we have our list now what we'll do is we'll create a panda's data frame by passing the SyFy csr matrix as values and feature names as the column needs now let us now check the dimension of this particular pandas dataframe so as you can see it's the same dimension two thousand by sixteen thousand two hundred and twenty eight now if we have a look at the top five rows of the data frame so as you can see here we have sixteen thousand two hundred and twenty eight columns with five rows and all the inputs are here zero now the data frame we are going to do is now split it into training and testing sets and let us now examine that training and the test sets as well so as you can see the size here we have defined as 0.25 that is the test set that is 25% the training set will have the 75% of the particular data frame so if you have a look at the shape of the X train we have 15,000 and if you have a look at the dimension of X test this is 5000 so now how data is split now we'll use the nave bias classifier for text classification over the training and testing sets so now most of you guys might already be aware of what Annie buys classifier is so it is basically a classification technique based on the base theorem with an assumption of Independence among predictors in simple terms our neighbors classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature to know more you can watch our neighbor bias classifier video the link to which is given in the description box below if you want to pause at this moment of time and check quickly what Annie buys classifier does and how it works you can check that video and come back here now to implement name bias algorithm in Python will use the following library and the functions we are going to import the Gaussian NB from SQL urn library which is a psychic learn we are going to instantiate the classifier now and fit the classifier with the training features and the labels we are also going to import the multinomial nave bias because we do not have only two features here we have the multinomial features so now we have passed the training and the test dataset to this particular multinomial naive bias and then we will use the parade function and pass the training features now let's have a look and check the accuracy of this particular metrics so as you can see here the accuracy here is one that is very highly unlikely but since it has given one that means it is overfitting and it is overly accurate and you can also check the confusion matrix for the same for that what you need to do is use the confusion and the score matrix on these variables which is y underscore test and y underscore predicted so as you can see here although it has predicted 100% accuracy the accuracy is 1 this is very highly unlikely and you might have got a different output for this one I've got the output here as 1.0 you might have got an output as 0.6 0.7 or any number in between 0 and 1 so guys this is it for today's session I hope you understood a lot of things regarding text mining and natural language processing the tokenization stemming limit ization POS taggings then the named entity recognition chunking the syntax how it is important the syntax tree the creation of syntax tree and finally if you had a look at the final demo here where we use the knee bias classifier on top of all the NLP operations we performed thank you so much for watching I hope you have enjoyed listening to this video please be kind enough to like it and you can comment any of your doubts and queries and we will reply them at the earliest do look out for more videos in our playlist and subscribe to Eddie Rica channel to learn more happy learning
Info
Channel: edureka!
Views: 245,536
Rating: 4.8910756 out of 5
Keywords: yt:cc=on, natural language processing tutorial, nlp tutorial, nltk tutorial, nlp using nltk, text mining tutorial, text mining, text mining using python, text mining nltk, text mining techniques, text mining examples, text mining applications, nltk tutorial in python, nlp training, natural language processing training, natural language processing training using nltk, nltk training, natural language processing with python certification course, edureka, nlp edureka
Id: 05ONoGfmKvA
Channel Id: undefined
Length: 40min 28sec (2428 seconds)
Published: Sun Oct 07 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.