Coding Challenge #40.1: Word Counter in JavaScript

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello and welcome to another coding challenge in this coding challenge I am going to build a word counting application so what can you do with a word counting application a lot of things its forms the basis of text analysis operations you can make all sorts of visualizations you know word cloud being this or classic perhaps cliched example but I'm sure a lot of you will come up hopefully with creative outputs from this I'm just going to do the core word counting application two things I'll mention I'm gonna use something called an associative array if you don't know what that is I just made a video all about an associate of a rice you can go back and watch that and I'm also going to do another follow up I'm doing this one in JavaScript using p5.js I'm going to do this exact same thing in processing on the java programming language right after it so you could if you're more interesting that good go and skip and watch that video okay so let's just get started so I need a text I'm going to load text from a text file I just copy pasted the description of Rainbows from Wikipedia so I'm going to load this particular text and count all the word occurrence count all the word frequencies in this text but you could you know pull text from an API project gutenberg is a great place to find a lot of public domain text that you can work with or you could even have you know users upload files as you've seen in previous examples so you might take your own spin on where are you getting your data from but I want to focus on the core algorithm of counting words so before I can actually count the words though I need to write the code to load the text file I'm going to do this in the simplest way by adding a preload function to my p5 code I don't need the draw function there's not going to be any animation in this so there certainly could be one I'm going to create a variable called txt for the raw text I'm going to say T XT equals load strings rainbow dot txt so this is me loading that file in and if I now go to my page and I type txt here we can see what did it load and it actually loads everything in it as an array with each line each element of that array being one line from that text file that's how the load strings function works in p5 so one of the first things I want to do is just create a variable where I join all of that together so I can join it all I just want to have one long string because ultimately I'm going to chop up that long string into word tokens so I get I get a text file I have it an array of lines I want to join all those lines together so that I can split up the whole thing by word tokens so then after I have all words I can save our tokens equals all words dot split now if you recall split is a JavaScript function you can call on a string that says here's a delimiter this is the space in between words and I want you to separate this text into an array where each token each word in this case word is a separate element of that array so and I can use a regular expression C my videos and regular expressions to define the delimiter so the simplest thing I could do would just be split by space but I'm going to and this isn't going to be so great but I'm actually going to split by this regular expression what this regular expression means is slash is a slash W back slash W is a medic act character for any alphanumeric character abc123 but that's lowercase W capital W means anything that's not a through Z or 0 through 9 so essentially I want to split by all spaces all punctuation that sort of thing could be some problems with this can't will split into Ken and T but you know we could refine this later and get a better better egg expression in there mostly I just want to work on the word counting part so that's all set now let's just make sure this aspect of it is working and let's just look at that array in the console so we can see it is working you can now see that I have an array where each word is a separate element of that array okay what's next what I wanted what I need to do now is look at every word one at time so first of all how would I do this if I didn't have a computer you know people used to do these text-based concordance --is all the time if I had a book like how computers work I could just start reading this book the oh okay the one changes change one in one okay etc etc Cedric Cedric Cedric says ah ah oh I already found that uh okay - so you can see this is the algorithm right and I encourage you to always do your algorithms not always but when you when you can when you feel like it do your algebra algorithms manually by hand to understand how they work I need this pen so what is the algorithm it is look at each word is it a new word is it a new word yes add it and set count to 1 right if it's not a new word increase its existing count so this is the algorithm we now need to implement in our code I'm sorry for the chicken scratch and the focus box focus box focus box but this is the algorithm that we want to implement look at each word is it a new word yes add it to the dictionary the associative array thingy the JavaScript object no increase its existing count okay come back over here let's start doing that so how do i iterate through all the words for VAR i equals 0 i is less than tokens dot length i plus plus and now i'd say ok let's look at the let's look at each word tokens index i now I need to have this dictionary that I'm talking about the dictionary of counts so I'm going to create a global variable doesn't need to be global necessarily but I'm going to do that just so I can kind of play with it in the console if I need to counts and what should counts be counts should be an empty object because what I'm going to do is put the words as the keys the properties of the object and the counts as the values in the object so how do I determine if this word right the first thing I need to do is determine is is it a new word so the way that I can check if it's a new word is to look at if it's a property of counts now there are some things I can do I think I JavaScript and say like if counts has own property or something like that but I an easier way for me to do this is just reference reference the property right if that property doesn't exist it will be undefined right so in other words let me come over here and look at this count object right if I try to say counts the it's going to I'm going to get back undefined so I can actually undefined evaluates in JavaScript to false or I can actually be really explicit about it in tests if it equals undefined let's do that so if counts word equals undefined right if it's undefined again that's a little bit of overkill there but if it's undefined what do I do set its count to 1 right set its count to 1 if it's not undefined what do I do and that should be word not words increase its count by 1 of course I could say counts word plus plus but I'm being very long-winded here if it's undefined we found it it's a new word if not increase its count by one so now let's take a look at what we got ooh boy what did I do wrong I definitely got a mistake because the keys are all numbers let's see what's going on here word equals tokens index I what did I do wrong here a console dot log word that's right counts it's called counts right it's an object if it's undefined it equals one otherwise it equals plus one what's going on here what did I do oh no no I didn't do anything there's just a ton of numbers in the text so by the way this is easy to forget and it's by the way it's actually sorted in alphabetical order which was like oh why are all these numbers this something must be wrong so you know like I could maybe say like actually I want to ignore numbers and let's do that so let's say I wanted to ignore numbers what would be some ways that I could do that well one thing I could do is I could make sure I could only do this if I could I could write a right expression like something like this so if I believe this is right so let's let's let's test this in the console so a regular expression like /d oops this is any number of digits I should get false I should get true so I could ignore anything that is not just a string of digit that is only a string of digits so I could put a little if statement around anything so as long as it is not just a string of digits then I'll actually count this word and if I do this again oops and I have a syntax error I guess I need another parenthesis and I say counts now we can see this is a bit better this is war what I expecting so I got some junk in there eccentric so now one thing you'll notice is oh there's a lot of capital words and then there's a lot of lowercase words but there might be a capital the and a lowercase the and I want to count those as the same thing so let's let's fix that one way one way I could do that is just as I'm going through every single word I can say two lowercase so I want to convert all those tokens to lowercase so that as I'm filling up this dictionary this associative array this JavaScript object I'm just including the lowercase versions as the keys so now if I go to counts we can see there we go a 206 Abel 3 about 10 above 14 apps and accepts Accord etc etc so now interestingly enough so this basically at you can see in many ways I'm done I now have a JavaScript object that has every single word as a key and the number of times it appeared in that document as the count now here's the thing it looks like it's in alphabetical order there which weirdly by the way is not the order its heat in here so let's try to understand what's going on here I'm going to make another I'm going to make a test object it is a little bigger so this is an empty object I'm gonna say test dot B equals 100 and test out a equals 50 and now I want to look at this object so one thing that's confusing here is an object a JavaScript object isn't ordered the properties dark don't exist in an order now they do kind of have this inherent order and you can see that inherent order there I put them I added them to the object in an order I said B first and then I said a so in some ways that's an order that JavaScript maintains for you and there is a way to iterate over all the properties of an object and this is the next thing I need to do but I also need to sort that order and there's no real way to sort the order of keys in a JavaScript object easily now looks like they're also an alphabetical order here which my guess is that's actually just the chrome developer console deciding like ah maybe you want to look at them in alphabetical order I'll put them in alphabetical order for you whatever a question like this comes up I think to myself could I add some extra redundancy to my code and just like not worry about it but store all the keys in another data structure that has ordered to it and I can check my attack in both places so if I need the order I have them over here and if I don't need the order I have them over here now there might be reasons someday why you don't want to do that but this is going to be a perfectly good and and I think kind of excellent solution for us right now so what I want to do so what's a data structure that has an order to it a structure that has an order to it is an array so what I actually want to do in this example because ultimately I want to sort everything from highest count to lowest count and there's no way to do that really just in the JavaScript object itself without some acrobatics as much as I want to try some acrobatics so what I want to add actually is a variable called keys and that's going to be an array the idea here is that I'm going to store all of the pairs the keys and the values in the counts object but I'm going to store separately a list of all of the unique words that I found because that's going to have an order I get iterate over it I can sort it I can do all sorts of stuff with it so how will I do that if I have this empty array when do I want to add a word to that array looking through the algorithm here I only want to add words to that array when I've discovered a new word that I haven't encountered before which is happening right here when it doesn't exist already in the dictionary the JavaScript object the associative array hash table thing so I'm going to say here keys and I'm using the word key because we can think of this JavaScript act object as a collection of key value pairs the values being the count the words being the keys and so now I'm calling the array keys keys dot push word so if I do that now let's look at what I have I have my counts which are all the words and their counts and then I also have my keys and my key is this 1343 words and you can see here they are it's a big array and all of them so these are all the unique words in the order that I found them now here's the thing we can now sort that array so first of all what what's the point of that array the main point of that array is I can now iterate over it so let's say I just wanted to create a div for every single word what I can do here is I can loop over that array we're actually going to have a problem here which I will try to remedy I'm going to iterate over all the keys and I'm going to say create div and this is p5 Dom syntax you could use document create element or whatever but create div keys index I so let's just look at what that does okay so I'm going to refresh this page and we can see there we go here's all the keys now what if I want to actually also show the counts for those words while I'm iterating over the keys but the counts are stored in the counts dictionary so what I want to do is say var key equals keys index I and I want to create a div that's both the key and what the counts for that key so the key the key is the thing that I'm getting from the array but then I use it as the lookup in to the counts JavaScript object so this now allows me to say this I now have all the words in there counts and I can make this a little bit bigger now they're not sorted in any order but I can now sort that array so let's say I do this Keys dot sort that's going to sort that array it's sorted it now look it's in alphabetical order now if it so happens you want your keys in alphabetical order done turn off the video go outside fly a kite skip to your loo whatever you want to do you are done but most likely you're going to want to sort that array from some other mechanism and JavaScript kind of has this sort function for arrays and it's just like okay you said sort I I guess I'll sort by alphabetical it has like one way it knows how to sort the array the numbers it'll sort it by like ascending or descending order something like that probably but what you want to do what I want to do is sort this by the counts but the counts aren't in the array so how do I sort the array by the counts that the counts are in the object but not in the array guess what a javascript has this neat little feature which allows you to pass a function into the sort function so sort without any argument will sort by default but if I write a function called compare if I write a word variable name called compare that means I can write a function called compare and what that functions job is is to return a positive or negative number based on whether it wants a so a and B are two L any two elements in that array JavaScript behind the scenes is going to sort the array based on your comparison of those two elements so what I could do now is say okay well count a equals counts a count B equals counts B so in this function which is telling keys how to sort I can reach into the dictionary like reaching into the dictionary pulling out those counts and now I can get a positive or negative number right I could send it pop one if count a is bigger than count B I could send it negative one if count B is bigger than count a and actually an even simpler thing I can do is just say return count a minus count B right because 10 minus 5 is 5 and 5 minus 10 is negative 5 so that should work so let's put that in there let's run it and look oh well I sorted it the wrong way because now I have the lowest ones first so I'll just say count B minus count a I can never remember which one is which so I always just do it one way and then reverse it just the wrong way and we can see there we go the appears 537 times 248 times a 206 times rainbow 100 and 660 times etc etc etc so this now gives you a solid foundation for how to build a word counting application in JavaScript particularly using p5 so if you're watching this video and thinking well what should I do next here's what I would ask of you number one think of some interesting text that's meaningful to you or just something you want to play with that you want to do word counting with give that a try other aspects are think is this isn't so great this is a literal list of the words in there counts what type of creative idea might you have how might you animate the results of this how much you animate it while it's computing it how might you visualize these word counts how much you do a concordance of multiple texts and compare them so please if you make something please share it with me and the last thing that I'm going to mention is if I go to I just want to show you if I go back to my course website under text analysis I just want to show you this particular example this is essentially the same example that I just built but it has a bit more features to it where and you can take a look at this code if you want where you can drag a file in you can use a text sample sample it has a submit button you know formats it as a list so this this you might want to take a look for some more features and I also want to mention that what this example does is it takes everything that has to do with word counting and packages that in a JavaScript object that you can use generically there's one more example I'll show you that's also on my website which is whoops which is the parts of speech concordance so you can also look at this one which uses Rita J s which I showed in another video to not just count the word frequencies but actually count how many nouns we're in there how many verbs and these are parts of speech tags short short indicators of what parts of speech so you can take a look at that and think about what are other things count how many verbs there were count how many words that had six letters in it there were what other kinds of work no counting applications can you make where you're not just counting the word frequency itself one other thing I'd like to mention thanks to Alvaro in the chat who just brought this up is even though I made that keys array myself one thing you can do in JavaScript if I have this count counts object is I can say object dot keys counts and this will actually return an array of all the keys in that object so you can actually get this automatically and then sort that array but I've still just like to kind of just might as well build it myself because I'm reading through everything so hope you enjoyed this video see you in another one I'm going to uh in the next video I'm going to do this exact same or accounting application but in processing using the java programming language and later i'm also going to do a word counting application that does keyword generation using tf-idf and algorithm known as that which i'll explain and at some point look at bayesian text analysis as well to do text classification which also employs word counting okay thanks very much and see you soon you
Info
Channel: The Coding Train
Views: 44,884
Rating: undefined out of 5
Keywords: word counter, word counter regex, word counter javascript, word counter js, associative arrays javascript, patreon, creative coding, coding challenge, javascript (programming language), daniel shiffman, javascript, tutorial, programming challenge, javascript string object, js string object, text analysis, programming from a to z, data and apis, data javascript, coding, word counting, concordance, word counting javascript, word frequency, term frequency
Id: unm0BLor8aE
Channel Id: undefined
Length: 21min 54sec (1314 seconds)
Published: Mon Oct 10 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.