Coding Challenge #75: Wikipedia API

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello I am going to do a coding challenge with the wikipedia API um I don't think I've done this before I'm not exactly entirely sure what I'm going to build in a way it might be better for me to just do a tutorial about the Wikipedia API but let's make this a challenge so this is what I'm thinking there's a lot of interesting things you could do if you can get access to the contents of Wikipedia you can make poetry machines and all sorts of experimental text things but one thing I know that people have done effectively is make games that you can play by sort of crawling and moving around Wikipedia and there are lots of interesting examples about that hopefully to get the chat so what I'm gonna attempt to do I think is create a sketch where I'm gonna say user input I'm going to create a little user input box and I'm gonna say user input equals I'm going to use the p5 function create input so now we should if I go to the code we should have a nice little input box and I'm gonna say and you know what I'm actually gonna do let me do this differently instead of using the create function let me get rid of the styling stuff let me actually put it in the HTML so Wikipedia coding challenge thing I'm hoping them and make something not so interesting and that you the viewer will learn and see how I did it but have a more creative and mind to make something more interesting and exciting so I'm gonna say word and then I'm gonna say input ID equals user input input so I just I prefer to like put the the elements that I want on the web page in HTML so now we can see and I so now what I can do instead of this I can say select so I'd rather just select it and now we have access to what's written in it in and so what did i do did I do something wrong input is this something wrong with my HTML here oh yeah this should be h1 thank you so there we go so and actually let me make it h2 because it's large there we go so I want to enter in a word here hello and hit like enter or something so when I hit enter I can handle that event saying user input changed go wiki so what I want to do is have a function when the user hits enter or tab that that will signal that there's been a change in the text field I could also add a button or something but I'm not worrying about interface design here you could worry about that when you redo what I do and now I'm just going to write a function and I'm gonna call it go wiki and the first thing I want to do is just get what is in that what is in it so first I want to say let word or term or something term equal user input value console dot log term I don't know what I just did there above okay okay so again this is my sort of friendly style of writing JavaScript you could do this in 15 different other ways that's what I would do the other thing I'm gonna do is I'm just gonna add rainbow in here so it looks that's not right I guess I say value equals rainbow so it's in there and then changed of null so the idea here is user input and then in here I said select oh user input so it'd be nice if I put an ad in there so there we go okay so now if I hit enter tab oh it hasn't changed actually yeah so rainbows so the funny thing is he says it started with a rainbow when I hit enter it did I should probably is a button must submit whatever I'm actually gonna skip bypass all of that and I'm just going to when the window code starts I'm just gonna immediately call go wiki so I'm using this just like I I want to setup an interface to make this dynamic with different words but I just want every time i refresh the page for it to run to test what's going working so I should see every time I run it I get a console log of what's in there I could change a different word so now the first thing I want to do is look for list of articles with that term in it so I you know I've already done the research about the Wikipedia API and how it works normally in these coding challenges I like to kind of like figure that out but I have to admit the Wikipedia document API documentation is a bit difficult is to read but here it is and I can see this this is basically what I'm using here I think is that this is looking for this URL tells to send you the content of a page but there's also like a URL to get us to do a search so you can look through and read all about how to do this but I'm actually just gonna I already have a URL right here this is for a search so what I'm gonna do here is I'm gonna say VAR URL equals and actually let's um let me use these let me make these and I'm gonna I have to use let not bar I'm gonna say put this put this back up at the top there I'm gonna say let search URL equal and this is the full URL I'm gonna go all the way to the end and close the quotes but I'm gonna take that out because what I want to do is have an end and then I want the search term to form dynamically so I'm gonna say let URL equal search URL plus term so let's just console.log that URL to see that that's working ah there we go so now you can see that's the full URL it's searching for a rainbow and that's dynamic because if I were to put in here you know unicorn and hit enter now I'm gonna get unicorn so the first thing that I want to do is just sort of see what's in that I want to connect I just want to copy this URL and I want to look at it I didn't get it I could just click on it instead of copying it but I'm gonna put it in here and look this is the JSON JavaScript object notation formatted data that's coming from the Wikipedia API and it's giving me a list of all the possible pages with the term rainbow in it so let me just pick the first one so how do I even get that so it's coming in as an so first of all I have to request it from my code I'm looking at in the browser so I want to say load JSON URL and I'm gonna say got data so let's make sure this comes through and I'm gonna write a function called got data and that's the callback for when the data comes through and I'm kind of entering into a dangerous territory of having all these nested callbacks first there's a callback for the input box then there's a callback for the data then I'm gonna do something else with a callback so someday I might want to refactor this using a sort of more thoughtful approach of how to sequence all these callbacks but I'm just gonna kind of do it willy-nilly right now console dot log data so let's see what happens here ah so look at this this is like the bane of everybody's existence on the Internet access control allow origin header not present I can I use it course to say ba ba ba ba ba ba so this is a really unfortunate it's the Wikipedia API doesn't seem to have something called cores which is cross-origin resources or something like that enabled allowing us to fetch data from another server from a server which is where our code is running some api's will let you do this some won't I'm pretty sure since I've used this before in code that there's a way around this it's called JSON PE I sort of made a video about this somewhere else it's JSON with padding and p5 has a nice option so often what I do and I've said this before is it didn't work I got cores let me just add another argument here JSON P so if I last add this third argument let me try it with JSON P and let me run it again and there we go it came through so we could say a lot more about that but in this case were just lucky we tried it first without JSON P now we try it with JSON P so what I'm looking for is it comes in as an array I want the first element the first element of the first element of the array so I'm going to try to see if I could say console log data 0-0 let's see what I get and I could pick a random one that might be more interesting but let me just pick the first one I got are mmm did I get the Oh zero No zero sorry I want data one the second element the first element of the second element there we go rainbow so let's just see rainbow let's try putting other things in here like unicorn and let's just try like a few characters like ar e there's a I guess there's a Wikipedia page where re P P you why okay what's going on here let's look at the full data 0 tu Y oh there is a Wikipedia page for P Y which is the friend spinergy the point is if I Panter it in a search term that was just part of a page I wouldn't get it back but you you'll think of something so let's change it to random so what I'm actually gonna do is I'm going to say let's length equal data index 1 length it's probably it's I think it's gonna just give me sort of 10 by default and then I'm gonna say let index equal floor random length so I want to pick a random index into that array and I'm going to say data index 1 data 1 index so this should give me a random article that'll be better let me go back to my thing here so now I got rainbow the South Korean band if I type in what was IP u why I got P why but there are other ones that won't it's I don't I got to have the submit button unicorn I got unicorn bubble so you can see I'm getting a random one each time let's do rainbow again I got Rainbow Brite let's do a unicorn again and I got unicorns the cricket team okay so we're getting a random article each time that's good I forgot what I'm doing here exactly but now let's at least get the articles data so the next thing that I want to do once I have that title so really what I'm saying is VAR title equals this now one thing about Wikipedia titles is let's say I try to actually look for unicorns cricket team on Wikipedia so I'm gonna go to wicked unicorns cricket team Wikipedia see if I can get that page here it is look at the URL the title the way Wikipedia works is it actually uses underscores wherever there are spaces so I'm gonna need to account for that so one thing I'm gonna do is I'm now gonna say title equals replace oh sorry title this is a JavaScript function that allows me to use a regular expression which you can watch my regular expression tutorials match any white space I could have just done this any like space but let's do I'm gonna be smart about this any one or more white space in a row with an underscore now let me console.log that title and let's see what this looks like you can see I got rainbow flag now oh you know what I forgot it only replaced the first space so I need to have the global flag here so it replaces all the spaces and I know you can see it replaced all the spaces rainbow trout etc rainbow rainbow break etc okay so here we go now what am I gonna do now I need to ask Wikipedia and here's the next this is I'm gonna call this the content URL this is the URL for asking for the content and I think you can ask the fact that it's it you can see at the end here it's titled equals I think you can ask for content of multiple articles but I'm just gonna do one so what I'm going to do now is back into my code I'm gonna go down here to see what I mean by this sort of nested call back thing that I've done is kind of a bit of a problem I'm going to say load JSON okay let are here let-let-let URL equal the content URL plus the title and then I call it content URL with lowercase RL content URL plus the title and then lo JSON URL and then got now I need another you know most people when writing JavaScript I don't most people would put in an anonymous function right in there that's the callback or use some type of other technique to have nested callbacks like a sync or promises I'm gonna be pretty terrible about it and I'm gonna say I'm gonna call this got search and I'm gonna call this got content and just write another function call this data and I'm gonna say console whoops and I'm going to say console dot log data okay so now I want to see I want to look at the say console dot log querying and title oh there's that new like backtick fancy way in es6 of you doing strings got to do that at some point okay okay so let's do this uh-oh so I went for Rainbow Brite but it didn't get the content of Rainbow Brite so what I'm going to do is say why because I forgot about JSON P so I have to do JSON P I had that cross origin error again and does this look like I got stuff this looks weird so actually this is working I just find this to be totally confusing to look at but the content is in there I should have thought like I have to go and investigate how the JSON is formatted first so I could say I can I can copy this link and look at it and by the way I have a Chrome extension installed it's called like JSON format err so let me look at the stuff this way so really what this is I got to look under query pages huh I have to know that ideal hold on a second here so pages page idea revisions oh my god goodness three items content format star this is what I want which is oh this is the Wikipedia bi is organized strangely and maybe there's another way to do this but I actually need to I think I might I have to think about how do i if I don't if the page ID is gonna be different how do I know what to look up so looking at this I have query pages and then some number which is the page ID which is something that's going to be different for every article so I have to somehow pull that dynamically in order to get down here to this stuff which is the content which is what I want so what I'm going to do is I'm gonna say let me look here and I can change this console.log data dot query dot pages let's see what I get oops I'm in the wrong place and hit refresh so I got it so how do I know what this number is can I just use like an index I could get the key I could say object keys but what if I just say index 0 will that work for the first object no undefined so I could say object keys to give me all the keys that are in this and then look I've got that so the page ID this is pretty weird that what I'm doing I know if there's a better way to do this page ID equals object keys data query pages index 0 index 0 so let's see if that gives me the page ID whoops I have too many brackets what if I messed up here too many parentheses and okay here we go lots of errors here here we go let's try to get this right this this deserves a drumroll am I gonna see the page ID yes and I'm gonna see different page IDs look at that so I'm able to get the page IDs whatever so now what I want is the content to be equal to so so let I'm just got page equal data query pages and now I want to say page page ID right because I want to go and look at I want to look at page pages Oh pages page ID revisions star dot revisions star now here's the thing you can say dot the name of the thing if it's a valid variable name like revisions but I can't say dot star because star is not a valid character for a variable so I've got to use these brackets boy I'm really off my rocker here with this crazy convoluted using Wikipedia low okay here we go let's see what's the chance that I got something here Oh let's look console dot log content undefined mmm let's look at revisions okay got something there revisions zero oh it's an array so zero star oh my goodness zero stars these are really this is like detective work it's not even programming just to figure out how this thing works they are there look there's the content of the Wikipedia page I finally got the content of the Wikipedia page now what do I do with that Oh No so let's do this let's first do a couple things let's say let me at least say right about right before I asked for the content let me say create P title so I got that and now I got the content if I type unicorn in here and I get the content so now the question is what do I want to do well I could search for a word and then I could so I could pick a random word and then search for articles with that random word in it why not right I could look for links like I could look for special links to other Wikipedia pages but let's be really simple about this and let's just actually let's use another regular expression I got to move they're gonna make this challenge I got to finish this up at a wrap this challenge up somehow so let's use a regular expression let's say I want to say were word regex equals any valid sequence of word characters one or more and let's actually let's say it has to be at least four or more so it has to be at least be like a four-letter word and then and I could put word boundaries right word boundaries I guess and then make it global and then I could say what how do I do this again I say word regex exact or match where do I say content match I can't even remember my own right maybe we should content match the word regular expression by the way I'm kind of off my rocker here in the sense that I am now using all sorts of stuff like regular expressions you could compare egg expressions tutorial but what I'm trying to do here is create match a given word and that word is any sequence of characters four or more I'm deciding and then I want to look at this so let's see I don't know if I did this right let's see if this gives me a bunch of words on the page it look it did it gave me two thousand four hundred seventy four words like vandal M vandalism expiry the nice thing about p5 is I can now say I'm going to save our words equals the results of this regular expression then var word equals pick a random one so the P five random function could actually give it a ran I'll pick a random one out and I'm going to say console.log word and I'm going to run this and it gave me average so now what I'm gonna do is I am going to go wicket and you what I'm just going to give this forget I'm give this a go wiki word or turn so it starts with user input value but this go wiki term just does it with that thing so now I can say go wiki word now I'm gonna probably crash my browser because I'm gonna be doing this over and over again and I should put something in that stops but it'll actually just go back and do this whole sequence again so let's see what happens look at this a Wikipedia random article crawler I need to stop how do I stop it okay let's at least figure out oh good at least I got an error at some point that's how I stopped it look how far I got I got to the return of Saturn so somehow we got femme fatale made in America each tier from Rainbow South Korean band so I should at least probably do some type of protection here I'm going to do something sort of silly whichever's gonna say let counter equals zero I should come up with a better way of doing this and I'm gonna say counter equals counter plus 1 and if counter is greater than 10 O's less than 10 so I'll only do this 10 times and I'm just gonna say create div just so it's a little less of all that space and now let's try this one for time it did it 10 times and it stopped and in theory I should be able to know the counter should get reset anytime changed but you know what I'm gonna do forget it let's let's not have it do it it's only gonna be I'll do it once you got this will be easily fixable so now I'm gonna put something in here like unicorn and OH so what happened there oh whoops okay go wiki is the thing that starts it of course of course so I need another function start search so I'm going to create a function called function start search and that does go wiki user input dot value and also sets the counter equal to zero there we go perfect now everything is done so I can actually do this and I can change this to unicorn and it will give me crawl a bunch of random articles and then I could give it happy and it will crawl a bunch of random articles and end with music okay everybody thank you for watching this random Wikipedia article coding challenge thing it required a lot require regular expressions it requires of callbacks and low JSON and JSON P and how wiki API works Wikipedia I hope I hope you make something from this and actually be more thoughtful and have a an idea here of why you might want to crawl Wikipedia and make some kind of strange thing with it so share with me what you make and I look forward to seeing you in a future video [Music]
Info
Channel: The Coding Train
Views: 280,001
Rating: undefined out of 5
Keywords: JavaScript (Programming Language), live, programming, daniel shiffman, creative coding, coding challenge, tutorial, coding, challenges, coding train, the coding train, live stream, itp nyu, challenge, javascript, p5.js, p5.js tutorial, programming challenge, wikipedia, wikipedia crawler, wikipedia web crawler, web crawler, wikipedia api, wiki api, regular expressions, json, load json, api, js, p5js, tutorials, rest api, wiki, asynchronous requests, network requests, restful api
Id: RPz75gcHj18
Channel Id: undefined
Length: 24min 51sec (1491 seconds)
Published: Mon Sep 25 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.