It's Alive! Machine Learning Writes Your Code! | Dominic Elm & Uri Shaked | #AngularConnect

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
So, hello. Thank you. Do you see my screen? No. That's whoo. Now you see my screen. It's alive. What's alive? We're alive, I guess? >> We're alive. >> From the stage. In >> Great times, hopefully machine learning can write code for us. As you will discover soon. She did some spoilers. So, we won't go over this again. But Dominic and I met here in AngularConnect last year. We attended a very interesting talk. Do you recognize this guy? URI: I think I recognize him. If I look closely, I think his name is Asim. That's at AngularConnect last year, almost one year ago. And it looks like, I don't know, you probably know him from some of the projects he did. Can you remember what that project was about? DOMINIC: Yeah. He showed a lot of exciting demos with machine learning. One was taking a photo and running it to an algorithm that will tell you what the emotion conveyed in that photo is. And then replacing that photo in the present state with a modifier. Worked quite well. DOMINIC: Yeah, it worked well. URI: And he had another interesting concept. DOMINIC: Yeah. He talked about AI in JavaScript and a couple examples. One that we remember well was this form. The goal was an HTML form by putting in a description of that form so we could say, hey, we want to have an input field and two check boxes and a radio button and all these things. And automatically it will generate that form for us. Isn't that insane? But guess what happened next? URI: He released it as open source? DOMINIC: Not quite. I don't know. URI: Click the submit button and then you can load it. カ?カ he fooled everybody. So, I guess we're not here for dancing. DOMINIC: This song is called Never Gonna Give You Up. We're not going to give up on that. It's not possible. URI: Thank you to Asim for inspiring us to do this talk. DOMINIC: We wouldn't be on this stage. We had a lot of inspiration from Asim. URI: After the talk, we started brainstorming, how could we bring this into reality? What could we do? We had a couple of ideas like sketching an app on a notebook and scanning it and getting it out of that. But eventually we had one idea that clicked. DOMINIC: Exactly. We went with this idea of automatically generating code for us. Basically, taking away our job. That's our job, writing code, right? So, what we thought was given a function signature, that's what we see there, we pass it into this black box that we call model. And then we wanted to take the implementation of this function. And in this case, it's just adding A and B together. That's what we wanted it to do. URI: So, basically given the start of the function, write a function. DOMINIC: Yeah. URI: And what's the thing in the middle? The model? Can you explain a bit? DOMINIC: This model, we don't have to dive into all the details. But think of the model, this black box, as a JavaScript ish function that has one input. That could be anything. We pass in an input, it does computational magic. We don't have to worry about the magic. And then returns an output. That's what model does. URI: Basically, it's magic. DOMINIC: It is, but we don't have to do this computational magic ourselves. We let the machine do the function. URI: That's what machine learning is all about. Getting the content of that magic model function figured out. DOMINIC: Exactly. URI: All right. So, speaking of magic, can you hold this for me? DOMINIC: I can. I don't know what's going to happen. URI: That's machine learning. It doesn't work. DOMINIC: Machine learning doesn't work. URI: Try that again. It's broke. We will try that again later. DOMINIC: You might be like this, we've just seen that generating an HTML form from description is not really possible. You might be like, oh, come on. Stop fooling us, it's not possible. But... URI: We actually went ahead and tried this and gave a talk about it a few months ago in a great conference. And you will be available to see in a moment what it looked like in that talk. We tried to right. DOMINIC: That's a glimpse. Let's have a look. Hopefully the audio works. URI: Yeah. You were really surprised by the implementation. Like, I wouldn't say it's the most straightforward implementation you would come up with. DOMINIC: Well, I can see flush after. It kind of subtracts A from B. It could be I think setting the thing it's very creative, I would say. URI: But, yeah. The model got really creative. It did manage to create like valid code which is quite amazing. And very creative indeed. And after this talk, we were excited. And we summarized our learning. So, first of all, as you probably have you just had a claim spot, we learned that automatic code generation is hard, right? DOMINIC: Right. We also learned that data processing we talked about this a lot at ngViking. Data processing, cleaning and gathering the data, all those things, that makes up a huge chunk of the work. It is a lot of the percentage of the type of work just focusing on the data because the data is so important for machine learning. URI: And basically, we really love working with machine learning. We found it hard, difficult, but fascinating. And we decided to give it another go. This time we had a different goal, though. We decided to think, what else can we do with code? DOMINIC: I could think of a couple of things. But let's not try to take away our job. Let's try and create this synergy between us and machines. URI: I have an idea. Do you like to write comments? Do you write comments? Do you people write comments? DOMINIC: I have to. URI: Who writes comments? DOMINIC: Who likes to write comments? URI: who likes to write comments? DOMINIC: There's someone. That's good for you. URI: Wow. You can come and work with our machine learning model together. For the rest of us, we decided to try to, again, given a function, this time the entire function, just predict what the comment would be. Have this model function. Return us the comment for that function. DOMINIC: That is great. I mean, that really makes our lives at developers much easier. We don't have to document our code. We just have this model which generates the comments for us and summarizes the code. That is fantastic. Does it work? URI: So, we also had another goal. Before we show you if it worked, we decided this time we were going to use machine learning with JavaScript. Because traditionally machine learning is done in other languages such as Python. And then if you want to take your machine learning models and use them everywhere, basically on the web or in Node.js, it's not very straightforward if they are written in Python. Python doesn't run on the web. So, we decided to try and use JavaScript this time. We'll tell you about it in a bit. So, the process the general process looks like this. We start by gathering data. DOMINIC: We need a lot of data. URI: Like a lot. Like how much? DOMINIC: Well, maybe 300,000 of examples? Or even a million of examples. So, it's a lot of data. URI: And then I see we need to clean it and prepare it. DOMINIC: Yes. URI: So, it's easy for the model to find out how to approach code. Like, and then we need to train the model. DOMINIC: Exactly. But this is basically what we've talked about already at ngVikings. But as we know, we wanted to add this layer of JavaScript to it to make it a little bit more practical for us so we could consume. What we did is we added another layer to this. So, taking a trained model, we're gonna see what it looks like. And then somehow use TensorFlow JS to consume it in JavaScript. URI: So, TensorFlow, that is a machine learning library for Python. But it has a special property. There are also versions for JavaScript, for Android, for iOS. So, a bit like Angular. It's universal. You can run your models anywhere. You can create them in Python, run them in JavaScript. DOMINIC: Exactly. And then once we have it in the JavaScript world, we can then use the tool that is we use, such as Visual Studio Code, which is awesome. And then we can create an extension for it so that we can consume the model in an extension and then predict the comment. URI: All right. So, let's start with gathering the data. We need you said like DOMINIC: A lot of it. URI: A lot of comments. So, we have like 500 people in the audience. If everybody writes comments for us, how long will it take us do you perhaps have a better idea how can we get a lot of comments? GitHub! Yeah. So, yeah. Can you go in GitHub and just download... DOMINIC: Well, I know what I typically do. Press this button. Go through all the repositories and click the down up button. URI: Hundreds of thousands of times. DOMINIC: It's gonna take me a while. But, sure. URI: I have a better way. Let me show you. So, there is this thing called BigQuery. It's large scale data warehouse from Google with a lot of buzzwords in the name. But basically, what it means, it can run SQL queries on an enormous amount of data in a matter of seconds. DOMINIC: Can you show me what that looks like? URI: Yes. It also happens that BigQuery has the entire open source code from GitHub as a dataset that we can query. And let me take you to my laptop where I can show you BigQuery in action. Spoiler: Another spoiler. Yeah. So, this is basically BigQuery. And I have a query here. I'm going to run it in a moment. You can see it's going to no, I didn't want developer tools. I want full screen. You can see it's going to persist 2.3 gigabytes of source code when I run it. I'm clicking the run button. It's running in the code. It's going to take one minute. DOMINIC: Wait a second. Is that just SQL? The SQL we would use for databases? URI: You can recognize SQL elements like request or where. But BigQuery has an interesting feature, you can run JavaScript as part of your SQL query as you see above. DOMINIC: I love that. URI: Basically, we are using the compiler inside of BigQuery to process all the source files, find all the methods and functions and extract the comments and the bodies of the functions and methods. DOMINIC: I see. So, we use the TypeScript compiler which we use we take advantage of the AST to then find information we need from those files such as the function and all these things. URI: Right. And we do this in the cloud really fast. It's gonna finish in a few seconds. And if you are not familiar with the AST abstract syntax tray, you can find it with the same title on Angular op. We have the results. Let's hide the editor. And as you can see, we have here a big table with basically the comments and the text. Some oh, check it. We should. Should attach. Let's go to the last state. See if there is anything interesting there. So, basically, we have here yeah. Basically, we have here come in. We have here 300,000 comments in our dataset. DOMINIC: In just a minute. URI: Yeah. Wow. This is like really long comment. Oh. This is espanol. DOMINIC: Si, yo habla espanol. URI: We have this large dataset, this big JSON file with 300,000 comments and functions. Are we ready? DOMINIC: No. I'm gone no stop you right there. We are not done yet. We have all the data. That's great. But the next step is actually to clean the data. Because the data, well, it's text. Right? And we have to do a couple things. It's not as tidy as we want it to be. URI: We have seen some comments in Spanish. DOMINIC: Yes, exactly. And that's why the first step would be to turn everything into lower case because we want to remove the noise from the dataset. We also remove URLs because that also just adds noise to the dataset and has no added value, really. And remove non English comments. Because learning one language is different. Right? And learning all these languages would be even more difficult. And then also to reduce the noise and complexity, we replace function names and arguments by just very generic place holders. And we also do that on an AST basis. URI: So, basically the comments no longer contain the function names and arguments. Function name place, argument one place holder. So, it's easier for the model to understand that a specific word in the comment is actually one of the arguments of the function. And once we have done cleaning, we start... DOMINIC: Almost. It's like preparing dinner. So, it's like a recipe and you have certain steps, things you have to do, right? If you want to make a tomato soup, you have to cut the tomatoes in a specific shape for a tomato sauce or soup. That's what preparing is all about. For instance, we have text. And obviously machines, they don't work with text. They work very well with numbers. URI: Numbers! DOMINIC: Exactly. That's what we have to do. URI: We have to take the text of the comment and convent it into a list of numbers. So, we create some kind of a dictionary like you would have an English dictionary. But in this case, it's not English to Spanish, it's English to number. DOMINIC: And also, the other way around. So, it's both ways. URI: So, when we feed the input into the model, you convert the dictionary into a list of numbers. When the model predicts a comment, we use the dictionary to convert the numbers of the model predicted into text. DOMINIC: That's right. URI: That's right. So, are we ready to train it? DOMINIC: No. URI: Why not? DOMINIC: Not yet. We have to come up with a model architectures. There's a couple of things you can do. But for training a model, you would typically use Python as we already mentioned. And there's a library called TensorFlow which lets you create and train the models. And you can think of TensorFlow as like this box which with a lot of Lego blocks, you can think of the blocks at Lego blocks and mix and match and stack them on top of each other to build the model you need. URI: How do you know how to do this? DOMINIC: You would look at similar problems. You wouldn't start from scratch. You would look at similar problems from problem domains. URI: Which is to go to code? DOMINIC: For example, translation. Say German into Hebrew, you want to translate code into English. URI: I will do it live, translating Hebrew to English. Number one DOMINIC: Meaning I have a bird. URI: In Germany, that would be [speaking German] DOMINIC: You could say that. I wouldn't. URI: All right. DOMINIC: Anyways. So, what you do is you look at very similar problems and then you you can start with model two already defined, try to tweak them and stack the building blocks. We're not going into too many details here. But that's basically what you would do. URI: We took an architecture, encode the decoder. You can find it in the repository. We open source everything. Unlike... and then let me show you how we train this model. So, we could install Python on our computer. But that wouldn't be the way that I would recommend to somebody training yeah. That wouldn't be the way I would recommend to somebody who is just getting started with machine learning to do it. Because you know setting up your computer is like half a day of work. There is something better. It's called Google Code Lab. DOMINIC: It is awesome. If you know stack bits, it's like stack bits for machine learning. It's not a code editor but think of it as stack bits for machine learning. URI: And I have the worksheet, I'm going to run the command and cloning the repo. It runs the commands somewhere in the Cloud. And copy our code and creates a small dataset. Let's look at the dataset. It's a super small dataset, just five comments and their functions. The comments have been translated to more complex for us, but simpler for computer presentation, the object syntax tree. And you can see the comments and we replaced the text with argument number one, argument number zero, that was the preparation. DOMINIC: Obviously those are very important to predict meaningful and correct comments. For the first step of this model, we just replaced and reduced the noise to reduce the complexity. URI: We downloaded the code and the model, and we can run LS minus L to see oh. We need elimination. We can run LS to see on the remote machine. And we can see it has all those files from our repo. And we are going to run the training that will take probably a moment or two. So, basically right now what are we doing? DOMINIC: So, we're just started the training process. And as we can see, there is something called epoch one of fiveand this epoch is just another name of iterations. We have a dataset that contains five sets of abstract ASTs and the comment for that function. And one epoch is going through all the data at data points, data entries we have in our dataset. And the training is really just an iterative process going through all of these examples over and over again until that function, this model, produces, well, somewhat good results on a variety of inputs. URI: Right. So, it's almost done. And while it's working, we have trained it in Python, right? But we do need to figure out some way to get it into JavaScript. DOMINIC: We can run Python in the browser or from within JavaScript. It makes no sense. We have to come up with this missing link in between. I think we've already mentioned something that we can use, right? URI: Yes. So, basically there is a TensorFlow JS which is TensorFlow for JavaScript. And we have one more thing we can we need to do so we can use our model inside TensorFlow JS, which is basically convert it into JavaScript model. Which is a JSON file. So, here I'm running this script to convert the model and create a ZIP file with the model that has been exported. DOMINIC: You can think of this as remember that this model exists, and the machine comes up with the implementation. This is what we basically do. We export the implementation of this model to a format that we can easily consume from JavaScript. URI: And the implementation is just a bunch of numbers. We can download the ZIP file. Whoa. That's fast. I would have extracted it, but I already did. Do you know Wiley synths JS? DOMINIC: It's an awesome tool that allows TDD. URI: We are not using TDD. We are going to use what we just downloaded. We have TScomment predicter that has a comment predicter plus that knows how to load this model using TensorFlow JS. You can explain what I'm doing while I'm doing it. So, first of all, we need to load the predicter. DOMINIC: He's already explaining it. URI: Comment predicter. DOMINIC: I don't have a job. So, yeah. URI: Sorry. DOMINIC: We put it we wrap everything, all the TensorFlow JS stuff in an abstraction. It's a thing called comment predicter. We're loading the model that we just saved and converted into JSON that we can easily load it. And that uses generators JavaScript generators. And now what we do is create a function. One example that we want to feed in, that is the input to the model. And it's just a string, right? And then in the next step, awesome function, technology is awesome. Good. URI: Love the result of finding the predicter to predict with the function. DOMINIC: And we're feeding in. What do you think is gonna happen? Hopefully it produces some comment. URI: It's thinking, you see? DOMINIC: Let's see. Remember, this is a very, very, very small dataset. It was only five examples and, well, do you think that the results are gonna be good? Bad? URI: Or no results? DOMINIC: Or no results at all because machines never work? URI: Yeah. Seem like my computer is tired. Let's give it one more try. Maybe it should be doesn't like this kind of... so, yeah. Our model is lazy. We created a lazy model. What have you done? DOMINIC: Just like us. I would say. URI: Yeah. Anyway, so, to be honest, this model was not very impressive. It wouldn't to a lot of stuff because we only tried it on five functions. And apparently, it's all lazy. So, it would probably predict some nonsense. DOMINIC: Maybe that's a sign. Maybe there is no comment because URI: There is no DOMINIC: There is no comment. URI: Yeah, there is no. Now it worked. And as you can see, it's not very intelligent. What it says is technology, technology, technology, technology. TT. DOMINIC: Very important. TT. URI: Very important. DOMINIC: That's typically how I end my comments as well. URI: It still is impressive because the model learned there is slash in the beginning of comments and space between words, I guess. DOMINIC: It has learned something. URI: From five examples. Not too bad. DOMINIC: Let's take a look at a real thing. URI: Yeah. So, we trained the model for an hour on a dedicated Google has this dedicated machine learning hardware called TPU. So, if we train this model regularly on our machine, I did it overnight, it took seven hours just for one iteration and we had to do hundreds of iterations in order to get it good. But with the TPU, it takes like 5 seconds. DOMINIC: machine dies. It can't do it. URI: We saw this worked and we wanted to get it into the Visual Studio Code extension. How would you do it? DOMINIC: Angular has schematics, scaffolding. And there's a tool called Yeoman. It's a scaffolding code, yo, code. >> Rob San Martin: We'll call it... DOMINIC: Lazy octopus. Just to match the theme. URI: Yes, there's Angular released names. No description, no repository, NPM. And basically, at this point we have VSCode extension. The code for us. DOMINIC: Awesome. This is machine learning. URI: And as soon as it finishes, we also want to install here our model. And... DOMINIC: Which we published to npm. All this shows you that it is not that difficult after all to consume these models. You can publish them to np. You can use the tools. Don't be afraid of machine learning. You don't have to do the data science yourself. Or find someone who knows the data science. You can get started and be in your comfort zone. URI: Just download TensorFlow and start installing it. And if you ever created a Python environment, you would know how much time it takes to get this installed. Here we get it in three seconds from npm. You don't have to do anything. Just npm install. Now we have this lazy octopus. And we can open the new project in Visual Studio Code. And if we look at the packet JSON, this is in code extensions, you can see it defines a command. A command is what happens when you open the command palette. When you create a new extension, you can add something to the list of the commands. We'll add a command... DOMINIC: Add comment to function. URI: Thank you. And then we would like to implement it. Actually, let's look at the implementation that comes by default. It should show an information message saying, hello, world. Let's just do a quick check to see if that work. I'm going to hit F5. And hopefully this is going to build the extension and open. Oh. Errors. DOMINIC: Npm watch. What is it? URI: Oh, so many errors. DOMINIC: Okay. Skip the check. There's a flag in the conflict file that lets you skip the check. URI: Read about it at home. It's magic. This is the real magic. [ Laughter ] DOMINIC: We appreciate that. That's good. URI: All right. So, now we can press F5 and we have another Visual Studio Code inside our inception. And then we can press oh! We see a new command edit. And when we click it, it says hello world. DOMINIC: That is awesome. URI: So, the only thing we have to do now is just to implement it. DOMINIC: Right. We just installed all these packages that we've published to npm which contain the trained model and the abstraction. So, we can easily just, you know, implement everything here inside the Visual Studio Code extension. So, let's do that. URI: Yeah. Let's start. Yeah. I create I need some utility functions. Lets me create these easy field functions. So, this is cheating. Just coded it beforehand. But basically, these are functions to fill the compiler to get the kind of function you are kindly looking at in the editor. And we have code that calls the functions. Basically, returns the current element the current node, the current TypeScript element you are looking at and find parent function. Returns the function you are currently looking at. So, let's do a quick demo of that where we just display the function we are looking at. Get back. And reload this. And now when we run the command, we can see that as soon as you click add comment to function, activate the extension, and it gives us the function we were looking at. The same function. And DOMINIC: This is exactly what we need for our model. URI: Yeah. That's the input from the model we get. And let's load the model, right? DOMINIC: Yes, let's do it. URI: So, importing and creating the model from the npm package. And then the last thing we would need to do is like, we are going to show a progress bar because this is going to take a few seconds to predict the comment. It says, predicting comment. And this will call the predicter, the predict function. The one we used in the text we did. Function test. And then it will insert that comment at the starting position of that function. So, just before the function, the comment will be automatically inserted. Are we ready to see this in action? DOMINIC: Yes, I want to see this. URI: Okay. DOMINIC: And by the way, this is nothing that we can rehearse. It's machine learning after all. We don't know what's going to happen. URI: We are going to run this function at time series. Will it work? Predicting comments! What is that? DOMINIC: Satisfaction offer constraint to a function. All of these invocations are unless otherwise noted. [ Applause ] URI: Shall we try another one? DOMINIC: Creative, I would say. URI: Okay. Let's try another one. I would say this oh! Speedy function. A speedy JS function has the async model and contains use speedy JS directive. It doesn't have it, if I run it and run it again, will that be the same thing? DOMINIC: Well, I would guess, yeah, that makes sense. Observe returns true when objects are shallow, equal. Yes. That's what I would expect from observe. I guess. URI: Magic time. Let's see. DOMINIC: I hope it's not gonna break this time. Whoo! Very nice. Okay. URI: So, I think we can summarize. What have you seen here? DOMINIC: So, before we talk about our learnings, I guess there's things we can still improve. And we definitely want to continue working on this because it's just fun and fascinating. So, overfitting is a problem. Overfitting means that the machine really just memorizes the entire dataset. That's why it's so bad at actually predicting proper comments for something that it hasn't seen. So, that's something we should definitely fix. URI: Yeah, I mean, we only tried on 2,000 functions. So, it has a small dataset, it can memorize it. But once we go bigger, well, it won't be able to memorize everything so it will have to be smart and learn. DOMINIC: Exactly. But then there's the problem of memory constraints. If you have a very, very large dataset, you have to come up with what ways, how to reduce the memory requirements and all of these things. So, that's another thing we have to do. URI: Right. But that's not your problem. Because I think the important thing here, you can use the help of a data scientist for the dataset. You can just do the JavaScript integration. Like integrate what they worked so hard to create into your application. DOMINIC: Yeah. And basically, what we've seen is that how easy it is to get started with machine learning. Even in the JavaScript world, right? We can easily use TensorFlow JS to convert Python models and integrate into a virtual Code extension and all of these things. URI: Yes. And basically, we know that text summarization, which is what we tried to summarize the code into a content is possible. It's an active area of research and people are doing it. So, we believe that with some more work we will get perhaps less funny, but more useful results. And we think that tight integration with existing like Visual Studio Code is crucial to success. This is where you come in. Where you can take machine learning and integrate it into your environment, your application. And DOMINIC: I guess the main takeaway is that machine learning is exciting. URI: And it's coming to JavaScript. DOMINIC: And it's coming to JavaScript and that you can all get started. URI: Yeah. So, you can see all the code that we created and the comments we did last night and this morning and after this morning and five minutes ago in the repository. The slides are here. We'll also publish everything on our Twitters. And with that... DOMINIC: Thank you very much.
Info
Channel: AngularConnect
Views: 1,886
Rating: 4.647059 out of 5
Keywords:
Id: eWhd48A3j6Y
Channel Id: undefined
Length: 34min 47sec (2087 seconds)
Published: Fri Sep 27 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.