So, hello. Thank you. Do you see my screen? No. That's whoo. Now you see my screen. It's alive. What's alive? We're alive, I guess? >> We're alive. >> From the stage. In
>> Great times, hopefully machine learning can write code for us. As you will discover soon. She did some spoilers. So, we won't go over this again. But Dominic and I met here in AngularConnect
last year. We attended a very interesting talk. Do you recognize this guy?
URI: I think I recognize him. If I look closely, I think his name is Asim. That's at AngularConnect last year, almost
one year ago. And it looks like, I don't know, you probably
know him from some of the projects he did. Can you remember what that project was about? DOMINIC: Yeah. He showed a lot of exciting demos with machine
learning. One was taking a photo and running it to an
algorithm that will tell you what the emotion conveyed in that photo is. And then replacing that photo in the present
state with a modifier. Worked quite well. DOMINIC: Yeah, it worked well. URI: And he had another interesting concept. DOMINIC: Yeah. He talked about AI in JavaScript and a couple
examples. One that we remember well was this form. The goal was an HTML form by putting in a
description of that form so we could say, hey, we want to have an input field and two
check boxes and a radio button and all these things. And automatically it will generate that form
for us. Isn't that insane? But guess what happened next? URI: He released it as open source? DOMINIC: Not quite. I don't know. URI: Click the submit button and then you
can load it. カ?カ
he fooled everybody. So, I guess we're not here for dancing. DOMINIC: This song is called Never Gonna Give
You Up. We're not going to give up on that. It's not possible. URI: Thank you to Asim for inspiring us to
do this talk. DOMINIC: We wouldn't be on this stage. We had a lot of inspiration from Asim. URI: After the talk, we started brainstorming,
how could we bring this into reality? What could we do? We had a couple of ideas like sketching an
app on a notebook and scanning it and getting it out of that. But eventually we had one idea that clicked. DOMINIC: Exactly. We went with this idea of automatically generating
code for us. Basically, taking away our job. That's our job, writing code, right? So, what we thought was given a function signature,
that's what we see there, we pass it into this black box that we call model. And then we wanted to take the implementation
of this function. And in this case, it's just adding A and B
together. That's what we wanted it to do. URI: So, basically given the start of the
function, write a function. DOMINIC: Yeah. URI: And what's the thing in the middle? The model? Can you explain a bit? DOMINIC: This model, we don't have to dive
into all the details. But think of the model, this black box, as
a JavaScript ish function that has one input. That could be anything. We pass in an input, it does computational
magic. We don't have to worry about the magic. And then returns an output. That's what model does. URI: Basically, it's magic. DOMINIC: It is, but we don't have to do this
computational magic ourselves. We let the machine do the function. URI: That's what machine learning is all about. Getting the content of that magic model function
figured out. DOMINIC: Exactly. URI: All right. So, speaking of magic, can you hold this for
me? DOMINIC: I can. I don't know what's going to happen. URI: That's machine learning. It doesn't work. DOMINIC: Machine learning doesn't work. URI: Try that again. It's broke. We will try that again later. DOMINIC: You might be like this, we've just
seen that generating an HTML form from description is not really possible. You might be like, oh, come on. Stop fooling us, it's not possible. But... URI: We actually went ahead and tried this
and gave a talk about it a few months ago in a great conference. And you will be available to see in a moment
what it looked like in that talk. We tried to right. DOMINIC: That's a glimpse. Let's have a look. Hopefully the audio works. URI: Yeah. You were really surprised by the implementation. Like, I wouldn't say it's the most straightforward
implementation you would come up with. DOMINIC: Well, I can see flush after. It kind of subtracts A from B. It could be
I think setting the thing it's very creative, I would say. URI: But, yeah. The model got really creative. It did manage to create like valid code which
is quite amazing. And very creative indeed. And after this talk, we were excited. And we summarized our learning. So, first of all, as you probably have
you just had a claim spot, we learned that automatic code generation is hard, right? DOMINIC: Right. We also learned that data processing we talked
about this a lot at ngViking. Data processing, cleaning and gathering the
data, all those things, that makes up a huge chunk of the work. It is a lot of the percentage of the type
of work just focusing on the data because the data is so important for machine learning. URI: And basically, we really love working
with machine learning. We found it hard, difficult, but fascinating. And we decided to give it another go. This time we had a different goal, though. We decided to think, what else can we do with
code? DOMINIC: I could think of a couple of things. But let's not try to take away our job. Let's try and create this synergy between
us and machines. URI: I have an idea. Do you like to write comments? Do you write comments? Do you people write comments? DOMINIC: I have to. URI: Who writes comments? DOMINIC: Who likes to write comments? URI: who likes to write comments? DOMINIC: There's someone. That's good for you. URI: Wow. You can come and work with our machine learning
model together. For the rest of us, we decided to try to,
again, given a function, this time the entire function, just predict what the comment would
be. Have this model function. Return us the comment for that function. DOMINIC: That is great. I mean, that really makes our lives at developers
much easier. We don't have to document our code. We just have this model which generates the
comments for us and summarizes the code. That is fantastic. Does it work? URI: So, we also had another goal. Before we show you if it worked, we decided
this time we were going to use machine learning with JavaScript. Because traditionally machine learning is
done in other languages such as Python. And then if you want to take your machine
learning models and use them everywhere, basically on the web or in Node.js, it's not very straightforward
if they are written in Python. Python doesn't run on the web. So, we decided to try and use JavaScript this
time. We'll tell you about it in a bit. So, the process the general process looks
like this. We start by gathering data. DOMINIC: We need a lot of data. URI: Like a lot. Like how much? DOMINIC: Well, maybe 300,000 of examples? Or even a million of examples. So, it's a lot of data. URI: And then I see we need to clean it and
prepare it. DOMINIC: Yes. URI: So, it's easy for the model to find out
how to approach code. Like, and then we need to train the model. DOMINIC: Exactly. But this is basically what we've talked about
already at ngVikings. But as we know, we wanted to add this layer
of JavaScript to it to make it a little bit more practical for us so we could consume. What we did is we added another layer to this. So, taking a trained model, we're gonna see
what it looks like. And then somehow use TensorFlow JS to consume
it in JavaScript. URI: So, TensorFlow, that is a machine learning
library for Python. But it has a special property. There are also versions for JavaScript, for
Android, for iOS. So, a bit like Angular. It's universal. You can run your models anywhere. You can create them in Python, run them in
JavaScript. DOMINIC: Exactly. And then once we have it in the JavaScript
world, we can then use the tool that is we use, such as Visual Studio Code, which is
awesome. And then we can create an extension for it
so that we can consume the model in an extension and then predict the comment. URI: All right. So, let's start with gathering the data. We need you said like
DOMINIC: A lot of it. URI: A lot of comments. So, we have like 500 people in the audience. If everybody writes comments for us, how long
will it take us do you perhaps have a better idea how can we get a lot of comments? GitHub! Yeah. So, yeah. Can you go in GitHub and just download... DOMINIC: Well, I know what I typically do. Press this button. Go through all the repositories and click
the down up button. URI: Hundreds of thousands of times. DOMINIC: It's gonna take me a while. But, sure. URI: I have a better way. Let me show you. So, there is this thing called BigQuery. It's large scale data warehouse from Google
with a lot of buzzwords in the name. But basically, what it means, it can run SQL
queries on an enormous amount of data in a matter of seconds. DOMINIC: Can you show me what that looks like? URI: Yes. It also happens that BigQuery has the entire
open source code from GitHub as a dataset that we can query. And let me take you to my laptop where I can
show you BigQuery in action. Spoiler: Another spoiler. Yeah. So, this is basically BigQuery. And I have a query here. I'm going to run it in a moment. You can see it's going to no, I didn't
want developer tools. I want full screen. You can see it's going to persist 2.3 gigabytes
of source code when I run it. I'm clicking the run button. It's running in the code. It's going to take one minute. DOMINIC: Wait a second. Is that just SQL? The SQL we would use for databases? URI: You can recognize SQL elements like request
or where. But BigQuery has an interesting feature, you
can run JavaScript as part of your SQL query as you see above. DOMINIC: I love that. URI: Basically, we are using the compiler
inside of BigQuery to process all the source files, find all the methods and functions
and extract the comments and the bodies of the functions and methods. DOMINIC: I see. So, we use the TypeScript compiler which we
use we take advantage of the AST to then find information we need from those files
such as the function and all these things. URI: Right. And we do this in the cloud really fast. It's gonna finish in a few seconds. And if you are not familiar with the AST abstract
syntax tray, you can find it with the same title on Angular op. We have the results. Let's hide the editor. And as you can see, we have here a big table
with basically the comments and the text. Some oh, check it. We should. Should attach. Let's go to the last state. See if there is anything interesting there. So, basically, we have here yeah. Basically, we have here come in. We have here 300,000 comments in our dataset. DOMINIC: In just a minute. URI: Yeah. Wow. This is like really long comment. Oh. This is espanol. DOMINIC: Si, yo habla espanol. URI: We have this large dataset, this big
JSON file with 300,000 comments and functions. Are we ready? DOMINIC: No. I'm gone no stop you right there. We are not done yet. We have all the data. That's great. But the next step is actually to clean the
data. Because the data, well, it's text. Right? And we have to do a couple things. It's not as tidy as we want it to be. URI: We have seen some comments in Spanish. DOMINIC: Yes, exactly. And that's why the first step would be to
turn everything into lower case because we want to remove the noise from the dataset. We also remove URLs because that also just
adds noise to the dataset and has no added value, really. And remove non English comments. Because learning one language is different. Right? And learning all these languages would be
even more difficult. And then also to reduce the noise and complexity,
we replace function names and arguments by just very generic place holders. And we also do that on an AST basis. URI: So, basically the comments no longer
contain the function names and arguments. Function name place, argument one place holder. So, it's easier for the model to understand
that a specific word in the comment is actually one of the arguments of the function. And once we have done cleaning, we start... DOMINIC: Almost. It's like preparing dinner. So, it's like a recipe and you have certain
steps, things you have to do, right? If you want to make a tomato soup, you have
to cut the tomatoes in a specific shape for a tomato sauce or soup. That's what preparing is all about. For instance, we have text. And obviously machines, they don't work with
text. They work very well with numbers. URI: Numbers! DOMINIC: Exactly. That's what we have to do. URI: We have to take the text of the comment
and convent it into a list of numbers. So, we create some kind of a dictionary like
you would have an English dictionary. But in this case, it's not English to Spanish,
it's English to number. DOMINIC: And also, the other way around. So, it's both ways. URI: So, when we feed the input into the model,
you convert the dictionary into a list of numbers. When the model predicts a comment, we use
the dictionary to convert the numbers of the model predicted into text. DOMINIC: That's right. URI: That's right. So, are we ready to train it? DOMINIC: No. URI: Why not? DOMINIC: Not yet. We have to come up with a model architectures. There's a couple of things you can do. But for training a model, you would typically
use Python as we already mentioned. And there's a library called TensorFlow which
lets you create and train the models. And you can think of TensorFlow as like this
box which with a lot of Lego blocks, you can think of the blocks at Lego blocks and mix
and match and stack them on top of each other to build the model you need. URI: How do you know how to do this? DOMINIC: You would look at similar problems. You wouldn't start from scratch. You would look at similar problems from problem
domains. URI: Which is to go to code? DOMINIC: For example, translation. Say German into Hebrew, you want to translate
code into English. URI: I will do it live, translating Hebrew
to English. Number one
DOMINIC: Meaning I have a bird. URI: In Germany, that would be [speaking
German] DOMINIC: You could say that. I wouldn't. URI: All right. DOMINIC: Anyways. So, what you do is you look at very similar
problems and then you you can start with model two already defined, try to tweak them
and stack the building blocks. We're not going into too many details here. But that's basically what you would do. URI: We took an architecture, encode the decoder. You can find it in the repository. We open source everything. Unlike... and then let me show you how we
train this model. So, we could install Python on our computer. But that wouldn't be the way that I would
recommend to somebody training yeah. That wouldn't be the way I would recommend
to somebody who is just getting started with machine learning to do it. Because you know setting up your computer
is like half a day of work. There is something better. It's called Google Code Lab. DOMINIC: It is awesome. If you know stack bits, it's like stack bits
for machine learning. It's not a code editor but think of it as
stack bits for machine learning. URI: And I have the worksheet, I'm going to
run the command and cloning the repo. It runs the commands somewhere in the Cloud. And copy our code and creates a small dataset. Let's look at the dataset. It's a super small dataset, just five comments
and their functions. The comments have been translated to more
complex for us, but simpler for computer presentation, the object syntax tree. And you can see the comments and we replaced
the text with argument number one, argument number zero, that was the preparation. DOMINIC: Obviously those are very important
to predict meaningful and correct comments. For the first step of this model, we just
replaced and reduced the noise to reduce the complexity. URI: We downloaded the code and the model,
and we can run LS minus L to see oh. We need elimination. We can run LS to see on the remote machine. And we can see it has all those files from
our repo. And we are going to run the training that
will take probably a moment or two. So, basically right now what are we doing? DOMINIC: So, we're just started the training
process. And as we can see, there is something called
epoch one of fiveand this epoch is just another name of iterations. We have a dataset that contains five sets
of abstract ASTs and the comment for that function. And one epoch is going through all the data
at data points, data entries we have in our dataset. And the training is really just an iterative
process going through all of these examples over and over again until that function, this
model, produces, well, somewhat good results on a variety of inputs. URI: Right. So, it's almost done. And while it's working, we have trained it
in Python, right? But we do need to figure out some way to get
it into JavaScript. DOMINIC: We can run Python in the browser
or from within JavaScript. It makes no sense. We have to come up with this missing link
in between. I think we've already mentioned something
that we can use, right? URI: Yes. So, basically there is a TensorFlow JS which
is TensorFlow for JavaScript. And we have one more thing we can we need
to do so we can use our model inside TensorFlow JS, which is basically convert it into JavaScript
model. Which is a JSON file. So, here I'm running this script to convert
the model and create a ZIP file with the model that has been exported. DOMINIC: You can think of this as remember
that this model exists, and the machine comes up with the implementation. This is what we basically do. We export the implementation of this model
to a format that we can easily consume from JavaScript. URI: And the implementation is just a bunch
of numbers. We can download the ZIP file. Whoa. That's fast. I would have extracted it, but I already did. Do you know Wiley synths JS? DOMINIC: It's an awesome tool that allows
TDD. URI: We are not using TDD. We are going to use what we just downloaded. We have TScomment predicter that has a comment
predicter plus that knows how to load this model using TensorFlow JS. You can explain what I'm doing while I'm doing
it. So, first of all, we need to load the predicter. DOMINIC: He's already explaining it. URI: Comment predicter. DOMINIC: I don't have a job. So, yeah. URI: Sorry. DOMINIC: We put it we wrap everything,
all the TensorFlow JS stuff in an abstraction. It's a thing called comment predicter. We're loading the model that we just saved
and converted into JSON that we can easily load it. And that uses generators JavaScript generators. And now what we do is create a function. One example that we want to feed in, that
is the input to the model. And it's just a string, right? And then in the next step, awesome function,
technology is awesome. Good. URI: Love the result of finding the predicter
to predict with the function. DOMINIC: And we're feeding in. What do you think is gonna happen? Hopefully it produces some comment. URI: It's thinking, you see? DOMINIC: Let's see. Remember, this is a very, very, very small
dataset. It was only five examples and, well, do you
think that the results are gonna be good? Bad? URI: Or no results? DOMINIC: Or no results at all because machines
never work? URI: Yeah. Seem like my computer is tired. Let's give it one more try. Maybe it should be doesn't like this kind
of... so, yeah. Our model is lazy. We created a lazy model. What have you done? DOMINIC: Just like us. I would say. URI: Yeah. Anyway, so, to be honest, this model was not
very impressive. It wouldn't to a lot of stuff because we only
tried it on five functions. And apparently, it's all lazy. So, it would probably predict some nonsense. DOMINIC: Maybe that's a sign. Maybe there is no comment because
URI: There is no DOMINIC: There is no comment. URI: Yeah, there is no. Now it worked. And as you can see, it's not very intelligent. What it says is technology, technology, technology,
technology. TT. DOMINIC: Very important. TT. URI: Very important. DOMINIC: That's typically how I end my comments
as well. URI: It still is impressive because the model
learned there is slash in the beginning of comments and space between words, I guess. DOMINIC: It has learned something. URI: From five examples. Not too bad. DOMINIC: Let's take a look at a real thing. URI: Yeah. So, we trained the model for an hour on a
dedicated Google has this dedicated machine learning hardware called TPU. So, if we train this model regularly on our
machine, I did it overnight, it took seven hours just for one iteration and we had to
do hundreds of iterations in order to get it good. But with the TPU, it takes like 5 seconds. DOMINIC: machine dies. It can't do it. URI: We saw this worked and we wanted to get
it into the Visual Studio Code extension. How would you do it? DOMINIC: Angular has schematics, scaffolding. And there's a tool called Yeoman. It's a scaffolding code, yo, code. >> Rob San Martin: We'll call it... DOMINIC: Lazy octopus. Just to match the theme. URI: Yes, there's Angular released names. No description, no repository, NPM. And basically, at this point we have VSCode
extension. The code for us. DOMINIC: Awesome. This is machine learning. URI: And as soon as it finishes, we also want
to install here our model. And... DOMINIC: Which we published to npm. All this shows you that it is not that difficult
after all to consume these models. You can publish them to np. You can use the tools. Don't be afraid of machine learning. You don't have to do the data science yourself. Or find someone who knows the data science. You can get started and be in your comfort
zone. URI: Just download TensorFlow and start installing
it. And if you ever created a Python environment,
you would know how much time it takes to get this installed. Here we get it in three seconds from npm. You don't have to do anything. Just npm install. Now we have this lazy octopus. And we can open the new project in Visual
Studio Code. And if we look at the packet JSON, this is
in code extensions, you can see it defines a command. A command is what happens when you open the
command palette. When you create a new extension, you can add
something to the list of the commands. We'll add a command... DOMINIC: Add comment to function. URI: Thank you. And then we would like to implement it. Actually, let's look at the implementation
that comes by default. It should show an information message saying,
hello, world. Let's just do a quick check to see if that
work. I'm going to hit F5. And hopefully this is going to build the extension
and open. Oh. Errors. DOMINIC: Npm watch. What is it? URI: Oh, so many errors. DOMINIC: Okay. Skip the check. There's a flag in the conflict file that lets
you skip the check. URI: Read about it at home. It's magic. This is the real magic. [ Laughter ] DOMINIC: We appreciate that. That's good. URI: All right. So, now we can press F5 and we have another
Visual Studio Code inside our inception. And then we can press oh! We see a new command edit. And when we click it, it says hello world. DOMINIC: That is awesome. URI: So, the only thing we have to do now
is just to implement it. DOMINIC: Right. We just installed all these packages that
we've published to npm which contain the trained model and the abstraction. So, we can easily just, you know, implement
everything here inside the Visual Studio Code extension. So, let's do that. URI: Yeah. Let's start. Yeah. I create I need some utility functions. Lets me create these easy field functions. So, this is cheating. Just coded it beforehand. But basically, these are functions to fill
the compiler to get the kind of function you are kindly looking at in the editor. And we have code that calls the functions. Basically, returns the current element
the current node, the current TypeScript element you are looking at and find parent function. Returns the function you are currently looking
at. So, let's do a quick demo of that where we
just display the function we are looking at. Get back. And reload this. And now when we run the command, we can see
that as soon as you click add comment to function, activate the extension, and it gives us the
function we were looking at. The same function. And
DOMINIC: This is exactly what we need for our model. URI: Yeah. That's the input from the model we get. And let's load the model, right? DOMINIC: Yes, let's do it. URI: So, importing and creating the model
from the npm package. And then the last thing we would need to do
is like, we are going to show a progress bar because this is going to take a few seconds
to predict the comment. It says, predicting comment. And this will call the predicter, the predict
function. The one we used in the text we did. Function test. And then it will insert that comment at the
starting position of that function. So, just before the function, the comment
will be automatically inserted. Are we ready to see this in action? DOMINIC: Yes, I want to see this. URI: Okay. DOMINIC: And by the way, this is nothing that
we can rehearse. It's machine learning after all. We don't know what's going to happen. URI: We are going to run this function at
time series. Will it work? Predicting comments! What is that? DOMINIC: Satisfaction offer constraint to
a function. All of these invocations are unless otherwise
noted. [ Applause ]
URI: Shall we try another one? DOMINIC: Creative, I would say. URI: Okay. Let's try another one. I would say this oh! Speedy function. A speedy JS function has the async model and
contains use speedy JS directive. It doesn't have it, if I run it and run it
again, will that be the same thing? DOMINIC: Well, I would guess, yeah, that makes
sense. Observe returns true when objects are shallow,
equal. Yes. That's what I would expect from observe. I guess. URI: Magic time. Let's see. DOMINIC: I hope it's not gonna break this
time. Whoo! Very nice. Okay. URI: So, I think we can summarize. What have you seen here? DOMINIC: So, before we talk about our learnings,
I guess there's things we can still improve. And we definitely want to continue working
on this because it's just fun and fascinating. So, overfitting is a problem. Overfitting means that the machine really
just memorizes the entire dataset. That's why it's so bad at actually predicting
proper comments for something that it hasn't seen. So, that's something we should definitely
fix. URI: Yeah, I mean, we only tried on 2,000
functions. So, it has a small dataset, it can memorize
it. But once we go bigger, well, it won't be able
to memorize everything so it will have to be smart and learn. DOMINIC: Exactly. But then there's the problem of memory constraints. If you have a very, very large dataset, you
have to come up with what ways, how to reduce the memory requirements and all of these things. So, that's another thing we have to do. URI: Right. But that's not your problem. Because I think the important thing here,
you can use the help of a data scientist for the dataset. You can just do the JavaScript integration. Like integrate what they worked so hard to
create into your application. DOMINIC: Yeah. And basically, what we've seen is that how
easy it is to get started with machine learning. Even in the JavaScript world, right? We can easily use TensorFlow JS to convert
Python models and integrate into a virtual Code extension and all of these things. URI: Yes. And basically, we know that text summarization,
which is what we tried to summarize the code into a content is possible. It's an active area of research and people
are doing it. So, we believe that with some more work we
will get perhaps less funny, but more useful results. And we think that tight integration with existing
like Visual Studio Code is crucial to success. This is where you come in. Where you can take machine learning and integrate
it into your environment, your application. And
DOMINIC: I guess the main takeaway is that machine learning is exciting. URI: And it's coming to JavaScript. DOMINIC: And it's coming to JavaScript and
that you can all get started. URI: Yeah. So, you can see all the code that we created
and the comments we did last night and this morning and after this morning and five minutes
ago in the repository. The slides are here. We'll also publish everything on our Twitters. And with that... DOMINIC: Thank you very much.