Tensorflow for Deep Learning Research - Lecture 1

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
we are going to be learning tensorflow together I have been wanting for a while to make some tutorials on video tutorials on tensorflow because what I found that even though there are tons of tutorials on YouTube they are either too shallow or they could not really hit the points that I wanted to hit and then recently I found that Stanford is offering the score some CS 20s i and this is the winter of 2017 and they made the slides available for anybody so even though the videos are not online but the slides are so I thought I'll just use the slides the great slides which have been made by chip hewn from Stanford and we just follow along the tutorial so the general idea out here is that I'm going to be just taking the slide from the Stanford class as they become available and I have been working with tensorflow for the past year or so and I'll just put in my thoughts on them and this would be a good way hopefully to do follow the Stanford casts if you're following the slides or to just learn ten suffer from crash so without further ado the agenda will have the same agenda which is the overview of tensorflow and graphs and sessions two of the most important and when you're starting probably the most mind-bending concepts that you will come across so this is chip thanks a lot ship for the slides thanks for making them available to the public and I'll go if there are certain slides which don't directly which are not directly applicable I just like to go through them very quickly not so if you're here you probably know what tensorflow is but it is on library for numerical computation that has been open sourced by by Google so Google brain team uses tense flow within there within their own team and they have a separate branch for that day that they use but they have open source quite a bit of it and the Delta between what they are using and what they have open source is just because of the complexity of making that codebase open source oh that's a-- that's what they have put out and the basic idea out here is that we are going to be creating graphs and making numerical computations on top of these graphs so tensorflow has a huge suite of functions and classes and it will enable you to build various models be it deep learning models or shallow models whatever you want logistic regression linear regression you can build pretty much anything from scratch in tensorflow and the interesting tensorflow has just exploded since since it was released slightly over a year ago now there are many deep learning library tensorflow is one of them torch is another from face book and now recently two days ago Facebook open source this framework called Pi torch which is there is a Python API for torch and one of the things that they put out in the release is that pi torch or torch and general would allow you to build dynamic neural networks or the sensor flow is more suitable for static models where you know the model when you're starting you build it out and then then you run computation on it there are others like piano cafe for the Microsoft version of everything C NT k etc etc but what we are going to be going for obviously is Spencer flow one of the best things about tensor flow that I feel is that it has this Python API and I love Python I love coding in Python the other thing is that you can deploy the model anywhere you can deploy it on cpu GPU I have a GPU box with four Titan X GPUs in there and tensorflow just works very well with that you it can use one or more GPUs in parallel it also has it can also be deployed on android iOS all kinds of platforms checkpoints which is another checkpoint is something that in one one way or another is available with most decoding models the whole idea is that as you are running the model the model is going to be saving its state every now and then so that if you're running a multi-week or a multi-day model and your power goes out it's not that you'll have to start everything from scratch so checkpoints save the state of the models so that the models can start from that point down the line also differentiation is pretty much everybody every deep learning framework will have to have without that it's not much fun and I don't know if you guys have tried to write the backpropagation layers for for your networks from scratch I have done that and it's extremely illuminating so I would really really advise everybody who has not done that to go ahead and try to write the various backpropagation layers for the forward and the backward passes for various layers initially to get started so that you have an insight of how easy things like frameworks like tensor flow make your life obviously tensor flow is being used by a huge community and it just keeps on growing lots of companies are using them the list keeps on going up and let's go through some of the projects so this is you will start translating an amazing paper where you take two images let's say there is this image of this line and then there is this other classic and what the network does it takes the style from the masterpiece let's say whatever I am sorry I don't know who the painter is out here but it will take the style from this painter and the content from this other picture of the line and we and it would come up with this kind of a fused image and the network just does it for you it's an amazing amazing paper that came out earlier last year now there's a lot of work that's happening in generation be it handwriting generation which is what dies being shown in this slide and then recently I came across this paper called stack Ganso stack generative adversarial networks which takes in text such as I need a bird with a red beak and yellow breast and it will just generate that the image of the world so there has been a lot of work on gas so the stag and paper actually generates higher resolution images amazing amazing paper wavelets that has been in the news quite a bit recently and hopefully the class gives everybody the ability to be able to just piece together projects such as what we just start so in this lecture we will go through the graph the computational graph model that that that stencil flow has we will go through some of the building functions and we will learn how to structure these models for from a deep learning perspective so going through some of these slides so there are some books being mentioned out here but what I have found is that the API is really fluid so it keeps on changing very very fast so your best bet is to go to tensorflow org and whatever you want is out there so just use that as a source for for the latest aps etc okay so once you setup tensor flow and I would assume that you can just follow the instructions which are laid out on the tensor flow website you just you just import tensor flow SDF in your Python shell right now there are many different libraries at high level abstractions that are coming out on top of inter flow such as TF learn TF slim Kara's pretty been pretty tensor etc now they do make your life simple if you are trying to build models where what everything is just in your control well let me try to four simple things these high level abstractions really work well and they are getting better and better in the sense that they can handle more and more complex things but initially I would suggest just just learn the darn thing from scratch because it will give you the best bang for the buck as far as learning what's happening under the hood so that you can piece stuck together and you do want to build models which are which you came up with it's not just the off-the-shelf models that you want to implement but you want to be able to build models from scratch and learning tensorflow from scratch is going to give you the ability to do that so let's talk about graphs and sessions so one of the major things about tensorflow is that it separates how you define your computation from the actual execution of the computation so what does that mean basically when you're writing a model in tensorflow you will be assembling a computational graph and once you have done that you would use a session to execute whatever operations you laid out in the graph now this will become very clear in the next few minutes so just hang on let's start with the basics tensor is basically a n dimensional matrix if it's 0 dimensional it's just a number 1 to 2 point 5 that's a scalar that's the video dimensional tensor 1 D tensor would be a vector which is if from Bob pipetting perspective it's just a list of numbers 1 2 3 4 2 D tensor is a matrix and the and the dimensionality keeps on increasing so any n dimensional matrix is a 10-0 now once we let's let's start with this concept of how how the whole computer generation of the computational graph works so let's say we have once you import tensor flow stf you come up with a statement a is equal to TF dot ads which is an operator 2 comma 3 run now if you were to visualize that using tensor tense about by the way tense of a tensor bold and amazing visualization platform that is already built into tensor flow so highly recommended we'll get into as the lecture progresses we'll get into how to use tensor board but it allows you to visualize your graph very easily so highly recommended that you start using it right away so me when we do is equal to TF dot add two three and we try to run this what will happen and when you visualize is in Tessa flow is that you'll see something like this so there are these two nodes that come up in the visualization x and y and then there is an additional node which says add so for now let's just assume that these x and y are three and five and in the tensor flow visualization the nodes are the nodes are the operators variables constants etcetera and the actual tensor values are on the edges so this is the visualization that shows up now when you actually run this TF dot add three five let's say a is equal to TF dot add three five and you try to print a you see that it doesn't show up with three plus five you would expect if the computation did go to you would you would think that it would come up with something like eight but it does not at this point we are just creating the graph that we are talking about out here right so unknown has been created ad and this is the shape which is which is being input from from the addition which is three plus five so it's a scalar so it's a zero dimensional zero dimensional tensor which has the shape of just empty brackets and it is the D type is in 32 so now if I were to do B's equal to TF dot add three 3.0 5.0 and you say what you'll see is now another node has been added which is incremented this the same is still a scalar but now that the type is float so there are graph interpretation is inferring whatever datatype that this tenth of this be tensor can hold and in this case its grow thirty-two so the nodes in the graph again are going to be what are called operators variables constants and the edges are tensors and we'll go into the definitions of all of these different different types of nodes that we just talked about so again out here this is this is what I just showed you on the Python console where when we are adding 3 & 5 adding a you see that a node has been added add scaler Ana in 32 which is a default integer type in tensorflow now if we really wanted to get the value of a if you really wanted to get that 8 out here what we would do is to create what is called the session so you will instantiate a session and you would run the computation within that session and the whole idea of sessions is that once you create a session and it takes all the resources even if there are multiple GPUs whatever it takes care of getting all the resources in place to run a particular computation and now when you run that actual node of the graph within that session it will compute the graph and we'll send you the value back so what you would do out here is you would just instantiate a session using this TF dot session command and now when you say print session dot to run a four session dot run and then whatever you put in the power in the parenthesis is going to be the either the whole graph or a certain load within the graph so here we gave a node which is a and what this will do is actually run that computation and the print is going to be printing out the value of eight and then you can close the session so if you say first is equal to TF dot session first run a and it will come up with a value of eight right let's close the session for now to release all the resources now what happens at this point is when we put in the command set run a what tensorflow is doing is looking at that value of the node which is a and now it looks back since it has this whole graph available to it it will look at whatever is necessary to make this computation going and it will go ahead and compute that and then it will finally compute the value of a so you can imagine that if there is there are many many more nodes it will go all the way back and start computing from from whatever is required and then finally give the value of eight so more on that and so while one of the tenth that is being shown the slides out here is instead of doing the explicit session dot close we can just use the width statement in Python so that in out here with TF dot session as sense and session dot run a once you come out of the width Clause the session is automatically going to get slow so similar to what you do for files etcetera and Python so TF dot session is the it gives a session object and this basically encapsulates the entire resource environment for all the operators to execute in the tensor objects the various operators everything is executed within the environment of the session so if you were to take another example where we have x and y 2 and 3 now we define an operation TF dot add XY operation to TF dot mul XY operator 3 is TF dot power of operator 2 op1 so basically if this is X Y this is X raised to Y now with T of precession it says we are calling operated three session dot run operator 3 so what happens from the tensorflow perspective is this graph was created when we went through these four five statements right so even before the width TF taught session we this graph got created now as soon as we try to run OB 3 which in this case is the power now tensorflow knows that it needs mul and for model it will need x and y and it's going to calculate op 2 in this case right it also needs that op 1 which is the TF dot add XY so it's going to calculate the value of x plus y using this node ad and now when it has these two nodes it's going to calculate the F dot power of 2 comma op 1 so out here you see an example where when we start all these calculations happen when the session dot run op 3 is called so it's not like all these values are pre calculated by tensorflow and cash somewhere that that's not the way tensorflow works what it will do is it will rate it will wait very lazily for the need for these calculations to show up and then it will run through the graph and make this calculation and another interesting thing happens is that the graph can be very complex and it can have multiple multiple nodes but the power of this lazy Insel evaluation comes from the fact that not everything needs to be calculated so if you go to another example out here where the opt one is again add up to is mul of XY and there is this other node we are calling it useless out here which is op 1 multiplied by X right so 2 multiplied by X plus y and which is out here and op 3 is the same that we discussed in the slide above which is TF dot power of 2 up 1 right now out here if you are trying to calculate up 3 by using sessions run op 3 what will happen is when you go to op 3 it will see that all I need is up to and op 1 and for op 1 I need to do this calculation of AD 4 up to mul using x and y and so it will go and calculate up one up two and right then everything that is necessary to create the recipe of op3 is in place so at that point it's not going to go ahead and even calculate this node which is the D F dot malenko p1 which is the which is the node called useless it's not needed and that's where the power of this graph data flow computation starts revealing itself because we only calculate what is needed and that saves a bunch of computation as we every follow along deeper into the models so if we go ahead and provide a whole list of operator so what's first session dot run takes as if you were to look at the definition of D F dot session dot run in the actual tensor flow code and sometimes it's very illustrative to do that whenever you're not comfortable with certain API or whatever just just go for the source code it's all available on github and it will give you it's extremely well documented they're very well commented etc so once you go there you can get a deeper insight into what that particular API is trying to do so when you look at the TF start session dot run code you'll see that the first argument is this that is expecting is this list call fetches right and whatever nodes so once you create this graph let's say useless op3 o p1 or p2 whatever you need what you all all you do is to provide that list of nodes within that within the fetches list and pass it on to run and then tensorflow is going to calculate all these different nodes that you passed in and it will give you the results back as a list so for example if you did want to calculate useless you could have you could have passed in op3 you can have fast and useless both these nodes as a list and passed it to session to run and you would get o p3 and we are just calling this variable not useless and you would have gotten this result back from tensorflow the other beauty about modeling over computations in this way is that since it's a graph you can literally point various nodes to very CPUs or GPUs and that computation would run in parallel across multiple CPUs GPUs or devices so this is another thing that we get out of modeling our our computation as a graph where any of the sub graphs can be just worked off to a separate computation device this can make a lot of these things extremely paralyze about and this is an example of how we can put a specific graph on a specific GPU and I believe we will go into a lot more detail down the line on this but you can just create this device you have to devise gp so if you have a machine with multiple GPUs the they go as GPU 0 1 2 3 etc and you can just put in a certain graph on a certain GPU where with this command TF device GPU 2 and now whatever you are instantiating the node elements which is the TF dot constants out here and the operators which is the F dot maximals which is also known by the way and you can these devices this computation would be running on GPU 2 and now when you create a session this is the recipe for creating a session where the dog resources are being sourced from that one particular GPU for in this case and yet the rest of the computation progresses exactly the same way which is fashion dot runs C which is going to calculate the F dot maximal but looks like we are going to go into a lot more detail on this down the line now what if what if we want more graphs right so the session so whenever we start writing our putting in various operators starting to create our graph tensorflow provides this default graph that we are actually putting our nodes and our variables etc in right and when we do the session dot run and then put in a operator like op 3 what we did earlier on the session is actually using that default graph so if you want to have multiple graph and ship seems to be having a very strong bias towards not having multiple graph what in one of the use cases where I see where we might want to have multiple classes let's say ensemble Akama news case in deep learning all machine learning in general right so you might have a model which for a particular for a one particular task you my have trained completely different models one is a CNN one is a RN and pixel-based RN n or various models can be put in place to train for a specific task and now during inference time what you want is you want to be running all these different models in parallel let's say and let's say if you have five such models running and you want to have a majority work to find the actual classification that is something that I would think can fall into the use case of having different graphs because all these different models are completely separate and they will require their own sessions which which goes ahead and gets all the resources required for running that particular graph and one of the things about these sessions is that they are very greedy in the sense that for my machine when I just run this by default sessions run and let's say I'm trying to train another training job it's going to go ahead and get all the resources it's going to try and hog all my all my GPUs in one shot so there are ways of actually making sure that your program is just looking at one or two GPUs but that's one of the things that you have to take care that if you're explicitly running multiple sessions everybody all these sessions are going to go ahead and try to get as many resources as possible and one of the other things is that all these different sessions are compute completely separated execution environments and if you want to have data passing between them you will have to come back to the Python numpy layer and it's kind of it becomes pretty hooky when you're trying to make it work in a distributed setting so looks like chip really wants us to use one session so I'll have to go with that and the way you create a graphic let's say we want to have something other than the default graph is using this API which is TF dot graph and now when we want to add operators to a graph you would make it you you'd set it as default and the way to do it is G dot as default this is the API you'll be use now if you're not using any of these things if you're not in Senshi ating this by default there is a default graph so this is not required and when you do the X is equal to TF dot h35 it's going to add this note to the default graph but now as we want this to explicitly happen within the context of a certain graph we will first instantiate that graph we will make that graph as default now we we can add in whatever we want out there now for the session when you instantiate the session you would pass key F to expression graph is equal to G so you will actually pass this keyword argument of lowercase G to this which is which is what is which is something we have out here and now session is going to be looking at this particular graph G and now when we run the particular node X it's going to be executing that graph so again another example out here we instantiate a new graph set it as default add some operators out there is equal to 3 5 TF dot add a B now when you give session is equal to TF third session graph G so now this is the our session is going to look at this graph create the entire execution environment around it and it you can run whatever you want out there and then you close it and if we are if you just want to get the handle on the default graph we just use the API to get TF dot get underscore default on the for graph and that you can save in the in this variable called G so this is a default graph that if at any point you want to get a handle on the graph that you have been creating without instantiation or anything you can get a handle of that or if you had created this graph and then set it as default then this API will give you a handle back on that graph if it's needed so another important thing out here is not to mix the two graphs so let's say what let's see what we are trying to do out here we instantiated a let's call it a user graph out here G is equal to TF graph but we did not set it as default and now we are adding these up a is equal to TF to constant three and so this got added to the default graph that tensorflow provides right now we set this new graph that we created G as default and we add B in the context of this graph and this is a bit hooky because it can get confusing very very easily if you want to do something like that just get a handle on the default graph out here which is G 1 is equal to TF dot default graph you basically get that handle initially from the default graph that InterPro provides and now we create another graph and now we explicitly with touched with statement G 1 we set it as default we add ops out there and now for G 2 we add every set G 2 as default and then ops out there and this was better but probably if you are doing simple things like this it's always possible to go through with one graph instead of having multiple graph so one of the biggest advantages of graphs is that we can run the computation of sub graph and sub graphs can go all the way down to one particular op of let's say there is a op that we have put in X is equal to TF dot constant v dot o right and it can be a 20 layer Network but I want to just execute that for whatever reason then I can just do session dot run X and it will just run that one part of the graph right rather than running the whole thing so this is one of the biggest things and other reasons are more internal which is all these different graphs once they it helps writing the library for auto differentiation and also we went through that case where these different computations these different sub sub graphs can be a forked off on different devices different GPUs and tensorflow also has a distributed mode where it can run across different machines so once you have the computation described as such a lot of these things are a lot of these things we just get for free so we will get on to the next class and the things the major things we want to remember how the takeaway from this session were bra no pun intended sessions and grasp so the way we do things in tensorflow is we set up on graphs of computation and then we instantiate a session and we run the sub graph of that graph that we created and by definition when I'm saying self graph the full graph is can also be run like that but this is the major takeaway and then we'll go into more fun next week thanks a lot
Info
Channel: Labhesh Patel
Views: 97,336
Rating: 4.9468083 out of 5
Keywords: tensorflow tutorial, graphs, sessions, tensorflow, deep learning
Id: g-EvyKpZjmQ
Channel Id: undefined
Length: 37min 28sec (2248 seconds)
Published: Sun Jan 29 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.