deeplearning.ai's Heroes of Deep Learning: Yann LeCun

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hyeyeon you've been such a leader for diva episode well thanks a lot for doing this with us well thanks for so you've been working on neural nets for a long time um I'd love to hear your personal story how did you get started a I have your networking what neural networks so I got interested in I was always interested in the intelligence as in general like the emergence of intelligence in humans you know that could be interested into you know human evolution when I was a kid and there's in France that was in France I was know you know in the middle school or something and you know I was interested in technology and like you know space etc you know my favorite movie was 2001 a Space Odyssey you know you had intelligent machines you know space travel and human evolution as kind of the teams that was why I was fascinated by and and you know the concept of intelligent machine I think really kind of appealed to me and then you know I studied Electrical Engineering and when I was a at school it was maybe in second year of engineering school I stumbled on the book which was actually philosophy book it was a debate between Noam Chomsky the computational linguist at MIT and Jean Piaget who is a cognitive psychologist psychologist of child development in in Switzerland and it was basically a debate between you know nature and nurture where Chomsky arguing for the fact that you know language is as a lot of unit structure and pleasure is saying you know a lot of it is learned and etc and on the side of Piaget was a transcription of a person you know each of these guys that brought their teams of people - I - you know argue for their side and the side of Piaget was Seymour Papert from MIT who had worked on the perceptron model one of the first machines capable of running and I never heard of the perceptron and I read this article it's a machine capable of running that sounds wonderful and so I I started going to several university libraries and searching for all you know everything I could find that talked about the perceptron I realized there was a lot of papers from the fifties but the it kind of stopped at the end of the sixties with a book that was co-authored by the same same Apep earn what year was this that so this was in 1980 roughly right and so I specialized I did a couple projects with some of the math professor in my school on kind of neural nets essentially but there was no one I could talk to who you know had worked on this because the field basically had disappeared in the mean time right it's 1980 nobody was working on this and you know experimented with this a little bit you know writing kind of simulation software various kinds reading about neuroscience when I finished my engineering studies I studied chip design you get a site design at the time so something awfully different and when I finished I really wanted to kind of do research on this and I you know figured out already that at the time that the important question was how you train neural nets with multiple layers it was pretty clear the literature of the 60s that that was the important question that was that I didn't left unsolved and the idea of hierarchy and everything you know I'd read Fukushima's article in the new cognate all right which was the sort of hierarchical architecture very similar to now we now commercial Nets but you know with without really without back pop star learning algorithms and I met a small I met people who were in a small independent lab in France that they were interested in what would they call at the time automata networks and they gave me a couple of papers the people on hopeful networks which you know it's not very popular anymore but it's you know the first associative memories with with neural nets and that that people can revive the interest of some research communities into neural nets in the early eighties whereby mostly physicists and condensed matter physicists and a few psychologists it was it was job not okay for engineers and computer scientists to talk about neural nets and they also showed me another paper that had just been distributed as a preprint whose title was optimal perceptual inference and the so the first paper on on Boltzmann machines by Jeff Internet Jerry Sandusky was talking about hidden units it was talking about basically the part of learning you know multi-layer neural Nets with the other you know can are more capable than just mere classifiers so I said you know I need to meet these people who because they already tried see in the right problem and a couple years later I was after I started my PhD I participated in a workshop in a in ladouche and that was organized by the people I was working with and Terry was one of the one of the speakers at that workshop so I met him at that time it was like early 80s now this is 1985 early 1985 so I met Terry Zaleski in 1985 in the workshop in France in a zoo and a lot of people were there you know from the server Lee neural net you know Java field and a lot of people working on so theoretical neuroscience and stuff like that it was a fascinating workshop I met also a couple couple people from Bell Labs who eventually hired me a bell as I came but this was several years before I finished my PhD so I talked to Terry Sandusky and I was telling him about what I was working on which was some version of back prop at the time this was before back prop was a paper and Terry was working on that talk at the time this was you know before the the you know the rumor heart intern Williams paper on back prop has been published but you know he was friends with we Jeff you know this information was circulating so he was already working on trying to make this work for net talk but he didn't he didn't tell me and you went back to us and told Jeff there is some kid in France who's working on the same stuff we were working on I see her and then a couple most a few months later in June there was another conference in France that where Jeff was a keynote speaker and you give a talk on Boston machine so of course he was working on you know the background paper and give this talk and then there is you know 50 people around even we want to talk to him and the first thing you said to the organizer is you know this guy yeah Nicole it's because he had read my paper in the Proceedings that was returned in French yeah he could sort of read French and he could see the math and he could figure out what sort of background and so so we're not together and that's why we became friends I see well that's because you basically you know independently multiple groups independently we invented or invented back prop pretty much right or realize that the whole idea of chain rule or what the optimal control people call it at the a joint statement which is you know really the context in which back prop was really invented this in context of optimal control right in early 60s it's this idea that you could use greater understand basically with multiple kind of stages is is what my prop really is and that popped up in various contexts at various times and but I think you know the grammar part in turn Williams paper is the one that popularized it and then I'm you know fast forward a few years you want up at AT&T Bell Labs where you invented among many things the net which we talked about in the inter in the course and I remember when way back that was a summer intern that ATG thought that swell work of Michael curtains and a few others and hearing about your work even back then so so tell me more about your your AT&T Lynette okay hearings so what happened is actually started working on comes on that when I was a postdoc a University of Toronto with Jeff intestine I did the first experiment where the code there and did the first experiments there that showed that if you had a very small data set I was you know the data set I was training on there was no end list or anything like that back then so I you know I drew a bunch of characters with my mouse I had an immediate personal computer which was a bit computer ever and you know I drew a bunch of characters and then use that you did annotation to kind of you know increase it and then use that as a way to test performance and I compared things like you fully connected net locally connected that without shared weights and then shared with networks it was basically the first combat and and that work really well for small relatively small data sets could you could show that you get better performance and no overtraining with the completion of architecture and when I go to Bell Labs in October 1988 the first thing I did was first scale out the network because we had faster computers a few months before I go to Bell Labs my bus at the time Larry Jekyll became my department head at Bell Labs said oh we should order a computer for you before you come what do you want I say well you know here you just roll over there is a Sun 4 which was the latest greatest stuff it'd be great if we if we had one and the other one and I had one for myself you know it was one for the HR department oh yeah when just for me all right and so what Larry told me he said yeah you know why Bala's you don't get famous by saving money so so that was like awesome and and they had been working already for for a while on character recognition they had this enormous data set called USPS that had 5,000 training samples and so so immediately I you know trained a designer commercial net which was running at one basically and and trained it on this data set and got really good results by you know better results than the other methods they had been had been they had tried on it and that other people had tried on this dataset so that we knew we had something fairly early on this is this was within three months of me joining joining Bell Labs and so that was the first version of commercial net where we had a completion with tried and we did not have separate subsampling and pulling layers so each each combination was actually subsampling directly and the reason for this is that we just could not afford to have a coalition at every location you know there was just too much computation so the second version had a separate completion and pooling layer and sub standing I guess that's the one that's called the net the net one really we published a couple of papers on this at indoor computation and that nibs and so when interesting sorry I gave a talk at nibs about but but this work and Jeff Fenton was in the audience and then I know I came back to my seat I was sitting next to him and he said you know there is one bit of information you're in your talk which is that if you do all the sensible things it actually works and then show the after deadline work went on to make history because became widely adopted these ideas are being widely adopted for reading checks and yeah so the became ready adopted within AT&T but not not very much outside and you know I think it's a little difficult for me to really understand why but the several factors I think so you know this was back in the late 80s and there was no internet you know we had email with FTP but there was no internet really nobody you know no two labs were using the same software or hardware platform right you know some people had Sun workstations all that said you know other machines you know using pcs whatever there was no such thing as Python or MATLAB or anything like that white people were writing their own code I had written I had spent a year and a half basically writing me and Leon Powe too when it was two students working together and we spent are you gonna have basically just writing a neural net simulator and at at the time you know because there was no MATLAB or Python you had to write your own interpreter right to kind of control it so we were torn Lisp interpreter and so although Lynette was written in Lisp using a numerical back-end very similar to what we have now with you know blocks that you can interconnect and I started magic differentiation and all that stuff that we familiar with now with you know its fortune PI torsion tensorflow and all those things so so then we we developed a bunch of applications we got together with a group of engineers very smart people some of them had you know were like theoretical physicists who kind of have turned engineer at Bell Labs Chris Burgess was one of them who then had a distinguished career at Microsoft Research afterwards and Craig Knoll and verge of a lucky ball and we were collaborating with them to kind of make the distinct notion practical and so they developed together we developed those your character education systems and that met integrating commercial Nets with things like similar to things that we now call CRFs for interpreting sequences of characters not just to read usually individual actors yeah right the Lynette paper had passed the under neural network and partially on the autonomy to machinery right to put it together yeah that's right that's so the first the first half of the paper is on is on commercial Nets and the paper is most excited with for that and then the second half very few people have read it and it's about you know sort of sequence level discriminative running and basically structure prediction we you know with that normalization so it's very similar to where to CRF in fact it would be CCRI for several years so that was very successful except that today we were celebrating the deployment of that system in a major bank we worked with this group that was mentioning that was kind of doing the engineering of the whole system and then another product group in a different part of the country that belong to a subsidiary of AT&T called NCR so this is the company register national cash registers right they also build large medieval ATM machines and they be a large check reading machines for banks so there were the customers if you want they were using our check reading systems and they deployed it in you know in the bank it was a camera which bank it was the deployed also the ATM machines in a French bank so they could read the cheque you would deposit and we were all at a fancy restaurant celebrating the deployment of this thing where when the company announced that it was breaking itself up so this was 1995 and AT&T announced that he was breaking his cell phone companies so you know there was AT&T and then I was losing technologies and NCR so NCR was spun off and recent technologies was spun off and the engineering group went with listen technologies and the poor a group of course with with NCR and the sudden the sad thing is that the AT&T lawyers in their infinite wisdom assigned the patterns of there was a patent on congressional Nets I see which is thankfully expired they see expired in 2007 and years ago and this antipattern to NCR but there was nobody in the NCR who actually knew well even what accomplished on it was really icy and so the patent was in the hands of people with no idea what they had and we were in a different company that now could not really develop the technology and then our engineering team was in a separate company because we went with AT&T the engineering went with result and the product worked with NCR so it was a little depressing so in addition to your early work you know when your networks were were hot you kept persisting on neuro networks even when there was some sort of winter for your knee so what was that like well so I persisted and didn't persist in some ways I was always convinced that eventually those those techniques would come back to the fore and sort of you know people would figure out how to use them in practice and it would be useful so I always had that in the back of my mind but but in 1996 when AT&T broke itself up and all of our work on character recognition basically was kind of working out because the polygroups was you know went in separate way i was also promoted to department head and i had to figure out what to work on and this was the early days of the internet we took in 1995 and I had the idea somehow that one big problem about the emergence of the internet was going to be to bring all the knowledge that we had on paper to the digital world and so I started actually a project called digital DJ vu which was to you know compressed scanned documents and she essentially so they could they could be distributed over the Internet this project was really fun for a while and you know had some success although AT&T really didn't know what to do with it yeah I remember that helping dissemination of online research papers yeah that's right exactly and you we scanned the entire proceedings of nips and we made them available online to kind of demonstrate how that worked and you know we could compress like you know high-resolution pages to transfer kilobytes so confidence starting from you know some of your much early work has now I'm pretty much taken over the field of computer vision and starting to you know encroach significantly even other fields over this tell me about how you saw that whole process so I tell you how I I thought this was gonna happen early on so first of all I always believed that this was gonna work it required you know fast computers and also data but I always believed somehow that this was gonna be the right thing to do what I thought originally when I was at Bell Labs that there was going to be some sort of you know continuous progress along these directions as machines get got more powerful and you know we're we've been designing chips to run commercial nets at Bell Labs Baraboo's are actually and and Hospital graph separately had two different chips for running coma Sean that's really really efficiently and so you know we thought there was gonna be kind of a pickup of this and and kind of growing interest and and sort of you know continuous progress for it but in fact because of the the sort of interest for neural nets at the dying in the mid 90s that didn't happen so it was kind of a dark period of you know seven or six or seven years between 1995 roughly and 2002 when basically nobody was working on this in fact there were there was a little bit of work there was some work at at Microsoft in the early 2000s on using commercial nets for Chinese character recognition a decent resume at school yeah exactly and and it was some other small work for like face detection and things like this in France and in various other places it was very small I discovered actually recently that there's a couple groups that came up with ideas that are essentially very similar to commercial nets but never quite published it you know the same way for medical image analysis and those were mostly in the context of commercial systems and so it never kept quite made it to the profession I mean it was after the office to work on on commercial Nets and they were not really aware of it but but it sort of developed in parallel a little bit so you know several people got chemistry ideas you know several years interval but then it was really surprised by how fast interest picked up after you know the imagenet cycle 12 from 2012 so it's the end of 2012 it was kind of a very interesting event at ECC V in Florence where there was a workshop on imagenet and everybody knew that you know Jacqueline's team Alex crew jetski and SOC ever had had won by a large margin and so everybody was waiting for an extra desk is talk and most people in the configuration committee had no idea what cumshaw net was I mean they heard me talk about it actually had a Nevada talk at cvpr in 2000 when I talked about it but but most people you know it not paid much attention to it senior people did you know they knew what it was but the more junior people in the community were really had no idea what it was and so an extra Jeske just you give this talk and it doesn't explain what the conversion that is because the assumes everybody knows because he comes from machine learning so he says you know here is a we everything is connected and the you know how we transform the data and results we get can assuming that everybody knows what it is a lot of people are incredibly surprised and and you could see the opinion of people changing as he was kind of giving his talk now you know very senior people in the field so they think that that that workshop was the defining moment that swayed a lot of computer vision community yeah definitely that's that's where it happened right there so today you retain a faculty position at weiu and you also read their facebooking I research um I know you have a pretty unique point of view on how corporate research should be done do I share your thoughts on that yeah so I mean one of the beautiful things that you know that I managed to do at Facebook in the last four years is that I was I was given a lot of freedom to set up fair the way I thought was the most appropriate because this was the first research organization within Facebook you know Facebook is a certain engineering centric company and so far was refocused on sort of survival or short-term things and book was you know about to turn 10 years old had you know had a successful IPO and was basically thinking about the next 10 years right tell me Mark Zuckerberg was thinking you know what is going to be important for the next 10 years and and the survival of the company was not in question anymore so this is the kind of transition where a large company can can start to think or it was not such a large company at the time Facebook at 5,000 employees or so but but it had the luxury to think about the next 10 years and what would be important in in technology and and and Mark decided that Mark and his team decided that a IO is going to be a crucial piece of technology for for connecting people which is the mission of Facebook and so they explored several ways to can I build an effort in the eye that had a small internal group and Jamie group experimenting with commercial nets and stuff that were getting really good results in face recognition and various other things which piques their interest and they explored the idea of you know hiring a bunch of young researchers or acquiring a company or things like this and they settled on the idea of you know hiring someone senior in the field and then kind of setting setting up a research organization and and it was a bit of a culture shock initially because the way research operates in the company is very different from from engineering why do you have longer timescales and horizon and researchers tend to be very conservative about the choice of places where they want to work and and I made very clear very early on that research needs to be open that researchers need to not only be encouraged to publish but we even mandated to publish and also be evaluated on criteria that are similar to what we used to evaluate I can you know actually make you know researchers and so you know what Mark and and Mike Schafer the city of the company who is my boss now said you know they said Facebook is a very open company we distribute a lot of stuff in open source you know you know shred the CTO comes from the open source world he was he was illa before that and a lot of people came from that world so that was in the DNA of the company so that maybe we're confident that we could kind of set up open research organization and then the fact that the the company is not obsessive-compulsive about IP as some other companies are it makes it much easier to collaborate with universities and have arrangements by which a person can have a foot in industry and a foot in academia and as you find that valuable yourself oh absolutely yeah so if you look at the my publications or the last four years the vast majority of them are publications when my students at NYU because at Facebook I you know I I did a lot of you know organizing their lab hiring you know centric direction and advising and things like this but I don't get involved in individual research projects to get my name on papers and you know I don't care to get my papers anymore but but it's a setting out someone else's your great work exacting alright great verges and you never want to put yourself you know you want to kind of stay behind the scene and you want you don't want to put your site in competition with people you know in your lab in that case I'm sure you get asked this a lot but hope can answer for all the people watching this video as well what advice do you have for someone wanting to get involved in the I have pretty frickin - yeah I mean it's such a different world now than it was when I when I got started but I think what's great now is it's very easy for people to get involved at some level right I mean the tools that are available is so easy to use now you know with tons of hope I torch whatever you can you know you can have a relatively cheap computer and you're you know in your bedroom and and basically train you or your your commercial net or recurrent net to do to do whatever and there's a lot of tools you can learn a lot from from online material but this without it was not a very onerous so you see high school students now playing with this right which is kind of which is great I think and they're certainly you're growing in yes from the student population to learn about machine learning and AI and it's very exciting for for young people that I find that wonderful I think so my advice is if you want to get into this make yourself useful so make a contribution to an open-source project for example or make or make an implementation of some standard algorithm that you couldn't find the code of online but you would like to you know make it available to other people so take a paper that you think is important and then you know we promote the algorithm and then put it up and open-source package or contribute to one of those epistles packages and if the stuff you write is interesting useful you will you will get noticed you know maybe you'll get a nice job at a company you really want a job at or maybe you'll get accepted and your favorite ph.d program or or things like this I think that's a good way to get started so open source contributions is a good way to enter the community yeah that's right leerin yeah that's right thanks a lot y'all don't start skating oh no need for many years is though fascinating to hear all these details of all the stories have gone on over the years oh yeah there's many many stories like this that you know reflecting back at the moment when they when they happen you don't realize you know what what importance they might take ten or twenty years later yeah thank you Thanks
Info
Channel: DeepLearningAI
Views: 21,509
Rating: undefined out of 5
Keywords:
Id: JS12eb1cTLE
Channel Id: undefined
Length: 27min 49sec (1669 seconds)
Published: Wed Apr 04 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.