Heroes of Deep Learning: Andrew Ng interviews Yann LeCun

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

r/machinelearning


For mobile and non-RES users | More info | -1 to Remove | Ignore Sub

👍︎︎ 1 👤︎︎ u/ClickableLinkBot 📅︎︎ May 20 2018 🗫︎ replies
Captions
hyeyeon you've been such a leader for diva episode well thanks a lot for doing this with us well thanks for so you've been working on neural nets for a long time um I'd love to hear your personal story how did you get started AI Hodja networking with neural networks so I got interested in I was always interested in the intelligence in general like the emergence of intelligence in humans you know that got me interested into you know human evolution when I was a kid and there's in France that was in France I was no you know in the middle school or something and you know I was interested in technology and like you know space etc you know my favorite movie was 2001 a Space Odyssey you know you had intelligent machines you know space travel and human evolution as kind of the teams that was what I was fascinated by and and you know the concept of a Turing machine I think really kind of appealed to me and then you know I studied Electrical Engineering and when I was a at school I was maybe in second year of engineering school I stumbled on the book which was actually philosophy book it was a debate between Noam Chomsky the computational linguist at MIT and Jean Piaget who is a cognitive psychologist psychologist of child development in in Switzerland and it was basically a debate between you know nature and nurture where Chomsky arguing for the fact that you know language is as a lot of unit structure and Chiara is saying you know a lot of it is learned and etcetera and on the side of Piaget was a transcription of a person you know each of these guys that have brought their teams of people - I - you know argue for their side the solut Piaget was Seymour Papert from MIT who had worked on the perceptron model one of the first machines capable of running and I never heard of the perceptron and I read this article it's a machine capable of running that sounds wonderful and so I I started going to several university libraries and and searching for all you know everything I could find that talked about the perceptron I realized there was a lot of papers from the fifties but it kind of stopped at the end of the sixties with a book that was co-authored by the same Sigma peppered what year was this that so this was in 1980 roughly right and so I specialized I did a couple projects with some of the math professor at my school on kind of neural nets essentially but it was no one I could talk to who you know had worked on this because the the field basically had disappeared in the mean time right it's 1980 you know but he was working on this and you know experimented with this a little bit you know writing kind of simulation software various kinds reading about neuroscience when I finished my engineering studies I studied chip design you get a site design at the time so something awfully different and when I finished I really wanted to kind of do research on this and I you know figured that already that at the time that the important question was how you train neural nets with multiple layers it was pretty clear in the literature of the 60s that that was the important question that was that had been left unsolved and the idea of hierarchy and everything you know I'd read Fukushima's article in the Oconee trial right which was the sort of hierarchical architecture very similar to now we now commercial Nets but you know with without really without back pop star learning algorithms and I met a small I met people who were in a small independent lab in France that they were interested in what would they call at the time automata networks and they gave me a couple of papers the people on popular networks which you know it's not very popular anymore but it's you know the first associative memories with with neural nets and that that people can revive the interest of some research communities into neural nets in the early eighties whereby mostly physicists and to condensed matter physicists and a few psychologists it was it was job not okay for engineers and computer scientists to talk about neural nets and they also showed me another paper that had just been distributed as a preprint whose title was optimal perceptual inference and this was the first paper on on Boltzmann machines by Jeff Internet jerry sandusky I was talking about hidden units I was talking about basically the part of learning you know multi-layer neural Nets will do that or you know we can are more capable than just mere classifiers so I said you know I need to meet these people who because they already tried seeing the right problem and a couple years later I was after I started my PhD I participated in a workshop in a in ledouche and that was organized by the people I was working with and Terry was one of the one of the speakers at that workshop so I met him at that time it was like very 80s now this is 1985 early 1985 so I met Terry Zaleski in 1985 in the workshop in France in a zoo and a lot of people were there you know from the server early neural-net you know jump up field and a lot of people working on so theoretical neuroscience and stuff like that it was a fascinating workshop I met also a couple couple people from Bell Labs who eventually hired me a bell as I came but this was several years before I finished my PhD so I talked to Terry Sandusky and I was telling him about what I was working on which was some version of back prop at the time this was before back prop was a paper and Terry was working on that talk at the time this was you know before the the you know the Roma hot intern Williams paper on that crop had been published but you know he was friends with we Jeff you know this information was circulating so he was already working on trying to make this work for that talk but he didn't he didn't tell me and he went back to us and told Jeff there is some kid in France who is working on the same stuff we were working on they see her and then a couple most a few months later in June there was another conference in France that where Jeff was a keynote speaker and you give a talk on Boston machine so of course it was working on you know the backdrop paper and you give this talk and then there is you know 50 people around even we want to him and the first thing you said to the organizer is you know this guy Anakin and it's because he read my paper in the Proceedings that was returned in French yeah he could sort of read French and he could see the math and he could figure out it was sort of back from and so so we're not together and that's how we became friends I see so that's because you basically you know independent of the multiple groups independently we invented or invented that proper you might write or realize that the whole idea of chain rule or what the optimal control people call it at the a joint statement which is you know really the context in which back prop was really invented this is a context of optimal control by an early 60s this idea that you could use gradient descent basically with multiple kind of stages is is what we're part really is and that popped up in various contexts at various times and but I think you know the where my heart in turn when you interfere is the one that provides it and then I'm you know fast forward a few years you want up at AT&T Bell Labs where you invented among many things Lynnette which we talked about in the inter intercourse and I remember when way back there was a summer intern that 18 2012 Michael curtains and a few others I'm hearing about your work even back then so so tell me more about your your AT&T hula net okay so what happened is actually started working on commercial net when I was a postdoc a University of Toronto with Jeff interns I did the first experiment I went to code there and did the first experiments there that showed that if you had a very small data set I was you know the data said I was training on there was no end list or anything like that back then so I you know I drew a bunch of characters with my mouse I had an Amiga a personal computer which was a bit computer ever and you know I drew a bunch of characters and then use that you did augmentation to kind of you know increase it and then use that as a way to test performance and I compared things like if we connected Nets locally collecting that without shared weights and then shared with networks it was basically the first combat and and that work really well for small relatively small data sets could you could show that you get better performance and no overtraining with the conventional tech chure and when I go to bail ads in October 1988 the first thing I did was first scallop the the network because we had faster computers a few months before I go to bail ads my bus at the time Larry Jekyll became my department head a bell lab said oh we should order a computer for you before you come what do you want I say well you know here you know co Toronto there is a Sun for which was the latest greatest stuff it'd be great if we if we had one and the other one and I had one for myself you know it was one for the HR department oh yeah one just for me all right and so what Larry told me he said yeah you know why Bela is you don't get famous by saving money so so that was like awesome and and they had been working already for for a while on character recognition they had this enormous data set called USPS that had 5,000 training samples and so so immediately I you know trained a designer commercial net which was run at one basically and and train it on this data set and got really good results but you know better results and the other methods they had been have been they had tried on it and that other people had tried on this dataset so that we knew we had something fairly early on this is this was within three months of me joining training Bell Labs and so that was the first version of commercial net where we had a completion with tried and we did not have separate subsampling and pooling layers so each each completion was actually subsampling directly and the reason for this is that we just could not afford to have a convolution at every location you know there was just too much computation so the second version had a separate completion and pooling layer and subsampling I guess that's the one that's called the net fuller net one really we published a couple of papers on this at indoor computation and that nibs and so when he interesting story I gave a target nib so that but but this work and geoff hinton was in the audience and then i know i came back to my seat next to him and he said you know there's one bit of information in your in your talk which is that if you do all the sensible things it actually works and then show the after that line of work went on to make history because became widely adopted these ideas to be widely adopted for reading checks and yeah so the become widely adopted within AT&T but not not very much outside and you know I think it's a little difficult for me to really understand why but the several factors I think so you know this was back in the late 80s and there was no internet you know we had email with FTP but there was no internet really nobody you know no two labs were using the same software or hardware platform right you know some people had Sun workstations all that said you know other machines you know using PC's whatever there was no such thing as Python or MATLAB or anything like that right people are writing their own code I had written I spent a year and a half basically writing me and remember to when it was to student working together and we spent a year and a half basically just writing a neural net simulator well and at the time you know because there was no MATLAB or Python you had to write your own interpreter right to kind of control it so we were to own this printer printer and so all the Lynette was written in Lisp using a numerical back-end very similar to what we have now with you know blocks that you can interconnect and I started magic differentiation and all that stuff that we familiar with now with you know torsion PI torsion tension of those things so so then we we developed a bunch of applications we got together with a group of engineers very smart people some of them had you know were like theoretical physicists who kind of have turned engineer at Bell Labs Chris Burgess was one of them who then had a distinguished career at Microsoft Research afterwards and Craig Noll and butcher lucky ball and we were collaborating with them to kind of make the this technology practical and so they developed together we develop those character condition systems and that met integrating conditional Nets with things like similar to things that we now call CRFs for interpreting sequences of characters not just sort of in usual individual actors the erector the net paper had passed the under neural network and partially on the automata machinery right to put it together yeah that's right that's so the first the first half of the paper is on is on commercial Nets and the paper he's most excited with for that and then the second half very few people have read it and it's about you know sort of sequence level discriminative running and physically structure prediction we you know with that normalization so it's very similar to where to tiara if in fact it would be CCRI for several years so that that was very successful except that the day we were celebrating the deployment of that system in a major bank we worked with this group that was mentioning that was kind of doing the engineering of the of the whole system and then another product group different for the country that belong to a subsidiary of AT&T called NCR so this is the company registered national cash registers right they also build large medieval ATM machines and they be a large check reading machines for banks so there were the customers if you want they were using our check reading systems and they deployed it in you know in a bank it was a camera which bank it was the deployed also the ATM machines in the French bank so they could read the cheque you would deposit and we were all at a fancy restaurant celebrating the deployment of this thing where when the company announced that it was breaking itself up so this was 1995 and AT&T and also it was breaking yourself into some companies so you know there was a TNT and then it was Lucent Technologies and NCR so NCR was spun off and recent technologies were spun off and the engineering group went Willison technologies and the protocol of course with with NCR and the sudden the sad thing is that the 820 lawyers in their infinite wisdom assigned the patterns of there was a patent on completionist I see which is thankfully expired I see expired in 2007 I see and years ago and this under patent two NCR but there was nobody in NC aa who actually knew what even what accomplished I was really icy and so the patent was in the hands of people with no idea what they had and we were in a different company that now could not really develop the technology and then our engineering team was in a separate company because we went with AT&T the engineering went with loose anton the pro' group within CR so it was a little depressing yeah so in addition to your early work you know when your networks were were hot you kept persisting on new networks even when there was some sort of winter for your own yeah so what was that like well so I considered and didn't persist in some ways I was always convinced that eventually those those techniques would come back to the fore and sort of you know people would figure out how to use them in practice and it would be useful so I always had that in the back of my mind but but in 1996 when AT&T broke itself up and all of our work on character recognition basically was kind of working out because the polygroups was you know went in separate way I was also promoted to department head and I had to figure out what to work on and this was the early days of the internet we took in 1995 and I had the idea somehow that one big problem about the emergence of the internet was going to be to bring all the knowledge that we had on paper to the digital world and so I started actually a project called digital DJ vu which was to you know compress scan documents essentially essentially so they could they could be distributed over the internet and this project was really fun for a while and what you know had some success all the way did he really didn't know what to do with it yeah I remember helping dissemination of online research papers yeah that's right exactly and you we scanned the entire proceedings of nips and we made them available online to kind of demonstrate how that worked and you know we could compress like you know high-resolution pages to - kilobytes so confidence starting from you know some of your much early work has now come and pretty much taken over the field of computer vision and starting to you know encroach significantly into even other fields over this tell me about how you saw that whole process so I tell you how I I thought this was gonna happen early on so first of all I always believed that this was gonna work it required you know fast computers and also data but I always believed somehow that this was gonna be the right thing to do what I thought originally when I was at Bell Labs that there was going to be some sort of you know continuous progress along these directions as machines get got more powerful and you know we're we've been designing chips to run commercial net Sabella Baraboo's are actually and and hospital graph separately had two different chips for running kishan that's really really efficiently and so you know we thought there was gonna be kind of a pickup of this and and kind of growing interest and and sort of you know continuous progress for it but in fact because of the the sort of interest for neural nets at the dying in the mid 90s that didn't happen so it was kind of a dark period of you know seven or six or seven years between 1995 roughly and 2002 and basically nobody was working on this in fact there were there was a little bit of work there was some work at at Microsoft in the early 2000s on using commercial nets for Chinese character recognition a decent resume at school yeah exactly and and there was some some other small work for like face detection and things like this in France and in various other places it was very small I discovered actually recently that there's a couple groups that came up with ideas that are essentially very similar to commercial nets but never quite published it you know the same way for medical image analysis and those were mostly in the context of commercial systems and so it never kept quite made it to the profession I mean it was after the or first work on on commercial nets and it were not really aware of it but it is sort of developed in parallel a little bit so you know several people got kinda severe ideas you know several years interval but then it was really surprised by how fast interest picked up after you know the imagenet thirty twelve and twenty twelve so it's the end of 2012 it was kind of a very interesting event at ECC V in Florence where there was a workshop on imagenet and they were really knew that you know Jeff intends team Alex crew jet ski and ski ever had had won by a large margin and so everybody was waiting for a rescue jet skis talk and most people in the configuration committee had no idea what comes on that was I mean they heard me talk about it actually had a Nevada talk at cvpr in 2000 when I talked about it but but most people you know it not paid much attention to it Syrian people did you know what they knew what it was but the more junior people in the community were really had no idea what it was and so I asked rejewski just you give this talk and it doesn't explain what the conversion that is because he assumes everybody knows because he comes from machinery so he says you know here is a we everything is connected and you know how we transform the data and results we get can assuming that everybody knows what it is a lot of people are incredibly surprised and and you could see the opinion of people changing as you was kind of giving his talk now you know very senior people in the field so they think that that that workshop was the defining moment that swayed a lot of computer vision community yeah definitely that's that's what happened right there so today you retain a faculty position at weiu and you also lead fair facebooking I research um I know you have a pretty unique point of view on how corporate research should be done do I share your thoughts on that yeah so I mean one of the beautiful things that you know that I've managed to do at Facebook in the last four years is that I was I was given a lot of freedom to set up fair the way I thought was the most appropriate because this was the first research organization within Facebook you know Facebook is a Engineering centric company and so far was really focused on sort of survival or short-term things and Facebook was you know about to turn 10 years old had you know had a successful IPO and was basically thinking about the next 10 years right I mean Mark Zuckerberg was thinking you know what is going to be important for the next 10 years and and the survival of the company was not in question anymore so this is the kind of transition where a large company can can start to think or it was not such a large company at the time Facebook at 5,000 employees or so but but it had the luxury to think about the next 10 years and what would be important in in technology and and and Mark decided that Mark on his team decided that a i/o is going to be a crucial piece of technology for for connecting people which is the mission of Facebook and so they explored several ways to can I build an effort in AI they had a small internal group in Jaime group experimenting with commercial nets and stuff that were getting really good results and face recognition and various other things which piques their interest and they explored the idea of you know hiring a bunch of young researchers or acquiring a company or things like this and they settled on the idea of you know hiring someone senior in the field and then you know setting setting up a research organization and it was a bit of a culture shock initially because the way research operates in the company is very different from from engineering right you have longer timescales and horizon and researchers tend to be very conservative about the choice of places where they want to work and and I made very clear very early on that research needs to be open that researchers need to not only be encouraged to publish but we even mandated to publish and also be evaluated on criteria that are similar to what we used to evaluate I can you know actually make you know researchers and so you know what Mark and and Mike Schafer the city of the company who is my boss now said you know they said Facebook is a very open company we distributor of stuff in open-source you know you know shred the CTO comes from the open source world he was he was illa before that and a lot of people came from that world so that was in the DNA of the company so that maybe we're confident that we could kind of set up open research organization and then the fact that the the company is not obsessive-compulsive about IP as some other companies are it makes it much easier to collaborate with universities and have arrangements by which your person can have a foot in industry and a foot in academia and as you find out how about yourself oh absolutely yeah so if you look at the my publications or the last four years the vast majority of them are publications when my students at NYU because at Facebook I you know I I did a lot of you know organizing their lab hiring you know scientific direction and advising and things like this but I don't get involved in the individual research projects to get my name on papers and you know I don't care to get many more papers anymore but but it's a setting out someone else did your great work exactly doing all right great verges and you never want to put yourself you know you want to kind of stay behind the scene and you don't wanna put your psyche in competition with people you know in your lab in that case I'm sure you get asked this a lot but who can answer for all the people watching this video as well what advice do you have for someone wanting to get a bump in the I am free break into a yard I mean it's such a different world now than it was when I when I got started but I think what's great now is it's very easy for people to get involved at some level right I mean the tools that are available is so easy to use now you know it tends to fly touch whatever you can you know you can have a relatively cheap computer and you're you know in your bedroom and and basically train you or your your commercial net or recurrent net to do to do whatever and there's a lot of tools you can learn a lot from from online material that this without it was not a very onerous so you see high school students now playing with this right which is kind of which is great I think and there's certainly a growing interest from the student population to learn about machine learning and AI and it's very exciting for for young people that I find that wonderful I think so my advice is if you want to get into this make yourself useful so make a contribution to an open-source project for example or make or or make an implementation of some standard algorithm that you couldn't find the code of online but you'd like to you know make it available to other people so take a paper that you think is important and then you know we implement the algorithm and then put it up and open-source package or contribute to one of those open source packages and if the stuff you write is interesting useful you will you'll get noticed you know maybe you'll get a nice job at a company you really want to job at or maybe you'll get accepted and your favorite ph.d program or or things like this I think that's a good way to get started so open source contributions is a good way to enter the community yeah that's right liron yeah that's right thanks a lot yeah though it's fascinating no no need for many years is though fascinating to hear all these details of all the stories have gone on over the years oh yeah there's many many stories like this that you know reflecting back at the moment when they when they happen you don't realize you know what what importance they might take 10 or 20 years later yeah thank you Thanks
Info
Channel: Preserve Knowledge
Views: 47,487
Rating: 4.9436092 out of 5
Keywords: machine learning, data science, neural networks, Geoffrey Hinton, Yoshua Benjio, Andrej Kaparthy, Andrew Ng, Ian Goodfellow, GANs, Deep learning, mathematics, lecture, Terry Tao, Convolution, generative, AI, Artificial intelligence, Robot, Self driving cars, Google Brain, Alphago, Yann LeCunn, CMU, Facebook, Google, Microsoft, Research, Big data, Bitcoin, Blockchain, programming, computer science
Id: Svb1c6AkRzE
Channel Id: undefined
Length: 27min 49sec (1669 seconds)
Published: Sat Apr 07 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.