[ML News] Stanford HAI coins Foundation Models & High-profile case of plagiarism uncovered

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

high profile case of plagiarism shocks the machine learning world tesla has an ai day extravaganza and all of stanford writes a single paper welcome to ml news stop before the rest of the video this video is sponsored by weights and biases weights and biases builds developer tools for machine learning for researchers for practitioners for juniors for seniors whatever your favorite flavor of yogurt is they don't care they build products for you except cherry who likes cherry today i want to talk to you about a feature called artifacts so artifacts essentially are files in the cloud but you're probably going to use them mostly for two things data and models both of these things are notoriously tricky to work with data set is too large to check into git we need to keep it up to date we may have different versions of it and models even more we want to save the outputs of our runs into models that we can then use later maybe introspect and these things are also versioned and we want to depend on them so when i did this i had to save the model to some special folder and then i had to go grab it from that folder put it on all the machines in a correct folder and then reference that folder from all my scripts that would then consume this model with artifacts this gets a lot easier so we first uploaded the original data set to an artifact now we're going to consume that artifact split the data into train validation and test data and then emit those things as artifacts so if there is a new version of the raw data available i can simply run the same script depending on the same thing and it will create new versions of the train validation and test data you can make this arbitrarily complex but i hope you can see the point here the same goes for models if your run outputs and saves some kind of a model you can log that as an artifact and from then on you can consume that model in all subsequent runs here's one of my models it's a cnn you can see it's already version 116 of that model but you can see all i have to do to use this model in any code in any script in the future i simply call the download method on the artifact and it will be available locally and as i told you you can do this with any file but since this is a model of a deep learning framework weights and biases understands it and gives me a neat viewer where i can actually introspect the model and look at the shapes and even at the weights of my cnn so i think this is incredibly powerful these things quickly get complicated with versions and scripts building upon other scripts and the artifact framework really helps you to make sense of all of it there's even the possibility that the data stays in specific private buckets with access controls so not everyone in your team has access to all of the data of course artifacts are only one of the features of weights and biases if you're interested please check them out free accounts are free academic accounts are free enterprise accounts cost a bit and that's it for this week's sponsor spot thanks a lot to weights and biases let's get into the video [Music] so on a lonely august evening i received the following text on twitter paper a plagiarized paper b and was accepted to iccv now if you know anything about the academic world especially the machine learning world is that everyone copies from everyone but i gave the papers a look to confirm for myself so here is paper a the first paper the quote-unquote original paper called momentum residual neural networks it's by a bunch of researchers of ens cnrs and google research the basic idea is to bring some form of momentum to a residual neural network since a resnet resembles somewhat of an iterative process the idea of momentum seems to be applicable here the question is how exactly you do that so here is a visualization of their idea formulas are here there's lots of mathematical analysis there are experiments with these concentric rings and what happens to them and there is like a table comparing it to previous approaches and so on i'm looking at version one of the paper for anyone who's following jumping to the other paper and i'm not going to reveal the name of the accused author right here because i don't want to point fingers or anything i simply want to talk about the problem at hand so the paper is called m revnet d reversible neural networks with momentum that has quite a similar idea in fact there is a visualization of this flow there are experiments with concentric rings being deformed there is a neat little table comparing it to previous approaches and generally the structure and even the sentences of entire passages appear to be just reformulations of one another at parts now i've looked further into this and realized that the first paper open sourced their code and the submission history reveals that they've probably tried to submit this to multiple conferences and failed a bunch of times before it got accepted so the paper was out early hasn't been able to be published code was out and then the second paper appears now after looking at this carefully i had the good impression that the second paper simply copied the first paper ran their code with a bunch of different hyper parameters maybe a different random seed and essentially wrote the same paper again possibly hoping that they could get it through peer review before the first paper or that it would just be never be noticed at all so i first told my discord community and contacted the authors a bunch of people of my community also contacted the authors and got a hold of them at which point they became aware and made the following statement on twitter here habla says imitation is the sincerest form of flattery simply posting the two links they followed up with a piece by piece comparison of the two papers essentially laying out a case of plagiarism now at this point twitter read it and the different forums sprung into action looked into this not only this but also other papers previous papers by the same author and dug up some worrisome conduct but not only the western world but also the chinese world now without revealing too much the author in question happens to be studying at a chinese university and working for chinese companies so the chinese world sprung into action comparing papers by this author and previous works and generally revealing this sort of approach to research where you take a paper and you do the visualizations in what is often actually a better way but nevertheless it's a copy now besides the first paper there's a strong case for also a second paper being plagiarized but that case is already very much more difficult so people have pointed out things like similarities and formulas similarities in the used signal pattern in the visualizations and so on in response to this the co-authors of that first author as well as the supervisors quickly distanced themselves from the author saying they didn't know they weren't careful enough when looking at their work they weren't that involved and the first author responded by taking their personal home page offline though you can still access it via the internet archive and retracting the paper from archive with a comment given idea overlapped with existing work yet by the rules of archive a retracted paper is still visible if you simply go to v1 of the paper you can see the original version the first author then went on social media and issued a somewhat apology saying that he made serious omissions by this and that he conducted the literature review for the paper before the other paper was out and didn't notice at the time of publication that the ideas overlap in general he tried to give an account of why the two papers are so similar and how this came about by just chants people having the same kinds of ideas and so on now safe to say this usually flies most cases of academic plagiarism especially in machine learning are never ever caught or even pursued because you can always make the case well it's a similar idea and so on and there are a bit different and whatnot in this case though the case was so clear that i think the pressure was overwhelming and the author edited the post to essentially say that they have plagiarized the two papers in question they apologize they will stop doing it they will learn from it and so on needless to say this has generated a giant amount of discussion as i said the twitter post by pierre habla became very widely spread reddit was on fire chinese social media talked about this at length i was in general impressed with the amount of work that people put into analyzing similarities between papers however the best comment goes to a combination of this user right here i don't know who it is and google translate it starts with after eating melon for a few days you have already said a lot about this matter i'm this is so cool this is my this is my new go-to saying i guess it's probably some sort of a way to say after thinking about it for a few days or something like this and it's a colloquial expression but this is gonna become my new go-to sentence after eating melon for a few days i've decided excellent excellent i love it in addition to that other people have come out with various stories of plagiarism for example shah huasan about code and papers that he reportedly only submitted to blind review yet other papers have appeared that essentially are a copy of his work which is even more shocking it's not simply a person going on archive and pulling down publicly available information not citing it but essentially abusing their position as a anonymous peer reviewer now as i said the amount of things happening like this is uncountable most of it will never ever get out or be done anything about it the authors of the second paper here have retracted it from iccv iccv has already confirmed that this paper will not be published at iccv and asked everyone to not call it the iccv paper which is why i dubbed it the paper formerly known as the iccb paper if you get this reference you're old so is this the end of the story i don't know as i said plagiarism is still widespread most of it goes undetected and even from this particular author it's very specific that he apologized for plagiarizing these two papers people have pointed out similarities in other works and so on and stemming from the fact that he first tried to simply go silent then deny and now admitting to these two papers and combined with the fact that this author has had like a record number of papers in very short amount of time it could be that this is simply a case of someone who let themselves be inspired by concurrent work a few times before and seeing how successful this is and not getting caught was getting more and more and more blunt in the plagiarism as time progressed i can't state that for sure i don't know no one will ever be able to prove anything like this so we'll just have to live with the fact that it is what it is it goes on pretty much everywhere i've personally witnessed quite a number of cases of people borrowing each other's ideas and even code and what are you gonna do nothing needless to say this isn't a case that we can solve easily with simple plagiarism checkers which usually check for some sort of engram overlap and even if we have a sophisticated one it's not gonna help as soon as people know that it exists they're gonna game it so we'll have to live with this for the foreseeable future there's a new paper called on the opportunities and risks of foundation models by everybody at stanford every person has say in this there are many authors to this paper and it's sort of a position paper on what they call foundation models now a few things what it actually is is mostly a literature review on what you might ask well foundation models foundation models is these papers framing of models that are kind of large and pre-trained on large data and transfer learn than essentially think bird gpt3 clip which they also state in the text they say a foundation model is any model that is trained on broad data at scale and can be adapted to a wide range of downstream tasks now i have multiple problems with this 200 page monstrosity right here the first one is with authorship itself how do so many people work together on a single paper the answer is they don't two people were sort of the integrators and i guess the writers of the introduction and so on and then the individual section of the papers were each authored by a subgroup of people these sub-sections are even labeled with the individual authors and even contain things like joint first authorship of that subsection now in general i'll say hey it's a free world do whatever you like but this seems to be a little bit of a gaming of the citation system in academia citations aren't weighted by number of authors or how much you contributed to anything if your name's on there you'll get a citation and this paper ironically might serve as sort of a foundation to be cited from many many different other papers now you ask yourself the question if someone wrote the section about adaptation of foundational models should they really get a citation when someone is citing the section on misuse authored by a completely different set of authors my personal opinion is no this isn't a paper this is a collection of papers like a compendium a book something like this so it seems to be appropriate that when we cite this work we cite the individual section of the work along with only the authors that wrote these individual sections now another problem that i and also other people have right here is that it's not really a new thing per se essentially these people simply rebrand large pre-trained models as foundation models it's a very shaky definition and it seems like it's just kind of a grab of a particular field or subfield for this particular group of people rather than simply contributing to the research landscape as a participant there's a serious disconnect between the definition that they give for foundation models a foundation model is any model that is trained on broad data at scale and can be adapted to a wide range of downstream tasks and what they actually talk about now usually in technical subjects we do things such as we put up a definition of something and then we derive our conclusions our experiments our hypotheses and so on from that definition however this paper does something completely different essentially none of the opportunities and risks they mention here are consequences of this definition for example a section on loss in accessibility why if foundation models are simply these models that can be adapted to things how does that necessitate loss in accessibility how does this necessarily impact the environment i can see the large language models we have today do that but how do you derive this from the definition like you can't and how does the definition justify 200 pages essentially if you amend the definition of foundation models to say something like there are efforts that cost a lot of money and then a lot of other things are built upon these efforts and that means anything that's built on top of it inherent all the properties including all the problems all the design decisions and so on all the properties of these intermediate efforts and since it's costly to produce them it's also costly to change them up there are opportunity costs there are dangers of centralization of these things and that that's about it and that's with the extended definition now if you think about the definition what comes to mind for me is something like a resnet50 a pre-trained resnet50 on imagenet is used throughout the world it's used in so many applications a lot of people build on it yet the number of people that actually fine tune gpt3 outside of openai is zero the number of actual products that are built on in context learning is very limited so if gpt3 counts as a foundation model certainly resonant 50 does after all it is a model trained on broad data at scale well here is the paper on the imagenet dataset large scale ergo it's large scale and diversity ergo broad range they say collecting imagenet is a challenging task so not exactly cheap they describe the data collection scheme and so on and let's not forget the uh centrality and bias and data quality question in the resin at 50 imagenet the dataset contains literal pornographic material i've discussed this on my videos previously so if resnet50 doesn't count as a foundational model then then i don't know how just because it's a few years old and doesn't cost as much as the models today it fits every bit of the definition of a foundation model yet resnet50 is mentioned one time in this 200 page document only to controls it to clip yet it's pretty clear what they actually mean gpt3 namely gpt3 is mentioned over and over and over and over and over 65 times in this entire document only to be topped by bert which is mentioned a whopping 174 times though sometimes it's like a sub part of another word so rather than deriving conclusions from the definition the paper is actually a series of anecdotes about some models that also fit the definition yet to me that doesn't justify the new term especially if you go that far away from the definition that's like me writing a paper on the opportunities and risks of groupian models which is any model containing an abelian group and i write 200 pages about how bad gpt-3 is because after all gpt3 surely contains an abelian group somewhere in there now with all the grumpiness i know it can get a bit much the paper is actually a great literature review on models such as gpt3 dali clip in general the current models that are trained on large-scale data and might not be entirely accessible to everyone i'm not trying to deny that there are dangers to that but let's keep in mind that for example gpt-2 was also considered incredibly expensive and non-accessible and if you remember even too dangerous to release at the point of release yet these dangers haven't actually materialized and as far as centralization of models go and choke points i'm pretty sure it has happened previously in the machine learning world that pretty much everyone used the same couple of two or three really well working algorithms no can't think of any none of them well okay let's continue so the community will have to decide if they accept this new term foundation models or if we just call gpt-3 and bert by their names [Music] okay next news the neural hash story continues there are various projects in order to create collisions or run neural hash by itself there's even one in the browser i also have one if you want watch the video so also we have now reports that imagenet contains naturally occurring hash collisions by roboflow here you can search imagenet for uh things that elucidate the same neural hash apple has responded by saying that there's another server-side check if you prevent wrong collisions and so on but safe to say this neural hash system isn't the most effective you can evade it easily you might be able to force collisions yet still we have a report from kron 4 that a bay area doctor was found with 2 000 images and videos of child pornography we don't know exactly if this is already a result of this system if it is you know good job it works as intended that makes me happy that it worked here it still does make me more comfortable with the privacy implication of neural hash in general next news facebook ai research releases a new paper called control strategies for physically simulated characters performing two player competitive sports this is a reinforcement learning framework for control applications where you have mostly humanoids doing sports but essentially the core parameters here are that there are a lot of degrees of freedom in some sort of a two-player game in a continuous environment i just love that the algorithm seems to come up with actual cool strategies and good control policies it's not so easy for these things to balance themselves in the first place and then to fight a boxing match where everyone tries to punch the other one to the ground is quite difficult so you can see the difference between this new framework and sort of a comparison framework i argue that the baseline though is the more interesting one certainly oh no if you're interested in control and two player games check it out [Music] tesla had its ai day this was a big presentation where they talked about all their advancements into ai i don't know if i should make an entire reaction video to that i think i will in the meantime lex friedman has made an excellent overview over the most important things that happened there i highly recommend you go check that out and we have we have we have to talk about the tesla bot so the idea here is that all these technologies tesla's developing for the car can also be deployed in a more general way in a humanoid robot to do manual labor so this is from an article in ieee spectrum this is the slide that tesla had up displaying the tesla bot now besides the applications of eliminates dangerous repetitive and boring tasks it's also supposed to be friendly gotta you gotta you gotta love elon musk now needless to say this is probably over promised both in whether or not that's doable at all with current or near future technology to the timeline they give which is i think something like a year or so is probably not going to happen as advertised but i come to think that musk sometimes does things just to provoke exactly the reactions that we're getting elon musk has no idea what he's doing with tesla pot humanoid robots are way harder than musk seems to think sometimes i wonder if he's like what if i just tell them i'm gonna build a robot in a year also the way he introduced the robot is first of course it's just a mock-up slide but then he actually brought a human in a robot suit up on stage and the human starts acting robotic but then of course increasingly gets less robotish and you just see elon smile back there this was totally like you can imagine him sitting planning this out is like what if we like get a human and then just so the world decides whether this is funny or not i think it's hilarious this is 100 hilarious now as far as competitors go george hotz revealed the comma 3 which other than tesla self-driving approaches is a thing that you can put into a lot of different cars essentially one mounted unit with cameras on it that is also supposed to do driving assistance and i think something like fully self-driving in the near future there's also a big long presentation about the specs of the command 3 the problems with self-driving with navigation in general with covering all of the edge cases and other than tesla comma takes an open source approach where it actively wants the community of developers to help developing the product further so if you are interested in that the comma 3 dev kit is available to order next news crn writes intel says it's winding down realsense camera business so intel was developing cameras sensors and so on for computer vision application now it's saying it's shutting that down to focus on its core business mid of a loss if you had one of these or were planning on getting one of these we've seen companies in the past saying they are going to focus on their core business and it's not really clear what it means for some companies it means they are on the edge of bankruptcy while for others it means they just want to make even more cash needless to say if you're looking into sensors and vision hardware intel is no longer the place to do so but ibm might be pr newswire writes ibm unveils on-chip accelerated artificial intelligence processor okay this is not a camera or a sensor i just thought it was a great segue into the next segment but ibm unveiled the telum processor which essentially has an ai accelerator on chip so a matrix multiplier their idea is to bring the compute to where the data is and so on but it's good to see a bit of competition in the market for accelerator chips okay kaggle has a new competition up called lux ai this is essentially a two-player game where you control units and have to collect as much light sources as possible to survive the night so if you're interested in game playing agents give the lox ai challenge a try or if you are interested in game playing agents in very large world together with lots of other agents look into ai crowd's neural mmo challenge here you deploy an agent into a world with not just one other player but many other players over longer periods of time the goal is to collect resources and at the same time keep others from collecting their resources it's very cool to see these kinds of challenges you don't have to use reinforcement learning or anything you can just script your bot if you want to but it's usually cool to see which approaches win at the end in these very open world challenges very cool give it a try okay at this point i want to shout out to dribnet who has been making a step into a bit of a different direction using the clip model and its image generation capabilities going into pixel art and this looks very very cool so he's been generating various skylines and going through the abc with various words zygote and zoo there's wellington a yacht and a yakuza x-ray in xenomorph i love the idea that going to pixel art essentially blurs the line between human created and machine created even more a lot of these pictures look absolutely fantastic so this can be potentially used to just create funny pictures but also can be combined for example to create video game assets and various other things where pixel art is generally used okay following up a bit on the plagiarism issue the reinforcement learning subreddit saw a big post saying that multi-agent reinforcement learning top conference papers are ridiculous essentially alleging that the entire field has a problem with unfair experimental tricks or cheating essentially what you want to do is just implement really crappy bass lines and then have your model be bigger more powerful take a longer time have more information and do a better hyper parameter search essentially what we're used to from the entire field of machine learning but the subfield of multi-agent reinforcement learning because it's super noisy and the experiments are mostly not standardized apparently has a particularly large problem with this so there are people voicing in saying they've published in these fields and this is absolutely true mostly also that papers with solid experiments aren't getting published because i guess they're not as flashy as the paper with the tricked experiments needless to say another bit of evidence that you shouldn't take the experimental results or any individual paper statements at face value benzynga writes elon musk lex friedman see language evolving with help of artificial intelligence wow this sounds like a thing that they interview elon musk that they analyze years of work and integrated anything like this no no they just they looked at they looked at two tweets they looked at two tweets and they made a news article about that all right ai helps a lot of people tweeting this right now tweeting this right now i want a news article tomorrow you hear that tomorrow all right now we come to our segment of ai news questions which i answer absolutely without any context or reading the article here we go zitinet writes can ai improve your pickup lines wait actually i i need to right here's what he comes up with do you want to have a cup of coffee wow you know i guess for most people using pickup lines simply saying please don't use pickup lines just ask them for coffee is an improvement so the answer is yes the enquirer asks what if the simpsons were voiced by artificial intelligence i don't care as long as bart is still in scientology all is good presenza asks artificial intelligence or human intelligence i don't know probably depends on the tasks you want to solve analytics inside asks which career should you choose data science versus artificial intelligence just learn the program you'll be fine just learn the program the bbc asks is a.i biased yes the answer is yes but probably not in the ways that the loudest people tell you it's probably biased in a bit more of a boring way and probably a bit less in a oh my god this is terrible way ricochet asks when will artificial general intelligence actually arise to this technology summit here i don't know but neither do they design news asks how smart can a machine get i don't know i what's this question like seven smart a machine can probably get seven smart cool and forbes asks is artificial intelligence contributing positively to parenting let's check this out google what to do if my baby turns blue if your baby is turning blue calling 911 is very appropriate thanks ai i guess the answer is yes all right that was it for our news questions if you see a news question and want it answered without me reading anything let me know okay a few last shout outs if you're old like me you remember the good old days of blobby volley well here's a 3d volleyball reinforcement learning environment built with unity ml agents check it out also in light ai releases maze applied reinforcement learning for real world problems it doesn't really have anything to do with an actual maze it is yet another rl framework but rl frameworks are kind of like there are many of them and most of them have something wrong and something right and if you haven't found any yet that fit you maybe give this one a try and lastly metaphor releases wanderer 2 a large language model that was trained we searched through 2.5 million articles that were posted on hacker news and yes hacker news has a notoriably crappy search function so uh thank you cool this was it for this week's ml news i thank you so much for checking in and checking out weights and biases that being said have great rest of the week i'll see you next monday ciao [Music] you

Info

Channel: Yannic Kilcher

Views: 24,065

Rating: 4.926641 out of 5

Keywords: deep learning, machine learning, arxiv, explained, neural networks, ai, artificial intelligence, paper, plagiarism, research plagiarism, ml plagiarism, foundation models, tesla ai day, comma three, comma 3, george hotz, elon musk, stanford, stanford ai, stanford hai, resnet, momentum resnet, lux ai, neural mmo, lex fridman, dribnet, clip pixelart, pixelart, ai art, ai pixelart, deep learning tutorial, what is deep learning, introduction to deep learning, ml news, mlnews

Id: tunf2OunOKg

Channel Id: undefined

Length: 32min 36sec (1956 seconds)

Published: Fri Aug 27 2021