The Narrated Transformer Language Model
Video Statistics and Information
Channel: Jay Alammar
Views: 60,002
Rating: 4.9402986 out of 5
Keywords:
Id: -QH8fRhqFHM
Channel Id: undefined
Length: 29min 29sec (1769 seconds)
Published: Mon Oct 26 2020
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.
Is it just me or is NLP way overhyped?
As someone who worked in NLP at a tech company for 2 years, I was blown away by how little of our data was cleaned, prepped and ready for analysis. One project I worked on was training an LSTM to extract skills and education from resumes. The company refused to force customers to tag these tokens in their text, so the burden was passed on to employees. This became a massive bottleneck!!
People act like the tech that makes billions for google, Fb, etc is equally relevant to their companies, and that’s just not the case. You need to trick the customer into doing data prep (Google made people copy NY Times photocopies to text, Fb lets you tag the faces of your friends, etc.) But without tasking your customers, you’ll never accrue the volume of data that’s truly necessary to take advantage of SOTA deep learning methods.
Not to say that transformers aren’t really powerful and impressive - they are! But I think the applicability of these tools to all companies has been grossly over inflated in public perception.
Bayesian models tend to learn far more from limited information than a neural network, so I think they’re more applicable to the average tech company than whatever state of the art paper google pushes out. (Bayesian models are far too slow on truly massive data sets, but again that’s usually not the situation the average tech company is in.)
Hi r/learnmachinelearning,
In this video, I present a simpler intro to transformers than my post "The Illustrated Transformer". I hope it encourages people who are new to the field to feel more comfortable to dig in and learn more.
Language modeling is easier as transformer intro material because you don't have to worry about 'encoder' and 'decoder' components. I've also used two distinct examples to showcase the value of the two major components of a transformer block (self-attention and the FFNN).
Hope you find it useful. Please let me know what you think.
I thought it's a big brain meme lol