What are Transformers (Machine Learning Model)?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
no it's it it's not those transformers but but they can do some pretty cool things let me show you so why did the banana cross the road because it was sick of being mashed yeah i'm not sure that i quite get that one and that's because it was created by a computer i literally asked it to tell me a joke and this is what it came up with specifically i used a gpt-3 or a generative pre-trained transformer model the three here means that this is the third generation gpt-3 is an auto-regressive language model that produces text that looks like it was written by a human gpt3 can write poetry craft emails and evidently come up with its own jokes off you go now while our banana joke isn't exactly funny it does fit the typical pattern of a joke with a setup and a punch line and sort of kind of makes sense i mean who wouldn't cross the road to avoid getting mashed but look gpt3 is just one example of a transformer something that transforms from one sequence into another and language translation is just a great example perhaps we want to take a sentence of why did the banana cross the road and we want to take that english phrase and translate it into french well transformers consist of two parts there is an encoder and there is a decoder the encoder works on the input sequence and the decoder operates on the target output sequence now on the face of it translation seems like little more than just like a basic lookup task so convert the y here of our english sentence to the french equivalent of porcua but of course language translation doesn't really work that way things like word order in terms of phrase often mix things up and the way transformers work is through sequence to sequence learning where the transformer takes a sequence of tokens in this case words in a sentence and predicts the next word in the output sequence it does this through iterating through encoder layers so the encoder generates encodings that define which part of the input sequence are relevant to each other and then passes these encodings to the next encoder layer the decoder takes all of these encodings and uses their derived context to generate the output sequence now transformers are a form of semi supervised learning by semi sequence semi-supervised we mean that they are pre-trained in an unsupervised manner with a large unlabeled data set and then they're fine-tuned through supervised training to get them to perform better now in previous videos i've talked about other machine learning algorithms that handle sequential input like natural language for example there are recurrent neural networks or rnns what makes transformers a little bit different is they do not necessarily process data in order transformers use something called an attention mechanism and this provides context around items in the input sequence so rather than starting our translation with the word why because it's at the start of the sentence the transformer attempts to identify the context that bring meaning in each word in the sequence and it's this attention mechanism that gives transformers a huge leg up over algorithms like rnn that must run in sequence transformers run multiple sequences in parallel and this vastly speeds up training times so beyond translations what can transformers be applied to well document summaries they're another great example you can like feed in a whole article as the input sequence and then generate an output sequence that's going to really just be a couple of sentences that summarize the main points transformers can create whole new documents of their own for example like write a whole blog post and beyond just language transformers have done things like learn to play chess and perform image processing that even rivals the capabilities of convolutional neural networks look transformers are a powerful deep learning model and thanks to how the attention mechanism can be paralyzed are getting better all the time and who knows pretty soon maybe they'll even be able to pull off banana jokes that are actually funny if you have any questions please drop us a line below and if you want to see more videos like this in the future please like and subscribe thanks for watching
Info
Channel: IBM Technology
Views: 159,123
Rating: undefined out of 5
Keywords: AI, Software, Machine Learning, Artificial Intelligence, transformers, attention transformer, deep learning
Id: ZXiruGOCn9s
Channel Id: undefined
Length: 5min 50sec (350 seconds)
Published: Fri Mar 11 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.