BERT and GPT in Language Models like ChatGPT or BLOOM | EASY Tutorial on Large Language Models LLM

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello Community Transformer language mods Bert versus GPT which model is better both are based on a Transformer architecture you have here on the left side your encoder and on the right side your decoder architecture but if you look closely you see there are the same elements you have a multi-headed tension and a feed forward Network in each layer in the encoder and in the decoder the only thing is different you have additional and mask multi-header tension in the decoder stack so you guessed it the encoder stack is bird from Google and the decoder stack is GPT from open AI now there's some significant difference because of this additional multi-head attention to this mask GPT is only directional this means it looks back at the previous words to predict the next word while bird is bi-directional considers the words that come before and after the missing masking term and predicts what the word should be also in bird you have a pre-training and a fine tuning you can continue learning in birth while in GPT you'll have the massive pre-training and this is it it doesn't learn anything new Series in the pre-training size if you have bird with one gbt as to include everything now in GPD you have the free training and then you only have a few short learning with a prompt but in bird you have a pre-training and a fine tuning for the downstream tasks and in addition if you have new data it is easy because you add an additional layer to the bird architecture and you can train the whole network again if you have new data you see in GPD you have to pre-train the whole system again but in bird you just add a fine tuning layer now we found that the last layer in the bird architecture are the most task specific so maybe you can even freeze the first layers Roberto is an optimization to bird it improved that the MLM is dynamic that the next sentence prediction is dropped the byte Bearer encoding tokenizer is used in a larger mini batch is used since bird was significantly under Terrain with GPT 3.5 it's a sheer size question and a black box nature that can be restrictive if you look at the leaderboard you see Bert is almost everywhere and if you ask so how can gbt learn well you add reinforcement learning from Human feedback and if you're interested in the chat GPT system by open AI I have a specific video for you the next video will be in the future

Info

Channel: code_your_own_AI

Views: 13,860

Rating: undefined out of 5

Keywords:

Id: ewjlmLQI9kc

Channel Id: undefined

Length: 2min 58sec (178 seconds)

Published: Thu Jan 19 2023