Sparse is Enough in Scaling Transformers (aka Terraformer) | ML Research Paper Explained
Video Statistics and Information
Channel: Yannic Kilcher
Views: 23,055
Rating: undefined out of 5
Keywords: deep learning, machine learning, arxiv, explained, neural networks, ai, artificial intelligence, paper, terraformer, scaling transformers, nli, nlp, natural language processing, transformers memory, deep learning memory, fast transformer, fast transformers, attention, attention mechanism, attention is all you need, bert, gpt-3, google research, reversible layers, reformer, sparse attention, sparse feedforward, low-rank
Id: hgSGHusDx7M
Channel Id: undefined
Length: 57min 6sec (3426 seconds)
Published: Thu Dec 02 2021
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.