Speech-to-Speech Demo of TinyChatEngine on Nvidia Jetson Orin Nano

Video Statistics and Information

Captions Word Cloud
Reddit Comments
hello my name is Jimmy Sean I'm currently a intern at the MIT Han lab llms are a very powerful tool however there are many pain points such as the amount of computation memory and energy required to utilize them tiny chat engine demonstrates that llms can be more accessible and be utilized on more memory constrained Hardware devices such as Edge devices having AI on the edge has many advantages including increased security lower latency and lower cost so right now I'm going to present a speech-to-speech demo of llama27b on a Jetson or a nano our demo has three parts a Texas beach model the language model and species model name one tourist attraction in San Francisco so the model we are using is llama27b chat which originally requires 14 gigs of memory and we are deploying it on a Json R Nano which is seven gigs of memory so we need to compress the model down so it can fit it onto the device to do so 4-bit integer quantization is used where we reduce the bit weight of the weights the bit width of the weights to 4 bits to cut down the size name one difference between MIT and Stanford so as I was saying four big integer quantization is used to reduce the bit width however doing so usually leads to a drop in accuracy so it developed activation aware weight quantization brought a range of academic offerings including business law and Medicine we then implemented Tinychat engine a universal low bit inference Library designed for efficient deployment of quantized llms on edge devices its flexibility allows it to support many different quantization techniques and different instruction sets and accelerators tiny chat engine can pave the way for more research and be done so that llms can be accessible everywhere and anywhere to learn more check out awq and Tinychat engine on the GitHub repos on MIT on lab thank you
Channel: MIT HAN Lab
Views: 3,836
Rating: undefined out of 5
Id: Bw5Dm3aWMnA
Channel Id: undefined
Length: 3min 16sec (196 seconds)
Published: Sun Sep 10 2023
Related Videos
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.