Introducing Nova-2: The Fastest, Most Accurate Speech-to-Text API

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

breaking news today deep cream is announcing Nova 2 the world's best performing ASR model to date now I know what you're thinking best performing what does that mean Best Buy what metrics great question when we discuss Nova 2 exactly three metrics come to mind accuracy speed and cost and while the original Nova 1 model that we announced four months ago is still the world's leader in the ASR Market we here at Deep gram are never truly satisfied so we built a better model Nova 2 is clearly the winning speech recognition model across all metrics and all domains and it's extremely simple to use you can either Dive Right In and use our API to write some code that will allow you to make the most of your audio data right away just like this or you can test out our models through our API playground which will allow you to get a complete transcription of any audio you'd like in seconds just like this no coding required see how fast that was that was deep Graham's AI in action and we'll walk you through a step-by-step demo towards the end of this video but first let's talk about those metrics ready let's go let's begin with accuracy if you look at this graph you'll notice that Nova 2 has the lowest word error rate across every single one of these ASR models and just as a reminder the word error rate or wer is a gold standard metric for AI transcription it's a way for us to quantify how accurate a model's output is the lower the word error rate the fewer mistakes the AI makes and as we can see here deep grams Nova 2 makes the fewest mistakes out of any of these models and that's just the beginning if you check out this chart you'll see that even if you stratify by domain Nova 2 is the clear winner when it comes to podcast for example Nova 2's word error rate hovers around six percent meanwhile other models have word error rates that hover anywhere between one and five percentage points higher meaning they make one to five percent more mistakes than Nova 2 does or to speak more rigorously this chart reveals that Nova 2 has has the lowest median word error rate among all its competitors across all four main audio domains podcasts videos meetings and phone calls but that's for batch transcription AKA pre-recorded transcription what if you're trying to transcribe audio in real time but if you don't have something already recorded and you need to use AI to transcribe you live for example if waiting 30 seconds to get a full hour's worth of transcription is too long you can try transcribing yourself in real time you know so the words appear on screen as you say them if that's your use case as usual deepgrams got you covered here are the stats for a real-time streaming word error rate once again deepgram has the lowest median would error rate across various domains so if accuracy is something that you care about as a user whether you've already recorded your audio or if you're doing it live Nova 2 is the obvious choice but alright what about speed well let's measure how long it takes for each ASR model to transcribe an entire hour's worth of audio looking at this chart we can see that at its lowest models from other other providers could take anywhere between 10 and 15 minutes to transcribe an hour's worth of audio down the middle of the road speed improves a bit with these mid-tier models hovering out a little over four minutes but deep Graham under 30 seconds or to frame these numbers in a more high level analysis here's the punch line Nova 2 is fast very fast and if you're say a podcast host with a three hour recording or a call center with thousands of hours of phone calls per day or if you're a company who needs to incorporate AI driven speech recognition into your product then our under 30 second transcription is the foremost choice for those who value productivity and efficiency but what about price is such high quality AI extremely expensive not at all as usual deepgram's AI models are the most affordable ones on the market less than half a cent per minute of transcription as we've mentioned before you won't have to break the bank to use deep grams models we value affordability as much as we value accuracy and efficiency and that philosophy especially applies to Nova 2. being anywhere between two and a half to five and a half times more affordable than other providers and as we said in the past this was not an accident this was a design choice we wanted Nova 2 not only to be the best AI out there but also the most accessible and most affordable how did we do it well it's all about our unique patented AI architecture and training process without giving away the secret sauce here's how deep Graham's AI works first Nova 2's architecture goes far beyond the classic vanilla CNN Transformer model while you may find an RNN or a CNN within our architecture the way in which these neural networks are wired to each other is what gives deepgram the edge over other ASR models second the training data we use to train our models is unique itself whether we gather audios from YouTube various podcasts public domain resources and so on the way deep Grim labels this data is truly unique to us so even though some of the audio is in our training set may be the same as Audios in the training sets of other ASR providers deep grams efficient and accurate data labeling style is what really pushes us beyond what these other providers are able to do after all we've been flushing out our data labeling processes basically since we were founded in 2015 and over the past near decade all those years of iterations and improvements manifest themselves in Nova 2. the result well you've already seen it in other words deep Graham outperforms everyone else because our models not only listen to audio in a unique way but what they listen to is unique as well and we don't take for granted just how difficult it is to achieve all three of these Feats remember it's simple to create an accurate model if you're willing to sacrifice speed it's simple to create a fast model if you're willing to allow it to make multiple mistakes and it's also decently feasible to create a fast and accurate model as long as it's expensive however we at deepgram don't want to make any of these sacrifices for you our fellow developers Business Leaders and AI enthusiasts we work as hard as possible to build AI that achieves this Trinity of features it takes an extra extremely skilled and dedicated research team to build an AI with this level of accuracy it takes an equally strong engineering team to make such clever and thorough optimizations for Speed and of course it takes a meticulous business approach alongside a dedicated community of customers from the Independent level to the Enterprise level alike to achieve the affordability that we do that's why deep Graham is extremely proud to be able to deliver what no other provider can the perfect near impossible combination of accuracy speed and affordability but alright that's enough talking about Nova 2. let's go over how to use it here I've compiled a little collab notebook to get started however you can use whatever ID you desire this notebook by the way is readily available on our website all you have to do is open it up note that today's demo is in Python but we offer multiple software development kits or sdks from python to node to net and go so that you can use our API in your language of choice nevertheless to get started all I have to do is head over to deepgram.com and login I'll create an API key and then plug it into the variable called dgkey if you want to be a bit safer you can use an environment variable but for the sake of this demo I'll just plug it in here now I'll just have to fill in a few other details the audio file that I want to transcribe is an MP3 so I don't have to change this mime type variable but just know that deep gram can transcribe these types of audio files on screen alright the audio that I want to transcribe will be placed in the root directory so I'll just set the directory variable to a DOT and now I just have to run the cells the first cell installs dependencies the second cell reminds me to upload my audio or audios of choice and the final cell is where the magic happens this line is the one that calls the AI to transcribe and once this cell has completed running we'll have a pretty Json that has a complete transcription alongside metadata Word level timestamps and more if we just want to see the finished transcript however we can run this bottom cell and voila we've just used the world's most advanced speech recognition Ai and finally if you just want to test out the models without having to delve into any code check out our API playground it's probably the easiest way to try out our AI right now just head over to playground.deepgram.com upload an audio or choose one of the default audios we provide and hit run your transcription should be available in seconds note that there are a few other options that you can check out like selecting which model you want or specifying whether you want us to summarize your audio in addition to transcribing it but overall the playground should be simple to explore test it out with any audio you have on hand after all you have the world's best speech recognition AI right at your fingertips and that is the power of Nova 2. so whether you're a YouTuber trying to streamline your creation process or an avid Zoom user who needs to record multiple calls whether for business or pleasure or even if you're an Enterprise call centers and Silicon Valley startups alike you'll have affordable access to a blazing fast speech recognition AI with Incredible accuracy and if you sign up for deep Graham today you'll receive 200 in free credits without even having to put a credit card down that's up to 45 000 minutes of free transcription all you have to do is create a username and a password sign up today to see the best that the AI Community has to offer and as always follow deep gram for more AI content

Info

Channel: Deepgram

Views: 7,265

Rating: undefined out of 5

Keywords:

Id: PSaVX6ST-FM

Channel Id: undefined

Length: 9min 15sec (555 seconds)

Published: Tue Sep 19 2023