Implement Speech-To-Text on Android with .NET MAUI

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
In this video, we're going to learn how to implement Speechtotext Android, your Net Maui Android application. Much, much earlier on this channel, I've recorded a video on textospeech. So that is inputting text and then having that spoken out by your application. That is, well, very easy to implement back then with Xamarin Essentials. Xamarin Essentials is now just part of .NET MAUI. It's just another API that you can use in your Maui application right now. If you want to know more about that, I'd highly recommend that you check out the video that's popping up on your screen right now. All about implementing that, but a lot of people under that video asked, hey, I want to know about speech to text. So kind of like the other way around. Now some people also had the requirement to not go to services like Azure Cognitive Services or other AI services. So while that is a perfectly good option, this video, it's going to be multiple. Starting with android. These videos are going to focus on the APIs that are available on the device. Now, it's not guaranteed that this will not send the speech to back end services, especially for Android. The documentation is kind of vague. It might be sent to external services. It probably will actually. So that's going to cost bandwidth, that's going to cost battery. So don't just go listening all day and send everything over. That's definitely not something that you want to do. But with this, you can implement the APIs to actually implement speech to text on your device. Let's go dive it. So I just created a file. New Net Maui Application. I did make some tweaks already so that I'm not boring you with all the plumbing here because we have lots to COVID here with implementing the speech to text already. I have it running here on my physical device so that I can use the microphone right. And you can see that screen mirrored here on the right so that you can see everything that's going on here. Now if you've been following this channel for a longer period of time, then you might know that I like my Simplistic designs, or rather I'm just bad at designing. And I just put in a couple of labels and buttons and that's usually it. So this one is no different. I have this vertical stack layout right here with a label and two buttons inside of it. And that label is going to capture the recognition results, right? So that's going to get the actual results of what I'm going to say and put that into text and put that here on the screen and the buttons is to start the listening and cancel the listening session basically. Now this is already something that's good to be aware of. We're going to implement in this video. It for android. There's other videos for iOS and Windows as well. Stick around to the end of the video to figure out where those are. But I'm going to implement it for Android for this video. And the behavior is a little bit different between the different platforms. So here the listening kind of stops after a short time out. By default, if you stop talking, if it doesn't detect any sound anymore, just going to stop. For iOS, I think it just keeps listening until you kind of cancel it. There's probably options to also cancel it whenever it detects silence. I'm pretty sure there is options for that as well. So be aware of these little difference right here that are on the platforms and that you might want to tweak for your own application. Now having that said, we're going to write some platform specific code and this is based on a blog post by my good friend Vladislav Antonyuk. I'm hoping I'm pronouncing that correctly. I like to use his full name, but I also can call him Vlad. He is with me on the Net Maui Community Toolkit team and he's done an amazing job at writing this blog post. So go check it out in the links below. And he also has plans to wrap this into a plugin or maybe even the Net Maui Community Toolkit. So maybe by the time you're watching this, you can just put in that plugin and use it inside of your own application. But it's always good to know what is going on under the hood, right? So that's what we're going to learn today. Now first, before we go to write any code, let's go into our Solution Explorer and we want to go into our Android manifest because we need permissions for this to actually work. So like I said, this might or might not send something to a back end service. So you might want to have the Internet permission in here that's kind of like by default in the templates. It's very common for that to have and let's just copy and paste that one. And we want to also add the record underscore Audio in here. That's the permission for the microphone so that we can use that one as well. Now with that in place, we're going to go to our Solution Explorer once more and right click on your project and add a new item. And I'm going to add an interface first. So I'm going to find the interface here under code interface. And I'm going to call that I Speech to Text because we're going to implement platform specific code, right? But we need on our abstract layer, our shared code layer, we need something, a contract that we can use to share our code and see what's going on there. So we have this interface. Let's make this a public interface. Right here is Speechtotext and I'm going to copy a little bit of code from off screen here, which is going to be the two methods that we are going to put into this interface. So let's put them here. And this is going to be the request permissions. It's not technically necessary to have that, but it's nice to have so that it will request the permissions automatically. Or you can do that wherever you want in your code. And the other one is the important one, which is going to be our listen method, right? So we have a couple of parameters going in here. We have the culture info. So with that you specify what the culture is that's going in. So in most cases, at least for this demo, it's going to be enus, right? It's going to be English so that it knows that this is the language that it's going to use to translate the speech to text. And if I'm going to speak Dutch and this is set to English, it's going to try and convert those words to English and only garbage comes out, right? So that's not great. We have this IPROGRESS thing. I progress is a nice kind of thing for providing progress updates. Don't need to really worry about that. You'll see in a little bit how it works. But basically how that works is whenever there is a bit of progress. So whenever the speech has been put into text, it's going to feed that back through this result and we can do something. Probably put that in our main page, right? We make it show up there. And the cancellation token, if we want to cancel our speech to text session, we set this cancellation token to cancel. And then our speech to text session is cut short and it's going to stop the microphone and it's not going to do things anymore. Of course, this returns a string. This returns a boolean if the permissions are requested and in order yes or no. So this is the interface that we want to work with. Now the next thing we want to do is actually give this somebody actually give this an implementation. So let's go over to our solution explorer. And because the way .NET MAUI and the single projects are set up, this is very easy to do. You can argue if it's like the nicest cleanest thing to do code wise, but you can definitely do this in multiple ways, right? But I'm going to kind of take the simple route here and the way we're going to do that is go into this platforms folder. And whenever you go into the platforms Android or iOS or Mac Catalyst, you can suddenly write platform specific code. And it's all in one project, at least the way it's set up right now. So I can just go into one of these platforms folders and use that in the rest of my shared code as well. It sounds kind of cryptic, but you'll see it in a little bit. So in my Android folder, I'm now going to right click and do add and do new item. I'm going to make it a class this time and I'm going to name this speech to text implementation. So this is going to be the implementation. Let's make this a public class as well. And I'm going to make this implement I Oops speech to text, not text to speech. I sometimes mix those up. So I need to of course now implement those two methods, right? So let IntelliSense help me here a little bit and implement interface. So we got all that and it's going to throw the not implemented exception, right? So let me put this here on screen for you. Let's start with the easy one. Let's start with the request permissions. So again I'm going to copy a little bit something here off screen. So let's just copy the couple of lines here and boom, we've got our request permissions kind of done, right? So we are using here the permissions APIs from Xamarin Essentials which is now .NET MAUI Essentials which is now just an API in Donald Maui, right? The Essentials name doesn't exist anymore. So in .NET MAUI we just have permissions and request Async and we're going to request permissions microphone. So that's just an API that you can use. I need to add the Async here to my method so that all the squigglys go away. We're going to check the status and if we have the status is available for the speech recognizer, we are good to go basically, right? So we're going to check the status is available for the speech recognizer, if that's a service that's on this device and if both of these are true. So if the status is granted and is available we're going to return that and the request permissions is then good to go. Now you see that there is one weird thing here. The Android app application context. It doesn't recognize this because there is a naming clash here which is not great. So we have this namespace maui speaks to text sample. That's my project name platforms Android which is the default way how Visual Studio handles these namespaces. But now this clashes with Android. So it's now trying to find this app application context in this Android namespace. So the easiest thing to do here is just remove this Android from here which is also benefits are kind of like platform specific versus shared code implementation as we'll see in a little bit. So just remove that platform specific bit from here and we're good to go. Now it starts recognizing this and we've implemented our request permissions. So that's great. Now the listen function, the listen method, let's implement that and we need a lot for that. So I'm going to copy a lot of code for this first. Well piece by piece so don't be afraid. I'm going to walk you through all of this and let me put in the first bit here. So instead of this throw new not implemented exception I'm going to paste this in here. Now this is going to give a couple of errors because I don't have everything in here, but what this does is instantiate this listener, which is apparently some kind of private field. So let's actually get that as well. So those won't give us any errors anymore. So I'm in this class, I'm going to add these two private fields, which is a speech recognition listener, which is something that we're going to define ourselves in a minute. And we have this Speech Recognizer, which is the API from Android, to actually make all of this possible. So we have these two fields. Now for this listener, we're going to create our new speech recognition listener. So we'll see that in a minute we have an error. Well, actually, I'm going to handle that in a little bit. So let's just skip over this for now. Then we're going to have this speed recognizer, which is going to create a new speed Recognizer, the APIs that Android had to recognize speech. Basically, we have to provide that with our application context, which is an Android specific thing, the context of our application whenever that failed for some reason. So whenever this is null, we're going to throw this exception like, hey, apparently we got here, but this service is not available, which should never happen, right, because we checked it here as well. But you never know. Better safe than sorry, right? We're going to set this recognition listener, which is that thing that we are actually going to still create. And then we're going to start listening, right, with the culture that we specified. So that culture info is going in there so that it knows what language to listen for. And then it's going to await using cancellation token. So it's going to get that cancellation token, that's a concept in C#. If you want to know more about anything that you want seeing here, and I'm not explaining thoroughly enough, let me know down in the comments what you want to see videos about. But the cancellation token is a concept in C# that you can use to cancel operations. So whenever a cancel is requested on the cancellation token that we get provided here in this method, then we're going to say stop recording, which is also something we still need to implement. And then it stops the recording, right? So the first thing I see a couple of awaits here. So the first thing that we want to do is make this a sync again so that those red squigglys will go away and then we can start implementing all the other things. Well, again, let's start easy with this stop recording thing right here. So what we want to do with that is just a private method right here, stop recording. And we're going to say speech recognizer, stop listening. And we're going to destroy it as well so that it release any resources, not have any memory leads, right? So stop recording. Boom. Implement it. Now this one they start listening with create speech intent. So intense is another Android concept that provides extra data for the operation that you're about to do. So let's create a little method for that as well because there can be a lot of intents for this speech recognition stuff. So we have this intent. We should be able to implement this. Visual Studio should do that automatically. But in this case you want to add the using Android content. It's going to add that at the top. And we have this intent now and we're going to use this action recognize speech, right? So we have this new intent which is about recognizing speech. And now you can put all these flags in here. So check out the documentation for Android. There is a lot to COVID here, but you have the extra language preference. You have the language tag. So we're going to say that culture name again. Some extra language things here. I got a couple of things in commons that you want to check out. So the extra speech input complete silence, right? So possibly complete silence, that's kind of like that time out whenever you stop talking. And it kind of like tries to detect whenever you're done yes or no. And this one is kind of important because the extra partial results, by default this value is false. But you want to set it to true if you want to kind of like have that scrolling, popping up of text. We'll see that in a little bit. Whenever you're speaking, it's going to report back in progress in little chunks and it's going to make that show up on our screen. So we've got all that. We've set all of our flags for our intent. And the only thing that we still have to do now is add this speech recognition listener, which is also an implementation of something within Android. So I'm going to copy this full class and actually let's just paste this here on top in the same namespace. So I'm just going to paste this here again. It's a whole lot, but a lot is also not implemented. That's the way it works for Android because we are implementing this interface and Android at runtime is going to kind of detect to see which of these are implemented. And it's going to try to call them probably through reflection or whatever, how that works. So you need to have all of these methods in here, unfortunately because it tries to call them regardless if there's something in there, yes or no. But we only implemented some of these, right? So on error, we are invoking this error thing right here with the error so that we can catch back whatever is going wrong and we can give a little information to our users. We're going to send those partial results, right? So we're going to send the results back to our shared code and we're going to see them pop up. We're going to catch that whenever our full result is done, we're going to send that same result. So that's going to all go to the same way. And again, we have another overload of the send results. So that's kind of like these are going to feed into this one. And we're going to check if the matches are not null, if the matches are not zero. And we're going to send back those results to our action, right? This is our action that we register in our application. So if we scroll back down, we now have this class that captures the results on the Android level. And if we now go back here to our speech to text implementation, we see that the red squeeze are gone. We have this new speech recognition listener. We are setting that error, right? We have a couple of error. We have a couple of properties here. So we have this error. One, they're all actions. So we can invoke actions whenever this is happening. And you can see these actions are triggered from right here in this Android implementation of the eye recognition listener, right? So that's how it all kind of ties together. I hope that's kind of clear. So whenever we have an error, we're going to try set exception to our task result and we're going to do new exception with the details for that, the partial results. So whenever a partial result is there, we have this thing, this object going in here, which is a string and we're going to send that to our recognition result report and we're going to send along that string that we've detected and kind of like the same thing for our results. So that's how we service them to our shared code layer, if you will. So we've got all this in place and what we can do now is actually start consuming this. So I've implemented this through this interface so we can use this with dependency injection. So dependency injection is built in with .NET MAUI by default now. So I can go to my Maui program and I can register this for dependency injection. So let's do builder. Let's scroll down so that you can see it a little bit better. Builder services add transient. We're first going to add my main page in here because I want to inject it in my main page. And then I also need to register that in my dependency injection container. We need the Builder services add singleton. I think our speech to text implementation can definitely be a singleton. So let's just do that. And I want to register this as the Maui speech to text. And I want this to be the speech to text implementation, right? So what we're seeing now is, hey, I want to have whenever the I speech to text is requested, I want you to provide a speech to text implementation. Implementation. Did I spell it correctly now? I hope so. But now it's not recognizing it right. So what we need to do is here at the using Maui speechtotext platforms. And you can see that it already starts complaining here like, hey, this is unnecessary. And we have this box right here in the top left is set to iOS. So for iOS, we didn't implement it right so it can't find this class. And you can see whenever you hover over it like, hey, for Android it's available, but for iOS and Windows and Macatalyst, it's not available. We're going to work on that in Next videos. Make sure to stick around till the end. But for now, we're just going to make sure that it runs for Android and we're going to implement the rest later on. And if you switch here to here in the top left to the Android context, that's basically what you're doing here. You're saying, hey, now I want to see whatever is built for. Whenever I target Android, you can see that it starts recognizing these things, right? But if I don't do something like this, probably if Android, then it's probably going to break because whenever you start running and building, then it's going to try and build it for all platforms. And now this can only be done for Android, right? So let's make sure that this only runs for Android. Do it like this. And now we make sure that only the Android bits are being compiled and we're good to go on this front as well. So we've got that set up. Now the last piece of the puzzle is hook up the UI that you can see here on the right. So we need to actually inject this is Speech to Text. So let's do that. Go to your solution explorer. We're going to go to my main page and the main page, XAML CS. Inside of that in the constructor I want to say is Speech to Text. Speech to text. So now this will be injected, which works because I registered everything in the dependency injection container. I want to use this later on. So let's make a private field for the Is speech to text. And let's name that speech to text as well, which is maybe not the best thing to do here with the naming convention, but oh well. So you make sure that this speech to Text is speech to text. And now we have our Speech to text implementation inside of my main page that I can use. So let's do that. And I already set up this listen command. So this is going to go to this listen method right here. And this listen cancel. And to actually be able to cancel, I need that cancellation token, right? So also let's set up this private cancellation token source. Token source. And we can use that to actually cancel our things. So listen command goes to the listen method right here. Listen cancel goes to this method right here. And then I'm setting the binding context so that it knows that it needs these properties from the recognition text and listen command, all from right here. Now for the listen, I'm going to actually copy some code here again. So let's do that. Here we are. So first we're going to check like if we have the permission, right? Because we have that method in our code as well to request a permission. So why not do that? Let's make this async while we're waiting here so that all the rest quizzes go away. So we're going to check if authorized. If not, then we're going to show this display alert like, hey, we have the permission error. We can't do anything here. But if we do our happy path, then we're going to try and set that recognition text. So that's the string to the await speech to text listen with a hard coded culture info. So I'm just going to set this to enus. You might have different ways of doing this, but I'm just going to do it this way. And then our second parameter is the new progress, right? So that's for our progress reporting. And then inside of that new progress, we can specify an action that we need to take whenever a new progress is reported. And that's happening on our Android side. And then we're going to just set our recognition text to this partial text that's coming in, right? So again here there's going to be little differences in behavior for iOS and Windows. But for Android you can just set this recognition text to the partial text because it will add the partial text to the things yourself. There is a little bit of optimization that you can do here. But for now I'm going to stick with this nice assignment for you at home to figure out how this fits in your application, right? And we have this cancellation token. I see that this was a little bit different in my sample code that I prepared. So I'm going to say token source token. This is going to put in the token and then in the cancel listen cancel, I can just say token source cancel, right? So we can cancel our thing right here. And I actually need to create a new one here to make sure that things are not going to break. So we have this cancellation token source. And there we go. So now we have everything in place. Whenever I tap listen, it's going to start listening. It's going to show me the things and whenever you press cancel, it cancels. So the app has been running the whole time. This is probably not going to pick up with all the net hot reload, XAML hot reload. So let's just stop and restart really quickly right here. And we're going to see what is actually going to happen. And if this speech to text is actually going to work in our .NET MAUI Android application. So here we are. The application is coming back up, at least it is, yes, here on my physical device, also on the mirrored screen. So we should see our minimalistic design again. And whenever I start pressing listen, then you should see the label popping up with everything that I'm saying here as well. So let's press listen. Oh, actually we get the permission first, of course. So I'm just going to do oh, I'm in the way here. So we're going to get this permission, right. So you see while using the app, which is fine, and then I can say, hello, my name is Gerald and this is actually the speech to text on Android. And you can see that it picked up on some of it, not all of it, but it does the speech to text fairly good. So maybe there's something wrong with the way I catch my partial results, or maybe something is just not great with me talking into all kinds of microphones right here. But you can see that this works. We've now implemented speech to text on Android together with Done in Maui. Now we have android done. Of course, there is the other parts, I've already mentioned it a couple of times. iOS and Windows Macatalyst should work together with iOS as well. Maybe with a little tweaks here and there. But we'll see that in other videos. So make sure that you catch the other videos as well, where I implement the rest of this. And you will also learn a little bit about how to do this platform specific implementations with Dundee Maui while still sharing the maximum amount of code. I just want to give a quick shout out to Vlad for writing the blog post which this code is based upon. So flat. Thank you so much for all your hard work and doing the great stuff that you do. Also together with me on the dotnet. Maui community toolkit. If you've liked this video, please click the like button so that it will spread to more people onto YouTubes and more people will learn about net Maui speech recognition, Android, all the things right here. Subscribe to my channel if you haven't done so already. And click here to go to the next video on iOS and see this playlist to actually fully discover more about Net Maui. See you for the next one.
Info
Channel: Gerald Versluis
Views: 5,558
Rating: undefined out of 5
Keywords: .net maui, net maui, dotnet maui tutorial, dotnet maui, .net maui tutorial, speech to text, speech-to-text, speech-to-text android, maui speech to text, android speech to text, c# maui, c# maui tutorial, c# maui android, speech to text app, how to convert speech to text in android, stt, .net maui speech to text, net maui tutorial, speech to text android app, android speech to text tutorial
Id: CI-Fx8_0oYo
Channel Id: undefined
Length: 24min 48sec (1488 seconds)
Published: Tue Dec 20 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.