Implement Speech-To-Text on iOS with .NET MAUI

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
In a previous video, we've seen how to enable speechtotext in an Android Net Maui application. In this video, we're going to learn how to do the same. But now for iOS, if you've been following this channel, then you might have seen the previous video where I enabled an Android device for speechtotext by writing some platform specific code. This is not something that is serviced to the Net Maui layer. So we have to write some platform specific code. But you can totally do that with .NET MAUI. And my good friend Vladislav, he created a beautiful blog post where he implements this for Android, iOS and Windows. So all the platforms that are supported for down at Maui. And I highly recommend that you watch the Android one first because there we laid the foundation for everything that we're going to see here as well. So if you are just on the iOS part, then you have to watch the Android one first where we do some specifics for all the platforms that we're going to use. But now we're going to focus on the iOS implementation. And if you want to read all about it, then I would suggest you go down there in the comments and get Floody Slav's blog post there as well. Without any further ado, let's just hop into Visual Studio and see how to do the same for iOS here. On the left you can see Visual Studio 2022. On the right you can see my physical iOS device, which you can see right here is mirrored onto my Windows machine right here. And it's connected through a Mac. So there's a Mac machine on my network. Visual Studio connects to that. It builds in conjunction with the Mac build host and it will end up on my iOS device with a lot of magic going on there. But it all works. It's a really amazing technology right here. So enter looking at the code that I've created in the previous video. Again, I highly recommend that you watch the Android video first because there are some generic parts that we are going to reuse here as well. So here we actually have our main page. So let's start with that. Actually, let's go to the main page, XAML. You can see our very minimalistic layout. So I have this label which is going to have our recognition text. We have a button to start listening and we have a button to cancel the listening, right? So you can start the listening. I already explained in the last video as well. For iOS, the behavior is a little bit different than on Android. On Android there's a little time out where it stops listening after some time or whenever it detects a silence. You can totally do that on iOS. But I think by default, it will start listening until you cancel, basically. So it's these minor things that are a little bit different between platforms and up to you to tweak it on how you need it inside of your own application, basically. So we've got that in place. Then in the code behind the main page, XAML CS, we have through data binding the listen command, right? So we have this listen method where we're going to check like, hey, do we have the right permissions? Because you need permissions to at least access the microphone, right? And if we have that, then we can do the speech to text listen with a certain culture. And we're just setting that to English. You can do a couple of other ones as well, if that's what you want, but we're going to do it to English. Whenever we catch the progress, we're going to get a new partial text. And again, here we're going to find a little bit of difference between the different platforms, the different implementations. We're going to add that to our recognition text, which is going to be shown in the label on the screen. Or whenever something goes wrong, we're going to show a little display alert. And whenever we're going to cancel it, we have this cancellation token and we're going to cancel on that. So that's this implementation here in the solution Explorer, you will find under Android, you will find the speech text implementation. That's the one that I handled in the previous video. But for like the generic, for the shared part, we have the Ispee text interface, right? So we just have this request permissions. It's going to check if we have the permissions, and if not, it's going to request them. And we have this listen method in which we're going to write the platform specific code to actually listen for the speech and translate that into text. Now for Android and for iOS, it both goes to Remote services as well. So this is going to be uploaded to the Apple services, but it is the actual platform APIs that are available for iOS and also the one for Android. And we're going to see also the one for Windows that are specific to this. So no Azure services or anything like that. But still your speech is being sent to external services where it's going to be processed for iOS. We have an option to do it on the device since iOS 13. I'm going to show you that in a little bit. But all stuff to take into account if you're going to work with this. Okay, so we need to now implement this interface for iOS, right? Because we already did it for Android. If we go to our Maui program CS, where we bootstrap all of our application, you can see here that I added this compiler directive. So now it only does it for Android and it's going to do Builder services ampsigleton. So whenever an iOS speech to text implementation is being requested, we're doing this speech to text implementation. Now, because of the single project approach of in Maui, this works perfectly, but you have to implement all the platforms first for this to kind of like, work nicely. So the first thing I want to do here is say if Android or iOS, because we're going to enable iOS here right now as well. Or Mac catalyst. This also works for Macatalyst. So we have one for Mac Catalyst as well. And let's save that. And it's going to immediately say, like, hey, I don't have this stuff for iOS or Mac Catalyst, right? And you can see here in the tool tip, like, hey, we have it for Android. So it's going to help you and show you like, hey, we have it for Android but not for iOS and Mac Catalyst. And in the video after this, we're also going to add Windows, right? But we'll see that in the next video. So right now we have this one. And if we just name this thing Speechtotext Implementation again and make it live in the Maui Speechtext Sample platform namespace, it's automatically going to pick it up because it's all in a single project. So that is really one of the powers of Done in Maui. So let's go ahead and do that. I'm going to save this one. Then in my Solution Explorer, I'm going to go to my Platforms folder. So here's the one for Android, right? And it's the Speech Suiteximplementation CS, which is fine because everything in the Platforms folder, we can write platform specific code, right? So in the Android folder, I can write Android namespaces, android types, android objects. For iOS, I can do the exact same. But now for iOS. And it won't hurt each other, right? Because it can be in the same namespace. It can be the exact same names, but it's only going to be compiled whenever you target that specific platform. So that's how that works. And that makes it very easy to not have all kinds of if statements and things here or name it differently. You can just do this for the different platforms, and it will only be compiled whenever you go to that platform. So I can go here to iOS and right click and say, add new item. And what did I actually call it now? Oh my gosh, I should go back. I just copy it to not make any mistakes here. Just copy that and do again, iOS, add new item. And I'm going to just copy that in here. It's going to be a class, which will be the speech to text implementation, right? Okay, there we have it at and boom, we have it in our iOS folder. I'm going to make this public just to make our lives a little bit easier. And this should implement the I text to no, I always mix them up speech to text, not text to speech. I speech to text. And I was going to say, hey, you need to implement all this stuff, right? We have the two methods for our permissions and the actual listen thing. So let's use IntelliSense here and implement the interface. And this is basically all that we need to implement. Well, we have to write a bunch of code to actually make it work. But this now is our iOS implementation. So if I go back to my Maui program, you will see that it should pick this up. Well, it doesn't yet because I didn't change the namespace because it automatically puts it in the namespace. That's just how Visual Studio works. It will add it into the folder where you create the class, right? And I want it to be in Platform. So let's just do this. I'm also going to add a semicolon so it will go to a file scoped namespace. So it will do that. And now it's in the same namespace as Android. So here let's just pull up the Android one as well. You can see namespace. Oh, I didn't do this in a file scope one, but you can see the same namespace. Platforms here we have Platforms as well. And if I now go to my Maui program, you will see hopefully that it picks up on that. Why doesn't it do that using Platform? Should I save the file right here? Does that work? Maybe it needs a rebuild. It probably needs a rebuild for us to pick this up. But it should start working here. The Ritzwig lease should go away and it should start working here. We'll see that in a little bit. Or then I will figure out why it doesn't work. But that is the way to ensure that you don't have to rename all kinds of things and do all kinds of magic to make this work. So here we have that and how we need to implement our code. So let's start with the easy one. Let's do the request permissions. So I'm going to copy some code from off screen here. So let's get the request permissions here and put that instead of this not implemented exception because that's going to break our application. What we're going to do here is do the iOS speech Recognition Request Authorization. So we're going to request the actual things here for the iOS, iOS, iOS, iOS, iOS, iOS, iOS speech recognition example iOS to do this speech recognition. And then we're going to set this result. So we're going to return this result basically like, hey, did the user actually allow this yes or no? And it needs to be authorized, right? So that's the thing that we need actually, while we're talking about permissions anyway, let's go to our Solution Explorer again. And also in our iOS application, we have this incope list where we need to add the permissions to actually make this work. So let's right click here and do Open With. We have a graphical editor but that doesn't have the editor for permissions yet. So let's do Open with and then with the XML text editor. We can just edit this as an XML file. And again, I'll just copy this in here because the key names are just too complicated to actually remember. So I'm just going to scroll down and here inside of that DICT node, the dictionary. Note I'm going to paste a couple of things here. We need this. NS speech Recognition usage description. So this is the permission to actually request the iOS, iOS speech recognition example. We're letting the user know like, hey, we're going to do speech recognition on whatever you're going to say to our application. And we're going to do the NS microphone usage description. So we're going to request the permission to actually use the microphone. And for both you can do kind of like a string here where you can say, hey, what is the reason that we're actually requesting this, right? What are we going to use this for? So make sure that you put something useful in here because the user is going to read this and is going to ask himself like, hey, why should I do this? Why should I allow this? And if you say like something in the lines of I'm going to use this iOS speech recognition, actually make my app more accessible and understand what you're saying because I'm using a personal assistant, then it makes sense if that's what your app about. And then they're more likely to allow these things, right? So that is what's going on here with these permissions. So our permissions are set. We have the actual implementation to check for our permissions in code. So that's set up as well. Then it's time for our listen method, right? So we're going back to my speech syntax implementation. And now I'm going to copy a whole lot of code. So basically I'm going to need a couple of private fields here. So let's start with that. So this is just everything that is needed for objects to actually view the speech recognition, get the results, do a request, do all kinds of things. And let me just paste in the implementation of the listen method right here. So let me check what that all is there. We have it and we need a couple of helper methods for that. But I'm going to paste this in here and let me walk you through it. Let's see what's going on here. So we have this speech recognizer. We have a iOS speech recognition. We're going to create that and we're going to initialize it with the culture name that we got passed in, right? So in my case, it was hard coded enus for English, but you can pass in a couple of other things in here. You probably might want to check the Apple documentation, what is supported. Now the online, if you're going to send this speech to the online services of Apple is going to be a lot of languages that can be recognized, right? They can do the server side processing. They don't need all kinds of limitations of the device. So there's a bunch of languages that are supported there. If you're going to do the offline processing, I'll get to that in a minute. Then the languages are less I don't know, I think it's ten languages still, right? And English is always going to be supported. So that's what's going on here. Then we're going to check like if it's available, yes or no. If not, we're going to throw an exception because we need this for our functionality. We're going to check the permissions once more just to be safe. We're going to throw an exception if that's not all right. And then we're going to actually initialize the other things, right, the audio engine to actually get to our microphone and do that kind of stuff. The live speech request, because we're going to request the system to analyze our live speech. So here you can set that toggle for online or offline. It's available for iOS 13 and up. And you're going to set that live speech request to require on device recognition. So then you're going to require the offline recognition. If that's something that you want, if you're concerned about privacy or whatnot, then you want to use this. Or maybe you just don't have a connection, right? That's something that could happen as well. Then we're going to do something, set some specifics to how the audio buffer should work, what the bit rate should be, all kinds of technical things. You could mostly just use this code unless you really know what you're doing, basically. And then we're going to set the audio engine to prepare, start and return error. So if some error is there, then we're going to again, how an exception and then it gets in kind of a loop until we stop it. So until we recognize some text and we're going to throw up those results or we're going to stop listening and cancel that task. So again, some error handling and then it's going to stop recording and set an exception. Or if there are results, then we're going to report that through our reporting services, right? We have that recognition result. So that's all the way up here. We pass that in with our I progress interface and we're going to say, hey, we have a piece of result that we want to do here, right? So we're going to pass that along and then we're going to fetch that in our main page. We'll see that in a little bit and we're going to put that in that label. So that's kind of like how that works. Then you can see the stop recording. Also whenever we get that cancellation token and whenever that gets canceled, we do the stop recording as well. So let's see the stop recording, which is not very impressive, but let's paste it in here because we're going to need it. So let's do that, which is a private result. So that's why it's not in our interface. We're going to set that audio engine to remove the tap on the bus so that we don't listen to that live audio anymore. We're going to stop this. We're going to stop the live speech request. We're going to do that recognition pass cancel. So basically everything that we can cancel and clean up and dispose, that's what we're going to do here. With the stop recording, you can see we got a couple of reds quickly still here, which is because we are using a wait. So I should make this method a sync so that that goes away as well. And we got that. And I think we're still missing one other thing. Aren't we not just checking my code here? We have this dispose async. So that's kind of like that. It doesn't have a direct call, but it's still very important, of course. So besides stopping all of this with the stop recording, we also have this Dispose Async. So we need to dispose of all the unmanaged resources that are iOS specific, right? So we want to dispose all of that. So you should call this manually from your code, basically to clean up everything whenever you're done with the speech to text recognition, you want to dispose Async and get all of that out of here. And this is very specific to iOS. So you could bubble this up to your interface and add it there because you're basically only talking to an interface. That's probably what you want whenever you're using dependency injection and such. And then on the other platforms, you could just have it not do anything. Basically just return value, task, complete a task, it doesn't really matter. But then for iOS, it will clean up all this stuff. And then for Android, it will not do anything. But you can still call the exact same code, right? So that choice is basically up to you. Now, we've wired all this up. Our Mario program is still giving me errors. So I hope whenever I'm going to build this or for Mac catalyst, it's not available, right? So let's just get that out of here. This code should also work for Mac Catalyst, but I should maybe double check. And you can see whenever I remove Mac catalyst now, this goes grayed out because I have here this combo box here in the little top left. I've got it set to Macatalyst. And you will see all the code that is relevant to Macatalyst. Let's switch it to iOS. And now it's in iOS land. And now, yay, everything is implemented and works as it should. So actually, let's just run this application here. In my run menu, I went to my iOS remote devices. You can see my Apple Watch is connected, my iPhone is connected. And I can just start running here and it's going to deploy it. And we should see how the speech detects is going to work on my iPhone. The app is deployed to my phone. So let's pick up my physical device right here. It's being deployed. You can see it right here on the right. Let's get myself out of the way so I'm sure to not get in the way of anything here. And we have our minimalistic design. We have the listen and listen cancel. So let's press that listen button and then we should see all of our permissions coming up, right? So you can see you would like to access speech recognition. Speech data from this app will be sent to Apple to process your requests, et cetera, et cetera. And then at the very bottom, it says my line, right? Like, iOS, iOS, speech recognition should be the text of like, hey, I want to use this to do whatever in my app, right? So we've talked about that. Okay. And then it comes with my microphone, right? So it's going to access my microphone, which is a separate permission, but it requests it all the same. And that's because, like, in my main page, we already got that generic code set up, right? So I already had that for my Android implementation. And this now also works for my iPhone implementation. For my iOS implementation, didn't need to change anything. So it's now requesting these permissions. And whenever it did that, it's going to use that recognition text and it's going to get those partial text right here. And as you remember, probably something was going to be different here between platforms, right? So our results might be a little bit different from what we've seen in Android, but let's just push, OK? And let's see what is going to come up here. So okay. Hello. Hello. The text recognition should work now. Oh, see, so now it only does word by word, but it still kind of works. And I could use this to actually add captions to my videos, right? You will have to read here, and this will be captions for my video. Okay, enough with this fun. I will just stop running. Actually. Let me see if I can do this with hard reload. So if I do plus is and then actually a little space at the end to actually see all the speech that we get in here. So let's save that.net hot reload and see if that actually works. No, I need to really stop this. Okay, so stop this. Redeploy it again. And whenever we do it again now, it will be the text plus the partial text, right? So for iOS, it really gets reported as partial text right here. So we get the word by word. You've seen that. So we want to stitch those together. Or you can go about it however you want. But for Android, I would get the full result kind of like each time. So I would get the full result of the kind of listen session. So that's why I had to replace the text all the time. So to things can be going on here. Either my Android implementation isn't great or my iOS implementation is great, depending on how you look at it. So you can tweak those to make this better. Or here, you probably want to do something like hey. Or you can make it a switch statement switch. And we should do something like device runtime platform. And we could implement something like case is device Android. And we can do our other thing right? So we can do this and it was just is and no space. So this is going to mess up my thing here. Break and case device iOS. I might be using obsolete deprecated stuff here. Yes, I am. So you should do this differently. I will make sure that the code that is actually going to go into my GitHub repository, for the example, I didn't prepare this, so I'm going offscript the code that's going up to my GitHub repository that is linked down below that will have the correct not deprecated code. But for now, I'm just going to roll with this one. And we should do this right for iOS. So we should have that one. All right, so now we should do the right thing for Android and the right thing for iOS. So let's try this again. I'm just going to tap listen. And it's now not going to request the permissions again because that's set right? So I'm just going to tap listen. And now we should see all the words coming up here, word by word. And you can see the progress coming up. It's nicely stitched together with all the labels right here. And while I'm here, maybe you also want to subscribe to my channel that you're watching, because that's what you want to do whenever you have a favorite YouTuber, right? So that's what you want to do. All right, so this is pretty cool. Let me just stop listening. I think it will come up with a little error. The test was canceled. But this is really cool, right? We've now implemented the speech to text for iOS as well. So I think these videos about speech to text do not only teach you about speech to text, but it also really teaches you what the power of the single project approach of .NET MAUI is. Because now we've implemented two platforms already. We've implemented first the one for Android, and we've laid all the groundwork there. We added that interface, we added it to our dependency injection container. And now without too much hassle, we can just roll out that iOS implementation and all of our shared code don't need any changes. It just is suddenly implemented for iOS as well. And it just works, right? It needs a lot of code for iOS, but the shared code doesn't need any changes here. So that is really, really cool. Thank you so much for watching. And one of my videos, please click that like button so it will spread to more people on the YouTubes. Of course. Stick around for the Windows video as well. That one you can find right here if you want. Have a full playlist with all the Net Bowie videos you can find. That right here. And I will see you for the next video. Keep watching.
Info
Channel: Gerald Versluis
Views: 1,752
Rating: undefined out of 5
Keywords: .net maui, net maui, dotnet maui tutorial, dotnet maui, .net maui tutorial, speech to text, speech-to-text, maui speech to text, c# maui, c# maui tutorial, speech to text app, how to convert speech to text in android, stt, .net maui speech to text, net maui tutorial, speech to text ios, speech-to-text ios, ios speech to text, ios speech recognition, ios speech recognition example, dotnet maui ios, net maui ios, voice to text iphone
Id: kxUsmctDyko
Channel Id: undefined
Length: 22min 22sec (1342 seconds)
Published: Tue Dec 27 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.