In a previous video, we've seen how to
enable speechtotext in an Android Net Maui application. In
this video, we're going to learn how to do the same. But now for iOS, if you've been following this channel,
then you might have seen the previous video where I enabled an Android
device for speechtotext by writing some platform specific
code. This is not something that is serviced to the Net Maui layer. So we have to
write some platform specific code. But you can totally do that with .NET
MAUI. And my good friend Vladislav, he created a beautiful blog post where
he implements this for Android, iOS and Windows. So all the platforms
that are supported for down at Maui. And I highly recommend
that you watch the Android one first because there we laid the
foundation for everything that we're going to see here as well. So if you are
just on the iOS part, then you have to watch the Android one
first where we do some specifics for all the platforms that we're going to
use. But now we're going to focus on the iOS implementation. And if you
want to read all about it, then I would suggest you go down there in the
comments and get Floody Slav's blog post there as well. Without any further ado,
let's just hop into Visual Studio and see how to do the same for iOS here.
On the left you can see Visual Studio 2022. On the right you
can see my physical iOS device, which you can see right
here is mirrored onto my Windows machine right here. And it's
connected through a Mac. So there's a Mac machine on my network. Visual Studio
connects to that. It builds in conjunction with the Mac build host and it will
end up on my iOS device with a lot of magic going on there. But it
all works. It's a really amazing technology right here. So enter
looking at the code that I've created in the previous video. Again, I highly
recommend that you watch the Android video first because there are some
generic parts that we are going to reuse here as well. So here we actually have
our main page. So let's start with that. Actually, let's go to
the main page, XAML. You can see our very minimalistic layout. So I have
this label which is going to have our recognition text. We have a button to
start listening and we have a button to cancel the listening, right? So you
can start the listening. I already explained in the last video as well. For iOS,
the behavior is a little bit different than on
Android. On Android there's a little time out where it stops listening after some
time or whenever it detects a silence. You can totally do that on
iOS. But I think by default, it will start listening until you
cancel, basically. So it's these minor things that are a
little bit different between platforms and up to you to tweak it on how you
need it inside of your own application, basically. So we've
got that in place. Then in the code behind the main page, XAML CS, we have
through data binding the listen command, right? So
we have this listen method where we're going to check like, hey, do we
have the right permissions? Because you need permissions to at least access
the microphone, right? And if we have that, then we can do
the speech to text listen with a certain culture. And we're just
setting that to English. You can do a couple of other ones as well, if
that's what you want, but we're going to do it to English. Whenever we catch
the progress, we're going to get a new partial text.
And again, here we're going to find a little bit of difference between the
different platforms, the different implementations. We're going to add
that to our recognition text, which is going to be shown in the label on the
screen. Or whenever something goes wrong, we're going to show a little display
alert. And whenever we're going to cancel it, we have this
cancellation token and we're going to cancel on that. So that's this implementation
here in the solution Explorer, you will find under Android, you will
find the speech text implementation. That's the one that I handled in the
previous video. But for like the generic, for the shared part, we have
the Ispee text interface, right? So we just have
this request permissions. It's going to check if we have the
permissions, and if not, it's going to request them. And we have this listen
method in which we're going to write the platform specific
code to actually listen for the speech and translate that into
text. Now for Android and for iOS, it both goes to
Remote services as well. So this is going to be uploaded
to the Apple services, but it is the actual
platform APIs that are available for iOS and also
the one for Android. And we're going to see also the one for Windows that are
specific to this. So no Azure services or anything like
that. But still your speech is being sent to external services
where it's going to be processed for iOS. We have an
option to do it on the device since iOS 13. I'm going to show you
that in a little bit. But all stuff to take into account if you're
going to work with this. Okay, so we need to now implement this
interface for iOS, right? Because we already did it
for Android. If we go to our Maui program CS, where we bootstrap all of
our application, you can see here that I added this
compiler directive. So now it only does it for Android and it's
going to do Builder services ampsigleton. So whenever an iOS speech to text
implementation is being requested, we're doing this speech to
text implementation. Now, because of the single project approach of in
Maui, this works perfectly, but you have to implement
all the platforms first for this to kind of like, work nicely. So
the first thing I want to do here is say if Android or iOS, because
we're going to enable iOS here right now as well. Or Mac
catalyst. This also works for Macatalyst. So we have one for Mac Catalyst as
well. And let's save that. And it's going to immediately say,
like, hey, I don't have this stuff for iOS or Mac Catalyst, right? And you
can see here in the tool tip, like, hey, we have it for Android. So
it's going to help you and show you like, hey, we have it for Android
but not for iOS and Mac Catalyst. And in the video after this, we're
also going to add Windows, right? But we'll see that in the next video. So right
now we have this one. And if we just name this thing
Speechtotext Implementation again and make it live in the Maui
Speechtext Sample platform namespace, it's automatically
going to pick it up because it's all in a single project. So that is really one
of the powers of Done in Maui. So let's go ahead and
do that. I'm going to save this one. Then in my Solution
Explorer, I'm going to go to my Platforms folder. So here's the one
for Android, right? And it's the Speech Suiteximplementation CS,
which is fine because everything in the Platforms folder, we can write
platform specific code, right? So in the Android folder, I can write
Android namespaces, android types, android objects. For iOS, I can do the
exact same. But now for iOS. And it won't hurt each other,
right? Because it can be in the same namespace. It can
be the exact same names, but it's only going to be compiled whenever you
target that specific platform. So that's how that works. And that makes it very
easy to not have all kinds of if statements and things here or name
it differently. You can just do this for the different
platforms, and it will only be compiled whenever you go to that platform. So I
can go here to iOS and right click and say, add new item. And
what did I actually call it now? Oh my
gosh, I should go back. I just copy it to not make any mistakes here.
Just copy that and do again, iOS, add new item. And I'm going to just copy that in
here. It's going to be a class, which will be the speech to text
implementation, right? Okay, there we have it at and boom, we have it in our
iOS folder. I'm going to make this public just to
make our lives a little bit easier. And this should implement the I text
to no, I always mix them up speech to text,
not text to speech. I speech to text. And I was going to
say, hey, you need to implement all this stuff, right? We have the two
methods for our permissions and the actual listen thing. So let's use
IntelliSense here and implement the interface. And this is
basically all that we need to implement. Well, we have to write a bunch of code
to actually make it work. But this now is our iOS
implementation. So if I go back to my Maui program, you will see that
it should pick this up. Well, it doesn't yet because I didn't
change the namespace because it automatically
puts it in the namespace. That's just how Visual
Studio works. It will add it into the folder where
you create the class, right? And I want it to be in Platform. So let's
just do this. I'm also going to add a semicolon so it will go to a
file scoped namespace. So it will do that. And now it's in
the same namespace as Android. So here let's just pull up
the Android one as well. You can see namespace. Oh, I didn't do
this in a file scope one, but you can see the same namespace.
Platforms here we have Platforms as well. And if I now go to my Maui
program, you will see hopefully that it picks up on that. Why doesn't
it do that using Platform? Should I save the file right here?
Does that work? Maybe it needs a rebuild. It probably needs a rebuild
for us to pick this up. But it should start working here. The
Ritzwig lease should go away and it should
start working here. We'll see that in a little bit. Or then I will figure
out why it doesn't work. But that is the way to ensure that you don't
have to rename all kinds of things and do all kinds of magic to make this
work. So here we have that and how we need
to implement our code. So let's start with the easy
one. Let's do the request permissions. So I'm going to copy some code from
off screen here. So let's get the request permissions
here and put that instead of this not
implemented exception because that's going to break our application. What we're going to
do here is do the iOS speech Recognition Request
Authorization. So we're going to request the actual things here for
the iOS, iOS, iOS, iOS, iOS, iOS, iOS speech
recognition example iOS to do this speech recognition. And then
we're going to set this result. So we're going to return this result basically
like, hey, did the user actually allow this yes or no? And it needs to
be authorized, right? So that's the thing that we need actually, while
we're talking about permissions anyway, let's go to our Solution
Explorer again. And also in our iOS application, we have this
incope list where we need to add the permissions to actually make this
work. So let's right click here and do Open With. We have a graphical editor but
that doesn't have the editor for permissions yet. So let's do Open
with and then with the XML text editor. We can just edit this as an
XML file. And again, I'll just copy this in here
because the key names are just too complicated to actually remember.
So I'm just going to scroll down and here inside of that DICT
node, the dictionary. Note I'm going to paste a couple of things
here. We need this. NS speech Recognition usage description. So this is the
permission to actually request the iOS, iOS speech
recognition example. We're letting the user know like, hey, we're going to do speech
recognition on whatever you're going to say to our application. And we're going to
do the NS microphone usage description. So we're going to
request the permission to actually use the microphone. And for both you can
do kind of like a string here where you can say, hey,
what is the reason that we're actually requesting
this, right? What are we going to use this for? So make sure that you put
something useful in here because the user is going to read this and is going to ask
himself like, hey, why should I do this? Why should I
allow this? And if you say like something in the lines of I'm going to
use this iOS speech recognition, actually make my app more accessible
and understand what you're saying because I'm using a personal assistant, then
it makes sense if that's what your app about. And then they're more
likely to allow these things, right? So that is what's going on here with
these permissions. So our permissions are set. We have the
actual implementation to check for our permissions in code. So that's set
up as well. Then it's time for our listen method, right? So we're going
back to my speech syntax implementation. And now I'm going to copy a whole lot
of code. So basically I'm going to need a
couple of private fields here. So let's start with that. So
this is just everything that is needed for objects to actually view
the speech recognition, get the results, do a request, do all
kinds of things. And let me just paste in the
implementation of the listen method right here. So
let me check what that all is there. We have it and we need a couple
of helper methods for that. But I'm going to paste this in here
and let me walk you through it. Let's see what's going on here. So we
have this speech recognizer. We have a iOS speech recognition. We're going to
create that and we're going to initialize it with the culture name that we got
passed in, right? So in my case, it was hard coded enus
for English, but you can pass in a couple of other
things in here. You probably might want to check the Apple documentation,
what is supported. Now the online, if you're going to send this
speech to the online services of Apple is going to be a lot of languages that can be recognized, right? They can do
the server side processing. They don't need all kinds of limitations of the
device. So there's a bunch of languages that are supported there. If you're
going to do the offline processing, I'll get to that in a minute. Then the
languages are less I don't know, I think it's ten
languages still, right? And English is always going to be supported. So that's
what's going on here. Then we're going to check like if it's available,
yes or no. If not, we're going to throw an exception because we need
this for our functionality. We're going to check the permissions
once more just to be safe. We're going to throw an exception if
that's not all right. And then we're going to actually initialize the other
things, right, the audio engine to actually get to our microphone and do that kind
of stuff. The live speech request, because we're going to request the
system to analyze our live speech. So here you can set
that toggle for online or offline. It's available for iOS 13 and up. And
you're going to set that live speech request
to require on device recognition. So then you're going to
require the offline recognition. If that's something that you want, if
you're concerned about privacy or whatnot, then you want to use this. Or maybe
you just don't have a connection, right? That's something that could
happen as well. Then we're going to do something, set some specifics to how the audio
buffer should work, what the bit rate should be, all kinds of
technical things. You could mostly just use this code unless you really
know what you're doing, basically. And then we're going to set the audio engine to
prepare, start and return error. So if some
error is there, then we're going to again, how an exception and then it
gets in kind of a loop until we stop it. So until we
recognize some text and we're going to throw up
those results or we're going to stop listening and cancel that task. So
again, some error handling and then it's
going to stop recording and set an exception. Or if there are results, then we're
going to report that through our reporting services,
right? We have that recognition result. So that's all the way up here. We pass
that in with our I progress interface and we're going to
say, hey, we have a piece of result that we want to do
here, right? So we're going to pass that along and then we're going to
fetch that in our main page. We'll see that in a little bit and
we're going to put that in that label. So that's kind of like how that
works. Then you can see the stop recording.
Also whenever we get that cancellation token and whenever that gets canceled,
we do the stop recording as well. So let's see the
stop recording, which is not very impressive, but let's paste
it in here because we're going to need it. So let's do that, which is a private
result. So that's why it's not in our interface. We're going to set that
audio engine to remove the tap on the bus so that we don't listen
to that live audio anymore. We're going to stop
this. We're going to stop the live speech request. We're going to do that
recognition pass cancel. So basically everything that we can
cancel and clean up and dispose, that's what we're going to do here. With the stop
recording, you can see we got a couple of reds quickly still
here, which is because we are using a wait. So I
should make this method a sync so that that goes away as well.
And we got that. And I think we're still missing one
other thing. Aren't we not just checking my code here? We have this
dispose async. So that's kind of like that. It doesn't have a
direct call, but it's still very important, of course. So besides stopping all of
this with the stop recording, we also have this Dispose Async. So we
need to dispose of all the unmanaged resources that
are iOS specific, right? So we want to dispose all of that. So you
should call this manually from your code, basically to clean up everything
whenever you're done with the speech to text recognition, you want to dispose
Async and get all of that out of here. And this is
very specific to iOS. So you could bubble this up to your interface
and add it there because you're basically only talking to an interface. That's
probably what you want whenever you're using dependency injection and such.
And then on the other platforms, you could just have
it not do anything. Basically just return value, task, complete a task, it
doesn't really matter. But then for iOS, it will clean up all this stuff. And then
for Android, it will not do anything. But you can still call the exact same
code, right? So that choice is basically up to you. Now, we've wired all this
up. Our Mario program is still giving me errors. So
I hope whenever I'm going to build this or for Mac catalyst, it's not
available, right? So let's just get that out of here. This code should also work
for Mac Catalyst, but I should maybe double check. And
you can see whenever I remove Mac catalyst now, this goes
grayed out because I have here this combo box here in the little top
left. I've got it set to Macatalyst. And you will see all the code that is
relevant to Macatalyst. Let's switch it to iOS. And now it's in iOS land. And
now, yay, everything is implemented and works as
it should. So actually, let's just run this application here. In my run menu,
I went to my iOS remote devices. You can see my Apple Watch is
connected, my iPhone is connected. And I can just start running here and
it's going to deploy it. And we should see how the speech detects is
going to work on my iPhone. The app is deployed to my phone. So let's pick
up my physical device right here. It's being
deployed. You can see it right here on the right. Let's get myself out of the way so I'm
sure to not get in the way of anything here. And we have our
minimalistic design. We have the listen and listen cancel. So let's press that
listen button and then we should see all of our permissions
coming up, right? So you can see you would like to access speech
recognition. Speech data from this app will be sent to Apple to process your requests, et
cetera, et cetera. And then at the very bottom, it says my line,
right? Like, iOS, iOS, speech recognition should be the text of like, hey, I
want to use this to do whatever in my app, right? So we've talked
about that. Okay. And then it comes with my microphone, right? So
it's going to access my microphone, which is a separate permission, but it
requests it all the same. And that's because, like, in my main page, we
already got that generic code set up, right? So I already had that for my
Android implementation. And this now also works for my iPhone
implementation. For my iOS implementation, didn't need to
change anything. So it's now requesting these permissions. And whenever it did that,
it's going to use that recognition text and it's going to get those partial
text right here. And as you remember, probably something was going to be
different here between platforms, right? So our results might
be a little bit different from what we've seen in Android, but let's just
push, OK? And let's see what is going to come up here. So
okay. Hello. Hello. The text recognition
should work now. Oh, see, so now it only does word by word, but
it still kind of works. And I could use this to actually add
captions to my videos, right? You will have to
read here, and this will be captions for my video. Okay, enough with this
fun. I will just stop running. Actually.
Let me see if I can do this with hard reload. So if I do plus is and
then actually a little space at the end to actually
see all the speech that we get in here. So let's
save that.net hot reload and see if that actually works. No, I
need to really stop this. Okay, so stop this. Redeploy it again.
And whenever we do it again now, it will be the text plus
the partial text, right? So for iOS, it really gets
reported as partial text right here. So we get the word by
word. You've seen that. So we want to stitch those together. Or you
can go about it however you want. But for Android, I would get the full
result kind of like each time. So I would get the full result of the
kind of listen session. So that's why I had to replace the
text all the time. So to things can be going on here. Either my
Android implementation isn't great or my iOS implementation is
great, depending on how you look at it. So you can tweak those to make this
better. Or here, you probably want to do something like hey. Or you can make it a switch statement
switch. And we should do something like device runtime
platform. And we could implement something like
case is device Android. And we can do our other thing right? So we can do
this and it was just is and no space. So this
is going to mess up my thing here. Break and case device iOS. I might be using obsolete deprecated stuff here. Yes, I am. So
you should do this differently. I will
make sure that the code that is actually going to go into my GitHub
repository, for the example, I didn't prepare this, so I'm going offscript
the code that's going up to my GitHub repository that is linked down
below that will have the correct not deprecated code. But for now, I'm
just going to roll with this one. And we should do this right for iOS. So we should have that one. All right,
so now we should do the right thing for
Android and the right thing for iOS. So let's try this again. I'm just
going to tap listen. And it's now not going to request the permissions
again because that's set right? So I'm just going to tap listen. And now we
should see all the words coming up here, word by word. And you
can see the progress coming up. It's nicely stitched together with all
the labels right here. And while I'm here, maybe you also want to
subscribe to my channel that you're watching, because that's what
you want to do whenever you have a favorite YouTuber, right? So that's what you
want to do. All right, so this is pretty cool. Let me just stop listening. I
think it will come up with a little error. The test was canceled. But this
is really cool, right? We've now implemented the speech to text for iOS
as well. So I think these videos about speech to text do not
only teach you about speech to text, but it also really teaches you what
the power of the single project approach of .NET MAUI is. Because now
we've implemented two platforms already. We've implemented first the
one for Android, and we've laid all the groundwork there. We added that
interface, we added it to our dependency injection container. And
now without too much hassle, we can just roll out that iOS implementation and
all of our shared code don't need any changes. It just is
suddenly implemented for iOS as well. And it just works,
right? It needs a lot of code for iOS, but
the shared code doesn't need any changes here. So that is really,
really cool. Thank you so much for watching. And one of my videos, please click
that like button so it will spread to more people on the YouTubes. Of
course. Stick around for the Windows video as well. That one you can find right here
if you want. Have a full playlist with all the Net Bowie videos
you can find. That right here. And I will see you for the next video.
Keep watching.