In the third and last video about
implementing speech syntax in your.net Maui application, there is only one
platform left to still implement in this video. We're going to see how to
implement it for Windows in this video. We're going to finalize
our speech to text functionality in our.net Maui applique. We have been
seeing in the past videos where we've
implemented iOS and Android, how we kind of defined our shared API.
So with an interface. And then we've seen in the couple of
videos that I highly recommend that you watch first, at least the very first
one, because that's where we also implement our shared code, how we implement the
platform specific code through the Power of
.NET MAUI single project. And we're going to finalize this
implementation. So implement speech for all the platforms by implementing
it for Windows in this video. So, like I said, I highly
recommend that you check out the other videos. They should pop up on
your screen during my talking here. Or you can find them down below in the
links and the comments. So be sure to check that out. And with that,
let's just head over to your Visual Studio and get to it. So here we are
in Visual Studio 2022. This is the sample code, the sample
app that I have been using. You can find the link down below. Like all my
videos, this also has a GitHub repository. And for all
my videos, they have a GitHub repository attached with all the sample code. So
if you just found out just now, go to my GitHub repository,
follow me and go check out all those repositories. And here,
let me walk you through the code that we've implemented so far. So I
have this very minimalistic design with a label where we're going to show our
recognized text. We have a button to start the listen session and
a button to cancel the listen session, right? So I've
mentioned this in the other videos as well. Like the
behavior on the different platforms is a little bit different. You will
find that a lot if you're going into cross platform development. So for
iOS, it will just go word by word. And whenever a word
is found is recognized, it will put that in the
label. For Android, it will wait for the full text, basically. And or it
will wait for some little time out. And that's also the case for
Windows. It will wait for a timeout until you've stopped speaking and then
it will suddenly pop up with a lot of text. So that's what we're
going to see here. Take that into account for your own application. In
our main page code behind, we have some text already implemented
here. We have this listen command, the listen cancel command and the
recognition text. We're working with data binding here. So we've got all that
set up. And for the listen we're going to see, we're going
to talk Maui speech to text interface
because we have that shared interface, which is our
contract of code. So we can use that in our shared Net
Maui code. But the implementation will each time be an implementation of
the platform specific code. So that's really the power of Net
Maui. Something is not surfaced to the.net Maui abstract layer. You
can just reach into the platform specific APIs and you can write your own code
still in C, Sharpen, Net and Surface. That that's basically
what we're doing here. So we have this is speech stacks interface.
We're going to request the permissions first for Windows. There's not much to
request here, so we're just going to fake that a little bit.
And then we're going to speech to text listen.
And we're going to specify we want to listen to English. You can
do other things here as well. Whenever there is progress, we use
this little action here and we are going to capture that progress.
Actually you can see a problem right here because we're only
doing it for Android and iOS. So for Windows, nothing is going to happen
here. So actually let's just implement that right now. So we're going to do the
same method here for iOS. So let's say or DeviceInfo Windows, sorry, platform is device platform Windows. So now this will also do it
for Windows or for when UI, right? So when UI is the
actual platform that we're using here. But Windows is the system that we're
running on. So now we can't forget. I was bound to forget later in this
video. Now it will do this thing and it will attach it to the
text that we already have in here. So it's doing that unless we have this
token source and we can cancel that token, the
cancellation token. Whenever an exception happens, we're going to display this alert. Or
whenever the permissions are not authorized, then we are going to also
display a little alert, right? So that's what we're going to do. And
the listen cancel, we're just going to say token source cancel here as
well. So that's all set up. That's our shared code.
Basically the implementation of the interface. Well,
the interface definition itself is right here. It's not very doesn't blow
you away. It does the request permissions to
request the permissions because we need permissions to do some speech
recognition. And also for the microphone, right? At least for Android and iOS.
So we need to have those. And then we have this listen method
for actually start the listening session. So we got all that. And then
in our platforms folder, we already have it
for Android, we have the speech to text implementation. And for iOS, the
same thing, speech to text implementation. And now
we're going to do the same thing for Windows. So let's add a new class here
at new class and we're going to name that
speech to text implementation. Implementation. And you'll notice that
I have the same class now three times with
the same name. How does that work? Well, that's all due to the
power of dotnet Maui single project. Because each
folder in here, iOS, Android, Macatalyst,
Windows, Tizen, they're all kind of like
they're little separate worlds. Everything outside of the platforms folder is
shared code. So if I would make the same class there three times, that wouldn't
work. But now, because I'm in each platform folder, they don't know about
each other, right? Albeit that code gets compiled whenever you run Android
or whenever you run Windows. So that's why I can have the same class
basically three times. So what I want to do is change this
namespace because if you look here in my Maui
program, that's where I register my service for dependency
injection. I'm using dependency injection here you'll see
that we have this speech to text implementation. And here in the top
left, if I switch to the Android context, you'll see that this starts to work
and we have this speech to text implementation, right? And by changing
the namespace to nothing platform specific here. So if I just do platforms, I can just leave all this code intact basically.
Because now I have using speech to text sample platforms and not
Windows or not Android. And I have that same class name, right? So I have
the same thing. So now I don't need to change any code
here or do if Windows or if iOS and have a little bit of a
different class name in there no need to do all that. Now I can just
reuse the same names and the same namespace here. So that's really
powerful as well. So I've got that STT up. Let's make
this a public class. And this is going to of course
implement the I speech to text, right? So let's do that. And
whenever we want to do that, let's IntelliSense help us implement
interface. And we have these two methods that we want to implement now. So this
request permissions reminds me I need to set the
permissions actually for Windows. So let's go back to our solution Explorer, go to the
package app manifest and if I double click that,
you will get into a graphical editor. You can also edit
it in the XML editor, but you can go here to the
capabilities. And we want to have the microphone one and we want to have the Internet
client one, I think. So let's click that. So we need those
two. Let me check actually in the XML
variant if that is actually the right one. So
let's save this. And you can also do right click open with and
then you can get the XML text editor in here, scroll down and
you can see these capabilities here. Internet client and microphone. That's
the one that I want. All right, that's all set up then. The request
permissions are already set. That code is not really very
interesting because we don't really have a way to check
the permissions on iOS or on Windows. Sorry. So we don't have a
very lot to do here. We just say return task
from result is true basically and that's it. So
there we have that. Oh, I see, I copied well.
What's going on here? Actually I have this weird
namespace here. It acts the brackets. We don't need that and all
right, here we got that. All right. So this
is better. I don't know how that happened, but we're back to a building
state. So the request permissions is just going to
return true. So you want to make sure that the permissions are there and
otherwise there's not really much to do here. We can't
really run time, check these permissions and then we have to implement that
listening. Now, before we are going to do that, there is a
library that we're going to use here, which is the system speech. It is by
Microsoft and it helps you with these speech recognition kind
of things. So let's go to our Solution Explorer, right click on our project
and do manage nugget packages. And I'm going to search for
system speech. So at the time of recording, version
number is 700 and it's probably linked to net seven, net
eight. So by the time that you're watching this and net eight has been
released, it probably has a 80 version. We just want to install that
and do okay, so when that's installed, we
have all the capabilities, all the APIs to do
something with this speech recognition it's installed. The thing that's a bit
weird here is that we also have Windows Media
speech, I think the namespace. So there is a couple of
duplicate types here which will mess up our code a little bit. But I'm
going to show you, don't worry, don't worry. So here we
have this listen with the culture that's going to be passed in the
IPROGRESS with which we actually report the progress and the cancellation token to
actually cancel our listen session, right? So let's start implementing our
listening. I'm going to add a couple of fields
here first, a couple of private fields so that we have all this and you can already see that it adds the right
namespaces so that it recognizes these types. So we have the
speech recognition engine and the speech recognizer here. I'm not sure if this
is going to be the right one. We'll find out in a little bit if it
picked up like from the system Dutch speech
recognition or we need the Windows Media speech. I don't know, we'll see
because it will have different constructors and it will start
complaining. So we'll see that in a little bit. Now for our listen, we're going to do
something interesting. We're going to see if we actually have Internet and if we
do, we're going to do listen online and else we have listen offline so if
you watch the other videos, then you know that we have or not have
kind of limited offline access, yes or no. But here it seems
we have full offline access that you can implement here. So
that's pretty cool. So let's go with the listen online one first.
I'm going to paste in a bit of code here so don't be
alarmed. Listen offline, listen online, sorry. So we want to have that in here. Boom. And have the code in here. So this is probably, like I said, here
you can see the Speech recognizer. It doesn't
contain a constructor that takes one argument. So there is something
wrong here. Let's see if IntelliSense can actually fix this for us. It
probably cannot. So we want to figure out the speech
recognizer here and I think we can do that by setting the
Speech recognizer is using Speech Recognizer is and then we can say something like
system Speech Recognition speech Recognizer, right? So we can really
point it to the type like, hey, this is the one that you need to
use. And now it still doesn't recognize it. So we probably need the
other one. We're going to go to Windows Speechtotext Media,
which is I call it media speech recognition. Speech recognizer. So probably we want
to have that one then. See? Boom. Now this is the right one.
So we're just going to say, hey, whenever you have this type,
you're going to have to use this one. That's a little trick that you can do
in C sharpen Visual Studio. So now we've got this one. There's
another type here that it doesn't recognize. So let's use IntelliSense. And again
we have, using Windows Media Speech recognition. So we're
going to import that. It will add it here. And now we got all the types here set
up as well. We are missing a couple of things. We still miss the
listen offline right here and we also have the stop recording right
here. So that's the things that we still want to
implement. Let's see if we can get that in here as well. So let's start with
the listen offline. Now my formatting is a little bit
messed up here, so let me fix that. There we go. So
the listen offline is kind of like the same as the listen
online. I didn't really take you through this. I just realized just
now. So listen online. We pass in the culture, the progress again and
the cancellation token. So it's kind of like the same method that we have
here. What we're then going to do is use that Speech recognizer. We're
going to create a new one with the language that we want to listen for and then
we're going to create that session, right? And whenever there is a result,
we're going to add that and we're going to report that through our
I progress thing right here or whenever we have this continuous recognition
session again whenever it's completed, right? So whenever something has been
completed. So this is whenever there's an intermediate result or this is
whenever the session is completed. We have this event here as well that
we're going to do and we're going to say, hey, is it a success? Okay,
set the result or that the user canceled and we are going to set it
canceled or we're going to throw an exception, right? So that's what's
going to happen. Then we're going to STT that recognition session. We're going to
start async so we're going to do that session to actually listen to the
speech and then we're going to, whenever something gets canceled, that
cancellation token, we're going to do stop recording, which we still need to
implement. And then we are going to set the try canceled on our task result
that we are returning from here, right? So that's what's
going on here. Then the listen offline is kind of like the same thing right
here. We have the same culture, the same eye progress for the reporting and the
cancellation token. But now we're going to use the speech
recognition engine with the culture that we have here. We're going to load
some grammar, all right. Don't know really what that
does, but it's necessary speech recognized, so we're going to use that
as well and we're going to report that through our recognition result. So
we're doing kind of like the same thing but a little bit different depending on
where online and offline. Of course, if you just want to do
offline, just only implement this one. If you want to do only online, do the
other one. But this just shows you both of these. So now we need to
do stop offline and stop recording still. So let's get
both of these in here as well. I'm just going to copy some code to do
that again. So let's paste that in here. We have to stop
recording, which basically just said speech, recognizer the object that
we've created for the online one that we've created right here. And
we're going to say, hey, you want to do that continuous recognition
session? We want to stop that please. We wrap that in a try catch. So
whenever something happens, it might already been stopped or
whatever, we don't really care. We're going to assume that it stops for the stop
offline recording. We are going to have that speech recognition engine and we are
going to also cancel that one. Now there's one thing
that we still want to do, which is not
really linked here, but we want to dispose async so we want to really
clean that up, right? The unmanaged resources, the managed
resources, we want to dispose that, which is something that you want to do
from whenever you are really done listening to the speech text and
whatnot and clean up all the resources so you don't have a
memory leak. Check out the GitHub repository where all this code is set up and you
can see how to use this. Exactly. So basically we've
implemented all our code here. Well, almost, because if we
go back to our Maui program, you would have seen that we have these
compiler directives here for if Android and if iOS. So if I speech to text
here to Windows, you see that this is grayed out. Who's going to
say, hey, I don't have any implementation for the speech to text, right? So let's
add Windows to this as well. And you'll notice that Mac Catalyst is
missing. I'll get to that in the end of the video. So stick around to
see what's up with that. And in my main page, I already set up
the code for adding the actual result to our text. So with this, I think we
can start running our Windows application, which is what
I'm going to do. Then deploy that. I have a little
microphone hooked up to this. So whenever we start the
listening session, then it should start listening to the microphone right
here. And everything that I will say should get transcribed on the
screen. But like I said, with Windows, it's kind of
interesting. It waits a little bit until I stop talking for it to actually show up. So
bear with me. Hello. This is the speech to text on
Windows. See, it actually works. So as long as
I keep talking here, it doesn't really add
the result here. I don't know if that's an
implementation detail or that's just how Windows works, but you can see if I stop talking, then suddenly, boom, the
result gets added here, right? So that's why I kind of like pause in the middle of
my sentences here. But this also implements it on Windows, which is
pretty amazing. So now you know how to implement speech to
text on iOS, Android and Windows. But what's up with
Macatalyst? So that wraps up our little series. You've now learned how
to implement speech to text on Android iOS, Windows. There's one left. Mac
Catalyst. Mac Catalyst is basically the same code as iOS, but it
adds a little bit more code. And there is a
funny thing with the compilation right there. So again, I'm not going to
create a full video on that. Just check out the GitHub repository for all the
code there and you will figure out all the bits that belong to Metcatalyst.
If you don't know how to do it, let me know in the comments and I'll
be sure to answer your questions there or still make a follow up video if you
really can't figure it out. If you want to learn more about any
topic, this was also something that was requested under my
videos as well. Windows speech to text. You have another topic that you'd like
to see, let me know down in the comments below and I'll be sure to get
back to that, hopefully. Thank you again for watching one of my
videos. Please click the like button so that it will spread to
other people and we can grow this channel and be an even bigger family.
Please subscribe to this channel if you haven't done so
already, I would really appreciate that as well. Thank you so much. And in the
meanwhile, you'll have this full playlist of Ed Maui videos already on my
channel. Go check that out here. The other videos for the speech to text are
right here. And I'll be seeing you for my next video.