In this video, we're going to learn
how to implement Speechtotext Android, your Net Maui Android application. Much, much earlier on this channel,
I've recorded a video on textospeech. So that is inputting text and then
having that spoken out by your application. That is, well,
very easy to implement back then with Xamarin Essentials.
Xamarin Essentials is now just part of .NET MAUI. It's just another API that
you can use in your Maui application right now. If you want to know more
about that, I'd highly recommend that you check out the video that's popping up
on your screen right now. All about implementing that, but a lot of people
under that video asked, hey, I want to know about speech to
text. So kind of like the other way around. Now some people also had
the requirement to not go to services like Azure
Cognitive Services or other AI services. So while that is
a perfectly good option, this video, it's going to be multiple.
Starting with android. These videos are going to focus on the
APIs that are available on the device. Now, it's not guaranteed that
this will not send the speech to back end services, especially for
Android. The documentation is kind of vague. It might be sent to external services.
It probably will actually. So that's going to cost bandwidth,
that's going to cost battery. So don't just go listening all day and send
everything over. That's definitely not something that you want to do. But
with this, you can implement the APIs to actually implement speech to text on
your device. Let's go dive it. So I just created a file. New Net Maui
Application. I did make some tweaks already so that I'm not boring
you with all the plumbing here because we have lots to COVID here with
implementing the speech to text already. I have it running here on my physical
device so that I can use the microphone right. And you can see that
screen mirrored here on the right so that you can see everything that's going on
here. Now if you've been following this channel for a longer period of time,
then you might know that I like my Simplistic designs, or rather I'm just
bad at designing. And I just put in a couple
of labels and buttons and that's usually it. So this one is no different. I
have this vertical stack layout right here with a label
and two buttons inside of it. And that label is going to capture the
recognition results, right? So that's going to get the actual results of
what I'm going to say and put that into text and put that here on
the screen and the buttons is to start the listening and cancel the
listening session basically. Now this is already something that's
good to be aware of. We're going to implement in this video. It for
android. There's other videos for iOS and Windows as well.
Stick around to the end of the video to figure out where those are. But I'm
going to implement it for Android for this video. And the
behavior is a little bit different between the
different platforms. So here the listening kind of stops after a short time out. By
default, if you stop talking, if it doesn't detect any
sound anymore, just going to stop. For iOS, I think it just keeps listening until
you kind of cancel it. There's probably options to also
cancel it whenever it detects silence. I'm pretty sure there is options for
that as well. So be aware of these little difference
right here that are on the platforms and that you might want
to tweak for your own application. Now having that said, we're going to
write some platform specific code and this is based on a blog post
by my good friend Vladislav Antonyuk. I'm hoping
I'm pronouncing that correctly. I like to use his full name, but I
also can call him Vlad. He is with me on the Net
Maui Community Toolkit team and he's done an amazing job at
writing this blog post. So go check it out in the links below. And
he also has plans to wrap this into a plugin or maybe even the
Net Maui Community Toolkit. So maybe by the time you're
watching this, you can just put in that plugin and use it inside of
your own application. But it's always good to know what is
going on under the hood, right? So that's what we're going to learn
today. Now first, before we go to write any code, let's go into our
Solution Explorer and we want to go into our Android manifest because we
need permissions for this to actually work. So like I said,
this might or might not send something to a back
end service. So you might want to have the Internet permission in here
that's kind of like by default in the templates. It's very common for
that to have and let's just copy and paste that one. And we
want to also add the record underscore Audio in here. That's the
permission for the microphone so that we can use that
one as well. Now with that in place, we're going to go to our
Solution Explorer once more and right click on your project and add a new item. And I'm going to add an interface
first. So I'm going to find the interface here under
code interface. And I'm going to call that I Speech to Text because
we're going to implement platform specific code,
right? But we need on our abstract layer, our shared code layer, we need
something, a contract that we can use to share our code and
see what's going on there. So we have this interface.
Let's make this a public interface. Right here is Speechtotext
and I'm going to copy a little bit of code
from off screen here, which is going to be the two methods
that we are going to put into this interface. So let's put them
here. And this is going to be the request
permissions. It's not technically necessary to have that, but it's nice to have so
that it will request the permissions automatically. Or you can do that
wherever you want in your code. And the other one is the important
one, which is going to be our listen method, right? So we have a couple of
parameters going in here. We have the culture info. So with that
you specify what the culture is that's going in. So in most cases,
at least for this demo, it's going to be enus, right? It's going to be
English so that it knows that this is the language that it's going to use
to translate the speech to text. And if I'm going to speak Dutch
and this is set to English, it's going to try and convert
those words to English and only garbage comes out, right? So
that's not great. We have this IPROGRESS thing. I progress is a nice kind of
thing for providing progress updates. Don't need
to really worry about that. You'll see in a little bit how it
works. But basically how that works is whenever there is a bit of progress.
So whenever the speech has been put into text, it's going to feed
that back through this result and we can do something. Probably put that in
our main page, right? We make it show up there. And the cancellation token,
if we want to cancel our speech to text session, we set this cancellation
token to cancel. And then our speech to text session is
cut short and it's going to stop the microphone and it's not going to
do things anymore. Of course, this returns a string. This returns a
boolean if the permissions are requested and in
order yes or no. So this is the interface that we want to work
with. Now the next thing we want to do is actually give this
somebody actually give this an implementation. So let's go over to our solution
explorer. And because the way .NET MAUI and the single projects are
set up, this is very easy to do. You can argue if it's like the
nicest cleanest thing to do code wise, but you can
definitely do this in multiple ways, right? But I'm going to kind of take
the simple route here and the way we're going to do that is go
into this platforms folder. And whenever you go into the platforms Android or
iOS or Mac Catalyst, you can suddenly write platform specific code.
And it's all in one project, at least the way it's set up
right now. So I can just go into one of these platforms folders
and use that in the rest of my shared code as
well. It sounds kind of cryptic, but you'll see it in a little bit. So
in my Android folder, I'm now going to right click and do add and do
new item. I'm going to make it a class this time
and I'm going to name this speech to text implementation. So this is going to be the
implementation. Let's make this a public class as well. And I'm going to make this
implement I Oops speech to text, not text to speech. I
sometimes mix those up. So I need to of course now implement
those two methods, right? So let IntelliSense help me here a
little bit and implement interface. So we got all that and it's
going to throw the not implemented exception, right? So let me put this
here on screen for you. Let's start with the easy one. Let's
start with the request permissions. So again I'm going to copy a little bit
something here off screen. So let's just copy the couple of lines
here and boom, we've got our request
permissions kind of done, right? So we are using here the permissions APIs
from Xamarin Essentials which is now .NET MAUI Essentials
which is now just an API in Donald Maui, right? The Essentials name
doesn't exist anymore. So in .NET MAUI we just have permissions and request
Async and we're going to request permissions microphone. So that's just
an API that you can use. I need to add the Async here to my
method so that all the squigglys go away. We're going to check the
status and if we have the status is available for the speech recognizer, we are good to go basically, right? So
we're going to check the status is available for the speech
recognizer, if that's a service that's on this device and if both of these are
true. So if the status is granted and is available we're
going to return that and the request permissions is then good to go. Now
you see that there is one weird thing here. The Android
app application context. It doesn't recognize this because there is a
naming clash here which is not great. So we have this namespace maui speaks
to text sample. That's my project name platforms Android which
is the default way how Visual Studio handles these
namespaces. But now this clashes with Android. So it's now trying to find
this app application context in this Android namespace. So the
easiest thing to do here is just remove this Android
from here which is also benefits are kind of like platform specific
versus shared code implementation as we'll see in a
little bit. So just remove that platform specific bit from here and we're good
to go. Now it starts recognizing this and we've implemented our request
permissions. So that's great. Now the listen
function, the listen method, let's implement that and we need a lot
for that. So I'm going to copy a lot of code for this first.
Well piece by piece so don't be afraid. I'm going
to walk you through all of this and let me put in the first
bit here. So instead of this throw new not
implemented exception I'm going to paste this in here. Now
this is going to give a couple of errors because I don't have
everything in here, but what this does is instantiate this
listener, which is apparently some kind of private field.
So let's actually get that as well. So those won't give us any
errors anymore. So I'm in this class, I'm
going to add these two private fields, which is a speech recognition
listener, which is something that we're going to define ourselves in a
minute. And we have this Speech Recognizer, which is the API from Android, to
actually make all of this possible. So we have these two fields. Now for this
listener, we're going to create our new speech recognition
listener. So we'll see that in a minute we have an error. Well, actually, I'm
going to handle that in a little bit. So let's just skip over this for now.
Then we're going to have this speed recognizer, which is going to create a
new speed Recognizer, the APIs that Android had to recognize speech. Basically, we have to provide that
with our application context, which is an Android specific thing,
the context of our application whenever that failed for some reason.
So whenever this is null, we're going to throw this exception like, hey,
apparently we got here, but this service is not available, which should never happen,
right, because we checked it here as well. But you never know. Better safe
than sorry, right? We're going to set this recognition
listener, which is that thing that we are actually going to still create.
And then we're going to start listening, right, with the culture
that we specified. So that culture info is going in there so that it knows what
language to listen for. And then it's going to await using cancellation
token. So it's going to get that cancellation
token, that's a concept in C#. If you want to know
more about anything that you want seeing here, and I'm not explaining
thoroughly enough, let me know down in the comments what you want to see videos
about. But the cancellation token is a concept in C# that you can use to
cancel operations. So whenever a cancel is requested on
the cancellation token that we get provided here in
this method, then we're going to say stop recording, which is also
something we still need to implement. And then it stops the recording, right? So the
first thing I see a couple of awaits here. So the first thing that we want
to do is make this a sync again so that those red squigglys will
go away and then we can start implementing all the other things.
Well, again, let's start easy with this stop recording thing right here. So
what we want to do with that is just a private method right
here, stop recording. And we're going to say
speech recognizer, stop listening. And we're going to destroy it as well
so that it release any resources, not have any memory leads, right? So
stop recording. Boom. Implement it. Now this one they start
listening with create speech intent. So intense is another Android
concept that provides extra data for the operation
that you're about to do. So let's create a little method for
that as well because there can be a lot of
intents for this speech recognition stuff. So we have
this intent. We should be able to implement this.
Visual Studio should do that automatically. But in
this case you want to add the using Android content. It's going to add that at the
top. And we have this intent now and we're going to use this action
recognize speech, right? So we have this new intent which is about
recognizing speech. And now you can put all these flags in here.
So check out the documentation for Android. There is a lot to COVID here, but you
have the extra language preference. You have the language tag. So we're
going to say that culture name again. Some extra language things
here. I got a couple of things in commons that you want to check out. So the extra
speech input complete silence, right? So possibly complete
silence, that's kind of like that time out whenever you stop
talking. And it kind of like tries to detect whenever you're done yes or
no. And this one is kind of important because the extra partial results, by
default this value is false. But you want to set it to true if you
want to kind of like have that scrolling, popping up of
text. We'll see that in a little bit. Whenever you're
speaking, it's going to report back in progress in little chunks and it's
going to make that show up on our screen. So we've got all that. We've set all
of our flags for our intent. And the only thing that we
still have to do now is add this speech recognition listener,
which is also an implementation of something within Android. So I'm
going to copy this full class and actually let's just
paste this here on top in the same namespace. So I'm just
going to paste this here again. It's a whole lot, but a lot is also
not implemented. That's the way it works for Android because we are
implementing this interface and Android at runtime is going to
kind of detect to see which of these are implemented. And
it's going to try to call them probably through reflection or
whatever, how that works. So you need to have all of these
methods in here, unfortunately because it tries to call them regardless if
there's something in there, yes or no. But we only implemented some of these,
right? So on error, we are invoking this error thing right
here with the error so that we can catch back whatever is going
wrong and we can give a little information to our
users. We're going to send those partial results, right? So
we're going to send the results back to our shared code and we're going to
see them pop up. We're going to catch that whenever our full result is
done, we're going to send that same result. So that's
going to all go to the same way. And again, we have another
overload of the send results. So that's kind of like these are going
to feed into this one. And we're going to check if the
matches are not null, if the matches are not zero. And we're going to send back those results to our action,
right? This is our action that we register in our
application. So if we scroll back down, we now have this class that captures
the results on the Android level. And if we now go back
here to our speech to text implementation, we see
that the red squeeze are gone. We have this new speech recognition
listener. We are setting that error, right? We have a couple of
error. We have a couple of properties here. So we have this
error. One, they're all actions. So we can invoke actions whenever this is
happening. And you can see these actions are
triggered from right here in this Android implementation of the eye recognition
listener, right? So that's how it all kind of ties together. I
hope that's kind of clear. So whenever we have an error, we're
going to try set exception to our task result and we're
going to do new exception with the details for that,
the partial results. So whenever a partial result is there,
we have this thing, this object going in here, which is a
string and we're going to send that to our recognition result
report and we're going to send along that string that we've detected
and kind of like the same thing for our results. So that's how we service them
to our shared code layer, if you will. So we've got all
this in place and what we can do now is actually
start consuming this. So I've implemented this through this
interface so we can use this with dependency injection. So dependency injection is
built in with .NET MAUI by default now. So I can go to my Maui program and I
can register this for dependency injection. So let's do
builder. Let's scroll down so that you can see it a little bit better.
Builder services add transient. We're first going to add my
main page in here because I want to inject it in my main page.
And then I also need to register that in my dependency
injection container. We need the Builder services add
singleton. I think our speech to text
implementation can definitely be a singleton. So let's
just do that. And I want to register this as the Maui speech to
text. And I want this to be the speech to text
implementation, right? So what we're seeing now is,
hey, I want to have whenever the I speech to text is requested, I
want you to provide a speech to text
implementation. Implementation. Did I spell it
correctly now? I hope so. But now it's not recognizing it right.
So what we need to do is here at the using Maui speechtotext platforms. And you can
see that it already starts complaining
here like, hey, this is unnecessary. And we have this box right here in the
top left is set to iOS. So for iOS, we didn't implement it
right so it can't find this class. And you can see whenever you hover
over it like, hey, for Android it's available, but for iOS and Windows and
Macatalyst, it's not available. We're going to work on that in Next
videos. Make sure to stick around till the end. But for now, we're just going
to make sure that it runs for Android and we're going to implement
the rest later on. And if you switch here to here in the top left to
the Android context, that's basically what you're doing
here. You're saying, hey, now I want to see whatever is built for. Whenever I
target Android, you can see that it starts recognizing these things, right? But
if I don't do something like this, probably if Android, then it's probably going to break
because whenever you start running and building, then it's going to try and
build it for all platforms. And now this can only be done for Android, right?
So let's make sure that this only runs for Android. Do it like
this. And now we make sure that only the Android
bits are being compiled and we're good to go on this front as well. So
we've got that set up. Now the last piece of the puzzle is
hook up the UI that you can see here on the right. So we need to
actually inject this is Speech to Text. So let's do that. Go to your
solution explorer. We're going to go to my main page and the main
page, XAML CS. Inside of that in the constructor I want to say
is Speech to Text. Speech to text. So now this
will be injected, which works because I
registered everything in the dependency injection container. I want to use
this later on. So let's make a private field for the
Is speech to text. And let's name that speech to
text as well, which is maybe not the best thing to
do here with the naming convention, but
oh well. So you make sure that this speech to Text is speech to text.
And now we have our Speech to text implementation inside of my
main page that I can use. So let's do that. And I
already set up this listen command. So this is
going to go to this listen method right here. And this listen cancel.
And to actually be able to cancel, I need that cancellation token, right?
So also let's set up this private cancellation token source. Token source. And we can use that to
actually cancel our things. So listen command goes to the listen
method right here. Listen cancel goes to this method
right here. And then I'm setting the binding context so that it knows that it needs
these properties from the recognition text and listen command,
all from right here. Now for the listen, I'm going to
actually copy some code here again. So let's do that. Here we are.
So first we're going to check like if we have
the permission, right? Because we have that method in our code as well
to request a permission. So why not do that? Let's
make this async while we're waiting here so that all the rest quizzes go away.
So we're going to check if authorized. If not, then we're going
to show this display alert like, hey, we have the permission
error. We can't do anything here. But if we do our happy path, then
we're going to try and set that recognition text. So that's the string
to the await speech to text listen with a hard coded culture info. So I'm just going to set this to enus.
You might have different ways of doing this, but I'm just going to do
it this way. And then our second parameter is the new
progress, right? So that's for our progress reporting. And then inside of that new
progress, we can specify an action that we need to take whenever a new
progress is reported. And that's happening on our Android side. And
then we're going to just set our recognition text to this partial text that's
coming in, right? So again here there's going to be little
differences in behavior for iOS and Windows. But for Android you can just
set this recognition text to the partial text because it will add
the partial text to the things yourself. There is a little bit of optimization
that you can do here. But for now I'm going to stick with this nice
assignment for you at home to figure out how this fits in
your application, right? And we have this cancellation token. I
see that this was a little bit different in my sample code
that I prepared. So I'm going to say token source token. This is going to
put in the token and then in the cancel listen cancel, I
can just say token source cancel, right? So we can cancel our thing right here. And I actually
need to create a new one here to make sure that things are not going to
break. So we have this cancellation token source. And there we go. So now
we have everything in place. Whenever I tap listen, it's
going to start listening. It's going to show me the things and
whenever you press cancel, it cancels. So the app has been
running the whole time. This is probably not going to pick up with all
the net hot reload, XAML hot reload. So let's just stop and restart really
quickly right here. And we're going to see what is
actually going to happen. And if this speech to text is actually going to
work in our .NET MAUI Android application. So here we are.
The application is coming back up, at least it is, yes, here on my
physical device, also on the mirrored screen. So we should see our minimalistic
design again. And whenever I start pressing listen, then you should see
the label popping up with everything that I'm saying here as well. So let's
press listen. Oh, actually we get the permission first, of course. So
I'm just going to do oh, I'm in the way here. So we're going to get this
permission, right. So you see while using the app, which is fine, and then I can
say, hello, my name is Gerald and this is actually the speech to
text on Android. And you can see that it picked up on
some of it, not all of it, but it does the speech to text
fairly good. So maybe there's something wrong with
the way I catch my partial results, or maybe something is
just not great with me talking into all kinds of microphones right
here. But you can see that this works. We've now implemented speech to
text on Android together with Done in Maui. Now we have android
done. Of course, there is the other parts, I've already
mentioned it a couple of times. iOS and Windows Macatalyst should work together with
iOS as well. Maybe with a little tweaks here and
there. But we'll see that in other videos. So make sure that you catch
the other videos as well, where I implement the rest of this. And you
will also learn a little bit about how to do this platform specific
implementations with Dundee Maui while still sharing the maximum amount
of code. I just want to give a quick shout out to Vlad for writing
the blog post which this code is based upon. So flat. Thank you so much
for all your hard work and doing the great stuff that you do. Also
together with me on the dotnet. Maui community toolkit. If you've liked this video,
please click the like button so that it will spread to more people onto
YouTubes and more people will learn about net Maui speech recognition, Android, all
the things right here. Subscribe to my channel if you haven't
done so already. And click here to go to the next video on iOS and see this
playlist to actually fully discover more about Net Maui. See you
for the next one.