AI Landmark Recognition With Tensorflow Lite and CameraX on Android

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey guys and welcome back to a new video this one is actually especially exciting because we are going to use some AI in Android together with camera a and in particular what we will build is a so-called landmar recognizer so we can actually just have our camera and hover it over or point it to a landmark and the app will then tell us what that actually is and you can see I already have my camera open here it's currently lying with the face down and it is recognizing some kind of landmark here which is of course wrong in this case but let's open Photoshop because I've prepared some photos here and open up the emulator and if I now point my camera here on the Stonehenge you will see it actually recognized it as such and if we then move a bit on then yeah there we get the Eiffel Tower and a little bit further we got the big ban so that is working perfectly fine and in this video you will not only learn how you can use camera acts for image processing so that will really go through every single frame and then analyze it and feed it into our AI model no you're also going to learn how you can Implement such a tensorflow model in Android to classify images and all that in real time and you might now wonder where I got that AI model from that comes from TF hub. def so that's basically a website where you can find tons of AI models that have already been trained and you can just download these here for free and use in your app lots of different use cases also if you would like me to make videos about different AI use cases like see audio text um video and then just let me know that below what you would like me to see in this video we're going to deal with image processing so you can click on problem domains image and then we want to have image classification and here you can already see this is the landmarks collection I've used if we click on that then there are different sets so depending on where you live you could use a different uh AI set here I downloaded the one from Europe that's also the one that I will include in GitHub but if you maybe from Asia uh then you can download the landmarks classifier Asia so you can also recognize your Asian landmarks so what you would need to do is after you've chosen your Landmark then you click on that you get to this page then wait a little moment until this loads you can see it already um gives you little sample you can also upload your photos here to see what this recognizer would um classify it as and if we scroll down then here's on the one hand a version for tensorflow so tensorflow is the library we use to use such AI models that have been trained that is the normal Library which is very popular on python for example but on Android we can't use that real tensorflow library but there is rather a tensorflow light version for mobile devices so if you want the tensor FL tensorflow light model which you just need to use for mobile devices then click on this TF light here and then you can just hit download and download it on your machine and if you've done that then you need to jump into Android studio and also include that downloaded TF light file here you can do this by going to your app folder creating new directory and searching for assets we want to include the source Main ass folder so the assets folder is basically just a folder where you can put put in any outside assets any outside files you want to include in your app perfect for for tens of from Models so here we just want to paste what I've copied before so that's just the downloaded TF light file from that just showed website enter um do I want to add that to G no I don't want to add that to G you can download that yourself or you know what I will actually include it um so you can ignore this but I want to add it to get so you can just clone the repo and you can use it as it is without needing to watch this tutorial so added that and we are now good to start what I also recommend is to take a look at the initial code in my GitHub repository below because you need these dependencies on the oneand camera X that is not new to you because I hopefully I hope you you watched the previous two videos which were about taking photos with camera X and recording videos and in this video we're going to take this to the Next Level and also process what camera X actually shows us so that is for camera X and the bottom three dependencies are for tensorflow light which we need to classify AI models or rather classify the images with an AI model okay so the first thing I want to do which we also did in the previous videos is to add a little composable in order to show our camera preview I will still do this here again for those of you who missed the previous videos it's really quick so we can can just create a new file here and let's actually make a presentation layer so new package in our root package go to presentation and create a preview or camera preview in there camera preview make that a file yes that to get and then have a composable in here camera preview which will take in a controller so as I said in the last video that is a life cycle camera controller which is used to just control camera to take photos to determine front back camera these kinds of things and we want to assign a modifier and in here we can then get a reference to the life cycle owner and we get that by local life cycle owner. current so we can attach this life cycle owner to our camera controller so it's also aware of our activities life cycle let's then use an Android view since the preview view which comes from camera act is not available as a composable at least yet but we can very easily work around that by by just having an Android view with a factory in which we create that preview view from The View system attach the context and we can make some changes to that with apply on the one hand want to say this. controller is equal to the controller we passed and we want to bind this to the live cycle um bind actually controller. bind to live cycle bind live cycle and the live cycle owner is this then we're going to sign a modifier and we're already done here in this file so what would be the next thing okay next up I want to dive into our root package create a domain package and in the domain package we're going to put two classes or two files on the one hand we're going to create a so-called classification that will be a data Class A classification ising the add nothing else than the output of our AI model so we feed something into this model which in this case is the frame so the image and we just do that very frequently frame by frame we feed in the new image into this AI model the AI model processes that frame based on its based on how it was trained and then the outputs will be the classification so it will then spit out a few landmarks which the model thinks it sees on that photo which we fed in and on the one hand it will give us a name of that Landmark which is what we want to display but on the other hand it will also give us a score basically how sure the model is that the landmark is really what it thinks it is so for example it will give you a score of 0.8 and that means it's 80 80% sure that what it's seeing is for example the Eiffel Tower and I will create this classification class just that we can bundle these information a bit so on the one hand we get the name of the classification and we get the score which is a float I will not use the score here to display it on DIY or so but I think it might help you to also learn how you get to know the score since that is quite important if you integrate a model into your app and you really want to get a feeling for how sure the model really is that what it tells you it is it really is cool in domain we also want to create an interface called landmark classifier and this interface will take a function or will expose a function called classify and that will take in a bid map not only a bit map but also the rotation of that bit map and it will then spit out a list of these classifications so this is the function we will call very frequently in our image analyzer later on when we implement the uh camera X related functionality every single frame or at least every few frames we say we want to we have that bit map here we feed that into this function together with its rotation this function fun will then yeah really use the model and feed in that frame to the model to classify it and will then spit out a list of classification so it might say hey I'm 80% sure is the Eiffel Tower I'm 10% sure it is the Brandenburg gate and so on and now the interesting part is of course how we implement this and that's actually not too complex so AI always sounds so super complex it is super complex if you train your own models and get deeper into it but if you actually have a pre-trained model like we do here and you just want to use it in your app that is not a lot of code so let's create a data package since feeding something into an AI model is clearly data related and here we create that TF light landmark classifier and this will take in a bunch of Constructor parameters on the one hand we need the context here we then need a threshold which is in the end of float and and I set that to 0.5f so the threshold is starting from which score we actually want to include um classifications so if the model is less sure than 50% we we want to say ignore that then it's not enough enough probability that there is a match and I want to have a private Val for the Max results which is an integer and we set that in case to one you can adjust this if you want your list to contain more than one entry and then just increase this to three five whatever and then the model will spit out more landmarks it might see on an image but here in our use case we just want to show The Landmark where the AI model is the the most sure that it really is what it thinks it is then we can say that is a landmark classifier and we of course need to implement the function command I classify and for that in order to classify something with TF light tensor flow light we need a private Val actually private VAR classifier which is an image classifier that comes from T tensorflow by default it's null but we want to have a function set up classifier which just initializes that in this function we can also configure this classifier so on the one hand we want some base options which we get with exactly that base options. Builder you can set the number of threats I want to set that to two you can also make that a constant then you can say you want to use the GPU I want to leave that out here and some other settings I'm not sure what that is for we just want to call build here and then Define some further options like this which are called image classifier that image classifier options refer to the Builder again and here we want to first of all set the the base options to our base options want to set the Max results to our Max results we want to set the score threshold to our threshold and then call build and then we just call try catch want to catch illegal State exceptions when something goes wrong then we just print the stack tray if everything goes right hopefully we want to say our classifier is equal to image classifier create from file and options that is what we want so we now want to create this classifier from a file which is our AI model and some options which we've created which we've created before so the context then the model path what is that well the path to a model it already uses the assets folder as a default path so we can just specify landmarks. T of light or whatever name you chose here in the assets folder and then we pass in our options for the the image classifier options and then here in the classify function want to check if our classifier is null if it is want to set it up so set up classifier and after that the exciting part starts since we now want to classify the frame and for that we need a so-called image processor we can create this from image processor. Builder here we don't need to pass in anything and we also don't need to configure anything here we can just call build next up we need to take this image image processor fed in our bit map but it takes that bit map as a so-called tensor image so just a tensorflow format so we want to get that and convert it by saying image processor. process so that takes in a tensor image and also spits out a tensor image and we want to process our bit map and we can simply say tensor image from bit map and pass in in that bit map next up what we want to do is want to also rotate the image with this rotation since that there are some predefined rotation constants we might want to use and for that I will use a private function called um get orientation from rotation pass in the rotation in here and that'll return an image processing options. orientation so that the model also knows where which part of the image is and how it can properly rotate that and here want to return When the rotation is surface. rotation Z so if it's not rotated at all then we assign image processing options orientation right chop it is I'm personally not fully understanding these orientation values here and what this right top refers to because I also didn't get this function here um from my own brain but these are related to the soal aive value so the image metadata you can say and I think this right top refers to the in this case the the right top corner of our image and if the image is not rotated at all then yeah the right top corner of our image would be at the right top obviously um if we move on and say surface rotation 90 so if it's rotated by 90° counterclockwise always when it's about rotations so we would now rotate it by 9 ° then we come up to image processing options orientation in this case top left and so on I will just paste these other values here so 180° would be right bottom and please correct me if I'm wrong with what I thought it was last but not least 270° would then be bottom right and here we can just say else so we get rid of this error and then we can say Val image processing options is image processing options Builder we set the orientation to get orientation from rotation and pass in the rotation we have I get this and then in order to get our results we call our classifier that classify that is where the magic happens pass in our tensor image and our image processing options and those results are now a list of classifications so a list of multiple classifications and we now want to Loop over this list and flat map it so return results that flatmap I will explain in a moment what all that does that gives us the classifications plural and these classifications have categories which we now want to map we get a reference to each category and each category we now want to map to our domain model this classification and the name is category. display name and the score is category. score we then go down here call that distinct by it. name and if that's null we we turn an empty list so what the heck happens here so first of all we just go over all these Cate ories and map these to our classification model so each category just contains the name of the classification so for example Stonehenge and the score how sure the AI model is that that's really Stonehenge and since we might have multiple categories we want to also map all these together to a single flat list so we don't want to have a list of lists we just want to have one single list with all the entries in there we then say distinct by the name which basically means that that if there are duplicates in that list we remove all duplicates except for one last entry of course and that's due to how this model seems to work um I'm not sure if I can yeah here you can see for some reason it classifies the same image sometimes multiple times as one Landmark which I want to avoid so we just remove all the duplicates so that we only have one pathon on here and that's already it for a classifier we can now go to presentation and have our Landmark Image analyzer which will now be the camera X related class which now gets called for every single frame and then gives us that frame we can convert it to bit map and feed it into our classifier to get the outputs let's see how that works on the one hand this needs a classifier landmark classifier and we get an on results Lambda um which gives us a list of classification or domain model and uh returns unit so that gets called Whenever there is a change in classifications then this would implement the image analysis. analyzer interface and we can overwrite the analyze function which gives us this image proxy we can then use to now convert that to bit map and feed it into our AI model okay so let's first get the rotation degrees we did that in the previous videos already we can get that from the image image info do rotation degrees and we can then say we have our bit map by saying image. 2 bit map but we are not done at this point we can't just take the bit map as it is I mean we could but it's not ideal because if we take a look at the model and scroll down to what kind of data expects for most models or for hopefully all models you will also find something like this the the size of the image that the model expects as an input and here it wants that as a square that is 321 by 321 pixels so we now have an image which is in portrait format but we need to make a square format out of that and save that as 300 21x 321 pixels how do we do that well we want to Center crop the image so as I showed you here in my demo app this green square we now want to cut out from our picture and crop it so that this is the final result which we feed into tensorflow also yeah a little bit cut off from the edges as well so that we have a 321 by 321 Square somewhere in here the original picture will already be scaled down by by this image analyzer since we don't want to feed in and call this function here uh for every single frame for 4K or full HD picture no is already a smaller picture on its own okay what does that mean in presentation I want to have a little bit map extension file and in here we're going to extend bit map and have a function that Center crops a bit map this will take in a desired width and a desired height which will be our 321 and we first calculate the X start so from where we want to start cropping on the x- axis that will be the original width of the bid map minus our desired width divided by two because we have two edges obviously from where we need to cut something off and let's keep that an integer do the same for the Y start at this height minus Desir height divided by two and if for some reason these values are invalid so if the a start is less than zero or the Y start is less than zero or the desired width is larger than the actual width or the desired height is larger than the actual height then we throw an illegal argument exception invalid AR Arguments for Center cropping if that's not the case we want to return bit map create bit map we pass in this so the bit map we called this function on but we crop based on X start y start and we pass in the desired width and the desired height with that we effectively achieve a center crop effect okay in the the analyzer we can then call set a crop here with 321 by 321 pixels we then get the results from our classifier that classify pass in that bit map and the rotation decrease and then we can call on results and we pass in the results that we get from the classifier last but not least want to close the image to tell our analyzer that we fully processed it for this that we fully cross the single frame and that we might want to move on one last thing I like to do here is to skip some frames um so I want have a private VAR frame skip counter which is initially zero but the reason for that is simply that I don't want to analyze and image every single frame that's too quick then also the text always jumps very quickly and it's hard to read it uh it's totally fine if we just analyze an image once a second so I just want to skip six 60 frames after analyzing a frame that's not only much more performant but also results in a better user experience so if the frame skip counter modulo 60 so 60 oops modulo 60 so if that's zero that means we yeah 60 frames have passed then we want to do all this like here so if we a frame rate of 60 frames per second then that means we analyze an image once a second and after this if statement we say frame skip counter and we increase that by one like this cool so now the main part really is in place we just need to use that in our UI so in here we're going to use a box the modifier is modifier fil Max size and in this box we're going to use the camera preview need to create the controller first of all just like in the last video with remember Block it's pretty slow right now remember hello I'll enter to import that no import function remember and in here we're going to create our life cycle camera controller which takes in the context pass in the application context we call apply here we want to set the enabled use cases to camera controller do image analysis that's what we want to do with our camera nothing else don't we don't want to take any pictures or record videos and we want to set the image analysis and analyzer that's how it work um with our context compat main executor so which threat we want this to run on which executor rather which handles all this behind the scenes pass in the context and our analyzer is what we just created so in order to create that analyzer we first need a list of our classifications of our classifications by remember mutable list of not not mutable list mutable state of so compos State empty list by default and that is a list of classification like this I'll enter Alt Enter import I'll enter yeah it imported okay then we want our analyzer which is also created inside of remember so our Landmark Image analyzer which takes in the classifier classifier let's just create it here TF light l Mark classifier context is application context and let's leave the rest as it is since we have default parameters on result is on results actually would be the function that gets called when our analyzer classified something so here we get the classifications and we can update our state classifications is equal to it and now with this analyzer we can pass this in here to our life cycle camera controller which will then attach to the camera preview okay controller and make sure modifier F Max size that all we make sure that our frames are analyzed but we of course also want to display the results which we can do with a little text so if the classifications are actually not empty we can Loop over these and actually put all that in a column like this modifier modifier fmx WID open this here and I want to rrange this column at the top of our screen so align top Center and then for the classifications we Loop over these and we display these as a text so in my case since we set the um amount or the the max number of classifications to one there will only be one entry but if you increase that then you'll also see more so the text will be it. name modifiers modifier. filmax with and we just give it a little bit of background so background tal theme color scheme primary container and some pading pading of 8 DP I'll enter to import DP and Android Studio is super slow right now I have no idea why ATP and then the text align should be Center the font size is 20sp also Alt Enter to import that and I want to set the text color to material theme color scheme primary okay now we can display our classifications we can preview everything and I think the last thing that's missing is camera permission so we should go to manifest add our users permission camera also go here Alt Enter to add this uses feature tag that we use the camera as a hardware feature and then we need to quickly request that so yeah let's create a little function down here in our activity has camera permission is equal to context combat check self permission this manifest from Android do permission. camera then if that is equal to package man manager permission granted we know we have the camera permission so we can scroll up again do a very simplified permission handling here so if we don't have that we have to request that activity compat request permissions this um array off manifest permission camera request code zero and we should be fine so I would say let's try this out launch this take a look here well I didn't show you how you can make this little square but I also don't think that this is the core of this video when dealing with AI I can tell you in a moment how I did it but it's it's pretty easy there we go our app launched while using the app and yes camera access is already working and you can see it actually recognizes my mouse pad as a landmark of course that's not what we want let's open Photoshop and have our sample landmarks here and see if it actually recognizes these and it does not yet recognize it as ston hch that's weird maybe I did something wrong is that ah that is the Eiffel Tower the Big Ben okay it's it seems to work stone hand H not sure why Stonehenge doesn't work like this maybe that is weird I I will take that off cam if that's just a model or if I missed something but these two landmarks seem to work Eiffel Tower Big Ben okay very cool very cool it seems to work and I will check that with a Stonehenge but uh I think that might just be the model with the other one it seemed to be very consistent though so I will get back to you in a moment all right I found the issue there actually was one in this get orientation from rotation function um which yeah where I made a little mistake actually we need to swab the first case with the last one so here it would be it would would need to be rotation 270° this would result in bottom right and in all other cases we want the top um right top actually and if we do that relaunch the app take a look look here then we should now also be able to recognize Stonehenge yes you can see that is working if we move to the Eiffel Tower that's working and the big ban is also working and if we increase the number of results we get here in our classifier for example to three and relaunch this then you will also see what happens here then we should be able to to get multiple results yes so then it is the the most sure that it's a big ban but it could also be one of these other two things and then here okay it's super sure it's the Eiffel Tower nothing else and here as well so that was probably your very first AI app in Android you can find the entire source code down below myab and if you not want to take this knowledge and just put it on the next level and really want to get the skills you need in the industry as an Andro developer then check the first link in the Ser description because uh you will find a lot of advanced Android premium courses there which are not only a brilliant way to to get these skills but also to support the future of this channel thanks a lot for watching I wish you an amazing rest of your week see you back in the next video [Music] bye-bye

Info

Channel: Philipp Lackner

Views: 19,283

Rating: undefined out of 5

Keywords:

Id: ViRfnLAR_Uc

Channel Id: undefined

Length: 34min 27sec (2067 seconds)

Published: Wed Oct 11 2023