RustConf 2021 - Identifying Pokémon Cards by Hugo Peixoto

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi everyone my name is luke beshot and i'm here to talk to you about my pokemon collection i've been collecting these pokemon cards for a while now but i never really took the time to keep them organized and recently i started playing the game again so it has become a bit of a problem because every time i want to build a new deck i need to know if i have the necessary cards to do it or if i need to go out and buy something so i decided to fix this i started by building a website where i could go and manually enter how many cards i have of each and that would get saved to a database now this worked fine but i wanted to try something different i wanted to have a webcam pointed at my desk and put a card in there and have the software automatically detect which card it was and start that in the database so this talk is going to be about the algorithms that i used and the problems that i faced here's a very high level overview of the whole process we start by grabbing a frame from the webcam's video stream then we take that frame and we need to extract the card image from it and finally we go through a data set of known cards and search for the one that's most similar to the card image we just extracted and we add that to the database initially i was working with frames like this one on the left and you can see that there are some some shadows and the wood pattern of the desk causes some noise as well so these things were making my life a bit harder than i wanted so i moved to a more controlled environment like the picture on the right and there you have a white background and there are no shadows and things worked much better under these conditions now i mentioned that i needed a data set so there's this website pokemon cards.com and they have pretty much all the cards in there that were ever printed in english so since they don't have an api what i did was i built a ruby script that just scraped the whole thing and downloaded every card image to my computer so i ended up with about 14 000 cards now let's focus on detecting and extracting the card image from the frame we're working with 1080p color images you might think of a color image as having three channels the red the green and the blue one now what i found is that most computer vision algorithms only really care about the brightness of the pixels or how light or dark each pixel is this is equivalent of to working with grayscale images and that's what we're going to do now once we have this grayscale image we want to crop it to the boundaries of the card while also fixing its perspective so that there's no rotation or skew we wanted to become as close to a 2d scan of the card as possible and the first step that i did to do this was to apply a sobel operator the circle operator highlights the edges of the image both the outline of the card and the edges in the drawing itself and it removes any sections of solid color since we're looking for the boundaries of the card this helps highlight the edges particularly in cases where the card doesn't have a thick black border like this one around it some cards have a yellow border and this helps to normalize the levels a bit now how does the simple operator work it's a kernel based image filter and what that means it's a image filter that follows a specific structure so you calculate each pixel independently of each other and you look at the respective pixel on the source image plus a small window around it and in this case i'm showing a three by three window but it can be a window of any size then you take the pixels in that window and multiply them by a matrix of coefficients and that matrix is called the kernel you sum all those values together and you get a resulting pixel the kernel is what defines the behavior of the filter so if you want to apply a gaussian blur which is a common operation in many imaging image editing programs you use one kernel and if you're applying the solo operator you use a different kernel so if we apply the servo operator to every pixel on this image we end up with something like this the values here range from zero to around a thousand and to better understand what's going on let's map this to a grayscale image where the zeros become white and the thousands become black here you can see that the corners and the center of the image are completely white and this indicates that there's no edge in those regions while around the circle a dark ring has formed and that indicates that these pixels likely contain an edge so if we apply this to our initial grayscale image we get this the next step is to find the contour of the outline or the contour or the outline of the card i did this using a simple algorithm we scan each row from left to right and when we hit the pixel that's above a given threshold we stop we mark that pixel and move to the next row so if we do this for every row and then we do the same thing from the other side and from the top to bottom and bottom to top we end up with these marked pixels so this is the contour of the image and this works because the the card doesn't have any holes or any concave structures or anything like that and once we have this the next step is to turn this contour into four straight lines and we do this using an algorithm called the half transform now let's see how that works so we go through each pixel on the contour and we draw all the lines that go through that pixel not literally every line because that's that's an infinite number of lines but we pick a resolution like every half degree or something like that and we draw those lines so in this case i'm drawing eight lines here so we start those lines and then we move to the next pixel and we do the same thing and you'll see that there's one common line between the two sets if if you take a look the horizontal line happens twice and since this line occurs more times than all the other ones it's more likely that it is a real line so we're going to do this for every pixel on the contour and we're going to keep only the lines that occur let's say 200 times or so and discard all the rest so if we take those lines and draw them we get this and it's pretty close to what we wanted but not quite there yet so to get rid of this extra nice what we can do is we can cluster these lines together and average them out so any lines that are similar enough get clustered together if we do that we get our intended result so with these lines we can calculate their intersection points we can just do this by going through every pair of lines calculate their intersections and discard any points that fall outside of our image and these four points what they represent is the four corners on our card now that we have the corners of the card we can work on fixing the perspective to get something like this and this transformation is done roughly speaking by taking it each pixel and moving it to another coordinate and this movement is done by multiplying each pixel's coordinate by matrix obtained by solving a system of equations that's based on those four corners and to do this i used a crate called n-algebra they implement a bunch of linear algebra algorithms so doing that we finally got what we wanted which is our 2d card image now we can take this image and search for matches in our data set since each since each image has around 1 million pixels we can't really compare them directly since there are 14 000 cards this would take forever so we need to reduce the amount somehow of information that we're comparing and we're going to do this using a perceptual hash and what the perceptual hash is is a smaller representation of the image that still keeps the the essence of the image the main idea is that similar images will have similar percep perceptual hashes so we take each card on the data set and convert it to its perceptual hash and we do the same thing for our image that we're searching for and we compare those instead so in this case i i picked a 16 by 16 hash that's 256 bits or 32 bytes and comparing that is fast enough so let's see how this hash is calculated we take our image and result resize it down to 16 by 16. and then we need to binarize it or which i mean convert each pixel into a zero or one and in this case what i'm doing for that step is each pixel becomes a one if it is darker than the pixel to its left and it becomes a zero otherwise and there are different types of perceptual hashes both in the resizing part and the binary binarization step but this is a simple algorithm that gave me good results so here's an overview of the full process we take the original image we grayscale it and apply a subtle operator and then we found to find the contour extract the four lines are represented calculate the four corners of the card with that we can fix the perspective of the grayscale image and apply a perceptual hash now this process gave me pretty good results but there was one case that i needed to deal with these cards over here they look the same they have the same name the gameplay effect is the same but there's one small difference they were printed in different sets throughout the years and you can see that the set symbol on the corner there is different so i need to be able to tell these apart the set symbols are so small that the perceptual hash doesn't pick up any differences in there so i needed to find a different way of doing this so i've limited a technique called template matching i created individual images for each of the set symbols and we go through each possible position in that marked zone in the car there and for each position we compare the set symbol with the pixels of the card and if that hair is below a certain threshold we consider it to be a match i had to use a different threshold for each set symbol because some of them are more complex than the others and i had to tweak this a bit manually and this solved my problem so with these two techniques i was able to get a good detection rate so let me show you how this works when i place a card there you'll see that the card on the left that's the one that's being detected and this works even for nice ear cards like this one and in this case you'll see on the left that the set symbol is also being detected so let me show you guys where the set symbol makes a difference the this this mario here it belongs to the champions path expansion set and you'll see that the correct symbol is shown on the left there's another marni which is printed in a different set this one was printed in the sword and shield base set and you can see that it also gets detected correctly now there are some cards where this doesn't work so good like this one it's a foil card and that means that there's a lot of reflective material in it so the lights form from all these bright patterns and the perceptual hash just isn't able to to deal with it now to finalize here are some of the libraries that i used and some that i think that are worth worth checking out the the last one rest cv is an organization with many computer vision algorithms so if you're interested in the area i think it's worth checking out they also have some basic tutorials going through some of them and uh the code for this is available on my github account so if you want to go and check it out feel free to ask me any questions and that's all i have for you today so thank you for listening you

Info

Channel: Rust

Views: 2,878

Rating: undefined out of 5

Keywords: rust-lang, rust, rustlang

Id: BLy_YF4nmqQ

Channel Id: undefined

Length: 14min 6sec (846 seconds)

Published: Wed Sep 15 2021