Getting started with object detection for web using MediaPipe Solutions

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[MUSIC PLAYING] JEN PERSON: The MediaPipe object detector task lets you detect the presence and location of multiple classes of objects within images or videos. For example, an object detector can locate dogs within an image. There are APIs available for Android, Python, iOS, and the web. To get started using the object detection task for the web, first take a look at the available object detection models. You can see this list in the documentation linked in the description. This list might change over time, so definitely check the docs for the latest. There are three recommended models listed here with two available formats each-- EfficientDet-Lite0, EfficientDet-Lite2, and SSD MobileNetV2. All three models were trained using the Coco data set, a large scale object detection data set that contains over 1.5 million object instances and 80 object labels. I've linked to a list of the Coco classes so you can see what labels are available. The EfficientDet-Lite0 model is recommended because it strikes a balance between latency and accuracy. It is both accurate and lightweight enough for many use cases. EfficientDet-Lite2 model is generally more accurate than EfficientDet-Lite0 but is also slower and more memory intensive. This model is appropriate for use cases where accuracy is a greater priority than speed and size. The SSD MobileNetV2 model is faster and lighter than EfficientDet-Lite0 but also generally less accurate. It's appropriate for use cases that require a fast, lightweight model that sacrifices some accuracy. For more details on each of these models, check the docs. If your use case requires a more unique object detection solution, then you can customize a model using Mediapipe Model Maker. I've linked to a guide from the docs, but if you'd like a video on getting started, let me know in the comments. Now that you've chosen a model, install the Tasks Vision Package. You can download the package using NPM and use a JavaScript compilation tool like webpack or you can import the package using a CDN. Note that under the hood, MediaPipe for the web uses WebAssembly or WASM, a binary instruction format for a stack-based VM. You don't need to be an expert on the ins and outs of WASM to use MediaPipe Solutions for the Web. In simplest terms, WASM allows non-web-based code to run on the web. For the best user experience, you don't want to bundle your model or WASM binary into your website. Instead, you will store them server side and provide links when initializing your object detector. So let's explore the code for this. Here we have a function createObjectDetector. First we configure our WASM binary loading using the FilesetResolver forVisionTasks method. Then we create the object detector using the ObjectDetector createFromOptions method, passing the file set resolver you just created and the model. You can also provide optional parameters like a score threshold, which indicates on a scale from 0 to 1 how positive the model should be to return a detection, and the running mode for the inference, which is either image or video. Image is the default value. To run object detection on an image, use the ObjectDetector.detect method, passing the image source. This function is synchronous, which is good to keep in mind when designing your UI. The source can be an HTML canvas element, HTML video element, HTML image element, image data, or image bitmap. The detect method returns an ObjectDetector result object. There are a series of detections in order of how confident the model is that the detected object belongs to the given category. In this example, the first detection has a class name of dog and a confidence score of 0.73828. The next detection is also a dog with a confidence score of 0.73047. We can be reasonably sure that there are two dogs in this image, which sounds like a great image to me. You can access these detection results using detectionResult.detections. So if you want to access the display name of the first result, you would use detectionResult.detections, grab the first result, which is of course the zeroth, get the first category, which is also the zeroth, and then get the category name property. You can iterate through results to handle multiple detections. To detect objects in frames of a video, get the current time using performance.now. Then get the object detection result using the objectDetector detectForVideo method, passing the video element in the current time. And that's it. With this code, you can get started with object detection in your own web app. You can check out a complete code example on Codepen and view all the available solutions on the MediaPipe website or get hands on with solutions in MediaPipe Studio. All these great resources are linked in the video. Now that you have what you need to get started, I want to know what's next for you. Tell me what you're working on. Tell me what you learned. Tell me what you still want to know. Drop a comment here on YouTube or on LinkedIn. I can't wait to see what you build. [MUSIC PLAYING]
Info
Channel: Google for Developers
Views: 5,186
Rating: undefined out of 5
Keywords: Google, developers
Id: C3-WnwzsaJA
Channel Id: undefined
Length: 6min 22sec (382 seconds)
Published: Tue Nov 14 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.