Data Annotation

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Welcome to the newest edition of the Innovation Coffee Break today, hosted by understand.ai. At understand.ai we automate annotations and now I will tell you why. For autonomous driving we need enormous amounts of data. To take passengers safely from A to B, a safe driving vehicle needs to master it surrounding perfectly with all challenges which are on the way. Since self-driving vehicles typically learn from data, we have to present those self-driving cars with all the weird situations which might apply in real life. Since we want self-driving vehicles to always behave nicely and well defined, we need to present them with many, many such rare events. A typically use case is that you have a driving function and you want to make sure that it applies to safety standards. So you put this driving function into your car and you have all the sensors and these are the sensors-under-test. And then you put a second sensor set on the rooftop of your car, so that's the so-called rooftop box, and this is the reference sensor set. So let's do a little math. So you have three hundred thousand kilometers and a typical average speed would be forty five kilometers per hour since it includes cities as well as highway. And if you do that, you end up with six thousand seven hundred hours of driving. If you have cams and a lidar, a typically frequency, recording frequency, is 10 frames per second. So it means with 10 frames per second, we end up with 240 million frames. So there are 240 million frames and they have to be annotated. So in each frame, they're on average 15 objects like cars, busses, traffic signs and so on. So if you have those 240 million frames, with 15 objects each, you end up at three point six billion objects to be annotated. This is a lot. And they need to be annotated at quality, because otherwise you cannot make a meaningful comparison between the sonsor-under-test data and the reference sensor data. So that's a job: to annotate three point six billion objects at quality. And this is what we call enormous. So can this be done manually? Manually, in the sense that you sit in front of your computer and you draw objects one after another. And the answer is no! This cannot be done manually. Not at this scale and not even if you have the best labeling crew in the world available. This just doesn't work out. In the past, it was done manually because the projects were much smaller. With ever more complex driving functionality and ever more complex validation requirements, there's also a higher amount of kilometers you have to annotate. And at the scale we talked about, there's only one way you can do that. And this is through automation. To understand automation, you need to understand what manual labeling actually means. The first step is you have to navigate yourself through the lidar scan and through the cam footage. So what you typically have is you have a lidar scan with corresponding cam footage. And the lidar scan is it's a 360 degree lidar. And then you have a variety of cams, maybe eight or so, which then cover the same area from a cam perspective. Typically there's a different angle and a different coverage for each cam and also for the lidar. And this is what makes it somewhat tricky to navigate yourself in this space. So once you did that, that's the first task, we assume it takes around 20 seconds on average, then you move on to the second task. So now you know where objects are located, at least roughly. And now what you do is you do object detection. So this means within your 3D world, like in a game, you are navigating close to the object you detected. Let's say it's a car. And then you start drawing an as tight as possible box around this car. So you do that in 3D space. So you're doing a 3D box around that car. This task takes roughly 15 seconds on average to draw that box. Then you move on to the next task. So that's not the third task. It's typically not enough to just have a box which is classified to be a car or a pedestrian or a motorcycle, for example. What you also need a certain attributes which better describe the object. So let's say you have a car. You would also be interested in turning signals, braking lights. Is it an emergency vehicle, or not? Are there caution orange lights on, or not? So these are the things you would need to know in addition. Or is it a moving or if it's a static object, is it parked and so on, so forth. And these are the attributes you assign in addition to your box to each of those objects. So there's a long list and you go through this list one by one. You look at the cam footage and you say, OK, the blinkers on or blinkers off and you move through that list. So this task takes another 10 seconds on average. But that's still not enough. There's at least one more step, which is absolutely important to achieve desired quality levels, what we do is, we have a second person looking on each of those annotation and making sure that it's actually a correct one. So that it is of high quality, high enough quality. So that person move through every annotation and then checking if the box is tight enough, if it's the right class, if it has the right attributes on and so on, so forth. We assume that this takes another 15 seconds on average. So if we sum that up, we end up on 60 seconds for a 3D bounding box with all annotations at the desired quality level. This is how long we would assume it takes if you do one box after another. Let's do a little napkin calculation. From our example before we ended up with three point six billion objects. If we now assume that it takes roughly 60 seconds, or one minute, per object to get it annotated manually, we end up with 60 million hours of work. Now let's talk automation. understand.ai developed the automation engine to bring down the times from 60 seconds to a significant lower amount of time to speed up those projects, to bring down prices. I now want to take the chance to explain how that is done. We start with automated object detection for cars, for example, via deep lerning based networks. Then we track those objects across frames and link them into chains. Finally, we snap those objects to the perfect size via deep learning based regression. With the understand.ai automation engine data can be processed up to 15 times faster than manual labeling and up to five times faster than traditional automation approaches. This saves you costs and time. Sounds interesting? Contact us if you want to learn more. dSPACE, your partner in simulation and validation.
Info
Channel: dSPACE Group
Views: 5,043
Rating: 4.9354839 out of 5
Keywords: dSPACE, understand.ai, UAI, data annotation, data enrichment, autonomous driving, AI, autonomous vehicle, self-driving car, AD, ADAS, automate annotation, Daniel Roedler, Innovation Coffee Break, Virtual Showroom
Id: WwsyGSFlgbQ
Channel Id: undefined
Length: 8min 14sec (494 seconds)
Published: Wed Oct 28 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.