Welcome to the newest edition
of the Innovation Coffee Break today, hosted by understand.ai. At understand.ai we automate annotations
and now I will tell you why. For autonomous driving we need
enormous amounts of data. To take passengers safely from A to B,
a safe driving vehicle needs to master it surrounding perfectly with all
challenges which are on the way. Since self-driving vehicles
typically learn from data, we have to present those self-driving
cars with all the weird situations which might apply in real life. Since we want self-driving vehicles
to always behave nicely and well defined, we need to present them with many, many
such rare events. A typically use case is that
you have a driving function and you want to make sure that
it applies to safety standards. So you put this driving function into your
car and you have all the sensors and these are the sensors-under-test. And then you put a second sensor set
on the rooftop of your car, so that's the so-called rooftop box,
and this is the reference sensor set. So let's do a little math. So you have three hundred thousand kilometers
and a typical average speed would be forty five kilometers per hour since
it includes cities as well as highway. And if you do that, you end up with
six thousand seven hundred hours of driving. If you have cams and a lidar,
a typically frequency, recording frequency, is 10 frames per second. So it means with 10 frames per second,
we end up with 240 million frames. So there are 240 million frames
and they have to be annotated. So in each frame, they're on average
15 objects like cars, busses, traffic signs and so on. So if you have those 240 million frames,
with 15 objects each, you end up at three point six billion
objects to be annotated. This is a lot. And they need to be annotated at quality,
because otherwise you cannot make a meaningful comparison between
the sonsor-under-test data and the reference sensor data. So that's a job: to annotate three
point six billion objects at quality. And this is what we call enormous. So can this be done manually? Manually, in the sense that you sit
in front of your computer and you draw objects one after another. And the answer is no! This cannot be done manually.
Not at this scale and not even if you have the best labeling
crew in the world available. This just doesn't work out. In the past, it was done manually
because the projects were much smaller. With ever more complex driving functionality
and ever more complex validation requirements, there's also a higher amount
of kilometers you have to annotate. And at the scale we talked about, there's
only one way you can do that. And this is through automation. To understand automation, you need to
understand what manual labeling actually means. The first step is you have to navigate
yourself through the lidar scan and through the cam footage. So what you typically have is
you have a lidar scan with corresponding cam footage. And the lidar scan is
it's a 360 degree lidar. And then you have a variety of cams,
maybe eight or so, which then cover the same area
from a cam perspective. Typically there's a different angle and
a different coverage for each cam and also for the lidar. And this is what makes it somewhat tricky
to navigate yourself in this space. So once you did that,
that's the first task, we assume it takes around
20 seconds on average, then you move on
to the second task. So now you know where objects
are located, at least roughly. And now what you do is
you do object detection. So this means within your 3D world,
like in a game, you are navigating close to
the object you detected. Let's say it's a car. And then you start drawing an as tight
as possible box around this car. So you do that in 3D space. So you're doing a 3D
box around that car. This task takes roughly 15 seconds
on average to draw that box. Then you move on to the next task. So that's not the third task. It's typically not enough to just
have a box which is classified to be a car or a pedestrian or a
motorcycle, for example. What you also need a certain
attributes which better describe the object. So let's say you have a car. You would also be interested
in turning signals, braking lights. Is it an emergency vehicle, or not? Are there caution orange
lights on, or not? So these are the things you
would need to know in addition. Or is it a moving or if it's a static object,
is it parked and so on, so forth. And these are the attributes you assign in
addition to your box to each of those objects. So there's a long list and you
go through this list one by one. You look at the cam footage and you say,
OK, the blinkers on or blinkers off and you move through that list. So this task takes another
10 seconds on average. But that's still not enough. There's at least one more step,
which is absolutely important to achieve desired quality levels,
what we do is, we have a second person looking on each of those
annotation and making sure that it's actually a
correct one. So that it is of high quality,
high enough quality. So that person move through every annotation
and then checking if the box is tight enough, if it's the right class, if it has the right
attributes on and so on, so forth. We assume that this takes another 15
seconds on average. So if we sum that up, we end up on
60 seconds for a 3D bounding box with all annotations at the
desired quality level. This is how long we would assume it
takes if you do one box after another. Let's do a little
napkin calculation. From our example before we ended up
with three point six billion objects. If we now assume that it takes roughly
60 seconds, or one minute, per object to get it annotated manually,
we end up with 60 million hours of work. Now let's talk automation. understand.ai developed the automation engine
to bring down the times from 60 seconds to a significant lower amount of time
to speed up those projects, to bring down prices. I now want to take the
chance to explain how that is done. We start with automated object
detection for cars, for example, via deep lerning based networks. Then we track those objects across
frames and link them into chains. Finally, we snap those objects
to the perfect size via deep learning based regression. With the understand.ai automation engine
data can be processed up to 15 times faster than manual labeling
and up to five times faster than traditional automation approaches. This saves you costs and time. Sounds interesting? Contact us if you want to learn more. dSPACE, your partner in
simulation and validation.