Mask R-CNN

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

so good morning everyone the title of this presentation is Master Chien and my name is coming here this work is presented by faithfully our research so up if I started I want to highlight that the image in this slide is a Venetian mask so I believe when it is a very lucky city for mass RCN so in the past few years we have seen a lot of progress in the problem of object detection and also in the problem of somatic segmentation however the problem of instant segmentation is still very challenging in the problem of intern segmentation we not only want to detect individual object instances but we also want to have a mask of each instance so to see how challenge instant segmentation is we can show the number of entries in the cocoa leaderboard and also in the cities can be doubled so in the cocoa leader book actually there is about there are about 30 entries of object detection where there are only 5 entries of intern segmentation so similarly in the leaderboard of cityscape there is about 60 inches of somatic segmentation but again there is about 10 entries of intense segmentation so for the problem of object detection in my understanding why he has been so popular in the last few years one of the reason is that there are very successful meta algorithms that can support many of their improvement and then it can get continuously improvement in the in the past few years so one example in of the detection is the faster faster Arsen system so these systems has good speed it has good accuracy and their intuitive and they are easy to use so I believe they can support many research on auto detection so for the same reason in the problem of somatic documentation we have the fully convolutional net work or the FCN system which serves as a matter algorithm for this problem actually in my opinion after the publication of FCN almost all the somatic segmentation methods are kind of fully convolutional network so this demonstrate the power of the meta algorithms so our mess the goal of this paper is to present a methyl for instance segmentation and we hope this method will serve as a matter algorithm and we hope this method will have good speed good accuracy and also it is intuitive and easy to use so it can support future research on instant segmentation and many other instance level understanding applications so in the past few years the instant segmentation methods can be roughly categorized into two families the first family is driven by the success of the RCM methods so for these methods people usually start from some segmentation level proposals and after that some classifiers are trained to classify these proposals into the somatic categories another family is driven by the success of FCN they usually started from some full image somatic segmentation result and then they will learn some cut to divide these results into individual instances so what is mass a.cian actually it is very simple it is just a combination of the best of both worlds so mass Asya is actually a faster audience system with a fully convolutional network who runs on each of the hours so it is very simple and we hope it will serve as a net algorithm for this more challenging problem so one of the property of the master system is its parallel heads so we believe the parallel has are easy and fast to implement and use and will help to facilitate research so if we go back to a few years ago in the original slow arthéon method actually it is a two step training system in the first step people trained a classifier to classify between foreground and background or among different classes and in a second step people use the features to train a bonny motor aggressor so these tools that problem has actually be greatly improved and simplified in the fast alcian system which trains a parallel pair for classification and also born in both regression so mass housing is designed in a similar spirit so in addition to the bonobos classification regression had we just simply add a parallel head for protecting the mask so we believe this is a simple implementation and another property of mass alcian is an operation which is called our a line so our a line is just a simple improvement of the very popular our eye pool operation however there is no any quantization in the our line operation so we simply need to map our region into the future map and then we do bilinear interpolation to extract a fixed dimensional output there is no any condensation and so we hope there is no information loss in this process so comparing with the very popular our pole operation actually we would like to note that the original our poor operation was not designed for segmentation it is designed for object detection so maybe it is too ok because you just need a bonding boss you don't need a pixel to pixel accuracy so and however type or operation because of the pooling it may break the pixel to pixel alignment so this may not be decide for instance segmentation so we hope that our align operation can help to solve this problem and another property of mass si is the usage of AFRICOM illusional head for protecting masks so fully convolutional network are designed naturally in a pixel to pixel alignment fashion so we just simply need to use a few extra convolutional layers on each region of interest then we can predict a very accurate mask so here is one example I show one image here and I also show one region of interest which has been warped into a rectangle so this is the fully convolutional net work output of a resolution of 28 by 28 it is pretty good it is very well aligned because that the property of FCN and then we just need to resize the soft prediction and then of some search how do we have the mass for this object so as I mentioned actually Masterson is a meta algorithm so it can support many implementations so we also hope it can be compatible with many other improvements such as high net remaining or many improvements of the backbone architecture so in this paper we have used resonance or resin acts as the future backbone and we also use the future pyramid network or FPN as the backbone so next I'm going to show some result so here are the object instance segmentation results on the cocoa dataset the first three entries are the winners of the cocoa competition in the last three years so actually without the House of M resource our mass as the entry can be to a P better than the winner of last year and also at the same time our method can run about 200 milliseconds per image on a GPU which is pretty good and also because our method is a matter algorithm you can easily support the improvements of the pebble architectures so for example if we replace the the features from ResNet to resin x we can see another about 1.4 improvements on this task actually as a by-product the mass arson system also improves object detection so here are the results of a burning box detection on cocoa dataset so if we compare the our align operation with their counterparts our I poo fpn baseline our methyl has about one point provement so if we train the mask with the bounding box jointly we can see another about one point improvement thanks to multitask learning and again our method can also improve by using better features so actually in the upcoming cocoa competition this year mass Ossian is used by many of the leading teams to our knowledge so for our team our master our entry has to achieve a single model result of above a percent bond in both AP and about 42% mass AP so more details will be disclosed in the cocoa workshop so next I'm going to show some examples so here is one of the mass Austrian results I believe this is a very challenging image because for example there are objects that is surrounded by other objects of the same category this can be very challenging case for somatic segmentation so here is another example of the mass as a result so we can see that there are disconnected object in this case so it may also present a challenge for those mesto they are based on grouping so here is another example of the mass hysteria results we can see that our metal can successfully detect and sacrament very small object so after seeing all these results people may think computer vision is solved but unfortunately is it not so there are still many failure case so for example the detection can still fail and in case the detection filter segmentation may also fail so there can be missing objects and there can be false masks and also more importantly recognition is not sold so there are very usually recognition areas so for example masterson recognized this is a kite but it is not so much a thing he also be very easily extended to a data such as human keep on detection so the extension is very simple we just need to view a single human human key point as a one Hamas and then we can use our framework for this task so in other words it means our methyl is a single framework that can support bounding box detection master commentation and also keep on detection next I'm going to show some muscles and results on video and this is just done frame by frame there's no temporal smoothing here is another example of frame by frame assassin so in conclusion we have presented masters in which we hope will serve as a matter algorithm for the problem of instant segmentation and other instance level recognition so our code will be open sourced as Facebook our research is detection platform and we hope we will do that after this with your deadline and that's all thank you it is time for one question well I have one question so have you tried this with classes that are not well localized like sky grass yeah I think in the upcoming cocoa workshop we are going to present another somatic segmentation system which is based on a similar pattern of mass Sen and in that case we believe we can just extend the system to be a multitask training system which we hope it can detect both objects and stuff [Applause]

Info

Channel: ComputerVisionFoundation Videos

Views: 81,077

Rating: undefined out of 5

Keywords: ICCV17, Venice, Computer Vision, ICCV, Oral 4, ICCV#16

Id: g7z4mkfRjI4

Channel Id: undefined

Length: 12min 21sec (741 seconds)

Published: Thu Nov 16 2017