I have a question concerning object recognition, especially recognizing car-models! I am at the beginning of a work about identifying the same car-model in different images. At the moment I think one of the best algorithm for 3D object recognition is SIFT but after playing around a bit with a demo implementation I have the strange feeling this algorithm has some problems with shiny metal objects like cars, especially if they have different colors.
Does anyone know some work on this area in general some suitable algorithm for the task of finding the same car-model in different images?
Thanks in advance for your help!
Answer
I would have a look at the so called "bag of words" or "visual words" approach. It is increasingly used for image categorization and identification. This algorithm usually starts by detecting robust points, such as SIFT points, in an image. The region around these found points (the 128 bit SIFT descriptor in your case) is used.
In the most simple form, one can collect all data from all descriptors from all images and cluster them, for example using k-means. Every original image then has descriptors that contribute to a number of clusters. The centroids of these clusters, i.e. the visual words, can be used as a new descriptor for the image. Basically you hope that the clusters an image its descriptors contribute to, is indicative of the image category.
Again, in the most simple case, you have a list of clusters, and per image, you count which of these clusters contained descriptors from that image and how many. This is similar to the Term Frequency/ Inverse Document Frequency (TD/IFD) method used in text retrieval. See this quick and dirty Matlab script.
This approach is actively researched and there are many much more advanced algorithms around.
The VLfeat website contains a nice more advanced demo of this approach, classifying the caltech 101 dataset. Also noteworthy, are results and software from Caltech itself.
No comments:
Post a Comment