This is the mathematical expression for Harris corner detection:
But I have the following doubts:
- What is the physical significance of $u$ and $v$? Many references say it is the magnitude by which the window $w$ shifted. So how much is the window shifted? One pixel or two pixels?
- Is the summation over the pixel positions covered by the window?
- Assuming simply $w(x,y) = 1$ , $I(x,y)$ is intensity of the single pixel at $(x,y)$ or the summation of the intensities inside the window with center at $(x,y)$?
- According to wiki they say the image is 2D , denoted by I and then asks to consider an image patch over the area $(x,y)$, then uses the notation $I(x,y)$
I am finding it confusing to grasp the mathematical explanation. Anyone has an idea?
Answer
The meaning of that formula is really quite simple. Imagine you take two same-sized small areas of an image, the blue one and the red one:
The window function equals 0 outside the red rectangle (for simplicity, we can assume the window is simply constant within the red rectangle). So the window function selects which pixels you want to look at and assigns relative weights to each pixel. (Most common is the Gaussian window, because it's rotationally symmetric, efficient to calculate and emphasizes the pixels near the center of the window.) The blue rectangle is shifted by (u,v).
Next you calculate the sum of squared difference between the image parts marked red and blue, i.e. you subtract them pixel by pixel, square the difference and sum up the result (assuming, for simplicity that the window = 1 in the area we're looking at). This gives you one number for every possible (u,v) -> E(u,v).
Let's see what happens if we calculate that for different values of u/v:
First keep v=0:
This should be no surprise: The difference between the image parts is lowest when the offset (u,v) between them is 0. As you increase the distance between the two patches, the sum of squared differences also increases.
Keeping u=0:
The plot looks similar, but the sum of squared differences between the two image parts is a lot smaller when you shift the blue rectangle in the direction of the edge.
A full plot of E(u,v) looks like this:
The plot looks a bit like a "canyon": There's only a small difference if you shift the image in the direction of the canyon. That's because this image patch has a dominant (vertical) orientation.
We can do the same for a different image patch:
Here, the plot of E(u,v) looks different:
No matter which way you shift the patch, it always looks different.
So the shape of the function E(u,v) tells us something about the image patch
- if E(u,v) is near 0 everywhere, there is no texture in the image patch you're looking at
- if E(u,v) is "canyon-shaped", the patch has a dominant orientation (this could be an edge or a texture)
- if E(u,v) is "cone-shaped", the patch has texture, but no dominant orientation. That's the kind of patch a corner-detector is looking for.
Many references say it is the magnitude by which the window 'w' shifted...so how much is the window shifted?one pixel...two pixels?
Normally, you don't calculate E(u,v) at all. You're only interested in the shape of it in the neighborhood of (u,v)=(0,0). So you just want the Taylor expansion of E(u,v) near (0,0), which completely describes the "shape" of it.
Is the summation over the pixel positions covered by the window?
Mathematically speaking, it's more elegant to let the summation range over all pixels. Practically speaking there's no point in summing pixels where the window is 0.
No comments:
Post a Comment