The approach lists the points that interest me. This is NOT an exhaustive summary of the paper.
- First you calculate the gradients. They tested various ways of doing this and concluded that a simple [-1,0,1] filter was best. After this calculation you will have a direction and a magnitude for each pixel.
- Divide the angle of direction in bins (Notice that you can divide 180 or 360 degrees). This is just a way to gather gradient directions into bins. A bin can be all the angles from 0 to 30 degrees.
- Divide the image in cells. Each pixel in the cell adds to a histogram of orientations based on the angle division in 2. Two really cool things to note here:
- You can avoid aliasing by interpolating votes between neighboring bins
- The magnitude of the gradient controls the way the vote is counted in the histogram
- Note that each cell is a histogram that contains the “amount” of all gradient directions in that cell.
- Create a way to group adjacent cell histograms and call it a block. For each block (group of cells) you will “normalize” it. Papers suggests something like v/sqrt(|v|^2 + e^2). Note that V is the vector representing the adjacent cell histograms of the block. Further not that || is the L-2 norm of the vector.
- Now move through the image in block steps. Each block you create is to be “normalized”. The way you move through the image allows for cells to be in more than one block (Though this is not necessary).
- For each block in the image you will get a “normalized” vector. All these vectors placed one after another is the HOG.
- Awesome idea: The used 1239 pedestrian images. The SVM was trained with the 1239 originals and the left-right reflections. This is so cool on so many levels. Of course!! the pedestrian is still a pedestrian in the reflection image. And this little trick give double the information to the SVM with no additional storage overhead.
- They created negative training images from a data base of images which did not contain any pedestrians. Basically randomly sampled those non-pedestrian images and created the negative training set. They ran the non-pedestrian images on the resulting classifier in look for false-positives and then added these false-positives to the training set.
- A word on scale: To make detection happen they had to move a detection window through the image and run the classifier on each ROI. They did this for various scales of the image. We might not have to be so strict with this as all the flowers are going to be within a small range from the camera. Whereas pedestrians can be very close or very far from the camera. The point is that the pedestrian range is much larger.
- A margin was left in the training images. of 4 pixels.