Took a closer look at things that need to be improved in the annotation process. Spent the day extracting specific examples of what went wrong and how to solve/avoid it. Here are my findings:
Improve the annotations themselves
I perused the annotation database again and found that there were some annotations that were improperly annotated. The one that is in the left figure was annotated as Salix Artica Female, but there is not enough information to do this. It could also be a male flower or a bud that is in a shady spot.
I have some ideas at to why this mis-classification occurs: The main one is that the annotation process is tedious and boring. One gets tiered quickly which can lead to mistakes. Another is that the reviewing biologists have more information than what the annotation can show; they can for example say that this is a female based on the leaves that sit beside it, which are not contained in the final annotation crop.
An immediate solution for this is to go through all the annotations and mark the ones that don’t contain enough features as unknown. The best way of doing this is looking at the cropped images instead of the annotations in the plot pictures. For future annotations -to increase reliability we could have a third-party double checking the cropped images.
Avoid Resizing inequalities
The nature of our data (pictures of small elongated flowers) generates different shapes of the same thing: If the flower is vertical and the picture is taken from atop, the elongated flower will appear as a circle in the picture; on the other hand, if the flower is horizontal and taken from the same angle, it will appear as an elongated rectangle.
The elongated rectangle becomes a problem when we are calculating the HOG features because we have to normalize it into a common shape (which I chose to be square). when I resize the original cropped annotation it modifies the pixel content (see figure). The resize algorithm adds pixels based on surrounding information or it can also collapse surrounding pixels into one. This is actually ok if the ratio of the image is maintained, but when we stretch one of the axis and leave the other unmodified (like in the figure), the calculated HOG features of the resized image will contain gradient relations that are surely not in the original image.
For the pedestrian detector, this was not a real issue as all the pedestrian pictures where of people standing upright. In other words, the elongated rectangle that they used characterized nicely their data. They did analyse images at different scale levels, but they didn’t change the ratio nor the rotation of the training window (which was 64×128 pixels)
Note that the solution is not to change the training window shape from a square to a rectangle. This would cause the same problem when going from the square “atop” annotations to the elongated rectangle ones. One possible solution is to annotate the elongated images with only squares. This would mean either to cover the elongated flower with several squares or to just annotate the top part of the flower. If we were to implement the first solution we could have different types for different parts of the plant. We can have for example a Salix_Female_midsection and Salix_Femal_top labels. The second solution is also plausible assuming that the top of the elongated flowers is similar to the images from atop.
Male / Female similarities
There is one type of Salix Artica Female that looks similar to the Salix Artica Male. Even for the human expert it is sometimes hard to distinguish between them. In the image we see a Male Salix Artica on the left and a Female on the right. The difference between them is subtle at this resolution. We see that the Female has more darkness in the middle while the make has longer hair. It is difficult to distinguish between them because they have practically the same silhouette. This could or could not be a problem, it all depends on the technique used to classify them (SVM or neural networks). For now I would just like to document the possible pitfall and wait.
If this becomes an issue we could solve it by creating a type of flower that encompasses Salix Artica Males and this type of Salix Artica Females (not all Females look like this). That would mean that the detection would be a bit strange, but its a way out. Not sure what else we could do here.
[1] http://www.navneetdalal.com/files/CVPR2005_HOG.pdf?attredirects=0