I ran Principal Component Analysis (PCA) on the hog features that I calculating (Using Navneet Dalal’s code) and the result was not spectacular. I plotted the result and color coded the types of flowers (bud:red, female:green, unknownSex:blue, hair:cyan, male:magenta). The figure suggests that the buds are easily differentiable from the rest since they are nicely grouped to one side. It also elucidates the lack of hair types (we don’t see much cyan). As could be expected the unknown type is all over the place. And finally we have the male and female somewhere spread out in the middle of the figure. I was expecting a bit more separation between male and female types.
Knowing that the neural network and/or the Support Vector Machine that I’ll use to classify the images could have good behavior despite the PCA results I have some observations that might be relevant as we move forward. These observations can produce some changes in the way we do the annotations.
- All the females are not alike. There are three different types of females, each with a specific shape. We might get better results if we separate these types into sub-types. Note that If we do this we might not have enough data (ATM) to train and test.
- Looking at the cropped images I notice that we get pictures that look out strange. There are some instances that are too dark or too mangled to distinguish them (at a glance). I still don’t know the reason for this. I suspect it’s because they have lost all context (surrounding pixels). They could also be wrongly annotated. The flowers are very small and the camera sometimes does not capture all the features needed.
- We need more data. The hair types only have 50 examples and the buds only have 217. This is a small number compared to the 800 (approx) that the male and female have.
- There is a similarity between the shape of one of the types of female and the males.
This similarity is even more pronounced knowing that the HOG features are calculated using only one color channel. If this becomes a problem we might want to use the color dimension in a form of a hue histogram (or something of the sort). We could also merge the flowers that look-alike into a separate type; This would still allow us to separate one type of female from the merged type.
- There are some annotations that contain more background than flower. These annotations are of flowers that are in some sort of L-shape or are in diagonal position. To reduce the amount of unwanted background we could have a policy of annotating the top of the flowers only. We could also separate the annotation in smaller squares that better capture the whole of the flower.
- Some flowers are elongated and in some pictures are annotated with elongated rectangles and in others with squares (seen from above). For the HOG feature I “normalize” everything to a square. This means that some images get distorted and might lose their differentiating features. To avoid this we might want to do the same as in 5.