Squared Earth Mover's Distance LossFor Training Deep Neural Networks on Ordered-Classes
Le Hou1
, Chen-Ping Yu2
, Dimitris Samaras1
1 Stony Brook University
2 Phiar Technologies, Inc
Abstract
In the context of multi-class single-label classification, the loss function of deep learning methods compares the predicted class distribution versus the ground truth class distribution. The commonly used cross-entropy loss ignores the intricate inter-class relationships that often exist in real-life tasks such as age classification. We propose to leverage these relationships between classes by training deep nets with the exact squared Earth Mover's Distance (also known as Wasserstein distance), assuming that the classes are ordered: one can put all classes in a one-dimensional space such that the dissimilarities between classes are represented by the euclidean distances between them. The EMD2 loss uses the predicted probabilities of all classes and penalizes the miss-predictions according to the dissimilarities between classes. Our exact EMD2 loss yields state-of-the-art results with limited computational overhead on age estimation and image aesthetics datasets.