标签:
1.Cross Entropy Error
The mathematics behind cross entropy (CE) error and its relationship to NN training are very complex, but, fortunately, the results are remarkably simple to understand and implement. CE is best explained by example. Suppose you have just three training items with the following computed outputs and target outputs:
InCE, A 4-7-3 NN(4input - 7hindden - 3output) is instantiated and then trained using the back-propagation algorithm in conjunction with cross entropy error. After training’s completed, the NN model correctly predicted the species of 29 of the 30 (0.9667) test items.
Using a winner-takes-all evaluation technique, the NN predicts the first two data items correctly because the positions of the largest computed outputs match the positions of the 1 values in the target outputs, but the NN is incorrect on the third data item. The mean (average) squared error for this data is the sum of the squared errors divided by three. The squared error for the first item is (0.1 - 0)^2 + (0.3 - 0)^2 + (0.6 - 1)^2 = 0.01 + 0.09 + 0.16 = 0.26. Similarly, the squared error for the second item is 0.04 + 0.16 + 0.04 = 0.24, and the squared error for the third item is 0.49 + 0.16 + 0.09 = 0.74. So the mean squared error is (0.26 + 0.24 + 0.74) / 3 = 0.41.
Notice that in some sense the NN predicted the first two items with identical accuracy, because for both those items the computed outputs that correspond to target outputs of 1 are 0.6. But observe the squared error for the first two items are different (0.24 and 0.26), because all three outputs contribute to the sum.
The mean (average) CE error for the three items is the sum of the CE errors divided by three. The fancy way to express CE error with a function is shown in Figure 2.
In words this means, “Add up the product of the log to the base e of each computed output times its corresponding target output, and then take the negative of that sum.” So for the three items above, the CE of the first item is - (ln(0.1)*0 + ln(0.3)*0 + ln(0.6)*1) = - (0 + 0 -0.51) = 0.51. The CE of the second item is - (ln(0.2)*0 + ln(0.6)*1 + ln(0.2)*0) = - (0 -0.51 + 0) = 0.51. The CE of the third item is - (ln(0.3)*1 + ln(0.4)*0 + ln(0.3)*0) = - (-1.2 + 0 + 0) = 1.20. So the mean cross entropy error for the three-item data set is (0.51 + 0.51 + 1.20) / 3 = 0.74.
Notice that when computing mean cross entropy error with neural networks in situations where target outputs consist of a single 1 with the remaining values equal to 0, all the terms in the sum except one (the term with a 1 target) will vanish because of the multiplication by the 0s. Put another way, cross entropy essentially ignores all computed outputs which don’t correspond to a 1 target output. The idea is that when computing error during training, you really don’t care how far off the outputs which are associated with non-1 targets are, you’re only concerned with how close the single computed output that corresponds to the target value of 1 is to that value of 1. So, for the three items above, the CEs for the first two items, which in a sense were predicted with equal accuracy, are both 0.51.
2. caffe里sigmoidCrossEntropyLoss层计算
参考自caffecn
标签:
原文地址:http://blog.csdn.net/iamzhangzhuping/article/details/51336400