标签:
In logistic regression we learn a family of functions from
to the interval
. However, logistic regression is used for classification tasks: We can interpret
as the probability that the label of
is
. The hypothesis class associated with logistic regression is the composition of a sigmoid function
over the class of linear functions
. In particular, the sigmoid function used in logistic regression is the logistic function, defined as
(6)
The hypothesis class is therefore (where for simplicity we are using homogenous linear functions):
(7)
Note that when is very large then
is close to
, whereas if
is very small then
is close to
. Recall that the prediction of the halfspace corresponding to a vector
is
. Therefore, the predictions of the halfspace hypothesis and the logistic hypothesis are very similar whenever
is large. However, when
is close to
we have that
. Intuitively, the logistic hypothesis is not sure about the value of the label so it guesses that the label is
with probability slightly larger than
. In contrast, the halfspace hypothesis always outputs a deterministic prediction of either
or
, even if
is very close to
.
Next, we need to specify a loss function. That is, we should define how bad it is to predict some given that the true label is
. Clearly, we would like that
would be large if
and that
(i.e., the probability of predicting
) would be large if
. Note that
(8)
Therefore, any reasonable loss function would increase monotonically with , or equivalently, would increase monotonically with
. The logistic loss function used in logistic regression penalizes
based on the log of
(recall that log is a monotonic function). That is,
(9)
Therefore, given a training set , the ERM problem associated with logistic regression is
(10)
The advantage of the logistic loss function is that it is a convex function with respect to ; hence the ERM problem can be solved efficiently using standard methods. We will study how to learn with convex functions, and in particular specify a simple algorithm for minimizing convex functions, in later chapters.
标签:
原文地址:http://www.cnblogs.com/JenifferWu/p/5813453.html