When it's said that the Softmax is a multivariate generalisation of the "sigmoid" (i.e. logistic) function, the scalar logistic function is interpreted as a 2d function where the arguments () are scaled by (and hence the first is fixed at ), evaluated at .
Since the softmax function is translation invariant,1 this does not affect the output:
The standard logistic function is the special case for a 1-dimensional axis in 2-dimensional space, say the x-axis in the (x, y) plane. One variable is fixed at 0 (say ), so , and the other variable can vary, denote it , so
, the standard logistic function, and
, its complement (meaning they add up to 1).
Hence, if you wish to use PyTorch's scalar sigmoid
as a 2d Softmax function you must manually scale the input () and take the complement:
# Scale values relative to x0
x_batch_scaled = x_batch - x_batch[:,0].unsqueeze(1)
###############################
# The following are equivalent
###############################
# Softmax
torch.softmax(x_batch, dim=1)
# Softmax with all inputs scaled
torch.softmax(x_batch_scaled, dim=1)
# Sigmoid (and complement) with inputs scaled
torch.stack([1 - torch.sigmoid(x_batch_scaled[:,1]),
torch.sigmoid(x_batch_scaled[:,1])], dim=1)
tensor([[0.5987, 0.4013],
[0.4013, 0.5987],
[0.8581, 0.1419],
[0.1419, 0.8581]])
tensor([[0.5987, 0.4013],
[0.4013, 0.5987],
[0.8581, 0.1419],
[0.1419, 0.8581]])
tensor([[0.5987, 0.4013],
[0.4013, 0.5987],
[0.8581, 0.1419],
[0.1419, 0.8581]])
-
More generally, softmax is invariant under translation by the same value in each coordinate: adding to the inputs yields , because it multiplies each exponent by the same factor, (because ), so the ratios do not change: