Why Softmax not used when Cross-entropy-loss is used as loss function during Neural Network training in PyTorch?

3 min readJan 3, 2021

Quick answer:

Cross-Entropy-Loss (CELoss) with Softmax can be converted to a simplified equation. This simplified equation is computationally efficient as compared to calculating CELoss and Softmax separately. PyTorch’s nn.CrossEntropyLoss() uses this simplified equation. Hence we can say “CrossEntropyLoss() in PyTorch internally computes softmax”

Simplified Cross Entropy Loss equation with Softmax equation substituted in it. [For one input (no batch input)]

Detailed answer:

Cross Entropy Loss: MultiClass

‘y’ is the label/ground truth. It can take two values 1 or 0. ‘P’ is the predicted probability which varies between 0 to 1. N is total number of classes (c = 1 to N)

Lets take an example and go through the equations. The above mentioned equation is for calculating loss on one input/image (No batch input). The image belongs to only one class, therefore among all the ‘yc’ labels, only one ‘yc’ is 1 and rest are 0. Therefore, we get the following one term equation:

Here, Pc is the probability output of the neuron whose label is 1. All Pc values are obtained from applying softmax to the last layer of the neural network. Pc = \sigma(x_i) is computed as follows:

The \sigma(x_i) is softmax output value of one neuron. [\sigma(x_i)] = [Pc value of one neuron whose ‘yc’ is 1]. I am repeating and stressing on this because it is important to understand this clearly.

Substituting this equation in the Cross Entropy Loss we get the following equation:

Simplifying above equation:

Finally we got the CELoss equation used in PyTorch which combines CELoss and softmax in one equation and is simple in terms of computation.

I was curious to understand the reason behind not using Softmax when Cross Entropy Loss is used during training in PyTorch, which I did and it led to this post. I hope this was useful post. If this material needs more clarification or needs any changes/improvements, let me know, I am happy to look into it.

References:

Multi-class cross entropy loss and softmax in pytorch

In this topic ,ptrblck said that a F.softmax function at dim=1 should be added before the nn.CrossEntropyLoss(). In the…

discuss.pytorch.org

CrossEntropyLoss - PyTorch 1.7.0 documentation

Join the PyTorch developer community to contribute, learn, and get your questions answered.

pytorch.org

Loss Functions - ML Glossary documentation

Cross-entropy loss, or log loss, measures the performance of a classification model whose output is a probability value…

ml-cheatsheet.readthedocs.io

Understanding Cross Entropy implementation in Pytorch (softmax, log_softmax, nll, cross_entropy)

This notebook breaks down how `cross_entropy` function is implemented in pytorch, and how it is related to softmax…

zhang-yang.medium.com