Shannon believed information entropy was a vital subject in information theory. His landmark paper included an entire section entitled “Choice, Uncertainty and Entropy” that rigorously explored this topic. Here we’ll cover just enough about information entropy for you to move forward in your study of cryptography.

Entropy has a different meaning in information theory than it does in physics. Within the framework of information, entropy is just a method of testing information content. People often encounter two issues in learning this concept. The first issue lies in confusing information entropy with the thermodynamic entropy that you may have encountered in elementary physics courses. In such courses, entropy is often described as “the measure of disorder in a system.” This is usually followed by a discussion of the second law of thermodynamics, which states that “in a closed system, entropy tends to increase.” In other words, without the addition of energy, a close system will become more disordered over time. Before you can understand information entropy, however, you need to firmly understand that information entropy and thermodynamic entropy are not the same thing.

The second problem you might have with understanding information theory is that many references define the concept in different ways, some of which can seem contradictory. We will demonstrate some of the typical ways that information entropy is explained so that you can gain a complete understanding of these seemingly disparate explanations.

In information theory, entropy is the amount of information in a given message. This is simple, easy to understand, and, as you will see, essentially synonymous with other definitions you may encounter. Information entropy is sometimes described as “the number of bits required to communicate information.” So if you wish to communicate a message that contains information, if you represent the information in binary format, entropy is how many bits of information the message contains. It is entirely possible that a message might contain some redundancy, or even data you already have (which, by definition, would not be information); thus the number of bits required to communicate information could be less than the total bits in the message.

This is actually the basis for lossless data compression. Lossless data compression seeks to remove redundancy in a message to compress the message. The initial step is to establish the minimum number of bits required to communicate the information within message—or, put another way, to calculate the information entropy. Many texts describe entropy as “the measure of uncertainty in a message.” You may be wondering, how can both of these definitions be true? Actually, they are both saying the same thing, as I’ll explain.

Let’s examine the definition that is most likely causing you some consternation: entropy as a measure of uncertainty. It might help you to think of it in the following manner: only uncertainty can provide information. For example, if I tell you that right now you are reading this article, this does not provide you with any new information. You already knew that, and there was absolutely no uncertainty regarding that issue. However, the content you are about to read in other articles is uncertain, and you don’t know what you will encounter—at least, not exactly. There is, therefore, some level of uncertainty, and thus information content. Put even more directly, uncertainty is information. If you are already certain about a given fact, no information is communicated to you. New information clearly requires uncertainty that the information you received cleared up. Thus, the measure of uncertainty in a message is the measure of information in a message.

In addition to information entropy are a few closely related concepts that you should understand:

**Joint entropy**

This is the entropy of two variables or two messages. If the two messages are truly independent, their joint entropy is just the sum of the two entropies.**Conditional entropy**

If the two variables are not independent, but rather variable Y depends on variable X, then instead of joint entropy you have conditional entropy.**Differential entropy**

This is a more complex topic and involves extending Shannon’s concept of entropy to cover continuous random variables, rather than discrete random variables.