The concept of confusion, as relates to cryptography, was defined in Shannon’s 1948 paper. Generally, this concept attempts to make the relationship between the statistical frequencies of the cipher text and the actual key as complex as possible. Put another way, the relationship between the plain text, cipher text, and the key should be complex enough that it is not easy to determine what that relationship is.

If you don’t have enough confusion, then someone might simply examine a copy of plain text and the associated cipher text and determine what the key is. This would allow the person to decipher all other messages that are encrypted with that same key.

Diffusion literally means “having changes to one character in the plain text affect multiple characters in the cipher text.” This is unlike historical algorithms (such as the Caesar cipher, Atbash, and Vigenère), where each plain text character affected only one cipher text character.

Shannon thought the related concepts of confusion and diffusion were both needed to create an effective cipher:

*Two methods (other than recourse to ideal systems) suggest themselves for frustrating a statistical analysis. These we may call the methods of diffusion and confusion. In the method of diffusion the statistical structure of M which leads to its redundancy is “dissipated” into long range statistics—i.e., into statistical structure involving long combinations of letters in the cryptogram. The effect here is that the enemy must intercept a tremendous amount of material to tie down this structure, since the structure is evident only in blocks of very small individual probability. Furthermore, even when he has sufficient material, the analytical work required is much greater since the redundancy has been diffused over a large number of individual statistics.*

These two goals are achieved through a complex series of substitution and permutation.

Suppose you have a simple Caesar cipher in which you shift each letter three to the right. This will provide a small degree of confusion, but no diffusion. Now assume you swap every three letters. This transposition will provide another small degree of confusion. Next, let’s apply a second substitution—this time two letters to the right. The two substitutions, separated by a transposition, provide minimal diffusion. Consider the following example:

Plain text: Atack at dawn

Step 1 (shift 3 right) dwwdfndwgdzq

Step 2 (swap 3 letter blocks) dfndwwdzqdw

Step 3 (shift right 2) fhpfyy fbsfy

Let’s try changing just one letter of plain text (though it will make for a misspelled plain text word). Change attack at dawn to attack an dawn:

Plain text: Atack at dawn

Step 1 (shift 3 right) Dwwdfndqgdzq

Step 2 (swap 3 letter blocks) dfndwwdzqdq

Step 3 (shift right 2) fhpfyy fbsfs

Now compare this cipher text to the one originally produced. You can see that only one letter has changed—the last letter—and instead of sfy you now have sfs. This provides only minimal confusion and still no diffusion! What is missing? Two things: The first is that, at least by modern standards, this simply is not complex enough. It is certainly an improvement on the basic Caesar cipher, but still it is not enough. The second problem is that there is no mechanism to have a change in one character of plain text change multiple characters in the cipher text. In modern ciphers, operations are at the bit level, not the character level. However, this example should give you the general idea of combining substitution and permutation.

**Avalanche**

A small change may create a sizable impact on the output, like an avalanche. This is Horst Fiestel’s variation on Shannon’s concept of diffusion. Fiestel’s ideas are used in many of the block ciphers. Clearly, a high avalanche impact is desirable in any cryptographic algorithm. Ideally, a change in 1 bit in the plain text would affect all the bits of the cipher text. This would be described as complete avalanche, but that has not been achieved in any current algorithm.

**Hamming Distance**

The Hamming distance is the number of characters that are different between two strings. This can be expressed mathematically as follows:

**h(x, y)**

Hamming distance is used to measure the number of substitutions that would be required to turn one string into another. In modern cryptography, we usually deal with binary representations rather than text. In that context, Hamming distance can be defined as the number of 1’s if you exclusive or (XOR) two strings.

The concept of Hamming distance was developed by Richard Hamming, who first described the concept in his paper “Error Detecting and Error Correcting Codes.” The concept is used widely in telecommunications, information theory, and cryptography.

Hamming distance works only when the strings that we compare are of the same length. One application is to compare plain text to cipher text to determine how much has changed. However, if two strings of different lengths are compared, another metric must be used. One such metric is the Levenshtein distance, a measurement of the number of single-character edits required to change one word into another. Edits can include substitutions (as with Hamming distance) but can also include insertions and deletions. The Levenshtein distance was first described by Vladimir Levenshtein in 1965.

**Hamming Weight**

The concept of Hamming weight is closely related to Hamming distance. It is essentially comparing the string to a string of all 0’s. Put more simply, it is how many 1’s are in the binary representation of a message. Some sources call this the population count, or pop count. There are actually many applications for Hamming weight both within cryptography and in other fields. For example, the number of modular multiplications required for some exponent e is computed by log2 e + hamming weight (e).