Entropy has a different meaning in information theory than it does in physics. Within the framework of information, entropy is just a method of testing information content. People often encounter two issues in learning this concept. The first issue lies in confusing information entropy with the thermodynamic entropy that you may have encountered in elementary physics courses. In such courses, entropy is often described as “the measure of disorder in a system.” This is usually followed by a discussion of the second law of thermodynamics, which states that “in a closed system, entropy tends to increase.” In other words, without the addition of energy, a close system will become more disordered over time. Before you can understand information entropy, however, you need to firmly understand that information entropy and thermodynamic entropy are not the same thing.

The second problem you might have with understanding information theory is that many references define the concept in different ways, some of which can seem contradictory. We will demonstrate some of the typical ways that information entropy is explained so that you can gain a complete understanding of these seemingly disparate explanations.

In information theory, entropy is the amount of information in a given message. This is simple, easy to understand, and, as you will see, essentially synonymous with other definitions you may encounter. Information entropy is sometimes described as “the number of bits required to communicate information.” So if you wish to communicate a message that contains information, if you represent the information in binary format, entropy is how many bits of information the message contains. It is entirely possible that a message might contain some redundancy, or even data you already have (which, by definition, would not be information); thus the number of bits required to communicate information could be less than the total bits in the message.

This is actually the basis for lossless data compression. Lossless data compression seeks to remove redundancy in a message to compress the message. The initial step is to establish the minimum number of bits required to communicate the information within message—or, put another way, to calculate the information entropy. Many texts describe entropy as “the measure of uncertainty in a message.” You may be wondering, how can both of these definitions be true? Actually, they are both saying the same thing, as I’ll explain.

Let’s examine the definition that is most likely causing you some consternation: entropy as a measure of uncertainty. It might help you to think of it in the following manner: only uncertainty can provide information. For example, if I tell you that right now you are reading this article, this does not provide you with any new information. You already knew that, and there was absolutely no uncertainty regarding that issue. However, the content you are about to read in other articles is uncertain, and you don’t know what you will encounter—at least, not exactly. There is, therefore, some level of uncertainty, and thus information content. Put even more directly, uncertainty is information. If you are already certain about a given fact, no information is communicated to you. New information clearly requires uncertainty that the information you received cleared up. Thus, the measure of uncertainty in a message is the measure of information in a message.

In addition to information entropy are a few closely related concepts that you should understand:

**Joint entropy**

This is the entropy of two variables or two messages. If the two messages are truly independent, their joint entropy is just the sum of the two entropies.**Conditional entropy**

If the two variables are not independent, but rather variable Y depends on variable X, then instead of joint entropy you have conditional entropy.**Differential entropy**

This is a more complex topic and involves extending Shannon’s concept of entropy to cover continuous random variables, rather than discrete random variables.

]]>

Cryptography is the art of protecting data by changing it (i.e. encrypting it) into an unreadable format named cyphertext. Only the ones with the secret key can decrypt the information into meaningful text. Encrypted communications can often be broken through modern cryptography methods that are practically unbreakable. As the Internet along with other forms of electronic communication become more common, digital security is becoming increasingly important. Cryptography is employed to safeguard email messages, credit card information, and corporate data. There is more to cryptography than just encrypting data, though. You’ll find three principal security themes that are covered by cryptography and various cryptography primitives that help you satisfy each concept. These themes are: Confidentiality, Integrity and Non-repudiation.

**Confidentiality**

Confidentiality is what you typically associate with cryptography. This is where you take a message or some other data and encrypt it to make the original data completely unreadable. There are lots of different cryptography algorithms that you can use, including RSA and Advanced Encryption Standard (AES), and a couple of primitives (DES and Triple DES) that aren’t recommended to be used in new code but you may have to use them if you are writing code that handles older legacy systems.

**Integrity**

In information security, data integrity means maintaining and sustaining the accuracy and consistency of information over its lifetime cycle. This means that information can`t be altered within a hidden or unauthorized way. Integrity is violated whenever a message is actively modified in transit. Systems typically provide data integrity in addition to data confidentiality. There are different cryptography primitives that you can use to help enforce data integrity including hashing algorithms such as MD5, Secure Hash Algorithm (SHA)-1, SHA-256, and SHA-512 which include hash message authentication codes (HMACs) that also use MD5, SHA-1, SHA-256, and SHA-512. Thing worth remembering is that it is not a good idea to use hashes to store passwords.

**Non-repudiation **

Non-repudiation is the guarantee that someone cannot deny something. Usually, non-repudiation refers to the ability to ensure that a party to a contract or a communication cannot deny the authenticity of their signature on a document or the sending of a message that originated with them. For several years, authorities have wanted to make repudiation impossible in certain circumstances. You may deliver email that is registered, for instance, therefore recipient can`t deny that a mail was delivered. Similarly, a legal document typically requires witnesses to its signing so that the person who signs it cannot deny having done so. On the Internet, a digital signature is used not only to guarantee that a message or document has been electronically signed by the person that claim to have signed the document but also, since a digital signature can only be created by one person, to ensure that a person cannot later deny that they provided the signature.

.NET comes with a rich collection of cryptography objects that can help you provide better security in your applications. The cryptography objects in .NET all live within the System.Security.Cryptography namespace.

]]>