Claude Shannon’s Information Theory Explained


Claude Shannon first proposed the information theory in 1948. The goal was to find the fundamental limits of communication operations and signal processing through an operation like data compression. It is a theory that has been extrapolated into thermal physics, quantum computing, linguistics, and even plagiarism detection.

We wouldn’t have the internet today like we do if we didn’t have Claude Shannon’s information theory. This theory turned information into a mathematical unit that could be measured, manipulated and then transmitted from one individual or machine to another.

Shannon worked at Bell Labs throughout much of his career as an electronics engineer and a mathematician. During World War II, he worked on the processes that would help to make it easier to send messages securely and efficiently over great distances. Using coding and principles of equation, his work would become the foundation of one of the most important theories that we use today.

What Is Information and Why is it Important to Recognize?

Information theory is based on statistics and probabilities. It measures the distributions that are associated with random variables so that we can recognize a specific result. Just as our brain sees a tree and recognizes it to provide you with information, a computer can do the same thing using a specific series of codes.

Everything in our world today provides us with information of some sort. If you flip a coin, then you have two possible equal outcomes every time. This provides less information than rolling dice, which would provide six possible equal outcomes every time, but it is still information nonetheless.

Before the information theory was introduced, people communicated through the use of analog signals. This mean pulses would be sent along a transmission route, which could then be measured at the other end. These pulses would then be interpreted into words. This information would degrade over long distances because the signal would weaken.

Shannon’s information theory changes the entropy of information. It defines the smallest units of information that cannot be divided any further. These units are called “bits,” which stand for “binary digits.” Strings of bits can be used to encode any message. Digital coding is based around bits and has just two values: 0 or 1.

This simplicity improves the quality of communication that occurs because it improves the viability of the information that communication contains.

Why Is Communication Better with Digital Coding?

Imagine you want to communicate a specific message to someone. Which way would be faster? Writing them a letter and sending it through the mail? Sending that person an email? Or sending that person a text?

The answer depends on the type of information that is being communicated. Writing a letter communicates more than just the written word. It’s a communication of personal effort. Writing an email can offer faster speeds than a letter that contains the same words, but it lacks the personal touch of a letter, so the information has less importance to the recipient. A simple text is more like a quick statement, question, or request.

You’d send a text to someone to ask them to pick up a pizza for dinner on the way home. You wouldn’t write them a letter and drop it in the mail to ask them for this.

These differences in communication style is what has made communication better through digital coding. Instead of trying to figure out all of the variables in a communication effort like Morse Code, the 0s and 1s of digital coding allow for long strings of digits to be sent without the same levels of informational entropy.

A 0, for example, can be represented by a specific low-voltage signal. A 1 could then be represented by a high voltage signal. Because there are just two digits and each has a very specific state that can be recognized, even after the signal has experienced extensive entropy, it becomes possible to reconstruct the information with greater accuracy.

How Claude Shannon’s Information Theory Works

When we think about information, we’re taking the approach that we’re trying to add to our knowledge. Through information theory, what we’re actually doing is reducing our uncertainty, one bit at a time.

Let’s go back to the concept of a coin toss. When you flip the coin in the air, you know that there is an equal 50% chance that the coin will land on heads or it will land on tails. Using the information theory, a base 2 is used for the mathematical logarithms so that we can obtain total informational content. In the instance of a coin flip, the value received is one bit. We’ve reduced our uncertainty of the equation because it has been completed.

The same would be true when dice are rolled. A six-sided die has a 1/6 equal chance of providing a specific result. Once that result is obtained, we’ve reduced the uncertainty of the outcome because we’ve received a number between 1-6. This is another bit of information.

You could expand this to a twenty-sided die as well. Each number has a 1/20 equal chance of being received. Once the dice stop rolling, you’ve receive the information needed because the uncertainty of the information has been removed.

This principle can then be used to communicate letters, numbers, and other informational concepts that we recognize. Take the alphabet, for example. It’s like a 20-sided die, but there is a 1/26 opportunity instead of a 1/20 opportunity. In reducing the uncertainty of the equation, multiple bits of information are generated. This is because each character being transmitted either is or is not a specific letter of that alphabet.

When you add in a space, which is required for communication in words, the English alphabet creates 27 total characters. This results in 4.76 bits of information. Thanks to the mathematics of the information theory, we can know with certainty that any transmission or storage of information in digital code requires a multiplication of 4.76 in order for it to be successful.

How We Continue to Reduce Uncertainty in the Information Theory

Now that we’ve covered the basics of information theory and how it works, it is time to address the probabilities that also affect how we consume information today. Probabilities help us to further reduce the uncertainty that exists when evaluating the equations of information that we receive every day.

Let’s go back to the English language to discuss probabilities and how they can directly impact the quality of information that is being received.

  • English words that contain the letter “q” are almost always followed by the letter “u.”
  • We know that the five common vowels (a, e, i, o, and u) are more common in most words than the letters “x” or “z.”
  • There are words that can be shortened into contractions, such as “we have” becoming “we’ve,” which shortens the amount of information that must be transmitted.

If we add an apostrophe to our alphabet, that changes the equation to 1/28 and alters the number of bits that are required. It also means we can transmit less data, further reducing our uncertainty we face in solving the equation.

Once all of these variables are taken into account, we can reduce the uncertainty which exists when attempting to solve informational equations. With enough of these probabilities in place, it becomes possible to reduce the 4.76 bits required for English language transmission to less than one bit. That means less time is needed to transmit the information, less storage space is required to keep it, and this speeds up the process of communicating data to one another.

Why Information Theory Continues to Be Important Today

Claude Shannon created the information theory in order to find a more practical way to create better and more efficient codes for communication. This has allowed us to find the limits of how fast data can be processed. Through digital signals, we have discovered that not only can this information be processed extremely quickly, but it can be routed globally with great consistency.

It can even be translated, allowing one form of information to turn into another form of information digitally. Think of it like using Google Translate to figure out how to say something in Spanish, but you only know the English language. The information you receive occurs because bits of information were used to reduce the uncertainty of your request so that you could receive a desired outcome.

Every bit of digital information that we use today is the result of codes that have been created, examined, and improved by Claude Shannon’s information theory. It is why computers are now portable instead of confined to one very large room. It is why we have increased data storage capabilities and the opportunity to compress that data to store more of it. Our information files, such as MP3s, MP4s, and even a standard JPG would not exist without it.

Information helps us to make decisions. It also helps us communicate because it can be turned into a mathematical equation.