The Vigenère Cipher: Unraveling the Enigma of Le Chiffre Indéchiffrable
Delve into the history of cryptography and explore how the Vigenère cipher confounded codebreakers for centuries, paving the way for modern encryption methods.
This comprehensive text draws heavily from the insightful work of Simon Singh, specifically his valuable contribution to Chapter 1 of the book 'Colossus: The secrets of Bletchley Park's code-breaking computers' by B. Jack Copeland. I would like to express my deep respect and gratitude to Mr. Singh for his enlightening work.
THE SUBSTITUTION CIPHER
In Julius Caesar’s Gallic Wars, he describes how he sent a message to the besieged Cicero. The message was encrypted by substituting Greek letters for Roman letters, then delivered in the most dramatic way imaginable. The messenger, unable to reach the camp, hurled a spear with the letter fastened to it with a thong. Although the spear lodged itself in a tower, nobody spotted it for two days. Eventually, it was taken down and delivered to Cicero, who read out the vital news to the entire camp, bringing enormous joy to his troops.
This was the first documented use of a substitution cipher for military purposes. Substitution ciphers, as the name suggests, encrypt messages by replacing the original characters with different characters. This is in contrast to a transposition cipher, in which the characters remain the same, but they are transposed or rearranged to create an anagram.
One of the most famous substitution ciphers is the so-called Caesar cipher, which simply replaces each letter in the message with the letter that is, say, three places further down the alphabet. Cryptographers often think in terms of the plain alphabet, the alphabet used to write the original message, and the cipher alphabet, the letters that are substituted in place of the plain letters, both of which are shown below.
The plaintext is the technical name for the original message, while the ciphertext is the encrypted message. In this chapter, the plaintext is written in lower case and the ciphertext in upper case. Although this example involves a shift of 3, lesser or greater shifts are of course possible.
The Caesar cipher can be generally stated as substituting each plain letter with the letter that is x places later in the alphabet, where x is between 1 and 25. This illustrates one of the basic principles of cryptography, namely the relationship between the algorithm and the key. In the Caesar cipher, the algorithm is the general idea of replacing the original letters with those that lie a fixed number of places further along the alphabet.
The key, x, specifies the distance of the shift. In other words, the key is a flexible component within the algorithm, which needs to be specified in order to determine the exact method of encryption. The key is usually selected by the sender and has to be communicated to the receiver so that the message can be deciphered.
A codebreaker can crack the cipher by identifying the correct key. In this case, the codebreaker can check every key, known as a brute force attack, because there are only 25 keys. A stronger version of the substitution cipher is the general substitution cipher, which allows the cipher alphabet to be any rearrangement of the alphabet.
In this case, there are roughly 400,000,000,000,000,000,000,000,000 possible keys, because this is the number of ways to rearrange the alphabet. Although a large number of keys is not the sole requirement for a secure cipher, it certainly makes the codebreaker’s job harder, because checking every single possible key would be impractical.
Typically, a communications network will use a single algorithm for several months or years, but will employ a variety of keys. For example, a different key can be used each day. This means that if a key is captured by the enemy, then only one day’s communications are immediately jeopardised.
It is assumed that the enemy already knows the algorithm, because it is inevitable that details of the system will have been leaked or stolen. The significance of the key, as opposed to the algorithm, is an enduring principle of cryptography, and it was definitively stated in 1883 by the Dutch linguist Auguste Kerckhoffs von Nieuwenhof in his famous article ‘La Cryptographie Militaire’: the security of a cryptosystem must not depend on keeping secret the crypto algorithm; the security depends only on keeping secret the key. This is Kerckhoffs’ Principle.
CRACKING THE SUBSTITUTION CIPHER
One method for cracking a ciphertext is to guess the true meaning of part of the encrypted message, which is known as a crib. For example, the codebreaker might know that a ciphertext that begins ‘XBKJ . . .’ is a letter, so XBKJ might stand for ‘dear . . .’, which means that the true values for four letters have been established, which in turn might be helpful in deciphering the rest of the message.
Without a crib, and with so many keys, the general substitution cipher seemed impregnable for centuries, but eventually a flaw was revealed. One of the first scholars to exploit the weakness of the substitution cipher was the ninth-century Arab philosopher al-Kindi, who recorded his codebreaking technique, now known as frequency analysis, in ‘A Manuscript on Deciphering Cryptographic Messages’.
Frequency analysis focuses on the fact that the letters in the Arabic, Roman, or any other alphabet have a distinct variation in frequency. In English, for example, e is the most common letter, accounting on average for roughly 13 per cent of all letters in a piece of text. The next most common letters are t, a, and n, whereas letters such as x, q, and z are very rare, as any scrabble player will testify.
Al-Kindi realised that if a letter was substituted for another letter (or symbol), then the new letter would take on the frequency of the original letter. Therefore, by studying the frequency of the letters in the ciphertext, it should be possible to establish their true value. For example, if L is the most common letter in the ciphertext, then L probably represents e.
Each letter has a personality and its overall frequency is just one part of that personality. Other traits include its relationship to other letters (q is always followed by u) or how often it appears at the start of a word. No matter how a letter is disguised during substitution, it will continue to carry its personality and should still be recognisable.
Once news of frequency analysis spread, it was clear that a better form of encryption was required. An ideal cipher would generate a ciphertext that bore no relation to the plaintext, unlike simple substitution which carries the frequencies across to the ciphertext as a hint to help identify the true value of each letter. The perfect ciphertext should appear to be random, because the codebreaker is impotent unless there is the slightest pattern that can be recognised.
THE VIGENÈRE CIPHER
emerged as a solution. There were numerous attempts to improve the simple substitution cipher. The homophonic cipher usually has a cipher alphabet that consists of numbers, with several possible number substitutions for the most common letters and one number for each of the rare letters. This results in a much flatter frequency distribution for the elements of the cipher alphabet, which makes the cipher much more secure. However, the ciphertext still retains recognisable qualities that allow it to be cracked.
For example, if a number is always followed by a small set of other numbers, then the former is probably q and the latter is probably u. An alternative to a cipher is a code. Although the word is used loosely to cover a whole range of encryption techniques, a code is technically a system with only one key. A code book might be a dictionary that lists thousands of words and alongside each one a five-digit number.
A message would be encrypted by looking up each word of the plaintext and replacing it with the corresponding number. This system is relatively strong, because the number of elements in the ciphertext will be smaller and there will be fewer repeated elements, but the lack of keys is a drawback. The same code book will be used for months or years, because creating and distributing a new code book is a major undertaking.
During this lengthy period, it is highly likely that the code book will be deduced or stolen. While some continued to use the old-fashioned, weak ciphers, others used inflexible codes or a combination of the two known as a nomenclator. But a better solution was the Vigenère cipher, named after Blaise de Vigenère, a French diplomat born in 1523, one of the cryptographers who contributed to its development.
The strength of the Vigenère cipher relies on using not one, but 26 distinct cipher alphabets to encrypt a message. The first step in encipherment involves drawing up a so-called Vigenère square, shown below, a plaintext alphabet followed by 26 cipher alphabets, each one shifted by one more letter with respect to the previous one. Hence, row 1 represents a cipher alphabet with a Caesar shift of 1, row 2 represents a cipher alphabet with a Caesar shift of 2, and so on.
The top row of the square, in lower case, represents the plaintext letters, and you could encipher each plaintext letter according to any one of the 26 cipher alphabets. For example, if cipher alphabet number 2 is used, then the letter a is enciphered as C, but if cipher alphabet number 12 is used, then a is enciphered as M. If the sender were to use just one of the cipher alphabets to encipher a message, then this would be a simple Caesar cipher, which would be easy for an enemy interceptor to crack.
However, the Vigenère cipher involves using several rows of the Vigenère square (i.e. different cipher alphabets) to encrypt the letters of the message. In other words, the sender might encrypt the first letter according to row 7, the second according to row 24, the third letter according to row 21, and so on.
In order to communicate, the sender and receiver must agree on a system for switching between rows. This agreement is achieved via a keyword. To demonstrate the Vigenère cipher, let us use the keyword ‘WHITE’ to encrypt the message ‘Divert troops to east’. Before encrypting, the keyword is spelt out above the message, and repeated over and over again so that each letter in the message is associated with a letter from the keyword.
Then, to encrypt the first letter, d, begin by identifying the key letter above it, W, which in turn defines a particular row in the Vigenère square. The row beginning with W, row 22, is the cipher alphabet that will be used to find the substitute letter for the plaintext d. Hence, we identify the column headed by d and see where it intersects the row beginning with W, which turns out to be at the letter Z. Consequently, the letter d in the plaintext is represented by Z in the ciphertext.
To encipher the second letter of the message, i, the process is repeated. The key letter above i is H, so encryption involves a different row in the Vigenère square, namely the H row, the seventh row. To encrypt i, identify the column headed by i and see where it intersects the row beginning with H, which turns out to be at the letter P. Consequently, the letter i in the plaintext is represented by P in the ciphertext.
Each letter of the keyword indicates a particular cipher alphabet within the Vigenère square, and because the keyword contains five letters, the sender encrypts the message by cycling through five rows of the Vigenère square. The Vigenère cipher is strong because it is impregnable to simple frequency analysis. For example, a cryptanalyst applying frequency analysis to a piece of ciphertext would usually begin by identifying the most common letter in the ciphertext, which in this case is Z, and then assume that this represents the most common letter in English, e.
In fact, the letter Z represents three different letters d, r, and s, but not e. This is clearly a problem for the cryptanalyst. Equally confusing is the fact that a letter that appears several times in the plaintext can be represented by different letters in the ciphertext. For example, the oo in ‘troops’ is substituted by two different letters, namely H and S.
The Vigenère cipher is called polyalphabetic, because a single message is encrypted using a variety of cipher alphabets. This is in contrast to the substitution ciphers that have been previously discussed, which are known as monoalphabetic. The Vigenère cipher seemed so strong that it was dubbed le chiffre indéchiffrable. It remained unbroken for centuries.