Speakeasy: The Alphabet of Coding

New Scrabble words and the art of frequency analysis.

Collins Dictionary, Scrabble, Oxford Dictionary, new words allowed in scrabble, how to play scrabble, Twitter, JRR Tolkien, words in scrabble, what is scrabble, boardgames, indoor games — 2,800 new words are now allowed in the game, including ‘ew’ and ‘ok’. (Photo: Getty Images/Thinkstock)

Scrabble has taken a giant step forward with the Collins Dictionary, which maintains the international word list for the board game, allowing the use of 2,800 new words, including “ew” and “ok”. It’s not exactly revolutionary, since the list is roughly the same as the one okayed last year by Merriam-Webster, which North American players follow. And, anyway, we’re talking about English, which is only one language out of the 29 in which the game is played, including l33t (“elite”, in which you show off), math (in which you write equations in place of words) and Esperanto. No one cares what’s going on in the dictionaries for Maori and Pinyin. Like the rest of reality, the Scrabble board is an unfair place.

Of the new additions, one would have expected “ew” to be controversial, since it may be spelled “eww” or “ewwww” according to taste and emphasis, but “ok” became the focus of stormy debates. Scrabble forbids acronyms which are generally used in upper case, like OK, but it was finally agreed that ok has become commonplace enough to seek admission in the list. Do such debates matter at all, in the age of Twitter contractions like “dis” and “dat”, and l33t-speak, used by pimply youths trying to sound like hackers?

But OK has serious implications for scoring, and, therefore, for game strategy. K, being one of the rarest alphabets, scores six points, and is surpassed only by J (eight points), Q and Z (10 points each). The distribution of Scrabble tiles was set by the inventor of the game, Alfred Mosher Butts, by frequency analysis of the occurrence of letters in words of varying length in the New York Times, the New York Herald Tribune, the Saturday Evening Post, and dictionaries. The most commonly used letters were assigned the lowest scores, and vice versa.

Over 80 years ago, Butts did his frequency analysis laboriously, by hand, on tabulated paper. Just as Abu al-Kindi, the father of Arab philosophy, had done it for another strategic objective — to crack substitution ciphers in 9th-century Baghdad. He had sought out instances of the high-frequency Arabic prefix “al-” (as in al-Kindi, or al-Hind), and used it as a starting point to figure out how alphabets had been substituted in a ciphertext. That was at the dawn of cryptography, when it was sufficient to shift letters by a certain number of places in the alphabet to mask a message. For instance, my given name, with a shift of three, reads Sudwln. Sounds like JRR Tolkien made it up. The cipher was good enough for Julius Caesar, to whom it was attributed, but the arrival of better math, and computers, made brute force cracking ridiculously easy.

Machine-based frequency analysis was instrumental in ending World War II, in which crucial tank battles were fought in North Africa between the Afrika Korps of Erwin Rommel, the Desert Fox, and the Desert Rats of Bernard Montgomery. The rats were given an intelligence edge over the Fox by cryptographers led by the mathematician Alan Turing at Bletchley Park in the UK, most of whom were unsung women. They used frequency analysis to crack the code of the Enigma cryptography machines which the German forces used for secret communications. The frequency of the letter “e” in German was known. Circumstantial evidence of the frequency was also used — it could be assumed that early morning naval communications would refer to the weather and wind (“Wetter” and “Wind”, or “Luft”). Place names like El Alamein and Tobruk would have helped, too. This process had to be done every day, since Enigma machines used a three-digit master code to open a new code page for every day.

In our times, such methods for cracking communications are somewhat redundant, since encryption now depends on abstract mathematics and is based on prime numbers. Strategies for “strong” public-key encryption like Pretty Good Privacy (PGP) and its open source variant, GnuPG, are used by individuals. The Advanced Encryption Standard (AES) is considered good enough for coding classified information. On the other hand, the Wired Equivalent Privacy (WEP) of wireless routers is not totally secure. Neither are MD5 and SHA-1 hashes, commonly used to verify the authenticity of downloaded data.

But cryptography has always been a race between techniques to encode and strategies to crack encryption keys. Present encryption methods might be rendered redundant by the advent of quantum computing, and will one day put massive cracking power into the hands of governments, corporations, groups and (someday) even individuals. No one knows how data could be protected then, but intelligence agencies are looking at a future opportunity. They are stashing away terabytes of intercepted data in the hope that what is garbage now will be distilled into intelligible messages by quantum computers.

Pratik Kanjilal lectures a surprisingly tolerant public on far too many issues.

Speakeasy: The Alphabet of Coding

New Scrabble words and the art of frequency analysis.

Oxford English dictionary has a new Indian entrant— ‘chuddies’

‘Aadhaar’ is Oxford dictionary’s Hindi word of 2017

Queen’s English? Chup!