As the country marks the centenary of Alan Turing’s birth, two of his mathematical research papers, believed to have been written whilst he was at Bletchley Park during World War II, have been released by GCHQ to The National Archives
Because of continuing sensitivity the papers had been retained at GCHQ, but they have now been re-assessed as suitable for release.
Written in an era when typographical errors would be corrected by hand and mathematical notation handwritten, the papers are called “Paper on Statistics of Repetitions’ and ‘The Applications of Probability to Crypt’.
The first, ‘Paper on Statistics of Repetitions’ is an informal report in which Turing works out the best statistical means of testing whether two cipher messages use the same key in parts of the message. This was very important in the exploitation of such messages at Bletchley Park.
The second longer paper, ‘The Applications of Probability to Cryptography’, demonstrates that Turing was determined to apply rigorous probability analysis to a wide range of cryptanalytic problems of the day. A particular highlight is where Turing uses life expectancy to examine conditional probability. The associated example, “Hitler is now age 52”, adds piquancy and suggests that the paper was written between April 1941 and April 1942. Bletchley Park’s output of decrypts was almost certainly enabled by the techniques in this paper.
A spokesperson for GCHQ said: “We are delighted to release these papers showing more of Alan Turing’s pioneering research during his time at Bletchley Park. It was this type of research that helped turn the tide of war and it is particularly pleasing that we are able to share these papers during this centenary year.”Notes for Editors
1. Review of "Statistics of Repetitions" by A.M. Turing
The report begins by stating the task as being able to "obtain reliable estimates of the value of given repeats". It does not say what the purpose of this is, but an example is given of two messages of length 200 and 250 characters respectively, with an overlap of 105 characters. It is evident then that what is being sought is a means of telling whether two machine enciphered messages have different plaintext but the same encipherment key during the overlap (presumably at the end of one message and the beginning of the other) - in other words, an offset depth.
The paper uses polygraphic statistics, with care to avoid double counting of r-graphs within n-graphs, to get a more accurate estimate of the odds of a pattern of repeats occurring causally compared with randomly, using an "urn" model.
2. Review of "The Applications of Probability to Cryptography" by A.M. Turing
This paper first describes the use of "odds" and "log odds" to simplify cryptanalytic probability tests, and then applies this to 4 problems: "Vigenere", "A Letter Substitution Problem", "Theory of Repeats", and "Transposition Ciphers".
The introductory section presents conditional probability arguments in an elementary fashion without using much mathematical notation, and gives a number of examples from everyday life. One of these examples is life expectancy, and the example which uses "Hitler is now of age 52".
Bayesian principles, Bayes' Theorem (or Law) is not mentioned until page 5, where Turing calls it the "Factor Principle".
Nowadays we would write, if X are observations, H is a hypothesis with negation ~H, and P is probability, that the odds in favour of H given X are
----- x -------
The first factor is the prior odds, the second factor is the odds for the observations, and the combination is the posterior odds. Since Turing's time, mathematicians at GCHQ have become very familiar with this concept.
The section closes with an example of log odds expressed as "bans" or "decibans", with that terminology "introduced in honour of the famous town of Banbury".
The "Vigenere" section applies now standard scoring theory to the solution of those ciphers, and provides tables of language counts and log probabilities written in Turing's hand.
The "Letter Substitution Problem" shows how to score the fitting of a crib against the convolution of 3 slides giving a range of 0 to 27. Turing uses generating functions (viz. (1-x^10)^3/(1-x)^3) to calculate the probability of each slide, and uses the non-uniformity to help with scoring the crib placement.
The "theory of repeats" section is very similar to his paper "Statisitics of Repetitions".
The "transposition ciphers" section uses what we would call digraph scores (and he calls "bigramme scores") to score the selection of adjacent columns in the cipher. In the margin next to one of Turing's assertions is the comment "I doubt it S.W." which was probably written by Shaun Wylie, former Chief Mathematician at GCHQ. This shows that in some details even the leading mathematicians could disagree! The section also has a (long) derivation that all the eigenvalues of a particular Markov chain matrix are at most 1, which is now pretty standard.
3. The papers are available to view at http://www.nationalarchives.gov.uk/news/705.htm
, for anyone with a valid National Archives reader's ticket. Visit the website for further information on how to access public records: www.nationalarchives.gov.uk/visit
4. For images, contact firstname.lastname@example.org
or call 0208 392 5225
5. A new exhibition celebrating the life of Alan Turing will open on 21 June 2012 at the Science Museum. For more information see: http://www.sciencemuseum.org.uk/about_us/press_and_media/press_releases/2012/02/Codebreaker_Turing.aspx
6. For more information about the wartime activities at Bletchley Park see: http://www.bletchleypark.org.uk