Abstract of Paper to be Presented at Accio 2005

How statistics and computer-based visualisations contribute to our understanding of Harry Potter

Dr Tim Regan

In this paper I will show how statistics and computer based visualizations may be used to aid our understanding of the Harry Potter novels.

These can help us with two aspects of our exploration of the Harry Potter series:

  1. Providing evidence for or against existing theories, and
  2. Predicting events and character developments of the final two books.

As good novels, and more importantly as an unfinished series of good novels, the Harry Potter series give ample scope for theorising about the true motives of the characters and the fates that J.K. Rowling has in store for them. The best of this theorising and speculation is based on repeated and detailed readings of the novels and the surrounding pronouncements of JKR and other readers. But careful reading is not the only way to glean and analyse the content locked in the text of a book.

Alongside statistics, computers can be used to provide abstract interactive visualizations of text. While a human reader is very good at gleaning the rhythm or poetic structure of a text, and the various plots and sub-plots, there are other structures or patterns present in written works of fiction, like the distribution of certain word clusters, which we may overlook. A computerised rendering of the text may help us to see these more abstract patterns.

Numerical or statistical analysis of text has a long history, starting with the use of checksums by early religious scribes, and progressing through the analysis of authorship; for example, from looking at Shakespeare's plays, up to current work using forensic stylistics to attribute leaked governmental secrets. Recently, information retrieval has been revolutionised by computing so that huge corpuses of texts can be reliably indexed, searched, and even to a certain extent summarised. These statistical methods include measures of style such as word counting, recording the length of sentences, or counting new words in each sentence.

So how can we apply these techniques to J.K. Rowling's Harry Potter novels? We can compare the frequencies and distributions of uncommon words around each character and event. This provides insight into the conscious and unconscious J.K. Rowling as she writes. For example, though the increased occurrence of the noun "beetle" through Harry Potter and the Goblet of Fire is missed on first reading, until Hermione uncovers Rita Skeeter's disguise, a statistical analysis would not be diverted by such obfuscation. Though providing evidence for or against existing theories is valuable, predicting the outcome of the series would be far more exciting. We know that J.K. Rowling has plotted all seven books, and that some events in the existing five books are clues while others are red-herrings. Can we use statistics and visualizations to tease these apart? I hope that my talk will show people how we can.