The History Manifesto and Big Data

Book cover of The History Manifesto

The History Manifesto by Jo Guldi and David Armitage, (Cambridge University Press, 2014).

In my last post I reviewed the provocative book, The History Manifesto. Written by history academics Jo Guldi (Brown University) and David Armitage (Harvard), it is a call to historians to turn their work towards investigating long periods of history (the longue-durée) in order to address the big issues affecting humanity such as inequality and climate change. I set aside one chapter in that review for special attention. In this post I consider chapter four, ‘Big questions, big data’.

There are many ways that technology can be used by the historian The ‘Big Data’ chapter in The History Manifesto discusses the use of topic modelling tools to highlight the type of language most often used and the topics most widely discussed in the past. Guldi and Armitage also recognise the potential for digital tools to uncover the ‘invisible archives’ which include records that some person or institution in the past wanted to remain hidden. They give the example of The Declassification Engine, developed by a history professor and a professor in the field of statistics. This website explores the world of classified, redacted and declassified government documents and is a good demonstration of the potential of the use of technology in history.

“Digitally structured reading means giving more time to counterfactuals and suppressed voices, realigning the archive to the intentions of history from below”, they observe (p. 93).  This type of historical research has the potential to reveal serious injustices and even lead to steps being taken to rectify a historic wrong. It is exciting to see the potential of digital research techniques to reveal invisible or hidden archives. However, the authors do not draw attention to the fact that most of the world’s archives are not digitised. Historians always need to be mindful of this.

I’m researching the beliefs of Australian soldiers as expressed in their diaries during World War I. In his book, The Broken Years, Bill Gammage has already noted that Australian soldiers didn’t discuss their beliefs much in their diaries. Information technology has assisted me enormously to find the scant comments and their context. Digital tools are fundamental to my research methods but close reading of the work of other historians and primary sources is an indispensable first step in identifying the research questions and issues that the digital tools can then help me explore. I still have to spend hours reading old handwriting as most of the primary sources are still not in machine readable format.

Digital tools can also lead to poor historical conclusions when used by researchers who do not understand how they were constructed and how they work. Guldi and Armitage rightly point to a problem with tools such as the Google Ngram Viewer (p. 90). The user can enter a particular word of interest and get a graph that demonstrates those periods when it was most often used in English language books. But… as vast as the Google English language digitisation program has been, the corpus of books that the Ngram Viewer searches is by no means all the books ever published in English. To understand the results produced by the Ngram Viewer we would need to understand the list of books that have been searched, their significance to readers and those that are missing from the collection. As a researcher I don’t find the Google Ngram Viewer useful. I prefer to work with collections I have built myself, as do other historians. The History Manifesto introduces the reader to Paper Machines, a tool that Guldi helped to create which enables historians to explore their collections.

Many disciplines are historical disciplines at their core. Economics, geology, climate science are just a few. Guldi and Armitage recognise the need for historians to breach the false dichotomy between the world of humanities and words, and science and numbers.

As historians begin to look at longer and longer time-scales, quantitative data collected by governments over the centuries begin to offer important metrics for showing how the experiences of community and opportunity can change from one generation to the next.

p. 96

As much as I am comfortable with the notion of using numbers and technology as part of my toolset I feel that this puts me on the fringe in the history world. Interest in exploring this potential is certainly present amongst historians who are part of the digital humanities field but such enthusiasm is hard to find at general history conferences and in many history journals. From their positions the authors find reason to be optimistic that this will change.

There is so much to agree with Guldi and Armitage in their discussion of the enormous possibilities of big data for historians, but as elsewhere in The History Manifesto their enthusiasm causes them to over-reach:

The arbitration of data is a role in which the History departments of major research universities will almost certainly take a lead; it requires talents and training which no other discipline possesses.

p. 107

Economists, accountants, statisticians, biologists, meteorologists all have sound training in the use of historical quantitative data pertinent to their field. Computer scientists have developed the whole field of big data. How could historians possibly trump this training when they receive no training in statistics or technology? Guldi and Armitage say that historians should be leading historical big data research because of their sceptism and their ability to detect bias, but environmental scientists have demonstrated this quality for years, as have some economists like Keynes and Picketty. Historians do have something important to contribute, especially interpreting non-numerical data, but I see that contribution would be as part of collaborative multi-disciplinary teams, not directing them.

Guldi and Armitage are correct in observing the institutional bias inherent in some scientific research, but this is a problem all disciplines have. At last year’s Australian Historical Association conference Professor Shurlee Swain commented that the historians who wrote commissioned histories for schools, churches and charitable organisations often wrote solely from the perspective of the institution. They did not write about the perspective of the members of those communities who may not have had such rosy memories of the leaders of these organisations. There is a fundamental ethical difficulty when the institution that a researcher is investigating is also paying for that research. Professional historians are not paid a salary by a university like academic historians such as Guldi and Armitage. What can be done to assist them to address this ethical problem when it occurs? In our current market-based economies, the institution has the ability to shop around until they find a researcher who will write in the manner desired by the institution.

Guldi and Armitage dream of the day when “[h]istorians may become tool-builders and tool-reviewers as well as tool-consumers and tool-teachers”, of consulting with “Silicon Valley startups and providing “data analysis for legislative committees” (p. 114). Yet for this dream to become reality historians need to be trained in the use of technology.

The problem of training historians in the use of technology and numbers is a significant one that is not mentioned in The History Manifesto. The myth of the ‘digital native’ prevails in too many educational institutions and today’s students from Arts faculties can graduate with poor skills in technology. They may be adept at organising a party using Facebook, of sharing LOL cats on Twitter, but my observation in Australia is that there many history departments fail to train undergraduate history students in writing for the public let alone writing blogs and using social media effectively as a professional tool.

Increasingly historians have to develop skills using technology if they are to produce insightful history and grasp opportunities in research that were previously out of reach. Over the last few decades electronic record-keeping has gradually become the primary source of record-keeping for many western governments and organisations. The implications for historians are profound. The United Kingdom’s Foreign and Commonwealth Office first introduced electronic record keeping in 1992 and from the year 2000 it became the primary medium to store the department’s records (p. 9, ‘FCO Records: Policy and Practice‘, 2013). The Head of the FCO’s Knowledge and Information Management Team, Carryl Allardice, has suggested historians prepare for the future. “At the heart of the challenge”, she observed, “I would imagine the Historians and Researchers in the future will need excellent digital search skills to be able to do the kind of data mining required to extract information in a coherent way…” (p. 25, ‘FCO Records: Policy and Practice‘, 2013).

Clearly historians in training need to develop good skills with technology in order to research the history of the 1990s and the twenty-first century. However, I argue that even for research of pre-digital era history researchers need to improve their technical skills if they wish to produce ground-breaking history. Even though most historical resources are not digitised and available in machine-readable form, there are many productivity gains and research insights to be made by good use of technology to organise research work and search for sources.

Are there any history departments at Australian universities advising their undergraduates that they should enrol in a computer science unit in order to develop skills they need in their future career whatever that may be? I don’t know of any Australian history department that teaches undergraduate students html, how to use spreadsheets, databases and to do sophisticated searches of the internet. I would like to be wrong on this point, but my observation is that the few history students who seek training in information technology can be deterred by the kind of bureaucratic barriers that my daughter faced when trying to enrol in a first year computer science unit as an Arts student. I have written about her experience on my digital humanities blog, Stumbling Through the Future.

Information technology skills need to be taught. Very few people learn by osmosis. The fact that so many people born after Bill Gates in 1955 are not adept at using information technology should dispel the myth of the ‘digital native’. We are failing our younger generations by ignoring their need for proper training in the use of information technology.  We also need to provide training opportunities for older historians and other professionals who have missed the technology revolution and are struggling to do more basic tasks. Use of technology is a literacy and some are talking about it being a human right.  No-one is too old to learn how to use technology and increasingly historians will be professionally marginalised if they do not have these skills.

Historians are already entering the field of digital humanities and starting to do sophisticated computational research. I don’t advocate that all historians use big data research techniques, but I agree with Guldi and Armitage that there is much potential for applying big data research techniques to produce longue-durée history. However, increasingly it is essential that all historians receive some form of training in more basic use of technology.

Until the discipline starts encouraging the systematic training of all historians in the necessary skills Guldi’s and Armitage’s aspirations for historians to become leaders in the area of big data will remain a dream.

Further Reading

These have been a lot of reviews and comments on social media from historians about The History Manifesto:

  • Richard Blakemore, ‘Some Thoughts on the History Manifesto‘: This is a succinct review of the book that caught my eye. I also like the whimsy of Blakemore’s online name – History Womble.
  • Tim Hitchcock: ‘Big Data, Small Data and Meaning‘: Hitchcock is a Professor of Digital History.While this post is not a review of The History Manifesto Hitchcock does discuss Guldi and Armitage’s arguments about big data and history. He discusses the macroscope and the importance of small data as well as big data.
  • David Armitage has maintained a list of reviews of The History Manifesto and related pieces.

I’ve added the following some time after this post was published:

  • Ian Milligan, ‘AHA Talk: The Promise of Web ARChive Files‘: This is a paper given at the 2015 American Historical Association conference. Milligan argues that historians need to learn technical skills such as programming now. He demonstrates the richness of online sources from the 1990s and early twenty-first century that the historian can access if they have good technical skills and address other issues hampering the discipline in this area.

In the third post in this series about The History Manifesto I reflect on my research of the beliefs of Australians during World War I, the millions of words in the diaries I’m examining and how The History Manifesto has reminded me of why I embarked on this research.


8 thoughts on “The History Manifesto and Big Data

  1. Thanks for your review- I don’t think that I’ll read the book but it’s interesting and useful to hear your perspective on it. Their comment about the ‘arbitration of data’ is certainly a strange one, and as you point out, there are many other disciplines well (and probably better) qualified to do so. I find even the idea of ‘arbitrating’ big data quite strange.


    • I hadn’t picked up on the phrase ‘arbitration of data’ but you are right it is odd. I innately think of it referring to the sifting of data to determine whether it is useful and how it is useful. I asked my cloud computing Hubble and he thought it referred to data validation.


  2. A thoughtful review of a subject about which I know little and have shied away from – to my chagrin. Yes…how does one capture the thoughts of those soldiers in the fields of war whose comments are buried within so many letters. Your/ the authors’ point that training in the use of technology for historians is essential is well made.


    • Thank you Christine. Using these big data techniques I’ve been able to capture the merest wisps of thoughts which on further digging in conventional sources have revealed some poignant stories. But I always remind myself that I’m searching through a small fraction of the words the soldiers in World War I wrote. So many letters and diaries are still not available in machine readable form. I’m very grateful to the State Library of New South Wales and their volunteers who have spent years transcribing the diaries I’m working on.


  3. Reblogged this on Stumbling Through the Future and commented:

    I published this review of the ‘Big Data’ chapter in The History Manifesto written by historians, Jo Guldi and David Armitage on my history blog last month. It is now on the reading list of HIST4170, Exploring Digital Humanities, a course offered by the history department at the University of Guelph, Canada.


  4. Ladurie was into climate change in the 1950s using grape data from monastic sources in the late medieval. Stats and big data forms the basis of the Annalist historiography. I agree with you on the need for history departments to teach basic stats, and courses on subjects like R and GIS, python programming, using APIs, sorting data from the html and xml that collects around it. I have spent countless hours labouring over spreadsheets with stuff like rainfall, the price of cattle, squatter’s accounts, labouring over GIS and other modern techniques. It is hard stuff to learn by oneself, especially hard to learn it so one can assess how useful it is in one’s research. My favourite source is The Great Meadow: Farmers and the Land in Colonial Concord, by Brian Donohue. He uses a lot of data over a long time, makes nice maps, and writes well: a good historian.


    • All I can say is where would we be without the blogs of digital humanists. I’ve learned everything I know about using technology in history from blog posts that are shared on Twitter plus the wonderful forums that answer every question you can possibly ask about programming. As an aside I liked this “Advice from an Old Programmer” I found on the Learn Python the Hard Way website the other day. Thanks for sharing about that book. For those who are interested, you can find further details on the Yale University Press website.

      What are you researching Kevin?


