[Unpublished: written c. 1992]

The computer revolution and local history

Alan Macfarlane, King's College, Cambridge

Introduction

    The first programmable electronic computer was invented in 1943; the first vacuum tube computer in 1946; the first electron­ic delay storage automatic calculator (EDSAC) in Cambridge in 1949. Modern electronic computers are therefore less than fifty years old. This article describes developments that have occurred over the second half of the 42 years since the first modern computer was invented in Cambridge.

   In 1972, Sarah Harrison and I decided to try to put abstracts of some of the records of two  English villages into a computer. Later this was to be expanded into a project to put complete transcripts of all of the records of the Essex parish of Earls Colne, between 1400 and 1750, into the machine. (1) The project was completed in 1983. It had taken approximately thirty man/woman years to complete, and was the longest funded project in the history of the SSRC/ESRC. Since 1983 we have built on and extended these methods in a further project to computerize more recent historical records concerning peoples on the border of India and Burma. (2) It is worth reflecting, on the basis of this continuing involvement in a rapidly evolving technology, as to how one might approach a local historical project if one were starting it now, rather than twenty years ago.

From mainframe to laptop.

    When we started, and for some fifteen years, there was no real alternative to working on a large 'mainframe' computer. Powerful enough, such a machine was unavailable to most local historians. Furthermore, the mainframes of the 1970's were cum­bersome machines, with poor documentation, difficult user inter­faces, liable to collapse, although the 'Phoenix' system and 'Tripos' operating system at Cambridge were  among the best in the country. Even on these mainframes there were resource con­straints. The Earls Colne data, with the indexes it generated, was originally too large to fit on the Cambridge computer. Only the introduction of the "double density disc pack" in the later 1970's solved this problem. In reality, the mainframe made local history computing the preserve of well‑placed and funded academ­ics.

    The development of micro computers in the second half of the 1980's to a point where a machine costing a thousand pounds has as much storage and is nearly as fast as the mainframe we started on, has totally transformed the situation. Since about 1989, the advent of "laptop" computers, which can be taken to the local archive or be worked with anywhere, has changed it again. We would clearly have used such machines to enter data and to inter­rogate it, if they had been available. A "dedicated" machine to hold one's files makes an enormous difference. The presence of such machines would, alone, have cut by  a third the time it took to complete the Earls Colne project.

From punched cards to optical character recognition.

    When we started on the project, it was still widely believed that computers were numerical devices; the idea of putting many words into the machine was not widely recognized. Hence most material at that time was put into the machine on punched cards or paper tape. We stared with a "flexiwriter", which nearly broke the taxi in which it was collected through its weight, typed in upper case and had to be put into "Shift" for lower case, and which made such a noise that one had to wear "ear defenders". A great step forward were paper‑tape punching IBM golf‑ball ma­chines ‑ though it took us several months to find a golf‑ball with the right characters. (3) This was all painfully slow. A great advance occurred in the later 1970's when it became possi­ble to type material directly into the computer using a visual display unit (VDU), and to edit it using a very expensive univer­sity graphics computer (PDP 11) with a light‑pen.

   Yet even when these were invented there was no screen editor or word‑processing package. Very near the end of the Earls Colne project Dr Tim King produced an editor (ED) for us, one of the first screen editors in England. Thus data input and correction was very slow and complicated.

   Two major developments have been and are occurring. The first was the arrival of commercially available word‑processors in the 1980's. We have watched the generations of Wordstar from 1 to 6, the emergence of Word, Word Perfect etc. These make input and cleaning of historical records infinitely easier. We would clear­ly have used them if they had been available.

    A second advance, which is only now under way, is the possi­bility of optically scanning historical documents into the com­puter. We are now on the edge of a revolution whereby not only printed and typed material can be scanned in, but even manu­scripts. This is potentially of great benefit to the local histo­rian, though there is no way in which optical scanners will be able to make sense of many of the more complex manuscripts. If we were starting now, we would invest in a flat‑bed scanner and see what could be done. Certainly many graphics (including maps and diagrams) can be scanned in very efficiently now.

From sequential searching to probabilistic retrieval

    When we started, there were no 'database' systems as we know them now. The only way to find anything in a mass of textual records was to search through them all from start to finish, looking for the 'shape' of a word, just as one does with 'Find' on a word processor. This was quite fast for one word on a main­frame, but what if one wanted to jump about through dozens of records? Here we had to invent our own software.

    We started with an "indexed sequential" system written by Charles Jardine, which at least found a word or several words more or less instantaneously. In the later 1970's, the earlier philosophy concerning databases, which was based on "hierarchical" searching was just giving way to a new approach, the "relational" database. There were, however, no relational databases  available either commercially or even through academic channels. We therefore collaborated with Dr. Tim King who wrote for us one of the first academic relational databases, the CODD database (COroutine Driven Database). This was very powerful and fast. (4)

    Now, of course, relational database systems have become standard and can be bought off the shelf in any computer store. Whether the commercial systems (e.g. DBase, Paradox, Foxbase, ideaList ), or the more specialist academic ones (e.g. 'Clio', AskSam, Ethnograph ), they all provide the local historian with Database Management Systems which were not dreamt of at the start of our project. We were developing the software alongside putting in the data, which often meant re‑inputting very large chunks of the data several time, an enormously time‑consuming task which a contemporary local historian is spared.

    Furthermore, we are just on the eve of another major shift from this current generation of 'relational' databases. These find what you know is in the texts and put it out as a set of records. The new systems, including 'probabilistic' databases such as the Cambridge Database System, based on the 'Muscat' system developed by Dr Martin Porter, are more interesting for the historian since they tend not only to confirm what one al­ready knows, but suggest links and connections to what one does not know. Furthermore, the restrictions on size of records and length of fields and amount of text editing are lifted. They are more flexible and intuitive, allowing the historian and computer to inter‑act. (5)

    Thus, if we were starting now, we would get hold of the latest database of this probabilistic kind and see how it could deal with our Earls Colne records. We have already tried it on twenty‑five thousand ethno‑historical records concerning the Naga peoples over the period 1850‑1950, where it works excellently. It seems likely that it would be appropriate for earlier historical documents as well.

From book to CDROM

    One aim of our project was to make available a version of all the material which we have transcribed, as well as an index to this material. An enormous effort had been put into transcribing the complete records of an English village, something which had never been attempted before, and it seemed a pity if this were not made available to the scholarly community as a whole, not only in this country but abroad.

     Yet publishing this raw material in the conventional paper form, as a series of what would be at least thirty volumes of 250 pages each, would make it very expensive and very bulky. It was  unlikely to find its way into libraries and hence not likely to find a publisher. At the start of the project there was no obvi­ous way to overcome this problem, but towards the end advances in photographic techniques had made it possible to envisage publish the whole collection of documents as Computer Output Microfiche. This cut the cost by a factor of at least four and the size by a very much greater amount.(6) 

    Yet microfiche and indexes are still quite difficult and slow to use. If we were starting now, we would think of distributing our original texts and the indexes as both text files and as a database file on an electronic medium. This might be floppy discs, to be put onto people's hard disc, or it may soon be possible to do so on CDROM (Compact Disc Read Only Memory). This is a form of storage which was only invented in 1985, but which will hold over 500 megabytes of information (several thousand books) on one disc like a music Compact Disc. CDROM players will soon be commonplace attachments to computers.

   Some form of electronic dissemination would enable other historians and educational institutions to interrogate the re­cords in the form of a set of texts or even a database, rather than somewhat cumbersomely by hand.

From hand flagging to computer indexing.

     When we started there was very little experience of text processing. At first we were told to code everything, otherwise the computer would not understand what was put in.(7) The error of this soon became apparent and we decided to make highly selec­tive abstracts. We did this for a while, but it was obvious that many of the facts we would be interested in would be left out of the records. We were then advised to put in all of the original text. This meant re‑doing much of the work, but it was a decision which turned out to be correct. Almost always the historian finds that he or she has left out the very pieces that are most impor­tant for later historical research, as well as changed the mean­ing during the abstracting or coding procedure.

    Yet having decided to put the whole text into the machine, we encountered the very considerable task of labelling everything so that the computer could make some sense of the list of symbols which it received. We devised a complex system of nested brack­ets, which broke up the documents. To a certain extent, we tried to replicate the semantic structure of English language within the machine.

    This did allow the possibility of tracing genealogical links, the descent of land‑holdings. But it was an immense job. It was  done several times by Sarah Harrison, later with the assistance of Jessica King, as the bracketing system was developed.

     If we were starting now, we would use a greatly simplified input structure and use the increasing power and sophistication of the computer to provide the indexes. This change in approach is based not only on the experience of the Earls Colne Project, but also the subsequent project on ethno‑historical records. one.

    Previously we divided the texts into records, which were then flagged or coded to indicate complex hierarchical bracket types. The types we used were Person, Name, Kin, Relationship, Land, Land Name, Land Number, Hold, From, To, Goods, Bequest, Verb, Conditional, Date. There were also Back Reference, Composite, Reverse Direction and Data Moving brackets, to deal with the idiosyncrasies of English grammar and the past tense. A relative­ly simple example of one entry in a rental illustrates what had to be put in:

&(P (P *1 (N Henry Abbott ) ) and (P (N Jone ) (K (1 his ) wife))

(H do claim to hold (L a tenement in Church Street) ) &)

This explains to the computer that there is a person with the name Henry Abbott and his wife Jone, linked by the kinship rela­tion of wife, who have a holding relationship to a piece of land in Church Street.

All this had to be checked by hand from diagnostic output to see that the brackets matched and  that the nesting of brackets was right. It can be seen that to deal with some thirty thousand records like this is something that most local historians could not contemplate.

     On the basis of our two major projects, we would now simpli­fy the field structure. Originally, the material had to be broken up into many different fields. Now, with more powerful retrieval systems, although a few other fields might be needed for special purposes, most local historical projects could be dealt with by using the following: Date, Field name/number, House name/number, Place name, Person name, Medium (e.g. parish register), Reference (archival reference). There would also be one or more 'free text' fields. Some of these would just contain un‑indexed longer texts, perhaps with an indexed caption, others would have every word indexed. It may even be possible to simplify this still further in the future, putting the records in as they are found and abstracting names, places, dates and other information automati­cally.

    Having set up a template of these fields, which automatically sets the flags or codes, it is relatively easy to put in the material and one does not have to worry about the coding.

From number crunching to multimedia.

    At first, computers were regarded as number processing de­vices. The Earls Colne project developed during the 1970's and early 1980's as it became clearer that computers could be useful in text retrieval. But a certain amount of non‑textual material, maps, diagrams, old photographs, had to be left on one side.

     During the 1980's a further revolution occurred as it was realised that computers were 'multimedia' devices. At first this became apparent by linking them to peripherals such as optical discs (videodiscs). The first commercially available videodiscs in this country appeared in 1982 and within a few years the BBC Domesday Project had realised some of their potential for storage of both textual and visual materials. (8) Now it is becoming increasingly obvious that computers themselves can hold digital material of all kinds ‑ texts, sounds, pictures.

    Already they can hold a little sound and a few pictures and graphics in digital form, or much more when joined to a compact disc (Compact Disc Interactive). But it is likely that during the next few years new storage devices and data compression will mean that it is possible to input not only texts of the normal histor­ical kind, but also sounds (oral history, folk music etc.), maps and other graphics, photographs and even, as compression im­proves, video and movie film.

    If we were starting again, we would consider whether any of these sources are available or could be generated by the histori­an and how they could be integrated with the texts. For instance, would a set of local documents be enriched if accompanied by a series of maps and photographs? 

Nominal record linkage

    Quite early on in the project we became interested in the idea of trying to use the computer to link together records, reconstructing individual lives from disparate records, or as it was technically known "nominal record linkage".  At first we believed that the computer would be able to do the record linkage automatically. We knew that it was possible to do this using just the christenings, marriages and burials in parish registers, since the machine could be programmed to obey the rules upon which hand linkage was based. It seemed likely that as more information became available through using all the sources, the links would become firmer and the job easier.

     After a great deal of effort, we proved to our own satisfac­tion that automatic record linkage of multi‑source records was too difficult for the computer. Of course, the computer could easily sort the records into name sets and deal with a certain number of the cases. But we found that whatever strategy we used, many of the links were so complex and required so much historical background knowledge that it was impossible to write a program that would replicate the human mind. (9)

    It is possible, of course, that artificial intelligence research into "intelligent" systems has advanced so far in the last ten years that what was then impossible could now be done. But it would require a massive amount of programming to do so. Certainly we were forced to abandon the attempt. Having created a name index automatically with the machine (that is every refer­ence to a name), Sarah Harrison used the hand index in tandem with the machine index to sort these names into historical indi­viduals. Even the human being was occasionally stuck, unable to select between the eight Henry Abbotts, for instance, when allo­cating a passing reference in a court roll. But on the whole the human being could do what the machine could not.


 NOTES

1. The project and records have been described in Alan Macfarlane et al., Reconstructing Historical Communities(Cambridge Univ. Press, 1977) and Alan Macfarlane, A Guide to English Historical Records (Cambridge Univ. Press, 1983). A short description of the computer aspects is contained in T.J.King, 'The use of computers for storing records in historical research', Historical Methods, vol.14, no.2.Spring 1981, and Alan Macfarlane et al, 'Recon­structing Historical Communities by Computer', Current Anthropol­ogy, December 1979. The project was financed by King's College, Cambridge and the Social Science Research Council (later Economic and Social Research Council). The project team over these years consisted of Sarah Harrison, Charles Jardine, Jessica King, Tim King and Alan Macfarlane.

2. The project is described briefly in Alan Macfarlane, 'The Cambridge Experimental Videodisc Project', Anthropology Today, vol. 6, no.1, 1990. For a fuller account of the Nagas, produced alongside the videodisc, museum exhibition and computer database, see Julian Jacobs et al., The Nagas; Hill Peoples of Northeast India (Thames and Hudson, 1990).

3.  The world of computing in those days, and even parallel searches for golf‑ball type faces, is amusingly described in Ben Ross Schneider, Jr.,Travels in Computerland (Addison‑Wesley, 1974).

4. A brief description of the system is given in T.J.King and J.K.M.Moody, 'Design and Implementation of CODD', Software‑Practice and Experience, vol.13, 1981.

5.  The system is based on the Museum Cataloguing System, MUSCAT, originally written by Dr. Martin Porter and documented in The Muscat Manual (4th edn: 1990) and Introduction of Muscat (2nd edn: 1989), both published by the University of Cambridge Comput­ing Service. This is currently under development as the Cambridge Database System (CDS), enquiries about which should be made to Cambridge Multimedia, St Andrews, North Street, Burwell, Cam­bridge, CB5 OBB.

6.  The records have been published by Chadwyck‑Healey Ltd., Cambridge Place, Hills Road, Cambridge,  as 'Records of an Eng­lish Village; Earls Colne 1400‑1750'.

7. This, for instance, was the advice given in the first popular book on historical computing, Edward Shorter, The Historian and the Computer (Prentice Hall, New Jersey, 1971).

8. The BBC Domesday project is described in Peter Armstrong and Mike Tibbetts, Domesday Video Disc User Guide (BBC Publications, 1986) and in Alan Macfarlane, 'BBC Domesday: The Social Construction of Britain on Videodisc', Society for Visual Anthropology Review, vol. 6, no. 2 (Fall, 1990).

9. Full technical documentation on the Earls Colne Project is contained in 'Reconstructing Historical Communities with a Computer: Final Report to the Social Science Research Council' (1983) by Sarah Harrison et al. which is deposited in the British Lending Library. The difficulties of automatic record linkage are described on pp.28‑30 of the Report.