Second Part

Second Part

0:09:07 I did not have a Fellowship at King's when I returned but was offered Direction of Studies, and also had a job at the Computer Laboratory as Senior Assistant in Research; paid at the same level as an assistant lecturer with no guarantee of tenure or commitment on either side, but a chance to work in a fascinating environment; the lab knew they were going to have to buy a new mainframe computer; they built Titan themselves, a prototype Atlas 2 machine, but technology was moving so fast and there was no way they could keep up using home built machines; naturally at that time they would be looking at IBM kit, so in some sense I was the purchaser of the future; I lived in King's, in free accommodation, in the Keynes Building [only just completed]; my friend Alan Robiette was then a teaching Fellow in Corpus Christi as was another old friend from King's, Haroon Ahmed; they suggested that as Corpus was short of mathematicians I should take a Fellowship there; King's heard the rumour and made me a Fellow in 1969; by then Edmund Leach was Provost; I soon found myself on College committees, including the Council, and made very good friends there.

4:50:24 I got to know Oliver Zangwill quite well as he lived on the same staircase; I had hitch-hiked at least 100,000 miles by the time I first had a car just before I went to IBM and I would therefore pick up hitch-hikers; after a squash match, coming back from Hertfordshire, I saw a slightly dishevelled figure in the headlights; I picked up young Barry whom it became clear was on the run from an approved school, Kneesworth House; I took him to Cambridge and asked where he was going to stay; he did not want to go to the police station, which fuelled my suspicions; I let him sleep in my sitting room; next morning I contacted Gill Macfarlane who told me to take him back to Kneesworth House, which I did; about a year later Zangwill knocked on my door as Barry was again on the run and in his rooms; we decided to route him back through friends he had made while on the run; Zangwill had found him waiting outside my rooms and had entertained him until I returned [Oliver was Professor of Experimental Psychology, so an expert witness to Barry's state of mind - "rather disturbed"]. Experience of Edmund Leach on the Council; his antagonism towards Bob Young; Adrian Wood's warning of a Leach outburst.

14:15:21 When I joined the Computer Lab in 1968 I was working with one particular computing language called Snobol4, which was based on a different model of computation from the classic lambda calculus model; for text processing applications this gave completely different dimensions from a language like Fortran, which was good at doing sums but no good for character strings and character manipulation; there had been a number of earlier languages - Snobol1 to 3 - based on Markov algorithms; by Snobol4 it was a hotchpotch of all known ideas in computing, so had functional programming embedded in it; it had been developed by a group including Ralph Grisbold, who had moved from Bell Labs to the University of Arizona at Tucson; he provided a portable implementation in the form of a macro-processed specification; it was a lot of work to implement it, and this was the first job I did in the Computer Lab; I liked Snobol4; we provided an implementation for Titan and it worked well, and was soon installed on the Atlas2 at the CAD centre; as a technical trick the thing that I am proudest of in all my computing career was porting the Snobol implementation from the Atlas2 back to the Atlas1 at Chilton, where I had worked when I was with IBM; we ported it in binary; the operating systems of the two machines were quite different, so I took the specification of the Atlas1 operating system, generated a version symbolically on the Atlas2, then dropped it into memory with a large patch area where we could fix bugs; there were only three bugs; I worked with Eric Thomas, who had done the Cambridge Diploma before going off to work at the Chilton Atlas, and was very well disposed to Cambridge; we had finished by lunchtime; Snobol gave great service there, because at Cambridge we had plenty of other language support, but Chilton had nothing for text manipulation, so [Snobol4] was very widely used there.

19:37:01 Snobol had no relationship to BCPL, which was a minimalist language in the lambda calculus tradition; I shared an office with Martin Richards who developed it; BCPL was the language that led to B at Bell Labs, which was followed rapidly by C[; when the Lab purchased an IBM mainframe in the early 70s I was the natural person to help Martin port BCPL to the IBM 370 architecture]. I first thought that what [Alan Macfarlane was] doing with historical data was inherently interesting when I saw your card index in King's in 1973; you showed a set of cards relating to a particular village family; I was surprised that from early demographic records one could get such a dense coverage of family relationships and lives; I was interested as I like practical problems; thought it would be a little-computer-project, also I had begun to get interested in database management systems; the language side was interesting because the languages we were working with at that time did not have good support for text processing, or like Snobol, were a bit of a mess; two things to be addressed - managing this very large set of data and finding languages that would support social scientists; I really enjoyed the problem, and in terms of career development I was very lucky to have access to an excellent undergraduate, Tim King, who just as we were beginning to formulate the project was finishing his degree; we had interest from IBM who had a prototype relational DBMS called PRTV [Peterlee relational test vehicle], which was under development at the IBM Scientific Centre at Peterlee; we set up a joint project; no one had experience of using relational databases for such a complex set of character string data - about 13,000,000 English words - an awful lot of date for the time; also it was from a wide variety of parish sources, and the problem of telling the story and extracting information without damaging the nature of the data is something that really intrigued me; I remember insisting that whatever was done we should retain access to the original text of the records; it would have been possible to encapsulate snippets onto punch cards, but that would not lead to the greatest benefit; right from the start we aimed to put the records in, abstract the natural language meaning by marking the records to say what was meant but retaining the word structure; with the benefit of hindsight that was obviously right, but at the time it looked slightly ambitious given the size of the data.

28:05:19 Charles Jardine [a former King's Maths student] who worked on the mark-up and query languages was clever, zany, highly innovative; as far as I can tell totally unambitious, but intrigued by problem solving; I always found him a delight to work with because he repeatedly made you laugh; Tim was not laid back but brought enormous abilities; he had sufficient competence and confidence as a programmer to say that PRTV was not working as it ideally might; we had an architectural model derived from PRTV that we really liked, namely the use of relational algebra as the basic language representation; PRTV was one of a number of prototype relational database management systems in the mid-1970's. The relational model had been spun off from a paper, an intellectual idea developed by Ted Codd in 1970; the amount of software required to drive the model is vast; by the mid-1970's there was a system called Ingres at Berkeley, developed by Michael Stonebraker, also an IBM project System R being developed at the San Jose Lab, as well as PRTV; both Ingres and System R were developing around a language called SQL, which was related to the relational calculus, essentially a record by record model for analysing relational data; PRTV had gone instead to a relational algebra model that allows you to write queries in terms of whole relations, that is all of the records satisfying particular constraints or in a particular format; for technical reasons, if you have got a large number of updates and you are going to have to manage concurrency control, a relational calculus approach is preferable; the great beauty of our problem in its matching with PRTV is that we were interested in setting up a database that was the model of the records describing Earls Colne, then exploring the database to derive the model of what Earls Colne might have been like; the great beauty of this is that there is no update, fundamentally a read-only database; for our purposes a relational algebra model was far more suitable. The way in which they had set up the implementation of relational algebra was to use recursive calls down a tree representing the parsing of a relational expression; for various reasons this is not nice, because it interferes with natural concurrency flow during evaluation; Tim had the idea that we should retain relational algebra, exploit the parse tree in a rather similar way, but instead of doing recursive calls from the top of the tree we should set up an active co-routine structure and allow the data to flow as it naturally wished to; that is what we did; it meant reengineering the whole of PRTV, which we duly did in BCPL; Martin Richards and I devised the co-routine structure and wrote a paper in 'Software Practice and Experience'; we have a research student currently working on similar problems and he has developed extensions to Java; out of curiosity I showed him this 1979 paper and the resonance with what he is doing now is surprisingly close; I think the computer science that came out of the project was real and good; quite recently I went to Cardiff for the 25th anniversary of the British National Conference on Databases; I found that there on the list of the issues that would be discussed in the review of twenty-five years of British computing was CODD, the Co-routine Driven Database that Tim developed, and CHIPS, the Cambridge Historical Information Processing System that was the rather over-elaborate high level language that we designed to support queries to this database; so we certainly made a mark on the community; there was a later paper describing the database implementation on top of the co-routine support; it taught me a lot about things like how you get grants; it made me publish something at last; it was excellent experience in terms of introducing database technology into the Lab; the fact that there was national level profile for the work pushed the Lab towards database, which previously they had no experience of; once again I go back to the fact that the work is driven by the data - the inherent interest in that made us all excited; Jessica King and Sarah Harrison's work on the data quality and preparation sides.

40:32:11 Later on [1998] came Tim Mills, who wanted to do a Ph.D. in information retrieval, a phenomenal developer of code; he wanted to take existing retrieval engines and generalize them, retaining the architecture of the engine, the matching functions, the evaluation, and the use of feedback support, but applying them to collections of data that were other than text documents, e.g. images, photos, which might contain annotations as well; I was interested in that aspect although I had not kept up with the basic literature on information retrieval; Keith van Rijsbergen had worked with me as a Ph.D. student much earlier, had gone to Glasgow and had become a leader in the mathematical end of information retrieval and the processing that went along with it; I knew what Keith was up to and had enough programming sense to know that Tim Mills and I would get on fine; that was the case, and he submitted two months short of his three years as a Ph.D. student; as one of the things that he did, he needed a test data set to study, so we took the original Earls Colne data and provided an IR context within a Web site; that was not the original purpose of Tim's Ph.D. but it was proof by example that what he had done worked. Keith van Rijsbergen did work on probabilistic retrieval when he came back to the Computer Laboratory as a Royal Society Research Fellow; it helped to make his reputation; Martin Porter, who was a former Ph.D. student with Karen Spärck Jones, worked as a Research Assistant on Keith's Royal Society project; I wasn't close enough to their work to comment in detail; I met Martin professionally later in the mid-1980's; my former girlfriend, Anne Howie, was associated with setting up a Chinese-English paleontological thesaurus [actually the baby of a colleague of Anne's at Monash University outside Melbourne, Pat Rich]; I got involved because the Cambridge end, which was responsible for dealing with the printing of the English side, was supported by the Literary and Linguistic Computing Centre; for various reasons things were not going well, and Martin Porter came and helped me then; we ended up publishing it; both Charles Jardine and Martin Porter are very clever, but neither is at all ambitious.

49:10:00 In the 1970's the Cambridge Ring was just happening; we had a relational database engine that operated by taking a parse tree and deploying the individual nodes to locations that were supported by co-routine activations and communicated with one another; this allowed the natural data flow to control the concurrency, rather than having it imposed by recursive call from the top of the parse tree; Tim had the very smart idea that since we had this architecture, there was no reason why the node representations should even be in the same processor; if you had something like the Cambridge Ring you could do your parse, generate a parse tree and then activate the nodes at different machines in the processor bank, allowing them to communicate through the Ring; in the same way as when I got Snobol to work straight away, we deployed it on the Ring and it evaluated quite a complex relational expression without any trouble; it was a validation both of the Cambridge Ring and of the co-routine structure that we had been using; IBM's original model would not have had a cat in hell's chance of making sense of what was going on because it was too complicated; one of the other things that was happening was that the Cambridge Distributed System was being developed in the 1970's; this was the software exploitation of the Cambridge Ring communication technology and the processor bank computational model that went along with it; lots of people had one, it wasn't just us, but everyone who had one suddenly found that they could do interesting and original things. The screen editing system goes back to the efforts that we made to mark the natural language structure of the records and the entities in the records that carried meaning at application level; there was a particular group within the Computer Laboratory, led by Charles Lang, which had a lot of experience with the PDP11 system, and we developed an interactive text editor, first on the PDP7 then on the PDP11, for extending the text documents with diacritical marks; I think other people were doing this but arguably we were among the first - some of that work certainly anticipated SGML and therefore XML, so I could say it is all our fault.

53:53:00 I took one consulting job only; I have always valued my time and have always had enough money to live on; the exception was when I bought a new house in 1977, a three-hundred year old thatched cottage in Knapwell, west of Cambridge, which needed a lot of work done on it; the consulting job was implementing BCPL on a Siemens processor, which was quite like an IBM machine but with a completely different operating system, in the EU in Luxemburg; in three weeks I earned a third of my annual salary and paid for all the building work. My wife Jean's contribution to my career has been enormous; she came to the Lab in 1985; she had worked very much in these areas of concurrency and distribution; she did a Ph.D. as a member of staff at Hatfield Polytechnic; she knew all about the theory and the practice, and she programmed and made things happen in the world that took off in the 1970's; by the 1980's she had made her reputation in that area; embarrassingly enough I was on the Lab interviewing panel that appointed her, though I did not know her before that; early on we started working together as we have similar areas of interest; I discovered to my shock that she would almost always rather write a paper than write code; I had regarded the paper as something one wrote when one had run out of ideas; the result of meeting Jean was that we started writing papers together; we had research students that had begun to form a group; Jean encouraged them to write and their reputation spread; we were quite early into distributed event-based systems, also federated application management and the access control structures that are needed to support it; these ideas have become more and more important; all of these things are driven by technology revolution to some extent; in the 1980's we were developing distributed database management systems, absolutely miserable to work with because moving data from one place to another was a total labour; all of this was made redundant by the comms revolution; suddenly there was no trouble in getting data moving from A to B; there are still concurrency and consistency issues, but you were not bothered by the slowness of the technology; we were exploiting the new model early and it has stood us in good stead, and we have run a happy group since 1988; Jean had never liked my thatched cottage, and in 2003 we moved to Blythburgh as our second home; the option of autonomy was no longer there, so we got married.