The Thinkers behind Intelligent Machines

Author:  Dennis Shasha and Cathy Lazere   Publisher: Copernicus, an imprint of Springer-Verlag New York,  Inc.     1st edition 1995. 2nd printing 1998 ISBN: 978-0387982694 (Paperback)

Author: Dennis Shasha and Cathy Lazere

Publisher: Copernicus, an imprint of Springer-Verlag New York, Inc.

1st edition 1995. 2nd printing 1998

ISBN: 978-0387982694 (Paperback)

Building intelligent machines that think methodically and solve difficult problems has been a monumental effort spanning several generations of ambitious thinkers and tinkerers. To understand, articulate, and overcome the manifold challenges faced in this endeavour, it has taken some extraordinary geniuses to delve deep into the way thoughts are formed and information is processed by humans as well as machines. It is no wonder then if we feel that these geniuses must have been out of their minds!

Out of Their Minds: The Lives and Discoveries of 15 Great Computer Scientists written by Dennis Shasha, a computer science professor at NYU, and Cathy Lazere, a finance and technology journalist, is a time-machine that lets its readers experience the early-life influences and breakthroughs of 15 remarkable computer scientists through their recollections. Published in 1995, the book provides a glimpse of the then future of computing through their vision. All 15 computer scientists covered in the book were living legends when the book was published; six have departed this world since then, and later this year in December 2024 we have the birth centenary of the oldest among them. The authors wrote it at a time when it was possible to pick the brains of these “living Isaac Newtons of computer science”. However, there cannot be a better time than now to revisit this book to understand how these 15 computer scientists sowed the seeds of subsequent breakthroughs, and how their foresight has shaped our present. Intriguingly, the last chapter in the book titled Postscript explores the question: “Twenty five years from now, which discoveries will a book like this discuss?”

Out of Their Minds (OTM) is largely based on in-person interviews and relies on quotes instead of the authors’ interpretations to bring out the personalities and their thought processes. In addition, the authors provide sufficient historical background and technical points to serve a large spectrum of readers—from the dilettanti to the cognoscenti. OTM has an exquisite References section; each of the 15 computer scientists covered in this book was asked to list a small set of his own favourite publications (papers or books). However, this book underlines the importance of documenting oral history via in-person interviews alongside the technical treatise. As we look up to these giants of their field and observe their footprints on the sands of time, it is important to look beyond their shoe sizes and try to understand what was going on in their minds and what places they were off to.1

The order of the day

Before we review the contents of OTM, it will not be out of order to make a few remarks about its scope and the organization of its contents. The authors have chosen to limit its scope only to those advances in computer science that came after computers became an engineering reality. As a result, the book covers the work of many Turing Awardees2 but much of the past theoretical work by the likes of Alan Turing is out of its orbit. The book is divided into four parts: (1) Linguists: How to Talk to Machines, (2) Algorithmists: How to Solve Problems Fast, (3) Architects: How to Build Better Machines, and (4) Sculptors of Machine Intelligence: How to Make Machines Smart. Looking back at the history of computing, this order seems quite natural, and it approximately matches the chronological order of birth years of the corresponding innovators covered in the four parts. To make intelligent machines a reality, we first need a language to communicate with the machines. Then we need algorithms to help the machines do difficult or important tasks as efficiently as possible. Once we have a machine, a language and algorithms, we can build better architectures suited to their specific requirements and squeeze out even the last bit of efficiency. Finally, we try our best to push the boundaries of what our intelligent machines are capable of. Later in this article, we will discuss if the above order remains the natural order of innovations even today but let’s not put the cart before the horse.

I must add another disclaimer right away. I am afraid my education and research background makes me biassed towards theoretical aspects of computer science over the engineering nuggets behind building intelligent machines. Hence, I may not be able to do justice to the book but I hope to do justice to the mathematically inclined readership of Bhāvanā.

Linguists

Out of Their Mind quotes Benjamin Whorf, “Language shapes the way we think, and determines what we can think about” (known as the Sapir-Whorf hypothesis), as it sets out to understand what shaped the ideas of those who developed linguistic tools for intelligent machines.

John Backus

John Backus (1924 – 2007)
John Backus (1924 – 2007) Wikimedia Commons
John Backus (1924–2007) is best known as a creator of Formula Translator or Fortran, one of the earliest high-level programming languages created in 1957. He is described as a restless inventor in the book because his work did not stop with Fortran. He invented the Backus-Naur form3 to codify the syntax or the grammatical rules of any high-level programming language. In his provocatively-titled Turing Award lecture “Can Programming be Liberated from the von Neumann Style?” in 1977, he presented a functional language (or to be pedantic, function-level language4) named FP, that paved way for subsequent work on functional languages such as Lisp and eventually a theoretical foundation for functional programming based on lambda calculus from the 1930’s. That such a proposal came from someone involved in the creation of the most popular imperative languages at that time (viz., Fortran and Algol) carried considerable weight. Based on John Backus’ in-person interview and family history, the authors speculate that his creativity was fueled by a distaste for inefficiency or unnecessary complexity, even in his own inventions.

OTM narrates a fascinating story of how John Backus as a final year undergraduate student at Columbia University was undecided about his future plans, when he went to see the public display of the IBM Selective Sequence Electronic Calculator (SSEC) on Madison Ave., and asked the tour guide if they had a job for him. He got introduced to Robert Rex Seeber Jr. and was hired on the spot after solving a puzzle. Robert ran what was called the Department of Pure Science at IBM, and Backus’ first project was on astronomical calculations about the position of the moon.5He developed Speedcoding, the first high-level programming language for the ease of programming with floating point numbers on IBM 701, IBM’s first commercial scientific computer. Speedcoding was a precursor to his proposal for Fortran, a lingua franca in scientific and high-performance computing even today. That there is a piece of Fortran code still running on Voyager 1 and 2 space probes about 12-15 billion miles away from us is a good metaphor for Backus’ far-reaching contributions!

What stands out is Backus’ deep and hands-on understanding of important economic and scientific trade-offs between the efficiency of the human programmer, the program, and the hardware. His introspection on the pursuit of science as an escapist will definitely resonate with most mathematicians—“Most scientists are scientists because they are afraid of life. It’s wonderful to be creative in science because you can do it without clashing with people and suffering the pain of relationships, and making your way in the world…. The pain in solving a problem is small potatoes compared with the pain you encounter in living.”

John McCarthy

As a young assistant professor at Dartmouth College, John McCarthy (1927–2011) co-organized (with Marvin Minsky, Nathaniel Rochester, and Claude E. Shannon) a workshop called Dartmouth Summer Research Project on Artificial Intelligence in 1956. This 2-month, 10-person workshop founded “Artificial Intelligence” as a new field with an ambitious goal to “Proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problem now reserved for humans, and improve themselves.”6

John McCarthy (1927 - 2011)
John McCarthy (1927 – 2011) Wikimedia Commons
At the Dartmouth workshop in 1956, Allen Newell and Herbert A. Simon, presented their latest work on Information Processing Language (IPL-2). It was an integral part of “Logic Theorist”,7 an automated reasoning system developed with their colleague J.C. Shaw at RAND Corporation. Through heuristic pruning of search trees implemented in IPL, an assembly language for manipulating lists, their system could find proofs for several theorems in Chapter 2 of Whitehead and Russell’s Principia Mathematica. In 1958, McCarthy presented a paper “Programs with Common Sense”8 at the Symposium on Mechanization of Thought Processes, in which he proposed a hypothetical program called “advice taker” for common sense reasoning by manipulating sentences in formal language. Interestingly, the paper contains a discussion (review and rebuttal) that will certainly remind the readers of present-day discussions on OpenReview.9 OTM shows how open discussions of “half-baked ideas” fueled a lifetime of McCarthy’s research on making common sense logical. As a means to this end, McCarthy realized that lists can represent any mathematical structure in science as well as sentence structure in languages, and invented a list processing language called Lisp in 1960. Lisp and its dialects remained favourite for building artificial intelligence systems over several decades. McCarthy did not anticipate this longevity but thought that its reason, as stated in OTM, was perhaps: “Whereas Newell, Shaw, and Simon described IPL as a language that grew complex over time, McCarthy described Lisp as a language that became simpler over time.”

OTM lays some backstory of McCarthy’s first attempt at modelling human intelligence on a machine after attending a talk by John von Neumann on self-replicating automata in 1948, and his work with Claude E. Shannon in 1951 to collect papers for a volume on machine intelligence called “Automata Studies” (since Shannon disliked flashy terminology like “Artificial Intelligence”). What the book fails to adequately explain and leaves the reader to connect the dots, is how McCarthy, after writing a PhD thesis on “Projection Operators and Partial Differential Equations” in 1951 at Princeton University, slowly shifted his focus towards machine intelligence. I wish the book had elaborated on how recursion as a natural way to implement differentiation and the universal eval function as a more transparent analog of the universal Turing machine both came from McCarthy donning his mathematician’s hat.10 Since the book focuses on McCarthy’s quest to provide logical foundations for artificial intelligence, it leaves no space for another of his inventions: time-sharing. Lester Earnest, who developed the first spell-checker, has spelled out the obvious, “We keep inventing new names for time-sharing. It came to be called servers. […] Now we call it cloud computing. That is still just time-sharing. John started it.”11

Alan C. Kay

Alan C. Kay
Alan C. Kay Wikimedia Commons
“All understanding begins with our not accepting the world as it appears.” Alan C. Kay started reading fluently by the age of three, and he had read hundreds of books by the time he went to school, only to realize that his teachers at school were teaching him only one perspective of a subject when his reading had already made him familiar with diverse perspectives. OTM does an excellent job of covering Kay’s childhood and early career that made him never distinguish between art and science. He could have even pursued a career in theatre productions and stage music, and his inspirations in computer science came from cell biology!

It is eye-opening to contrast the present-day push for diversity and inclusion in science and technology with some of Kay’s recollections about his early programming work experience in 1961—“They needed programmers. This was back in the days when programming was a low status profession and most of the programmers were women. My boss was a woman. […] I had a friend who was a black guy who did what today we call an operating system. You have to realize that IBM 1401 had 8k [8000 characters] of memory so the operating system had to be less than 1k. He had done this wonderful thing that allowed the machines to do really complex batch processing jobs and it was wonderful. I helped him work and got into the style of programming.” Kay’s exposure to diverse perspectives through books, people, and ideas seems to have played a great role in shaping his thoughts. In his other interviews, he has expressed equal appreciation for abstract mathematical ideas and hands-on user interface design.12 With his broad perspective, he really comes across as the ideal person to have led the design and development of personal computers at Xerox PARC.

Kay’s own inspiration for his object-oriented programming came from cell biology—“The big flash was to see this as biological cells.” The OTM states that this analogy helped Kay to discover three principles: “First, every cell `instance’ conforms to certain basic `master’ behaviours [Inheritance]. Second, cells are autonomous and communicate with one another using chemical messages that leave one protective membrane and enter through another one [Encapsulation]. Third, cells can differentiate—the same cell can, depending on context, become a nose, eye, or toenail cell [Polymorphism].

OTM starts the chapter on Kay with the famous visionary essay “As We May Think” by Vannevar Bush that was an early inspiration for the 14-year old boy Alan Kay. In Kay’s work, the inspiration finds a direct connection to his Dynabook proposal for a portable, personal computer for “children of all ages”, a precursor to present-day tablets. Today anyone who reads “As We May Think” would invariably see its prophecy fulfilled by some of the present-day information and communication technologies, and in particular, the internet. However, its most relevant part in the context of Kay’s work is that these technologies have made it easier for anyone to get cross-disciplinary and diverse views of any subject.

Algorithmists

OTM likens algorithms to recipes. The best ones call for a good taste, clever and optimal use of basic ingredients in correct proportions, culinary skills to put them together, knowing their limitations, and most importantly, when to stop.

Edsger W. Dijkstra

OTM gives us an idea of how Edsger W. Dijkstra (1930–2002) navigated his early career being an outlier, first as a programmer physicist, and then as a discrete mathematician. He had to publish his work on graph algorithms in the journal of numerical mathematics Numerische Mathematik, for the lack of any alternative for discrete mathematics. Dijkstra’s original paper does not contain a proof of correctness stated separately (as the book makes us believe) but it is very clear that he took a great deal of care to perfectly and unambiguously describe his algorithms, so as to make their correctness pretty much self-evident.

Edsger W. Dijkstra (1930 - 2002)
Edsger W. Dijkstra (1930 – 2002) Wikimedia Commons
OTM offers a delectable multi-course menu from Dijkstra’s algorithmic recipes on graphs such as the shortest-path algorithm to the Dining Philosopher’s Problem in synchronization of shared resources. Dijkstra’s remarks on his work and thought process show his constant demand for simplicity and correctness in everything around him, and its cultural ramifications in different social circles, e.g., programmers, mathematicians, and European-vs-American computer scientists. Dijkstra did not mince his words when expressing his clear, and oftentimes acerbic and controversial, opinions. I wish the book had dwelt a little more on Dijkstra’s contributions to structured programming, and focused on his intellectual debates on topics such as program verification13 instead of his diatribe against what he called the American way of looking at mathematics, programming, and artificial intelligence.

Another important detail missing in OTM (as it happened after its publication) is the 2002 ACM PODC Influential Paper Award for Dijkstra’s paper “Self-stabilizing systems in spite of distributed control”, shortly before his death, and which was later renamed after him as the Dijkstra Prize. Leslie Lamport wrote, describing the winning paper’s contributions: “I regard this as Dijkstra’s most brilliant work—at least, his most brilliant published paper. It’s almost completely unknown. I regard it to be a milestone in work on fault tolerance. The terms fault tolerance and reliability never appear in this paper. In fact, one reason why it’s not better known might be Dijkstra’s approach, illustrated by this quote from the paper: The appreciation [of these results] is left as an exercise for the reader.”14

There is no single recipe for a successful scientific career but what has stayed with me from OTM are Dijkstra’s three golden rules for young scientists:

  1. Never compete with colleagues.
  2. Try the most difficult thing you can do.
  3. Choose what is scientifically healthy and relevant. Don’t compromise on scientific integrity.

Michael O. Rabin

Michael O. Rabin
Michael O. Rabin Wikimedia Commons
Michael O. Rabin was born in Breslau, Germany (now Wrocław, Poland), in a religious family with a long line of rabbis. OTM describes Rabin’s early education in religious schools of Germany and Israel, and his gradual inclination through school and university to exact science, mathematics, and eventually computation through Kleene’s book Introduction to Metamathematics.

Taking the analogy of algorithms and recipes further, Rabin explains non-deterministic finite state machines as analogous to menu selection for a multi-course meal at a fancy French restaurant (as stated in OTM): “If a menu selection provides a non-deterministic patron with a good choice for every course, then he will say that the restaurant is good. Otherwise, he will say that the restaurant is bad.”

It is truly remarkable that the fundamental idea of non-determinism in computation came out of a summer job (or internship) at IBM Research by Rabin and his co-author Dana Scott, when they showed that any non-deterministic finite state machine can be simulated by a deterministic one (albeit with many more states), and it prominently appeared in their Turing Award citation for its long-lasting impact.

Rabin returned to IBM Research again the next summer, and solved a puzzle posed to him by John McCarthy using what is now known as one-way function in cryptography, i.e., functions that are easy to compute but difficult to invert. His work on the “Degree of Difficulty of Computing a Function and Hierarchy of Recursive Sets” paved the way for computational complexity theory. There is an anecdote in OTM about Rabin’s talk titled “Theoretical Impediments to Artificial Intelligence” on his work with Fischer that showed a doubly-exponential lower bound for formal proofs of theorems in Presburger arithmetic (first-order theory of natural numbers with only addition but no multiplication), and how it felt like an existential threat to the artificial intelligence community that was trying to build theorem provers for Presburger arithmetic. For mathematicians, it is reminiscent of what Gödel’s incompleteness theorem did to Hilbert’s program.

Just like non-determinism, randomization is another essential ingredient introduced by Rabin that changed the world of algorithmic recipes forever, and has found numerous applications in cryptography, communication protocols, distributed computing, and parallel computing. In 1976, Gary Miller discovered a deterministic algorithm for primality testing that runs in polynomial time, assuming the extended Riemann hypothesis. In 1980, Rabin modified this to a probabilistic algorithm without any assumption. It took another 22 years until Maninder Agarwal, Neeraj Kayal, and Nitin Saxena showed that “PRIMES is in P”.15 Essential ingredients really make or break a recipe that can be very difficult to fix without them!

Rabin was interviewed again in 2010 by Dennis Shasha, where he highlighted the symbiotic relationship between research and teaching, stating that, “Great teaching and great science really flow together and are not mutually contradictory or exclusive of each other”.16 In the same interview article, Shasha mentions Rabin as one of his three most influential teachers, and Shasha’s other book on algorithmic puzzles The Puzzling Adventures of Dr. Ecco is dedicated to Rabin.

OTM has the following remark by Rabin about whether he sees any limits of what computers can solve: “When we talk about complex tasks, I think we are completely lacking in understanding as of now. For example, we lack understanding as to how human memory works. […] I don’t think this has to do with a difference between the power of the mind and the power of the computer. It is simply that we don’t know how to write a computer program to do it.”

Donald E. Knuth17

Donald E. Knuth‘s contributions are vast and varied, yet each one of them is a monumental epitome of perfection standing the test of time. OTM speaks our mind when it starts with “Can Donald Knuth really be just one person?” Knuth’s popular book series The Art of Computer Programming (TAOCP) established design-and-analysis of algorithms as an important area in itself. His work on mathematical typography for typesetting and font design has empowered everyone to add long-lasting archival value to their work, without worrying about technological changes in printing or publishing.18

Donald E. Knuth
Donald E. Knuth Wikimedia Commons
The section “From Alfred E. Newman to von Neumann” in OTM covers Knuth’s early interest in patterns through words and music, and his unusual proposal in the Westinghouse Science Talent Search: “The potrzebie system of weights and measures”, where one potrzebie equals the thickness of the MAD magazine &hash;26, exemplifies his unique sense of humour.

I was really surprised to learn that Knuth started writing his magnum opus TAOCP initially as a book on how to write a compiler, approached by Addison-Wesley while he was still a graduate student at Caltech! A lot of credit behind its success goes to Knuth’s motivation to meticulously scour through original manuals and manuscripts, follow the documented history of any topic, and build upon it to preserve it for posterity. The idea of LR(k) bottom-up parser with lookahead came to him when he had just finished the first draft of a relevant chapter in TAOCP. “The fundamental idea came naturally, because I had just surveyed what was known before.” In another interview, Knuth has also pointed out how certain policies shape the future: “Case (Case Institute, now known as Case Western Reserve) was one of the very few institutions in the country with a really enlightened attitude that undergraduate students were allowed to touch the computers by themselves, and also write software for the whole campus. […] instead of going the way most places go, which would hire professionals to run their computer centre, Case hired its own students to play with the machines and to do the stuff everybody was doing. There were about a dozen of us there, and we turned out to be fairly good contributors to the computing industry in the long range of things.”19

OTM gives a great exposition of Knuth’s long body of work including attribute grammars, Knuth-Bendix algorithm, Knuth-Morris-Pratt algorithm, and finally how he put everything else on hold for nine years to design and implement two computer languages for digital typography, TeX and METAFONT. As Knuth said, “I do one thing at a time. This is what computer scientists call batch processing—the alternative is swapping in and out. I don’t swap in and out.”

Knuth is truly a keeper of the best traditional recipes and handing them down to the future generations. To quote from the epigraph that TAOCP, Volume 1 opens with:

It has taken us years to do, checking and rechecking countless recipes to bring you only the best, only the interesting, only the perfect. Now we can say, without a shadow of a doubt, that every single one of them, if you follow the directions to the letter, will work for you exactly as well as it did for us, even if you have never cooked before.—McCall’s Cookbook (1963).

Robert E. Tarjan

OTM describes Robert E. Tarjan‘s work as “characterized by minimalism, elegance, and an implicit reach towards generality”. For those familiar with the term “Proofs from THE BOOK” coined by Paul Erdõs, if we were to similarly define “Algorithms from THE BOOK”, Tarjan’s work will unanimously qualify for it. Tarjan credits the problems as much as their solutions, “Half the battle is in choosing the problems—not in coming up with the solutions. If you get the right problems, if you ask the right questions, you’re a long way to the solution.” In John McCarthy’s symbolic processing class for graduate students at Stanford, Tarjan and others were asked to write programs in LISP for planarity testing of graphs. A brute-force implementation using a well-known characterization of planar graphs in terms of forbidden subgraphs (known as Kuratowski’s theorem) was quite impractical. In another interview, Tarjan has mentioned how over the next few years he went from a quadratic-time algorithm of Lempel, Even, and Cederbaum that he implemented in his class to a linear-time planarity testing algorithm using depth-first search with John Hopcroft that formed his PhD thesis.20

Robert E. Tarjan
Robert E. Tarjan Wikimedia Commons
Tarjan has given us so many elegant “Algorithms from THE BOOK’’ that whatever representative examples we pick, it feels like an injustice to the ones left out. One such example missing from OTM is the median-of-medians or BFPRT algorithm, named after Manuel Blum, Robert Floyd, Vaughan Pratt, Ron Rivest, and Robert Tarjan, that has a record number of Turing awardees. Another small personal suggestion for a revised future edition of OTM would be to mention that the surprising appearance of the inverse Ackermann function in the union-find problem was a joint work by Tarjan and van Leeuwen, while analysing an algorithm proposed by van Leeuwen and van der Wiede.21 These would help underline two things. First, Tarjan’s own comment on his success that “Interaction is very important for having a cooperative spirit.” Second, Tarjan’s lasting impact in the elegant analysis—asymptotic, amortized, competitive—apart from the design of new algorithms.

In high school, Tarjan got interested in mathematics from reading “Martin Gardner’s columns in Scientific American—playing games and solving puzzles.” Around the same time, he started working on a “precomputer” at his father’s hospital doing “statistical studies of patient population” during summer breaks. He seemed to have balanced mathematical beauty and practical importance in his choice of problems in later life. The example presented in OTM of Tarjan’s work with Sarnak, Sleator, and Driscoll on persistent data structures is a good one. In a persistent data structure, one can efficiently track revisions without copying the entire data structure. “Its main application has been in temporal databases… designed to get snapshots of the past quickly and efficiently.”

Tarjan was also the very first winner of the newly inaugurated (and formerly known as) Nevanlinna Prize at the ICM 1982.

Leslie Lamport

OTM draws a parallel between Einstein’s work on special relativity and Leslie Lamport‘s work on distributed computing, since both made their respective peers rethink the commonly accepted assumptions about time and space. As the authors revisit Lamport’s childhood, we see instances of his curiosity and his not accepting any proof blindly without questioning. Lamport asked his high school teacher, “How do you know there’s only one way to factor a polynomial?” All his teacher could say was, “It just is” because he couldn’t provide an explanation.

Leslie Lamport
Leslie Lamport Wikimedia Commons

As Lamport says, “I got interested in non-Euclidean geometry and spent a lot of time at the New York Public Library. […] At some point, I read a book with deliberately fallacious proofs of theorems like `all triangles are isosceles’. That must have planted a seed in my mind that caused me to be suspicious of claimed proofs.” Just before entering college, through a summer program sponsored by the City of New York, Lamport ended up working in the computer department of the energy company Consolidated Edison, where he discovered the fun of programming on the computers that were left idle during lunch break.

Lamport had an unusual start to his career; he left mathematics after undergraduate and two years of graduate studies, as he felt “math was pointless because of its distance from reality.” He thought of rejoining graduate studies in music, taught math at a liberal arts college, then pursued research ideas in mathematical physics. Lamport eventually completed his PhD in mathematics but worked part-time on ILIAC IV, a massively parallel computer, for the Massachusetts Computer Associates (known as COMPASS, and later acquired by Applied Data Research). Around this time, Lamport switched his interests from mathematics to computer science, and his work was almost in isolation from the academic research community. It was perhaps beneficial as Lamport says, “I was outside academia so I wasn’t influenced by the current fashion of what one was supposed to do.” Working in isolation, Lamport “discovered” his famous Bakery Algorithm, named after bakery customers picking numbered tokens or tickets to line up, for the mutual exclusion problem proposed by Dijkstra a decade earlier. A lot of subsequent work by Lamport and others on distributed and fault-tolerant systems came from studying the Bakery Algorithm. Lamport credits his moment of inspiration by saying, “I’ve invented many algorithms, but the Bakery Algorithm is the only one I feel I discovered.” OTM chooses to explain Lamport’s paper “Times, Clocks, and the Ordering of Events in Distributed Systems’’ in detail, which goes really well with their Einstein analogy.

After Massachusetts Computer Associates, Lamport moved to Stanford Research Institute (SRI) International, and the book covers his work with Marshall Pease and Robert Shostak at SRI on fault-tolerant distributed systems and the Byzantine Generals problem that won the Dijkstra Prize in 2005.22 Lamport credits the use of digital signatures in their solution to his friendship with cryptographer Whitfield Diffie. OTM also touches upon Lamport’s lifelong obsession with formal verification and his hobby that he is best known for, the formatting system LaTeX that is “the lingua franca of document preparation languages for the sciences”. The book missed out on Lamport’s Paxos protocol23 from 1998 for consensus in distributed systems, and crucial to blockchain and cryptocurrencies of late. Readers might also like Lamport’s more recent interview from 2016, where he explains his lifelong research career in industry instead of academia.24 The secret of his success as shared by Lamport is modest but meaningful, “I happened to be looking at the right problems, at the right time, having the right background.’’

Stephen Cook & Leonid Levin

Stephen Cook
Stephen Cook Stephen Cook
Looking back, it seems incredible how Stephen Cook and Leonid Levin, coming from two disconnected research communities with hardly any communication, following two different lines of ideas, reached the same conclusion about the limitations of efficient computation. OTM explains how Cook’s work had its roots in logic, and building upon the work of Turing, Rabin, and his PhD advisor Hao Wang, Cook was studying the satisfiability problem for propositional logic. On the other hand, Levin’s work had its roots in the study of perebor (`brute force’ in Russian) problems in optimization along with Kolmogorov’s ideas about randomness in sequences of characters, in turn building upon Shannon’s information theory. At first sight, it might seem unfair to give both Cook and Levin only half-a-chapter in the book (well, Tarjan and Hopcroft also received their Turing Award jointly). However, the book justifies it perfectly by giving a detailed history of how two parallel streams from logic and information theory eventually merged. Hardness of computation has made subsequent strides after Cook, Karp and Levin’s work on NP-completeness, especially with the advent of Probabilistically Checkable Proofs or PCPs in 1992, that are not covered in the book. Not so surprisingly (in retrospect), much of this subsequent progress combines both propositional proof systems (proposed by Cook-Reckhow) and error correcting codes (albeit from the viewpoint of Hamming rather than that of Shannon) in a beautiful way.

Leonid Levin
Leonid Levin Wikimedia Commons
It is fascinating to read about Cook’s pastoral childhood on a farm where he milked cows, and also had a childhood inspiration in Wilson Greatbatch, the inventor of the implantable pacemaker, right in his hometown in Clarence, NY, and with whom he even managed to work. A similar childhood inspiration for Levin was Kolmogorov. The book contains a beautiful puzzle posed by Kolmogorov during a school visit, and its solution by the 15-year old Levin. It is also surprising to read their comments on the future of the famous P vs NP problem. Both Cook and Levin sound more optimistic about the possibility of P=NP being just around the corner in 1995, contrary to the general consensus now. The P vs NP problem has been an unscaled peak since then but several attempts to climb it have led to tremendous progress in our understanding of parallelism, randomness, circuit complexity, complexity theory beyond the worst-case analysis, and much more.25 As Cook comments, “It’s hard to say what progress is in a case like this.”

Architects

OTM posits that computer architecture is a lot like building architecture, except for its fast-paced innovation. “In civil architecture, fundamental new ideas arise every few centuries or so, […]. In computer architecture, such ideas arise by the half-decade.’’ The flipside is—“If you are too conservative, the market will sweep you by. If you are too radical, you will risk getting nothing out the door. If it works out, no success is sweeter.”

Frederick P. Brooks, Jr.

Frederick P. Brooks, Jr. (1931 - 2022)
Frederick P. Brooks, Jr. (1931 – 2022) Wikimedia Commons
Frederick P. Brooks, Jr. (1931–2022) led the work on Stretch and IBM System/360 systems, where several of their design decisions have since left a permanent impression—8-bit addressable characters or bytes, instruction pipelining for instruction-level parallelism in time on a single processor, maskable interrupts enabling computers to seamlessly react to external stimuli such as keystrokes, and many others. Brook summarized many of the hard lessons on software engineering in his classic text The Mythical Man-Month. For example, he criticized the practice of adding more people to expedite projects, “The bearing of a child takes nine months, no matter how many women are assigned.” What became famous as Brook’s law states it as “adding manpower to a late software project makes it later.”

In 1964, Brooks found “a real clear calling” to join academia and establish a computer science department at the University of North Carolina at Chapel Hill. Purdue had the only other computer science department at that time, and most other computer scientists worked out of either electrical engineering or mathematics departments. He argued that, “[…] the principal intellectual problems in software engineering are problems of scale […] and the university is the wrong place to do big software projects. So we decided that we would pick computer graphics software and natural language processing’’. Insisting that the computer graphics should not only simulate reality but should do it to amplify intelligence, his later work was a collaboration with biochemists and crystallographers on virtual reality to study molecular structures.

Brook’s research philosophy, in industry as well as academia, as observed in OTM is in complete contrast with Richard Feynman’s What do you care what other people think?—Solve someone else’s problem and make your solution work for them. Brooks says that it keeps you honest when someone can tell you, “What you’re doing is just as pretty as you please, but it isn’t any help to me at all.” Guided by his faith, he also adds an ethical dimension: “Not only should [research] amplify intelligence but it should also solve problems that affect people’s lives.”

Burton J. Smith

Burton J. Smith (1941–2018) is remembered as the primary architect of Denelcor’s Heterogeneous Elements Processor (HEP) and Tera Multi-Threaded Architecture (Tera MTA, later known as Cray MTA), two of the most innovative supercomputers. OTM does a great job of explaining parallelism in space and parallelism in time, and more importantly, dataflow architecture. In data flow, the availability of data determines which operation is executed and when, in contrast with control flow, where it is determined by the ordering of instructions. This approach was first developed by Jack Dennis and others in 1970s and used by Smith and his colleagues at Denelcor. Smith left his tenured faculty position at University of Colorado to join Denelcor because of an insight he had that, “The same machinery that was able to tolerate addition and multiplication latency [the dataflow idea] and asynchrony, could also tolerate memory latency.”

Burton J. Smith (1941 - 2018)
Burton J. Smith (1941 – 2018) Wikimedia Commons
The HEP design at Denelcor did not use cache for each processor. “Instead, processors issue requests to remote memories and do other work while they are waiting. Each processor may work on hundreds of tasks at once. The trick is to design processors that can switch from one task to another very fast.” To design the network and routing of data between processors and memories, Smith cleverly combined hot potato routing (also used for routing packets on the internet) with an Euler tour. Despite its clever design, HEP was a commercial failure as it could not find a good market fit at that time as a general purpose supercomputer. Smith picked up the problem again, and later co-founded Tera Computer Company to design an even more massive supercomputer Tera MTA. Despite the “vagaries of marketplace”, the dream of building general purpose massively-parallel multi-threaded supercomputers survives even today with the world’s first exascale—more than 10^{18} floating points operations per second—supercomputer Frontier (OLCF-5), that can trace its ancestry to Cray and Tera through past acquisitions.

Among Smith’s other contributions are his leadership and contribution in building a topological quantum computer at Microsoft, and his formidable technical breadth and mentorship.26

W. Daniel Hillis

W. Daniel Hillis
W. Daniel Hillis Wikimedia Commons
W. Daniel Hillis has built some iconic, inspiring, and unusual machines. One of them is a Tinkertoy computer he built in his college days—it plays tic-tac-toe and is made out of wooden parts, fishing lines, and sinkers held together by brass escutcheons, currently housed at the Computer History Museum in Mountain View, California.27 Another is the Clock of the Long Now or the 10,000-year clock, a mechanical clock designed to keep time for 10,000 years, and whose prototype is displayed at the Science Museum in London.28 On the more practical side, he co-founded Thinking Machines Corp., where he built Connection Machines, massively parallel supercomputers that grew out of his doctoral research.29 Hillis went against the general perception about the limitations of parallelism suggested by Amdahl’s law, and was instead inspired by massive parallelism in the human brain.

Hillis found the unlikeliest of collaborators in Richard Feynman, the famous theoretical physicist, who worked on implementing John Hopfield’s neural networks on the Connection Machine in his last years.30 Hillis admits that, “I was brought up with an academic bias that somehow science is better than engineering. Feynman convinced me that that wasn’t true, that there is real value in building things and creating things’’. OTM mentions the subsequent work by Hillis on simulating evolution using computer programs as organisms to discover and validate general principles and phenomena, e.g., What is the role of parasites in evolution? When does evolution favour sexual vs. asexual reproduction? However, the book misses out on some of his later work with Disney Imagineering and his award-winning popular science book titled The Pattern on the Stone: The Simple Ideas that Make Computers Work.31 Another interesting trivia about Hillis is that in 2010 he co-invented and patented the pinch-to-zoom feature for touchscreen devices.

Sculptors

OTM portrays sculptors of machine intelligence as over-optimists who dream of making machines smarter than people, even when the very same machines fail at simple tasks that are a cakewalk for even a 5-year old child. It also brings out the apparent contradiction that artificial intelligence has made tremendous progress on tasks that are difficult for humans (e.g., playing chess, medical expertise) while still lagging behind on tasks that are relatively easy for humans (e.g., housecleaning, common sense reasoning).

Edward A. Feigenbaum

Edward A. Feigenbaum, the pioneer of expert systems, had an interesting educational path to computer science that arguably brought him a broader understanding of the practical importance and the potential commercial impact of artificial intelligence. Despite his love for science, he opted for electrical engineering in college due to parental pressure and better career prospects. He came in contact with Herbert Simon by taking a senior-level course on mathematical models in social science. Feigenbaum’s subsequent PhD with Simon was in psychology, and his early career was in a business school. His thesis was on a computer simulated model of a psychological theory of human learning and memory, but he was also closely following cutting-edge developments in machine intelligence such as the Logic Theorist (an automated reasoning system by Allen Newell, Herbert Simon, and J.C. Shaw) and early programming languages such as Fortran etc.32 All of this ties very well with the previous chapters in OTM.

 Edward A. Feigenbaum
Edward A. Feigenbaum Wikimedia Commons
Feigenbaum bided his time for several years looking for the right and impactful applications, and developed the first expert system in chemical discovery called Dendral, and the subsequent medical expert system Mycin. Feigenbaum collaborated with Joshua Lederberg, a Nobel Prize winning geneticist working on exobiology—the search of life on other planets—and Carl Djerassi, a chemist, to develop the first expert system Dendral (Dendritic Algorithm) to identify unknown molecules using mass spectrometry data and the domain knowledge of chemistry. As Feigenbaum explains, “Artificial intelligence methods are the methods of choice in domains of ill-formed problems. Many computational problems of the normal variety—from physics calculations to payroll—are well-formed problems for which algorithms exist. You don’t have to do any search to solve the problem’’. Unlike the previous works that overly rely on reasoning or logic, Feigenbaum instead based his work on “equipping expert systems with sufficient precise knowledge” or what he calls the knowledge principle. Feigenbaum’s prediction for the future is: “Computer programs can claim intellectual niches that evolution did not provide for us marvellous creatures. […] We have to think here’s one intelligent agent, there’s another intelligent agent and they have complementary capabilities. We have to design our systems so that both of these are working together to produce a better result.”

Douglas B. Lenat

Douglas B. Lenat (1950 - 2023)
Douglas B. Lenat (1950 – 2023) Wikimedia Commons
OTM captures Douglas B. Lenat‘s disenchantment with mathematics and physics in college, and how he eventually chose artificial intelligence because “One was that it was positively reinforcing—you would be building something like a mental amplifier that would make you smarter, hence would enable you to do ever more and better things. The second interesting property was that it was clear researchers in the field didn’t know what the hell they were doing.” Lenat’s description of mental amplification is quite compelling. “Physics circa the 1800s and engineering even today have provided amplification to our physical selves. We can travel further and faster than we can walk. We can telecommunicate faster than we can shout. The AI goal is a kind of mental amplification so we can be smarter, be more creative, solve harder problems faster, forget less, and be reminded of more.”

Lenat’s first AI experiment was an automated mathematician, simply called AM, which was a Lisp program for automated discovery of mathematical concepts starting from 115 set theoretical concepts and 243 heuristic rules. Apart from discovering Goldbach’s conjecture, AM did many interesting explorations such as finding the smallest number with many divisors. George Polya’s reaction to this was, “This looks very much like the student of a friend of mine once did’’. Turned out the friend was Hardy and the student was Ramanujan, and Polya was referring to Ramanujan’s work on highly composite numbers!

Lenat is well-known for his work on a decades-long artificial intelligence project called Cyc about representation of a worldwide knowledge base of millions of concepts along with tens of millions of rules or assertions regarding them, enabled with an inference engine that has complex, human-like reasoning ability. Irrespective of the eventual success or failure of Cyc and the arguments on either side, it is definitely one of the boldest bets in the history of artificial intelligence. As Lenat says, “How many people have in their lives a 2 to 10 percent chance of dramatically affecting the way the world works? When one of those chances comes along, you should take it’’.

Back to the future

Looking at the recent advances in artificial intelligence, one wonders if the natural order of linguists, algorithmists, architects, and sculptors is now flipped on its head. Universal large language models have done away with the need for a specially designed language to communicate with machines. Today, first the sculptors like OpenAI have been taking the bold bets on large language models showing newer capabilities that match or surpass humans. The architects propose parameter-efficient models and build distributed platforms for their training and inference. The algorithmists design algorithms to train, fine-tune, and align models to specific requirements of usefulness and safety. Finally, the linguists try to understand the input-output of these models to explain their true capabilities and limitations with respect to the ultimate goal of Artificial General Intelligence (AGI). The language of instruction being a natural language makes things both easier and harder than before. Easier because we do not need a special language to communicate with machines, and harder because the underlying black-box model is difficult to explain using natural language.

As an interesting thought experiment, if one were to rewrite OTM today, what changes or additions would make it better? What people or areas should be included? A good starting point is an interview of the author Dennis Shasha himself.33 Shasha admits, “I avoided the database field because I felt that if I had interviewed one person, maybe other people would have been upset. Probably that was a mistake, so perhaps including the database field would have been good too.” Another important area of research that has great stories of extraordinary minds but it is not covered by OTM is cryptography (except for a passing reference to one-way functions). This is perhaps because the motivations of cryptography research cannot be easily tied to the theme of machine intelligence.

Finally, I think that the book would have been of even greater value had it also included the works of stellar women researchers in the field of computing. Grace Murray Hopper and Margaret Hamilton, and the three women Turing awardees so far, namely, Frances Allen, Barbara Liskov, Shafrira Goldwasser. Perhaps a book on similar lines, now more than twenty five years after the present book under review, as mentioned at the beginning could draw upon these sentiments and more. Nevertheless, Out of their minds was a very timely book of the time, published during the somewhat nascent period of the evolution of computer engineering, for documenting the lives and works of these early thinkers behind the intelligent machines. This is one of the few books that document oral histories of the giants in the fields of computer science and artificial intelligence. Computer History Museum’s oral history program34 and Science Lives project of the Simons Foundation35 have recorded similar oral histories of computer scientists and mathematicians more recently, whose videos and transcripts are publicly available. Academic publications only tell us a (partial) story of the path from a problem statement to its solution, whereas oral histories play a crucial role in our understanding of how the problem statements were formulated in the first place, thought processes behind their solution, and most importantly, the failed attempts as well. Out of their minds definitely underlines the importance of oral histories in this context.\blacksquare

Footnotes

  1. With due credit to the poems A Psalm of Life by Henry Wadsworth Longfellow and Writing A Résumé (English translation) by Wisława Szymborska.
  2. 11 of these 15 scientists have received the Turing award to date.
  3. See Ingerman’s suggestion about naming it Pāṇini-Backus form to give due credit, Communications of ACM https://dl.acm.org/doi/10.1145/363162.363165.
  4. https://dl.acm.org/doi/10.1145/72551.72554.
  5. https://www.computerhistory.org/collections/catalog/102657970.
  6. https://raysolomonoff.com/dartmouth/boxa/dart564props.pdf.
  7. http://shelf1.library.cmu.edu/IMLS/MindModels/logictheorymachine.pdf.
  8. http://www-formal.stanford.edu/jmc/mcc59.pdf.
  9. https://openreview.net/.
  10. https://archive.computerhistory.org/resources/access/text/2012/10/102658149-05-01-acc.pdf.
  11. https://www.latimes.com/local/obituaries/la-me-john-mccarthy-20111027-story.html.
  12. https://archive.computerhistory.org/resources/access/text/2016/08/102658340-05-01-acc.pdf.
  13. https://www.cs.utexas.edu/users/EWD/transcriptions/EWD06xx/EWD638.html.
  14. https://www.podc.org/influential/2002-influential-paper/.
  15. https://annals.math.princeton.edu/2004/160-2/p12.
  16. https://cacm.acm.org/news/an-interview-with-michael-rabin/.
  17. Editor’s note: See January 2023 issue of Bhāvanā for an interview with Donald Knuth.
  18. https://projecteuclid.org/journals/bulletin-of-the-american-mathematical-society-new-series/volume-1/issue-2/Mathematical-typography/bams/1183544082.full.
  19. https://www.computerhistory.org/collections/catalog/102658053.
  20. https://amturing.acm.org/pdf/TarjanTuringTranscript.pdf.
  21. https://dl.acm.org/doi/10.1145/62.2160.
  22. https://www.podc.org/dijkstra/2005-dijkstra-prize.
  23. https://dl.acm.org/doi/10.1145/279227.279229.
  24. https://archive.computerhistory.org/resources/access/text/2017/07/102717182-05-01-acc.pdf.
  25. https://www.math.ias.edu/~avi/PUBLICATIONS/MYPAPERS/W06/w06.pdf.
  26. https://www.sigarch.org/remembering-burton-smith.
  27. https://www.computerhistory.org/collections/catalog/X39.81.
  28. https://www.wired.com/1995/12/the-millennium-clock.
  29. https://dspace.mit.edu/handle/1721.1/14719.
  30. https://longnow.org/ideas/richard-feynman-and-the-connection-machine/.
  31. https://www.hachettebookgroup.com/titles/w-daniel-hillis/the-pattern-on-the-stone/9780465066933.
  32. https://archive.computerhistory.org/resources/access/text/2012/04/102658162-05-01-acc.pdf.
  33. https://sigmod.org/publications/interviews/pdf/05.profiles.shasha.pdf.
  34. https://computerhistory.org/oral-histories/
  35. https://www.simonsfoundation.org/series/science-lives/
Amit Deshpande works at Microsoft Research in Bengaluru. He is also a corresponding editor of Bhāvanā.