Alan Kay’s advice to computer science students

I’m once again going to quote a Quora answer verbatim, because I think there’s a lot of value in it. Alan Kay answered What book(s) would you recommend to a computer science student?

My basic answer is: read a lot outside of the computer field.

It is worth trying to understand what “science” means in “Computer Science” and what “engineering” means in “Software Engineering”.

“Science” in its modern sense means trying to reconcile phenomena into models that are as explanatory and predictive as possible. There can be “Sciences of the Artificial” (see the important book by Herb Simon). One way to think of this is that if people (especially engineers) build bridges, then these present phenomena for scientists to understand by making models. The fun of this is that the science will almost always indicate new and better ways to make bridges, so friendly collegial relationships between scientists and engineers can really make progress.

An example in computing is John McCarthy thinking about computers in the late 50s, the really large range of things they can do (maybe AI?), and creating a model of computing as a language that could serve as its own metalanguage (LISP). My favorite book on this is “The Lisp 1.5 Manual” from MIT Press (written by McCarthy et al.). The first part of this book is still a classic on how to think in general, and about computing in particular.

(A later book inspired by all this is “Smalltalk: the language and its implementation” (by Adele Goldberg and Dave Robson — the “Blue Book”). Also contains a complete implementation in Smalltalk written in itself, etc.)

A still later book that I like a lot that is “real computer science” is “The Art of the Metaobject Protocol” by Kiszales, Bobrow, Rivera,). The early part of this book especially is quite illuminating.

An early thesis (1970) that is real computer science is “A Control Definition Language” by Dave Fisher (CMU).

Perhaps my favorite book about computing might seem far afield, but it is wonderful and the writing is wonderful: “Computation: Finite and Infinite Machines” by Marvin Minsky (ca 1967). Just a beautiful book.

To help with “science”, I usually recommend a variety of books: Newton’s “Principia” (the ultimate science book and founding document), “The Molecular Biology of the Cell” by Bruce Alberts, et al. There’s a book of Maxwell’s papers, etc.

You need to wind up realizing that “Computer Science” is still an aspiration, not an accomplished field.

“Engineering” means “designing and building things in principled expert ways”. The level of this is very high for the engineering fields of Civil, Mechanical, Electrical, Biological, etc. Engineering. These should be studied carefully to get the larger sense of what it means to do “engineering”.

To help with “engineering” try reading about the making of the Empire State Building, Boulder Dam, the Golden Gate Bridge, etc. I like “Now It Can Be Told” by Maj Gen Leslie Groves (the honcho on the Manhattan Project). He’s an engineer, and this history is very much not from the Los Alamos POV (which he also was in charge of) but about Oak Ridge, Hanford, etc and the amazing mobilization of 600,000 plus people and lots of money to do the engineering necessary to create the materials needed.

Then think about where “software engineering” isn’t — again, you need to wind up realizing that “software engineering” in any “engineering” sense is at best still an aspiration not a done deal.

Computing is also a kind of “media” and “intermediary”, so you need to understand what these do for us and to us. Read Marshall McLuhan, Neil Postman, Innis, Havelock, etc. Mark Miller (comment below) just reminded me that I’ve recommended “Technics and Human Development,” Vol. 1 of Lewis Mumford’s “The Myth of the Machine” series, as a great predecessor of both the media environment ideas and of an important facet of anthropology.

I don’t know of a great anthropology book (maybe someone can suggest), but the understanding of human beings is the most important thing to accomplish in your education. In a comment below, Matt Gaboury recommended “Human Universals” (I think he means the book by Donald Brown.) This book certainly should be read and understood — it is not in the same class as books about a field, like “Molecular Biology of the Cell”.

I like Ed Tufte’s books on “Envisioning Information”: read all of them.

Bertrand Russell’s books are still very good just for thinking more deeply about “this and that” (“A History of Western Philosophy” is still terrific).

Multiple points of view are the only way to fight against human desires to believe and create religions, so my favorite current history book to read is: “Destiny Disrupted” by Tamim Ansary. He grew up in Afghanistan, moved to the US at age 16, and is able to write a clear illuminating history of the world from the time of Mohammed from the point of view of this world, and without special pleading.

Note: I checked out “The Art of the Metaobject Protocol,” and it recommended that if you’re unfamiliar with CLOS (the Common Lisp Object System) that you learn that before getting into this book. It recommended, “Object-Oriented Programming in Common LISP: A Programmer’s Guide to CLOS,” by Sonya Keene.

As I’ve been taking this track, reconsidering what I think about computer science, software engineering, and what I can do to advance society, making computing a part of that, I’ve been realizing that one of Kay’s points is very important. If we’re going to make things for people to use, we need to understand people well, specifically how the human system interfaces with computer systems, and how that interaction can help people better their condition in our larger system that we call “society” (this includes its economy, but it should include many other social facets), and more broadly, the systems of our planet. This means spending a significant amount of time on other things besides computer stuff. I can tell you from experience, this is a strange feeling, because I feel like I should be spending more time on technical subjects, since that’s what I’ve done in the past, and that’s been the stock in trade of my profession. I want that to be part of my thinking, but not what I eat and sleep.

Related post:

The necessary ingredients of computer science

The necessary ingredients for computer science

This is the clearest explanation I’ve seen Alan Kay give for what the ARPA community’s conception of computer science was. His answer on Quora was particularly directed at someone who was considering entering a CS major. Rather than summarize what he said, I’m just going to let him speak for himself. I’d rather preserve this if anything happens to Quora down the road. There is a lot of gold here.

As a beginner, what are the best ways to approach Computer Science?

If you are just looking to get a job in computing, don’t bother to read further.

First, there are several things to get clear with regard to any field.

  • What is the best conception of what the field is about?
  • What is the best above threshold knowledge to date?
  • How incomplete is the field; how much will it need to change?

When I’ve personally asked most people for a definition of “Computer Science” I’ve gotten back an engineering definition, not one of a science. Part of what is wrong with “CS” these days is both a confusion about what it is, and that the current notion is a weak idea.

The good news is that there is some above threshold knowledge. The sobering news is that it is hard to find in any undergrad curriculum. So it must be ferreted out these days.

Finally, most of the field is yet to be invented — or even discovered. So the strategies for becoming a Computer Scientist have to include learning how to invent, learning how to change, learning how to criticize, learning how to convince.

Most people in the liberal arts would not confuse learning a language like English and learning to read and write well in it, with the main contents of the liberal arts — which, in a nutshell, are ideas. The liberal arts spans many fields, including mathematics, science, philosophy, history, literature, etc. and we want to be fluent about reading and writing and understanding these ideas.

So programming is a good thing to learn, but it is most certainly not at the center of the field of computer science! When the first ever Turing Award winner says something, we should pay attention, and Al Perlis — who was one of if not the definer of the term said: “Computer Science is the Science of Processes”, and he meant all processes, not just those that run on digital computers. Most of the processes in the world to study are more complex than most of the ones we’ve been able to build on computers, so just looking at computers is looking at the least stuff you can look at.

Another way to figure out what you should be doing, is to realize that CS is also a “blank canvas” to “something” kind of field — it produces artifacts that can be studied scientifically, just as the building of bridges has led to “bridge science”. Gravitational and weather forces keep bridge designers honest, but analogous forces are extremely weak in computing, and this allows people who don’t know much to get away with murder (rather like the fashion field, where dumb designs can even become fads, and failures are not fatal). Getting into a “learned science of designs that happen to be dumb” is not the best path!

We (my research community) found that having an undergraduate degree in something really difficult and developed helped a lot (a) as a bullshit detector for BS in computing (of which there is a lot), (b) as a guide to what a real “Computer Science” field should be and could be like, and (c) to provide a lot of skills on the one hand and heuristic lore on the other for how to make real progress. Having a parallel interest in the arts, especially theater, provides considerable additional perspective on what UI design is really about, and also in the large, what computing should be about.

So I always advise young people -not- to major in computing as an undergraduate (there’s not enough “there there”) but instead to actually learn as much about the world and how humans work as possible. In grad school you are supposed to advance the state of the art (or at least this used to be the case), so you are in a better position with regard to an unformed field.

Meanwhile, since CS is about systems, you need to start learning about systems, and not to restrict yourself just those on computers. Take a look at biology, cities, economics, etc just to get started.

Finally, at some age you need to take responsibility for your own education. This should happen in high school or earlier (but is rare). However, you should not try to get through college via some degree factory’s notion of “certification” without having formed a personal basis for criticizing and choosing. Find out what real education is, and start working your way through it.

When I’ve looked over other material written by Kay, and what has been written about his work, I’ve seen him talk about a “literate” computing science. This ties into his long-held view that computing is a new medium, and that the purpose of a medium is to allow discussion of ideas. I’ve had a bit of a notion of what he’s meant by this for a while, but the above made it very clear in my mind that a powerful purpose of the medium is to discuss systems, and the nature of them, through the fashioning of computing models that incorporate developed notions of one or more specific kinds of systems. The same applies to the fashioning of simulations. In fact, simulations are used for the design of said computer systems, and their operating systems and programming languages, before they’re manufactured. That’s been the case for decades. This regime can include the study of computer systems, but it must not be restricted to them. This is where CS currently falls flat. What this requires, though, is some content about systems, and Kay points in some directions for where to look for it.

Computing is a means for us to produce system models, and to represent them, and thereby to develop notions of what systems are, and how they work, or don’t work. This includes regarding human beings as systems. When considering how people will interact with computers, the same perspective can apply, looking at it as how two systems will interact. I’m sure this sounds dehumanizing, but from what I’ve read of how the researchers at Xerox PARC approached developing Smalltalk and the graphical user interface as a learning environment, this is how they looked at it, though they talked about getting two “participants” (the human and the computer) to interact/communicate well with each other (See discussion of “Figure 1” in “Design Principles Behind Smalltalk”). It involved scientific knowledge of people and of computer system design. It seems this is how they achieved what they did.

— Mark Miller, https://tekkie.wordpress.com

SICP: Chapter 3 and exercises 3.59, 3.60, and 3.62

Prologue

SICP reaches a point, in Chapter 3, where for significant parts of it you’re not doing any coding. It has exercises, but they’re all about thinking about the concepts, not doing anything with a computer. It has you do substitutions to see what expressions result. It has you make diagrams that focus in on particular systemic aspects of processes. It also gets into operational models, talking about simulating logic gates, how concurrent processing can work (expressed in hypothetical Scheme code). It’s all conceptual. Some of it was good, I thought. The practice of doing substitutions manually helps you really get what your Scheme functions are doing, rather than guessing. The rest didn’t feel that engaging. One could be forgiven for thinking that the book is getting dry at this point.

It gets into some coding again in Section 3.3, where it covers building data structures. It gets more interesting with an architecture called “streams” in Section 3.5.

One thing I will note is that the only way I was able to get the code in Section 3.5 to work in Racket was to go into “Lazy Scheme.” I don’t remember what language setting I used for the prior chapters, maybe R5RS, or “Pretty Big.” Lazy Scheme does lazy evaluation on Scheme code. One can be tempted to think that this makes using the supporting structures for streams covered in this section pointless, because the underlying language is doing the delayed evaluation that this section implements in code. Anyway, Lazy Scheme doesn’t interfere with anything in this section. It all works. It just makes the underlying structure for streams redundant. For the sake of familiarity with the code this section discusses (which I think helps in preserving one’s sanity), I think it’s best to play along and use its code.

Another thing I’ll note is this section makes extensive use of knowledge derived from calculus. Some other parts of this book do that, too, but it’s emphasized here. It helps to have that background.

I reached a stopping point in SICP, here, 5 years ago, because of a few things. One was I became inspired to pursue a history project on the research and development that led to the computer technology we use today. Another is I’d had a conversation with Alan Kay that inspired me to look more deeply at the STEPS project at Viewpoints Research, and try to make something of my own out of that. The third was Exercise 3.61 in SICP. It was a problem that really stumped me. So I gave up, and looked up an answer for it on the internet, in a vain attempt to help me understand it. The answer didn’t help me. It worked when I tried the code, but I found it too confusing to understand why it produced correct results. The experience was really disappointing, and disheartening. Looking it up was a mistake. I wished that I could forget I’d seen the answer, so I could go back to working on trying to figure it out, but I couldn’t. I’d seen it. I worked on a few more exercises after that, but then I dropped SICP. I continued working on my history research, and I got into exploring some fundamental computing concepts on processors, language parsing, and the value of understanding a processor as a computing/programming model.

I tried an idea that Kay told me about years earlier, of taking something big, studying it, and trying to improve on it. That took me in some interesting directions, but I hit a wall in my skill, which I’m now trying to get around. I figured I’d continue where I left off in SICP. One way this diversion helped me is I basically forgot the answers for the stuff I did. So, I was able to come back to the problem, using a clean slate, almost fresh. I had some faded memories of it, which didn’t help. I just had to say to myself “forget it,” and focus on the math, and the streams architecture. That’s what finally helped me solve 3.60, which then made 3.61 straightforward. That was amazing.

A note about Exercise 3.59

It was a bit difficult to know at first why the streams (cosine-series and sine-series) were coming out the way they were for this exercise. The cosine-series is what’s called an even series, because its powers are even (0, 2, 4, etc.). The sine-series is what’s called an odd series, because its powers are odd (1, 3, 5, etc.). However, when you’re processing the streams, they just compute a general model of series, with all of the terms, regardless of whether the series you’re processing has values in each of the terms or not. So, cosine-series comes out (starting at position 0) as: [1, 0, -1/2, 0, 1/24, …], since the stream is computing a0x0 + a1x1 + a2x2 …, where ai is each term’s coefficient, and some of the terms are negative, in this case. The coefficients of the terms that don’t apply to the series come out as 0. With sine-series, it comes out (starting at position 0) as: [0, 1, 0, -1/6, 0, 1/120, …].

What’s really interesting is that exercises 3.59, 3.61, and 3.62 are pretty straightforward. Like with some of the prior exercises, all you have to do is translate the math into stream terms (and it’s a pretty direct translation from one to the other), and it works! You’re programming in mathland! I discovered this in the earlier exercises in this section, and it amazed me how expressive this is. I could pretty much write code as if I was writing out the math. I could think in terms of the math, not so much the logistics of implementing computational logic to make the math work. At the same time, this felt disconcerting to me, because when things went wrong, I wanted to know why, computationally, and I found that working with streams, it was difficult to conceptualize what was going on. I felt as though I was just supposed to trust that the math I expressed worked, just as it should. I realize now that’s what I should have done. It was starting to feel like I was playing with magic. I didn’t like that. It was difficult for me to trust that the math was actually working, and if it wasn’t, that it was because I was either not understanding the math, and/or not understanding the streams architecture, not because there was something malfunctioning underneath it all. I really wrestled with that, and I think it was for the good, because now that I can see how elegant it all is, it looks so beautiful!

Exercise 3.60

This exercise gave me a lot of headaches. When I first worked on it 5 years ago, I kind of got correct results with what I wrote, but not quite. I ended up looking up someone else’s solution to it, which worked. It was kind of close to what I had. I finally figured out what I did wrong when I worked on it again recently: I needed to research how to multiply power series, but in a computationally efficient manner. It turns out you need to use a method called the Cauchy product, but there’s a twist, because of the way the streams architecture works. A lot of math sources I looked up use the Cauchy product method, anyway (including a source that covers power series multiplication that I cite below), but the reason it needs to be used is it automatically collects all like terms as each term of the product is produced. The problem you need to work around is that you can’t go backwards through a stream, except by using stream-ref and indexes, and I’ve gotten the sense by going through these exercises that when it comes to doing math problems, you’re generally not supposed to be doing that, though there’s an example later in SICP where they talk about a method Euler devised for accelerating computation of power series where they use stream-ref to go backwards and forwards inside a stream. I think that’s an exception.

Here are a couple sources that I found helpful when trying to work this out:

Formal Power Series, from Wikipedia. Pay particular attention to the section on “Operations on Formal Power Series.” It gives succinct descriptions on multiplying power series, inverting them (though I’d only pay attention to the mathematical expression of this in Exercise 3.61), and dividing them (which you’ll need for Exercise 3.62).

Edit 6/4/2018: I originally had a video here, which I thought gave a good illustration of what the Cauchy product accomplishes. It also provided a nice visual aide that pointed me toward a solution to this problem in SICP. Unfortunately, the video has disappeared from YouTube. So, I’ll substitute a presentation of the Cauchy product that I used as a tool for figuring out how to do this exercise. This will not give you the answer for how to do this exercise, but it’s on the way to the answer I got.

Let A and B be power series that we’re going to multiply. We will use ai to represent the coefficients for A, and bi to represent the coefficients for B:

A = a0x0 + a1x1 + a2x2 + …

B = b0x0 + b1x1 + b2x2 + …

Multiplying these series creates a new series, C, made by creating a sum of products from terms in A and B:

A x B = C

C0 = a0x0b0x0

C1 = a1x1b0x0 + a0x0b1x1

C2 = a2x2b0x0 + a1x1b1x1 + a0x0b2x2

C3 = a3x3b0x0 + a2x2b1x1 + a1x1b2x2 + a0x0b3x3

C4 = …

A x B = C0 + C1 + C2 + C3 + C4 + …

You’ll note that, with the exception of the very first term, as we progress to computing each term, we start with the ith term in series A, and go down to the 0th term, but with series B, we start with the 0th term, and go upward to the ith term.

To reiterate, you can’t use the stream architecture in SICP to compute the product for this exercise using the exact method I’ve outlined above (when I thought about trying to do that, it get extremely convoluted, and it’s not worth pursuing), but there is a way to compute the product series that allows you to go through both streams from left to right.

What helped me find a succinct way to implement mul-series was to just focus on the coefficients:

a0b0
a1b0 + a0b1
a2b0 + a1b1 + a0b2
a3b0 + a2b1 + a1b2 + a0b3

A couple hints I’ll give are:

  • There is a way to do the Cauchy product without using mutable state in variables, nor is it necessary to do anything particularly elaborate. You can do it just using the stream architecture.
  • Think about how addition is commutative, as you look at the above pattern of computation.

The solution is a little counterintuitive, but once you find it, it’s pretty neat.

It is crucial that you get the solution for this exercise right, though, because the next two exercises (3.61 and 3.62) will give you no end of headaches if you don’t. Exercise 3.61 builds on what you create for this one, and 3.62 builds on what you create for 3.61.

Exercise 3.60 says you can try out mul-series using sin(x) and cos(x), squaring both, and then adding the product series together. This should give you a result of 1. How is “1” represented in streams, though? Well, it makes sense that it will be in the constant-term position (the zeroth position) in the stream. That’s where a scalar value would be in a power series.

There is a related concept to working with power series called a unit series. A unit series is just a list of the coefficients in a power series. If you’ve done the previous exercises where SICP says you’re working with power series, this is what you’re really working with (though SICP doesn’t mention this). It’s why in Exercise 3.61 SICP has you writing a function called “invert-unit-series”.

The unit series equivalent for the scalar value 1 is [1, 0, 0, 0, 0, …].

A note about Exercise 3.62

The exercise talks about using your div-series function to compute tan-series. Here was a good source for finding out how to do that:

Trigonometric Functions

The unit series for tan-series should come out as: [0, 1, 0, 1/3, 0, 2/15, 0, 17/315, …]

Goals for software engineering

This is going to strike people as a rant, but it’s not. It’s a recommendation to the fields of software engineering, and by extension, computer science, at large, on how to elevate the discussion in software engineering. This post was inspired by a question I was asked to answer on Quora about whether object-oriented programming or functional programming offers better opportunities for modularizing software design into simpler units.

So, I’ll start off by saying that what software engineering typically calls “OOP” is not what I consider OOP. The discussion that this question was about is likely comparing abstract data types, in concept, with functional programming (FP). What they’re also probably talking about is how well each programming model handles data structures, how the different models implement modular functionality, and which is preferable for dealing with said data structures.

Debating stuff like this is so off from what the discussion could be about. It is a fact that if an OOP model is what is desired, one can create a better one in FP than what these so-called “OOP” proponents are likely talking about. It would take some work, but it could be done. It wouldn’t surprise me if there already are libraries for FP languages that do this.

The point is what’s actually hampering the discussion about what programming model is better is the ideas that are being discussed, and more broadly, the goals. A modern, developed engineering discipline would understand this. Yes, both programming models are capable of decomposing tasks, but the whole discussion about which does it “better” is off the mark. It has a weak goal in mind.

I remember recently answering a question on Quora regarding a dispute between two people over which of two singers, who had passed away in recent years, was “better.” I said that you can’t begin to discuss which one was better on a reasonable basis until you look at what genres they were in. In this case, I said each singer used a different style. They weren’t trying to communicate the same things. They weren’t using the same techniques. They were in different genres, so there’s little point in comparing them, unless you’re talking about what style of music you like better. By analogy, each programming model has strengths and weaknesses relative to what you’re trying to accomplish, but in order to use them to best effect, you have to consider whether each is even a good fit architecturally for the system you’re trying to build. There may be different FP languages that are a better fit than others, or maybe none of them fit well. Likewise, for the procedural languages that these proponents are probably calling “OOP.” Calling one “better” than another is missing the point. It depends on what you’re trying to do.

Comparisons about programming models in software engineering tend to wind down to familiarity, at some point; which languages can function as a platform that a large pool of developers know how to use, because no one wants to be left holding the bag if developers decide to up and leave. I think this argument misses a larger problem: How do you replace the domain knowledge those developers had? The most valuable part of a developer’s knowledge base is their understanding of the intent of the software design; for example, how the target business for the system operates, and how that intersects with technical decisions that were made in the software they worked on. Usually, that knowledge is all in their heads. It’s hardly documented. It doesn’t matter that you can replace those developers with people with the same programming skills, because sure, they can read the code, but they don’t understand why it was designed the way it was, and without that, it’s going to be difficult for them to do much that’s meaningful with it. That knowledge is either with other people who are still working at the same business, and/or the ones who left, and either way, the new people are going to have to be brought up to speed to be productive, which likely means the people who are still there are going to have to take time away from their critical duties to train the new people. Productivity suffers even if the new people are experts in the required programming skills.

The problem with this approach, from a modern engineering perspective, goes back to the saying that if all someone knows how to use is a hammer, every problem looks like a nail to them. The problem is with the discipline itself; that this mentality dominates the thinking in it. And I must say, computer science has not been helping with this, either. Rather than exploring how to make the process of building a better-fit programming model easier, they’ve been teaching a strict set of programming models for the purpose of employing students in industry. I could go on about other nagging problems that are quite evident in the industry that they are ignoring.

I could be oversimplifying this, but my understanding is modern engineering has much more of a form-follows-function orientation, and it focuses on technological properties. It does not take a pre-engineered product as a given. It looks at its parts. It takes all of the requirements of a project into consideration, and then applies analysis technique and this knowledge of technological properties to finding a solution. It focuses a lot of attention on architecture, trying to make efficient use of materials and labor, to control costs. This focus tends to make the scheduling and budgeting for the project more predictable, since they are basing estimates on known constraints. They also operate on a principle that simpler models (but not too simple) fail less often, and are easier to maintain than overcomplicated ones. They use analysis tools that help them model the constraints of the design, again, using technological properties as the basis, taking cognitive load off of the engineers.

Another thing is they don’t think about “what’s easier” for the engineers. They think about “what’s easier” for the people who will be using what’s ultimately created, including engineers who will be maintaining it, at least within cost constraints, and reliability requirements. The engineers tasked with creating the end product are supposed to be doing the hard part of trying to find the architecture and materials that fit the requirements for the people who will be using it, and paying for it.

Clarifying this analogy, what I’m talking about when I say “architecture” is not, “What data structure/container should we use?” It’s more akin to the question, “Should we use OOP or FP,” but it’s more than that. It involves thinking about what relationships between information and semantics best suite the domain for which the system is being developed, so that software engineers can better express the design (hopefully as close to a “spec” as possible), and what computing engine design to use in processing that programming model, so it runs most efficiently. When I talk about “constraints of materials,” I’m analogizing that to hardware, and software runtimes, and what their speed and load capacity is, in terms of things like frequency of requests, and memory. In short, what I’m saying is that some language and VM/runtime design might be necessary for this analogy to hold.

What this could accomplish is ultimately documenting the process that’s needed—still in a formal language—and using that documentation to run the process. So, rather than requiring software engineers to understand the business process, they can instead focus on the description and semantics of the engine that allows the description of the process to be run.

What’s needed is thinking like, “What kind of system model do we need,” and an industry that supports that kind of thinking. It needs to be much more technically competent to do this. I know this is sounding like wishful thinking, since people in the field are always trying to address problems that are right in front of them, and there’s no time to think about this. Secondly, I’m sure it sounds like I’m saying that software engineers should be engaged in something that’s impossibly hard, since I’m sure many are thinking that bugs are bad enough in existing software systems, and now I’m talking about getting software engineers involved in developing the very languages that will be used to describe the processes. That sounds like I’m asking for more trouble.

I’m talking about taking software engineers out of the task of describing customer processes, and putting them to work on the syntactic and semantic engines that enable people familiar with the processes to describe them. Perhaps I’m wrong, but I think this reorientation would make hiring programming staff based on technical skills easier, in principle, since not as much business knowledge would be necessary to make them productive.

Thirdly, “What about development communities?” Useful technology cannot just stand on its own. It needs an industry, a market around it. I agree, but I think, as I already said, it needs a more technically competent industry around it, one that can think in terms of the engineering processes I’ve described.

It seems to me one reason the industry doesn’t focus on this is we’ve gotten so used to the idea that our languages and computing systems need to be complex, that they need to be like Swiss Army knives that can handle every conceivable need, because we seem to need those features in the systems that have already been implemented. They reality is the reason they’re complex is a) they have been built using semantic systems that are not well suited to the problem they’re trying to solve, and b) they’re really designed to be “catch all” systems that anticipate a wide variety of customer needs. So, the problem you’re trying to solve is but a subset of that. We’ve been coping with the “Swiss Army knife” designs of others for decades. What’s actually needed is a different knowledge set that eschews from the features we don’t need for the projects we want to complete, that focuses on just the elements that are needed, with a practice that focuses on design, and its improvement.

Very few software engineers and computer scientists have had the experience of using a language that was tailored to the task they were working on. We’ve come to think that we need feature-rich languages and/or feature-rich libraries to finish projects. I say no. That is a habit, thinking that programming languages are communication protocols not just with computers, but with software engineers. What would be better is a semantic design scheme for semantic engines, having languages on top of them, in which the project can be spec’d out, and executed.

As things stand, what I’m talking about is impractical. It’s likely there are not enough software engineers around with the skills necessary to do what I’m describing to handle the demand for computational services. However, what’s been going on for ages in software engineering has mostly been a record of failure, with relatively few successes (the vast majority of software projects fail). Discussions, like the one described in the question that inspired this post, are not helping the problem. What’s needed is a different kind of discussion, I suggest using the topic scheme I’ve outlined here.

I’m saying that software engineering (SE) needs to take a look at what modern engineering disciplines do, and do their best to model that. CS needs to wonder what scientific discipline it can most emulate, which is what’s going to be needed if SE is going to improve. Both disciplines are stagnating, and are being surpassed by information technology management as a viable solution scheme for solving computational problems. However, that leads into a different problem, which I talked about 9 years ago in IT: You’re doing it completely wrong.

Related posts:

Alan Kay: Rethinking CS education

The future is not now

— Mark Miller, https://tekkie.wordpress.com

SICP: A quick note about Exercise 1.28

It’s been 5 years since I’ve written something on SICP–almost to the day! I recently provided some assistance to a reader of my blog on this exercise. In the process, I of course had to understand something about it. Like with Exercise 1.19, I didn’t think this one was written that clearly. It seemed like the authors assumed that the students were familiar with the Miller-Rabin test, or at least modular mathematics.

The exercise text made it sound like one could implement Miller-Rabin as a modification to Fermat’s Little Theorem (FLT). From providing my assistance, I found out that one could do it as a modification to FLT, though the implementation looked a bit convoluted. When I tried solving it myself, I felt more comfortable just ignoring the prior solutions for FLT in SICP, and completely rewriting the expmod routine. Along with that, I wrote some intermediate functions, which produced results that were ultimately passed to expmod. My own solution was iterative.

I found these two Wikipedia articles, one on modular arithmetic, the other on Miller-Rabin, to be more helpful than the SICP exercise text in understanding how to implement it. Every article I found on Miller-Rabin, doing a Google search, discussed the algorithm for doing the test. It was still a bit of a challenge to translate the algorithm to Scheme code, since they assumed an imperative style, and there is a special case in the logic.

It seemed like when it came right down to it, the main challenge was understanding how to calculate factors of n – 1 (n being the candidate tested for primality), and how to handle the special case, while still remaining within the confines of what the exercise required (to do the exponentiation, use the modulus function (remainder), and test for non-trivial roots inside of expmod). Once that was worked out, the testing for non-trivial roots turned out to be pretty simple.

If you’re looking for a list of primes to test against your solution, here are the first 1,000 known primes.

Beginning the journey of becoming a computer scientist

“We need to do away with the myth that computer science is about computers. Computer science is no more about computers than astronomy is about telescopes, biology is about microscopes or chemistry is about beakers and test tubes. Science is not about tools, it is about how we use them and what we find out when we do.”
— “SIGACT trying to get children excited about CS,”
by Michael R. Fellows and Ian Parberry

There have been many times where I’ve thought about writing about this over the years, but I’ve felt like I’m not sure what I’m doing yet, or what I’m learning. So I’ve held off. I feel like I’ve reached a point now where some things have become clear, and I can finally talk about them coherently.

I’d like to explain some things that I doubt most CS students are going to learn in school, but they should, drawing from my own experience.

First, some notions about science.

Science is not just a body of knowledge (though that is an important part of it). It is also venturing into the unknown, and developing the idea within yourself about what it means to really know something, and to understand that knowing something doesn’t mean you know the truth. It means you have a pretty good idea of what the reality of what you are studying is like. You are able to create a description that somehow resembles the reality of what you’re studying to such a degree that you are able to make predictions about it, and those predictions can be confirmed within some boundaries that are either known, or can be discovered, and subsequently confirmed by others without the need to revise the model. Even then, what you have is not “the truth.” What you have might be a pretty good idea for the time being. As Richard Feynman said, in science you never really know if you are right. All you can be certain of is that you are wrong. Being wrong can be very valuable, because you can learn the limits of your common sense perceptions, and knowing that, direct your efforts towards what is more accurate, what is closer to reality.

The field of computing and computer science doesn’t have this perspective. It is not science. My CS professors used to admit this openly, but my understanding is CS professors these days aren’t even aware of this distinction anymore. In other words, they don’t know what science is, but they presume that what they practice is science. CS, as it’s taught, has some value, but what I hope to introduce people to here is the notion that after they have graduated with a CS degree, they are in kindergarten, or perhaps first grade. They have learned some basics about how to read and write, but they have not learned what is really powerful about that skill. I will share some knowledge I have gleaned as someone who has ventured as a beginner in this perspective.

The first thing to understand is that an undergraduate computer science education doesn’t tend to give you any notions about what computing is. It doesn’t explore models of computing, and in many cases, it doesn’t even present much in the way of different models of programming. It presents one model of computing (the Von Neumann model), but it doesn’t venture deeply into concepts of how the model works. It provides some descriptions of entities that are supposed to make it work, but it doesn’t tie them together so that you can see the relationships; see how the thing really works.

Science is about challenging and forming models of phenomena. A more powerful notion of computing is realized by understanding that the point of computer science is not about the application of computing to problems, but about creating, criticizing, and studying the strengths and weaknesses of different models of computing systems, and that programming is a means of modeling our ideas about them. In other words, programming languages and development environments can be used to model, explore, and test, and criticize our notions about different computing system models. It behooves the computer scientist to either choose an existing computing architecture represented by an existing language, or to create a new architecture, and a new language, that is suitable to what is being attempted, so that the focus can be maintained on what is being modeled, and testing that model without a lot of distraction.

As Alan Kay has said, the “language” part of “programming language” is really a user interface. I suggest that it is a user interface to a model of computing, a computing architecture. We can call it a programming model. As such, it presents a simulated environment of what is actually being accomplished in real terms, which enables a certain way of seeing the model you are trying to construct. (As I said earlier, we can use programming (with a language) to model our notions of computing.) In other words, you can get out of dealing with the guts of a certain computing architecture, because the runtime is handling those details for you. You’re just dealing with the interface to it. In this case, you’re using, quite explicitly, the architecture embodied in the runtime, not an API, as a means to an end. The power doesn’t lie in an API. The power you’re seeking to exploit is the architecture represented by the language runtime (a computing model).

This may sound like a confusing round-about, but what I’m saying is you can use a model of computing as a means for defining a new model of computing, whatever best suits the task. I propose that, indeed, that’s what we should be doing in computer science, if we’re not working directly with hardware.

When I talk about modeling computing systems, given their complexity, I’ve come to understand that constructing them is a gradual process of building up more succinct expressions of meaning for complex processes, with a goal of maintaining flexibility, in terms of how sub-processes that are used for higher levels of meaning can be used. As a beginner, I’ve found that starting with a powerful programming language in a minimalist configuration, with just basic primitives, was very constructive for understanding this, because it forced me to venture into constructing my own means for expressing what I want to model, and to understand what is involved in doing that. I’ve been using a version of Lisp for this purpose, and it’s been a very rewarding experience, but other students may have other preferences. I don’t mean to prejudice anyone toward one programming model. I recommend finding, or creating one that fits the kind of modeling you are pursuing.

Along with this perspective, I think, there must come an understanding of what different programming languages are really doing for you, in their fundamental architecture. What are the fundamental types in the language, and for what are they best suited? What are the constructs and facilities in the language, and what do they facilitate, in real terms? Sure, you can use them in all sorts of different ways, but to what do they lend themselves most easily? What are the fundamental functions in the runtime architecture that are experienced in the use of different features in the language? And finally, how are these things useful to you in creating the model you want to attempt? Seeing programming languages in this way provides a whole new perspective on them that has you evaluating how they facilitate your modeling practice.

Application comes along eventually, but it comes late in the process. The real point is to create the supporting architecture, and any necessary operating systems first. That’s the “hypothesis” in the science you are pursuing. Applications are tests of that hypothesis, to see if what is predicted by its creators actually bears out. The point is to develop hypotheses of computing systems so that their predictive limits can be ascertained, if they are not falsified. People will ask, “Where does practical application come in?” For the computer scientist involved in this process, it doesn’t, really. Engineers can plumb the knowledge base that’s generated through this process to develop ideas for better system software, software development systems, and application environments to deploy. However, computer science should not be in service to that goal. It can merely be useful toward it, if engineers see that it is fit to do so. The point is to explore, test, discuss, criticize, and strive to know.

These are just some preliminary thoughts on computer systems modeling. In my research, I’ve gotten indications that it’s not healthy for the discipline to be insular, just looking at its own artifacts for inspiration for new ideas. It needs to look at other fields of study to introduce unorthodox models into the discussion. I haven’t gotten into that yet. I’m still trying to understand some basic ideas of a computer system, using a more mathematical approach.

Related posts:

Does computer science have a future?

The necessary ingredients for computer science

— Mark Miller, https://tekkie.wordpress.com

If you’re entering computer science, please look at this

This presentation by Bret Victor could alternately be called “Back To The Future,” but he called it, “The Future of Programming.” What’s striking is this is still a possible future for programming, though it was all really conceivable back in 1973, the time period in which Bret set it.

I found this video through Mark Guzdial. It is the best presentation on programming and computing I’ve seen since I’ve watched what Alan Kay has had to say on the subject. Bret did a “time machine” performance set in 1973 on what was accomplished by some great engineers working on computers in the 1960s and ’70s, and generally what those accomplishments meant to computing. I’ve covered some of this history in my post “A history lesson in government R&D, Part 2,” though I de-emphasized the programming aspect.

Bret posed as someone who was presenting new information in 1973 (complete with faux overhead projector), and someone who tried to predict what the future of programming and computing would be like, given these accomplishments, and reasoning about the future, if what was then-current thinking progressed.

The main theme, which has been a bit of a revelation to me recently, is this idea of programming being an exercise in telling the computer what you want, and having the computer figure out how to deliver it, and even computers figuring out how to get what they need from each other, without the programmer having to spell out how to do these things step by step. What Bret’s presentation suggested to me is that one approach to doing this is having a library of “solvers,” operating under a set of principles (or perhaps a single principle), that the computing system can invoke at any time, in a number of combinations, in an attempt to accomplish that goal; that operational parameters are not fixed, but relatively fluid.

Alan Kay talked about Licklider’s idea of “communicating with aliens,” and how this relates to computing, in this presentation he gave at SRII a couple years ago. A “feature” of many of Alan’s videos is that you don’t get to see the slides he’s showing…which can make it difficult to follow what he’s talking about. So I’ll provide a couple reference points. At about 17 minutes in “this guy” he’s talking about is Licklider, and a paper he wrote called “Man-Computer Symbiosis.” At about 44 minutes in I believe Alan is talking about the Smalltalk programming environment.

http://vimeo.com/22463791

I happened to find this from “The Dream Machine,” by M. Mitchell Waldrop, as well, sourced from an article written by J.C.R. Licklider and Robert Taylor called, “The Computer as a Communication Device,” in a publication called “Science & Technology” in 1968, also reprinted in “In Memoriam: J.C.R. Licklider, 1915-1990,” published in Digital Systems Research Center Reports, vol. 61, 1990:

“Modeling, we believe, is basic and central to communications.” Conversely, he and Taylor continued, “[a successful communication] we now define concisely as ‘cooperative modeling’–cooperation in the construction, maintenance, and use of a model. [Indeed], when people communicate face to face, they externalize their models so they can be sure they are talking about the same thing. Even such a simple externalized model as a flow diagram or an outline–because it can be seen by all the communicators–serves as a focus for discussion. It changes the nature of communication: When communicators have no such common framework, they merely make speeches at each other; but when they have a manipulable model before them, they utter a few words, point, sketch, nod, or object.”

I remember many years ago hearing Alan Kay say that what’s happened in computing since the 1970s “has not been that exciting.” Bret Victor ably justified that sentiment. What he got across (to me, anyway) was a sense of tragedy that the thinking of the time did not propagate and progress from there. Academic computer science, and the computer industry acted as if this knowledge barely existed. The greater potential tragedy Bret expressed was, “What if this knowledge was forgotten?”

The very end of Bret’s presentation made me think of Bob Barton, a man that Alan has sometimes referred to when talking about this subject. Barton was someone who Alan seemed very grateful to have met as a graduate student, because he disabused the computer science students at the University of Utah in their notions of what computing was. Alan said that Barton freed their minds on the subject, which opened up a world of possibilities that they could not previously see. In a way Bret tried to do the same thing by leaving the ending of his presentation open-ended. He did not say that meta-programming “is the future,” just one really interesting idea that was developed decades ago. Many people in the field think they know what computing and programming are, but these are still unanswered questions. I’d add to that message that what’s needed is a spirit of adventure and exploration. Like our predecessors, we’ll find some really interesting answers along the way which will be picked up by others and incorporated into the devices the public uses, as happened before.

I hope that students just entering computer science will see this and carry the ideas from it with them as they go through their academic program. What Bret presents is a perspective on computer science that is not shared by much of academic CS today, but if CS is to be revitalized one thing it needs to do is “get” what this perspective means, and why it has value. I believe it is the perspective of a real computer scientist.

Related posts:

Does computer science have a future?

The necessary ingredients for computer science

—Mark Miller, https://tekkie.wordpress.com

The demise of Google Wave: Ignorance is not bliss.

I heard earlier this month that Google Wave was going to be shut down at the end of the year, because it was not a hit with academia (h/t to Mark Guzdial). I was surprised. What was interesting and rather shocking is one of the reasons it failed was that academics couldn’t figure out how to use it. They said it was too complex. In other words, the product required some training. I’ve used Wave and it’s about as easy to use as an e-mail application, though there are some extra concepts that need to be acquired to use it. I saw it as kind of a cross between e-mail and teleconferencing. Maybe if people had been told this they would’ve been able to pick up the concepts more easily. It’s hard for me to say.

I thought for sure Wave would be a hit somewhere. I assumed business and bloggers would adopt it. The local Lisp user group I’ve been attending has been using Wave for several months now, and we’ve made it an integral part of our meetings. I think we’ll get along without it, but we were sorry to hear that it’s going away.

As I’ve reflected on it I’ve realized that Wave has some significant shortcomings, though it was a promising technology. I first heard about it in a comment that was left on my blog a year ago in response to my post “Does computer science have a future?” Come to think of it, Google Wave’s fate is an interesting commentary on that topic. Anyway, I watched Google’s feature video where it was debuted. It reminded me a bit of Engelbart’s NLS from 1968, though it really didn’t hold a candle to NLS. It had a few similarities I could recognize, but its vision was much more limited. As the source article for this story reveals, the Google Wave team used the idea of extending the concept of e-mail as their inspiration, adding idioms of instant messaging to the metaphor. That explains some things. What I think is a shame is they could’ve done so much more with it. One thing I could think of right off the bat was having the ability to link between waves, and entries in different waves, so that people could refer back and forth to previous/subsequent conversations on related topics. I mean, how hard would that have been to add? In business terms wouldn’t this have “added value” to the product concept? What about bringing over an innovation from NLS and adding video conferencing? Maybe businesses would’ve found Wave appealing if it had some of these things. It’s hard to say, but apparently the Wave team didn’t think of them.

At our Lisp user group meeting last week we got to talking about the problems with Wave after we learned that it was going away. A topic that came up was that Google realized Wave was going to raise costs in computing resources, given the way it was designed, as a client/server model. All Waves originated on Google’s servers. An example I heard thrown out was that if one person had 10,000 friends all signed up on one wave, and they were all on there at the same time, every single keystroke made by one person would cause Google’s server to have to send the update to every one of the people signed up for the wave–10,000 people. There was something about how they couldn’t monetize the service effectively as well, especially for that kind of load.

As they talked about this it reminded me of what Alan Kay said about Engelbart’s NLS system at ETech 2003. NLS had a similar client/server system design to Google Wave. I remember he said that the scaling factor with NLS was 2N, because of its system architecture. This clearly was not going to scale well. It was perhaps because of this that several members of Engelbart’s team left to go work on the concept of personal computing at Xerox PARC. What they were after at PARC was a network scaling factor of N2, and the idea was they would accomplish this using a peer-to-peer architecture. I speculated that this tied in with the invention of Ethernet at PARC, which is a peer-to-peer networking scheme over TCP/IP. I remember Kay and a fellow engineer showed off the concept with Croquet at ETech. It was quite relevant to what I talk about here, actually, because the demo showed how updates of multiple graphical objects on one Squeak instance (Croquet was written in Squeak) were propagated very quickly to another instance over the network, without a mediating central server. And the scheme scaled to multiple computers on the same network.

Google Wave’s story is a sorry tale, a clear case where the industry’s collective ignorance of computer history has hurt technological progress. In this case it’s possible that one reason Wave did not go well is because the design team did not learn the lessons that had already been learned about 38 years ago.

—Mark Miller, https://tekkie.wordpress.com

SICP: Seeing a computer in software

As I’ve gone through SICP I’ve noticed a couple exercises that are very similar to programming problems we were given when I took Comparative Programming Languages in my sophomore year of college, when we were covering Lisp: 8 Queens (in Section 2.2.3 of SICP), and symbolic differentiation (in Section 2.3.2). I wonder if our professor got these problems out of this book, though they could’ve been classic computer science or AI problems. I’m going to focus on the symbolic differentiation exercises in this post.

The problem we were given in Comparative Programming Languages was to create a series of functions that would carry out all of the symbolic differentiation rules (for addition/subtraction, multiplication, division, and exponentiation) for expressions that had two, maybe a few more operands per operator (except for exponents), and simplify the resulting algebraic expressions. Our professor added, “I’ll give extra credit to anyone who implements the Chain Rule.” He gave us some helper functions to use, but we were expected to create the bulk of the code ourselves. In addition we could use no destructive operations (no set or setq). He wanted everything done in a pure functional style right off the bat. We had about a week or so to create a solution. We had only started covering Lisp the week before, and it was the first time in my life I had ever seen it. He never explained to us how a functional language worked. He was one of the hardest CS professors I had. What made it worse was he was terrible at teaching what he was trying to convey. He often came up with these really hard but interesting problems, though.

Creating the rules for addition/subtraction, and exponentiation were pretty easy once I had beaten my head against the wall for a while trying to understand how Lisp worked (I don’t think I ended up implementing the rules for multiplication and division). The hard part, which I didn’t even try, was simplifying the resulting expressions. I didn’t have the skill for it.

Having gone through that experience I was surprised by the treatment SICP gave to this exercise. On the first iteration it gave away almost everything I had tried to figure out myself for this assignment. I was wondering, “Gosh. This seemed like such a challenging book in the first chapter. It’s practically giving away the answers here!”

I’ll give the code for “deriv” as I think it’ll provide some clarity to what I’m talking about:

(define (deriv exp var)
   (cond ((number? exp) 0)
         ((variable? exp)
          (if (same-variable? exp var) 1 0))
         ((sum? exp)
          (make-sum (deriv (addend exp) var)
                    (deriv (augend exp) var)))
         ((product? exp)
          (make-sum
             (make-product (multiplier exp)
                           (deriv (multiplicand exp) var))
             (make-product (deriv (multiplier exp) var)
                           (multiplicand exp))))
         (else (error "unknown type expression -- DERIV" exp))))

The other functions that are needed to make this work are in the book. I modified the “else” clause, because PLTScheme doesn’t implement an “error” function. I use “display” instead.

SICP doesn’t ask the student to implement the differentiation rule for division, or the Chain Rule, just addition/subtraction, multiplication, and one of the exponentiation rules. The point wasn’t to make the exercise something that was hard to implement. It wasn’t about challenging the student on “Do you know how to do symbolic differentiation in code, recognize patterns in the algebra, and figure out how to simplify all expressions no matter what they turn out to be?” It wasn’t about making the student focus on mapping the rules of algebra into the computing domain (though there is some of that). It was about realizing the benefits of abstraction, and what a realization it is!

SICP asked me to do something which I had not been asked to do in all the years that I took CS, and had only occasionally been asked to do in my work. It asked me to take an existing structure and make it do something for which it did not appear to be intended. It felt like hacking, almost a little kludge-ish. I can see why I wasn’t asked to do this in college. What CS generally tried to do at the time was create good and proper programmers, instilling certain best practices. Part of that was writing clear, concise code, which was generally self-explanatory. A significant part of it as well was teaching structured programming, and demanding that students use it, in order to write clear code. That was the thinking.

What amazed me is SICP had me using the same “deriv” function–making no changes to it at all, except to add abstractions for new operations (such as exponentiation and subtraction)–to operate on different kinds of expressions, such as:

(+ (+ (expt x 2) (* 2 x)) 3)

(+ (+ x 3) (* 2 x) 3)

(((x + 3) + (2 * x)) + 3)

(x + 3 + 2 * x + 3) — (with operator precedence)

It could deal with all of them. All that was required was for me to change the implementation beneath the abstractions “deriv” used for each type of expression.

It brought to mind what Alan Kay said about objects–that an object is a computer. The reason I made this connection is the exercises forced me to break the semantic mapping I thought “deriv” was implementing, so that I could see “deriv” as just a function that computed a result–a specialized computer. Kay said that the abstraction is in the messages that are passed between objects in OOP, not in the objects. Here, I could see that the abstraction was in the function calls that were being made inside of “deriv”, not the code of “deriv”. The code is a pattern of use which is structured to follow certain rules. I could look at the function algebraically. In reality, the function names don’t mean anything that’s hard and fast. I thought to myself, “If I changed the implementation beneath the abstractions yet again–still not changing ‘deriv’–I might be able to do useful computing work on a different set of symbols, ones that aren’t even related to classical algebra.”

I recalled a conversation I remember reading online several years ago where a bunch of programmers were arguing over what were the best practices for creating abstractions in OOP. The popular opinion was that the objects should be named after real world objects, and that their behavior should closely model what happened in the real world. One programmer, though, had a different opinion. Whoever it was said that the goal shouldn’t be to structure objects and their relationships according to the real world equivalents, virtualizing the real world. Rather the goal should be to find computational processes that happen to model the real world equivalents well. So in other words, the goal, according to the commenter, should be to find a computer (I use the term “find” broadly, meaning to create one) that does what you need to do, and does it well, not try to take any old computing model and force it into a fashion that you think would model the real world equivalents well. The computer world is not some idealized version of the real world. It is its own world, but there are places where computing and real world processes intersect. The challenge is in finding them.

—Mark Miller, https://tekkie.wordpress.com

SICP: What is meant by “data”?

I want to discuss what’s introduced in Section 2.1.3 of SICP, because I think it’s a really important concept.

When we think of data in typical CS and IT settings, we think of strings and numbers, tables, column names (or a reference from an ordinal value), XML, etc. We think of things like “Item #63A955F006-006”, “Customer name: Barbara Owens”, “Invoice total: $2,336.00”.

How do we access that data? We typically use SQL, either directly or through stored procedures, using relations to serialize/deserialize data to/from a persistence engine that uses indexing (an RDBMS). This returns an organized series of strings and numbers, which are then typically organized into objects for a time by the programmer, or a framework. And then data is input into them, and/or extracted from them to display. We send data over a network, or put it into the persistence engine. Data goes back and forth between containers, each of which use different systems of access, but the data is always considered to be “dead” numbers and strings.

Section 2.1.2 introduces the idea of data abstraction, and I’m sure CS students have long been taught this concept, that data abstraction leads to more maintainable programs because you “program to the interface, not to the object”. This way you can change the underlying implementation without affecting too much how the rest of the program runs. SICP makes the same point.

In 2.1.2 it uses a simple abstraction, a rational number (a fraction), as an example. It defined 3 functions: make-rat, numer, and denom. Make-rat would create a rational number abstraction out of two numbers: a numerator and denominator. Numer and denom return the numerator and denominator from the abstraction, respectively. It calls make-rat a “constructor”, and numer and denom “selectors”.

In 2.1.3 it questions our notion of “data”, though it starts from a higher level than most people would. It asserts that data can be thought of as this constructor and set of selectors, so long as a governing principle is applied which says a relationship must exist between the selectors such that the abstraction operates the same way as the actual concept would. So if you use (in pseudo-code):

x = (make-rat n d)

then (numer x) / (denom x) must always produce the same result as n / d. The book doesn’t mention this at this point, but in order to be really useful, the abstraction would need to work the same way that a rational number would when arithmetic operators are applied to it.

So rather than creating a decimal out of a fraction, as would be the case in most languages:

x = 1 / 4 (which equals 0.25)

You can instead say:

x = (make-rat 1 4)

and this representation does not have to be a decimal, just the ratio 1 over 4, but it can be used in the same way arithmetically as the decimal value. In other words, it is data!

In Section 2.1.2, the abstraction used for a rational number was a “pair”, a cons structure (created by executing (cons a b)). In 2.1.3 it shows that the same thing that was done to create a rational number data abstraction can be applied to the concept of a “pair”:

This point of view can serve to define not only “high-level” data objects, such as rational numbers, but lower-level objects as well. Consider the notion of a pair, which we used in order to define our rational numbers. We never actually said what a pair was, only that the language supplied procedures cons, car, and cdr for operating on pairs. But the only thing we need to know about these three operations is that if we glue two objects together using cons we can retrieve the objects using car and cdr. That is, the operations satisfy the condition that, for any objects x and y, if z is (cons x y) then (car z) is x and (cdr z) is y. Indeed, we mentioned that these three procedures are included as primitives in our language. However, any triple of procedures that satisfies the above condition can be used as the basis for implementing pairs. This point is illustrated strikingly by the fact that we could implement cons, car, and cdr without using any data structures at all but only using procedures. Here are the definitions:

(define (cons x y)
  (define (dispatch m)
    (cond ((= m 0) x)
          ((= m 1) y)
          (else (error "Argument not 0 or 1 -- CONS" m))))
  dispatch) 

(define (car z) (z 0)) 

(define (cdr z) (z 1))

This use of procedures corresponds to nothing like our intuitive notion of what data should be. Nevertheless, all we need to do to show that this is a valid way to represent pairs is to verify that these procedures satisfy the condition given above.

This example also demonstrates that the ability to manipulate procedures as objects automatically provides the ability to represent compound data. This may seem a curiosity now, but procedural representations of data will play a central role in our programming repertoire. This style of programming is often called message passing.

So, just to make it clear, what “cons” does is construct a procedure that “cons” calls “dispatch” (I say it this way, because “dispatch” is defined in cons’s scope), and return that procedure to the caller of “cons”. The procedure that is returned can then be used as data since the above “car” and “cdr” procedures produce the same result as the “car” and “cdr” procedures we’ve been using in the Scheme language.

SICP mentions in this section that message passing will be used further as a basic tool in Chapter 3 for more advanced concepts. Note that I have talked about message passing before. It is central to the power of object-oriented programming.

Note as well that the text implies an analog between procedures, in a Lisp language, and objects. The above example is perhaps a bad way to write code to get that across. In the corresponding lecture for this section (Lecture 2b, “Compound Data”) Abelson used a lambda ( (lambda (m) (cond …)) ) instead of “(define (dispatch m) …)”. What they meant to get across is that just as you can use “cons” multiple times in Scheme to create as many pairs as you want, you can use this rendition of “cons” to do the same. Instead of creating multiple blobs of memory which just contain symbols and pointers (as pairs are traditionally understood), you’re creating multiple instances of the same procedure, each one with the different pair values that were assigned embedded inside them. These procedure instances exist as their own entities. There is not just one procedure established in memory called “dispatch”, which is called with pieces of memory to process, or anything like that. These procedure instances are entities “floating” in memory, and are activated when parameters are passed to them (as the above “car” and “cdr” do). These procedures are likewise garbage collected when nothing is referring to them anymore (they pass out of existence).

In the above example, “cons” is the constructor (of a procedure/object called “dispatch”), and “car” and “cdr” are the selectors. What we have here is a primitive form of object-orientation, similar to what is known in the Smalltalk world, with the mechanics of method dispatch exposed. The inference from this is that objects can also be data.

What SICP begins to introduce here is a blurring of the lines between code and data, though at this point it only goes one way, “Code is data.” Perhaps at some point down the road it will get to sophisticated notions of the inverse: “Data is code”. The book got into this concept in the first chapter just a little, but it was really simple stuff.

Getting back to our traditional notions of data, I hope this will begin to show how limited those notions are. If procedures and objects can be data, why can’t we transmit, receive, persist, and search for them just as we can with “dead” data (strings and numbers)? In our traditional notions of CS and IT (assuming they’re focused on using OOP) we use classes to define methods for dealing with data, and elemental interactions with other computing functions, such as data storage, display, interaction between users and other computers, etc. We have a clear notion of separating code (functionality) from data, with the exception of objects that get created while a program is running. In that case we put a temporary “persona” over the real data to make it easier to deal with, but the “persona” is removed when data is stored and/or we exit the program. What we in the whole field neglect is that objects can be data, but they can only really be such if we can do everything with them that we now do with the basic “dead” forms of data.

There is at least one solution out there that is designed for enterprises, and which provides inherent object persistence and search capabilities, called Gemstone. It’s been around for 20+ years. The thing is most people in IT have never heard of it, and would probably find the concept too scary and/or strange to accept. From what I’ve read, it sounds like a good concept. I think such a system would make application design, and even IT system design if it was done right, a lot more straightforward, at least for systems where OO architecture is a good fit. Disclaimer: I do not work for Gemstone, and I’m not attempting to advertise for them.

As I think about this more advanced concept, I can see some of where the language designers of Smalltalk were going with it. In it, objects and their classes, containing information, can be persisted and retrieved in the system very easily. A weakness I notice now in the language is there’s no representational way to create instance-specific functionality, as exists in the above Scheme example. Note that the values that are “consed” are inlined in “dispatch”. There’s a way to put instance-specific code into an object in Smalltalk, but it involves creating a closure–an anonymous object–and you would essentially have to copy the code for “dispatch”. You would have to treat Smalltalk like a functional language to get this ability. The reason this interests me is I remember Alan Kay saying once that he doesn’t like the idea of state being stored inside of an object. Well, the above Scheme example is a good one for demonstrating “stateless OOP”.

Edit 7-9-2010: After I posted this article I got the idea to kind of answer my own question, because it felt insensitive of me to just leave it hanging out there when there are some answers that people already know. The question was, “Why don’t we have systems of data that allow us to store, retrieve, search, and transmit procedures, objects, etc. as data?” I’ve addressed this issue before from a different perspective. Alan Kay briefly got into it in one of his speeches that’s been on the internet. Now I’m expanding on it.

The question was somewhat rhetorical to begin with. It’s become apparent to me as I’ve looked at my programming experience from a different perspective that a few major things have prevented it from being resolved. One is we view computers as devices for automation, not as a new medium. So the idea of computers adding meta-knowledge to information, and that this meta-knowledge would be valuable to the user is foreign, though slowly it’s becoming less so with time, particularly in the consumer space. What’s missing is a confidence that end users will be capable of, and want to, manipulate and modify that meta-knowledge. A second reason is that each language has had its own internal modeling system, and for quite a while language designers have gotten away with this, because most people weren’t paying attention to that. This is beginning to change, as we can see in the .Net and Java worlds, where a common modeling system for languages is preferred. A third reason is that with the introduction of the internet, openness to programming is seen as a security risk. This is because of a) bad system design, and b) in the name of trying to be “all things to all people” we have systems that try to compromise, introducing some risk, to promote backward compatibility with older models of interaction that became less secure when wide scale networking was introduced, and to cater to programmers’ desires for latitude and features so that the platform would be more widely accepted, so the thinking goes.

No one’s figured out a way to robustly allow any one modeling system to interact or integrate with other modeling systems. There have been solutions developed to allow different modeling systems to interact, either via. XML using networking protocols, or by developing other languages on top of a runtime. I’ve seen efforts to transmit objects between two specific runtimes: .Net and the JVM, but this area of research is not well developed. Each modeling system creates its own “walled garden”, usually with some sort of “escape hatch” to allow a program written in one system to interact with another (usually using a C interface or some FFI (foreign functionality interface)) which isn’t that satisfactory. It’s become a bit like what used to be a plethora of word processing or spreadsheet document file formats, each incompatible with the other.

Programmers find file format standards easier to deal with, because it involves loading binary data into a data structure, and/or parsing a text file into such a structure, but this limits what can be done with the information that’s in a document, because it takes away control from the recipient in how the information can be used and presented. As I’ve discussed earlier, even our notions of “document” are limiting. Even though it could serve a greater purpose, it seems not much time has been invested in trying to make different modeling systems seamless between each other. I’m not saying it’s easy, but it would help unleash the power of computing.

Code is encoded knowledge. Knowledge should be fungible such that if you want to take advantage of it in a modeling system that is different from the one it was originally encoded in, it should still be accessible through some structure that is compatible with the target system. I understand that if the modeling system is low level, or more primitive, the result is not going to look pretty, but it could still be accessed through some interface. What I think should be a goal is a system that makes such intermingling easy to create, rather than trying to create “runtime adapters”. This would involve understanding what makes a modeling system possible, and focusing on creating a “system for modeling systems”. You can see people striving for this in a way with the creation of alternative languages that run on the JVM, but which have features that are difficult to include with the Java language itself. Java is just used as a base layer upon which more advanced modeling systems can be built. This has happened with C as well, but Java at least allows (though the support is minimal) a relatively more dynamic environment. What I’m talking about would be in a similar vein, except it would allow for more “machine models” to be created, as different modeling systems expect to run on different machine architectures, regardless of whether they’re real or virtual.

I think the reason for this complexity is, as Alan Kay has said, we haven’t figured out what computing is yet. So what language designers create reduces everything down to one or more aspects of computing, which is embedded in the design of languages.

So my question is really a challenge for future reference.

— Mark Miller, https://tekkie.wordpress.com