I thought today maybe we would talk about 'grep', a well-known command
in the UNIX world. Something that's been around since the early 1970s. What 'grep' lets you do is to search for
patterns of text - arbitrary patterns of text in one or more files and there could be an unbounded number of files of input. Or the input could be coming from some other program, for example as it is if you're using Unix pipelines. So you take some program and you pipe it into 'grep' and
that way, no matter what the amount of input is, 'grep' can filter out, or show you, the things that you're interested in. And that's stuff that you can't do with a text
editor very conveniently - if at all. One of the issues with 'grep' has always been: Where does that weird name come from? And so I thought, perhaps, I could tell that story, if it would be
of any interest and we'll see where we go from there. The way it came about - you have to put yourself back in the
early days of computing, before everybody present in this room, except me, was born. Let's say something like 1970-71 -- the very, very, early days of UNIX. The computer that UNIX ran on was a PDP 11. At that point it was probably an 11/20. It was a machine that had very
very little computing power. It didn't run very fast. It also didn't have very much memory. Probably something in the order of 32K, maybe 64K bytes and that's 64 Kbytes, not megabytes. And very small secondary storage as well, you know
a few megabytes of disk and things like that. So, very very limited computing resources and that meant
that a lot of the software that was in early days of UNIX tended to be fairly simple and straightforward. And, that reflected not only the sort of ... the relative 'wimpiness' of
the hardware but also the personal tastes of the people doing the work, primarily Ken Thompson and Dennis Ritchie. So one of the prop ... one of the standard programs that
people use is the text editor on any system The UNIX text editor was called 'ed', and it's not pronounced 'edd' At least by those in the know, it's pronounced 'ee dee'. And this was written by Ken Thompson and I think it was a, basically, stripped-down version of an editor called QED, which Ken had worked with
and done a lot of work on earlier. So a very small, simple, straightforward editor and the thing that you have to remember
is that, in those days, in addition you didn't have actual video display terminals - not of the sort that we're used to today, or
even 10 or 20 years ago. But in fact all the computing, all of your
editing and so on, was done on paper Remember paper? If you zoom down here you can see paper! This meant that there were a lot of
things that tried to minimize the use of paper. It also meant that editors worked one line at
a time, or multiple lines at a time, but there was no cursor addressing, so you
couldn't move around within a line. And so the 'ed' text editor reflected that kind of thing. Maybe what I should do is just a quick look at what 'ed' looked like?
so the commands for 'ed' were single-letter commands. So, for example, there was a command called 'p', Which stood for 'print'; there was a command called 'd',
which would delete a line There was a command called 's', which took a little bit ... which
said 'substitute' so you could change this y'know, 'ABC' into 'DEF', or something like that. There was an 'append' command that simply said 'add some more text' and
you could add a bunch of lines and then terminate it with something. There was, of course, a 'read' command so that you could read information from a file, and there was
a 'write' command [so] that you could put it back in a file. a handful of other
things like that. So that was the essence of what it did. One of the things that 'ed' did very nicely was that,
OK, these apply by default to the current line But what do you do when you want to have more
specification of what lines you're operating on? And so you could say things like 'line 1 to line 10 print' So, this would print the first to 10 lines. 10 was that. But suppose you wanted to print all of the lines in the file? So there was a shorthand called '$'. So, I could say '1,$p'
and that would print all of the lines in the file. Or I could say: "Gee! I wonder ... I just want to see the last line".
So I could say '$p' and that would give me that. I could even elide the 'p', but that's good enough. Or I could delete the last line by saying '$d'. Or I could
delete the first line by saying '1d'. That is sort of the line addressing. So far not very complicated. The thing that 'ed' added to all of that, and this is definitely
Ken's influence was the idea of regular expressions. So, a regular expression is a pattern of text - its a way of specifying patterns of text. They could be literal texts like the word 'print' or they could be
something more complicated, like things that start with 'Prin' and but might go on to 'Print' or 'Princeton' or 'Princess',
or whatever, That kind of thing. And the way that regular expressions were written in the 'ed' text
editor was you said '/' and then you wrote the characters of the regular expression.
So, I could say '/print/' and that would be something that would match the next line,
in what I was working on, that contained the word 'print' anywhere within it. eSo the regular expressions in the 'ed' editor
were somewhat different - a little more sophisticated, and complicated, than the regular expressions
that you might find in shell wildcards, where, for example, a star means 'anything at all'. So, the same idea of patterns of text - a slightly different specification - a different way of writing patterns but suitable for
text editing. And so, then, I could say things like "I want to find the next occurrence of the word 'print' in my file". And then there I would be. And on, and on, and on, like that. OK, so that's the 'ed' text editor. We are a long way away from 'grep' at this point.
So what's 'grep' all about? Well, it turns out that at the time that this
was going on, 'ed' was the standard text editor. But, as I said, the machines you're working on are very very wimpy. Not much computing capacity in a lot of ways And in fact one of the limitations was that you couldn't edit a very big file, because there wasn't enough memory and the 'ed'
worked entirely within memory and so you were stuck. One of my colleagues at the time,
Lee McMahon, was very interested in doing text analysis. The sort of thing that we would call today, perhaps, Natural Language Processing. And so what Lee wanted to do ... he had been studying something that, at the time, was the very
interesting question of who were the authors of some fundamental American documents called the Federalist Papers.
The Federalist Papers were written by, variously, James Madison and Alexander Hamilton and John Jay in 1787 and 88, if I recall correctly, There were 85 of these documents But they were published anonymously under the name Publius.
And so we had no idea, in theory, who wrote them And so there's been a lot of scholarship trying
to figure out for sure. It's well known who wrote some of them and others are still, I think, a little uncertain and so Lee was interested in seeing
whether you could actually, by textual analysis of his own devising, figure out who wrote these things. So that's fine. But it turns out
that these 85 documents was in total just over a megabyte - I mean down in the noise by today's standards - wouldn't fit.
He couldn't edit them all in 'ed'. And so what do you do? So one day he said: "I just want to go through and find all the
occurrences of 'something' in the Federalist Papers so I can look at 'em!" And he said this to Ken Thompson and then went home for dinner or something like that.
And he came back the next day and Ken had written the program - and the program was called 'grep'. And what 'grep' did was to go through a bunch of documents - one or more files - and simply find all of the places where a particular regular expression
appeared in those things. And so the way ... it turns out that one more of the commands in
'ed' is a command called 'g'. And this stood for 'global'. And what it said was, on every line that matches a particular
regular expression - so, for example,'print'- I can then do an 'ed' command
So, I could say: "On every line that contains the word 'print' I'll just print it". So, I can see what my various print
statements would look like. Or I could, in some other way, say 'g' - and some other regular
expression in there - and delete them. So I could delete all of the comments in a program, or something like that. So the general structure of that is 'g' followed by (in slashes), a
regular expression, followed by the letter 'p' - g/re/p - and that's the genesis of where it came from. OK, and so this is in some ways the genius of Ken Thompson.
A beautiful program, written in no time at all, by taking some other program and just trimming it out and then giving it a name that stuck.
That's the story of where 'grep' came from. Let me add one thing - 25 years ago [it] literally was the spring of 1993, I was teaching at Princeton as a visitor. And I needed an assignment for my programming class. And I thought "Hmm!" So what I did was to tell them - the students in the class:
"OK, here is the source code for 'ed' " It was at that time probably 1800 lines of C. "Your job is to take these 1800 lines of C and convert them into
'grep' as a C program. OK, and you've got a week to do it". And I told them, at that point, that they had
a couple of advantages. First, they knew what the target was. Somebody had already done 'grep' so they knew what it was supposed
to look like. And all they had to do was replicate that behavior. And the other thing is that it was now written in C.
The original 'grep' was written in PDP 11 assembly language. And of course, they also had one grave disadvantage:
None of them were Ken Thompson.
TIL ed is pronounced ee-dee
All these years I’ve been using grep both as a program, and as a verb in general conversation (at least in computer circles), and I had no idea where it came from, or that it stood for global regex print. Wow what a great piece of history right here!
Ed lives on in the vim command line as well, try
Vim
is a clone ofvi
, which is the visual mode ofex
, which is a decendant ofed
. So most of ed's commands work exactly the same way on vim's command line.Brian rocks. It's cool to see he is still doing videos and interviews. What's even cooler are older videos back in the UNIX days - to me it's a nice comparison how much (or how little) has changed over the years.
Hearing all these stories of these OG programmers, it really gives me an inferiority complex. If you told me I had to work on a 64Kb system writing in assembly, I'd probably have a panic attack on the spot.
Here's the ed to grep assignment
Computerphile is the best
I’ve always just taken grep for granted, this was a nice piece of history. Now I want to try the project he talked about and see if I can extract it from ed.