## Recursion Over Lists and Other Data Structures

[ This section is a little out of place--need to introduce type and equality predicates first! Those have been presented in class, so this should be comprehensible anyway. Need to make this a separate hunk, and move it after the next hunk. ] [ Also need to introduce tail recursion somewhere early, and fwd ref the chapter on recursion. ] In this section we'll demonstrate the most common idioms for recursion over simple data structures--lists and trees.

Some of the examples will be implementations of standard Scheme procedures like `length`, `list`, `append`, and `reverse`. Scheme already has these procedures built in, but you should understand how they can be implemented using simpler procedures like `cdr` and `cons`. You'll inevitably have to write special-purpose procedures that are slightly different, but coded similarly. (In later chapters, we'll show some more advanced programming techniques that let you implement more general and/or efficient procedures like these.)

We'll also show a few other handy procedures for operating on lists, e.g., a list-copying routine.

Then we'll show recursion over simple binary trees of pairs. The normal style for recursion over trees in Scheme is slightly different from what you may be used to in languages like C or Pascal--and simpler.

### `length`

`length` is the standard Scheme procedure that returns the length of a list. It only counts the elements along the spine of the list (down the `cdr`'s).

It's easy to do write `length` using recursion. The length of a list is 0 if the list is empty, and otherwise it's 1 plus the length of the rest of the list. Here's the easiest way to define `length`:

```(define (length lis)
(cond ((null? lis)
0)
(else
(+ 1 (length (cdr lis))))))
```

The main thing to notice about this example is the recursive structure. The procedure can accept a pointer to either a pair `or` the empty list. The structure of the procedure corresponds directly to the recursive definition of a (proper) list. The two-part `cond` corresponds to the fact that there are two rules that characterize lists; it figures out which case we're dealing with.

We explicitly checked for the end-of-list case, but we implicitly assumed that otherwise the object being operated on is a pair. Making such assumptions might seem like bad style, but actually it's good, because it ensures that an error will be signaled if the argument to length is not the empty list or a pair--the second branch of the `cond` will be taken (erroneously), but the attempt to evaluate `(cdr lis)` will signal an error.

We could make this assumption clearer by using a three-branch `cond`, with separate branches for the two valid cases and the error case:

```(define (length lis)
(cond ((null? lis)
0)
((pair? lis)
(+ 1 (length (cdr lis))))
(else
(error "invalid argument to length")))
```

Here we've used the error-signaling procedure `error`, which stops execution and signals an error. (In most systems, the error message `"invalid argument to length"` will be printed and the user will be presented with a break prompt for debugging the problem.) Unfortunately, `error` is not supported by all Scheme systems. (Later we'll show an implementation that should work reasonably well in any Scheme system.)

Also note that in this example, we've used `lis` as the name of a list argument, rather than `list`. That's because there's a standard Scheme procedure named `list`, which will be shadowed by any local variable with the same name. (The reason for this shadowing is Scheme's unified namespace---you can't have a variable and a procedure with the same name, for reasons that will be explained later; `list` seems to be the only identifier for which this is commonly a problem.)

The above definition of `length` is not tail recursive--after calling itself, there must be a return so that `1` can be added to the value and returned. Later we'll show a more efficient, tail-recursive version of `length`, and a more general procedure called `reduce` that can be used to construct a variety of procedures whose basic algorithm is similar.

### Copying Lists

There are two common senses of copying, shallow copying, and deep copying. A shallow copy makes a copy of one object, and the copy has pointers to the same objects that the original did.

A deep copy copies not only the top-level objects in a data structure, but the ones below that, and so on recursively, so that a whole new data structure is created.

For lists, which are made up of more than one object, it is often useful to copy the spine of the list, i.e., doing a deep copy along the `cdr`'s only. We typically think of a list as being like a special kind of object, even though it's really a sequence of pair objects. It's therefore natural to copy "just the list."

If we just want to do a shallow copy, we can define `pair-copy` to copy a pair, without copying anything else.

In these examples, we'll assume we only want to copy list structure--that is a connected set of pairs. Whenever we come to something that's not a pair, we stop copying and the copy shares structure with the original. (These aren't standard Scheme procedures.)

Here's a truly shallow copy, just copying a single pair:

```(define (pair-copy pr)
(cons (car pr) (cdr pr)))
```

If we want to do a deep copy, we can use recursion to copy `car` or `cdr` values that are also pairs. The following code for `pair-tree-deep-copy` assumes that the structure to be copied is a tree of pairs. (If there is any shared structure, it will be copied each time it is reached, and the copy will not have the same structure. It will always be a tree. Preserving shared structure while copying is harder, but can be done. If there's a directed cycle, `pair-tree-deep-copy` will loop infinitely.)

```(define (pair-tree-deep-copy thing)
(if (not (pair? thing))
thing
(cons (pair-tree-deep-copy (car thing))
(pair-tree-deep-copy (cdr thing)))))
```

Notice that `pair-tree-deep-copy` works on improper as well as proper lists, but only copies the pairs. Where it gets to a non-pair value, it stops and just uses the same value in the copy, and the copy shares structure with the original.

The code for `pair-tree-deep-copy` directly reflects the kind of structure it copies. It can handle non-pairs, which are assumed to be leaves of the graph of pairs that it's copying, and it can handle pairs, which are assumed to be interior nodes of the tree. Their `car` and `cdr` values may be leaves of the tree, or other pairs.

So the recursive definition of a pair-tree is:

• a non-pair (leaf), or
• a pair whose car and cdr are pair-trees

The first rule is the base case, i.e., is the simple one that doesn't require recursion. The second is the recursive rule, which expresses the fact that an interior node's car and cdr fields can point to any kind of pair-tree: a leaf, or another interior node whose children may likewise be leaves or other interior nodes...

This is the easy way to write recursive routines over data structures--figure out a recursive description that exactly describes the expected data structures, and then use that recursive description to write a recursive description of the result you want. Then you can straightforwardly code routine that will traverse the structure and compute that result.

Generally, we write the base case first, to make it clear where recursion ends--and so that we don't forget to write it and accidentally write infinite recursions or unhandled cases. If you do this consistently, your code will be more readable and you'll make fewer mistakes.

To copy the spine of a proper list, we can use this description of the answer we want:

A copy of a list is

• the empty list if the original list is empty, or
• (if the list is nonempty) a pair whose `car` value is the same as the `car` of the original list, and whose `cdr` value is a copy of the rest of the original list.

Here's the code:

```(define (list-copy lis)
(cond ((null? lis)
'())
(else
(cons (car lis)
(list-copy (cdr lis))))
```

As usual, we only check to see if we're a the end of the list, and otherwise assume the argument is a pair. Since we take the `car` and the `cdr` of the pair in the latter case, we'll get an error if the argument is not a proper list. This is usually what we want, so that Scheme will signal an error when it gets to the part of the list with unexpected structure.

The name `list-copy` was chosen to suggest that it operates on lists, and in Scheme terminology "list" means "proper list" by default. If we want a routine that copies improper lists, we should call it something else, and write a comment saying what kinds of things it works for.

Actually, lists are so common in Scheme that we could have just called it `copy`. Most procedure names begin with the name of the kind of structure they operate on, but exceptions are made for lists and for numbers.

### `append` and `reverse`

Two handy operations on lists are `append` and `reverse`; both are standard Scheme procedures.

`append` takes any number of lists as arguments, and returns a list with all of their elements. `reverse` takes a list and returns a new list, with the same elements but in the opposite order.

Note that like most Scheme procedures, neither of these procedures is destructive--each creates a new list without side-effecting (modifying) its argument(s).

#### `append`

Append works much like `list-copy` except that we have multiple lists to deal with.

The trick to getting it right is to maintain the essential structure of `list-copy`, with the right minor differences.

For now, let's keep things simple, and just do a two-argument version of `append`, called `append2`.

Our strategy is to recurse through the first list, like `list-copy`, copying one element of the list at each step. When we get to the end, however, the base case is different--rather than terminating the list with the empty list, we just use the second list as the "rest" of the copy we're making.

Notice that the base case occurs when the first list is null--the `append` of an empty list and another list is just that other list--conceptually, we `cons` zero items onto the front of that list. Concretely, we can just return that list.

Here's the recursive characterization of the result we want

• if the first list is empty, the result is just the second list
• if the first list is nonempty, the result is a pair whose `car` is the `car` of the first list, and whose `cdr` is the `append` of the rest of the first list and (all of) the second list.

Here's a simple two-argument version of `append`:

```(define (append2 lis1 lis2)
(cond ((null? lis1)
lis2)
(else
(cons (car lis1)
(append2 (cdr lis1) lis2)))))
```

Note that `append2` copies its first list argument, but the result simply shares a pointer to the last list argument--the last list is not copied, so the result shares structure with that list. This sharing is also used by the standard Scheme function `append`, which can take any number of lists as arguments. The first n-1 lists are copied, but the last is shared.

Be sure you understand the concrete operation of the above algorithm. On the way down during recursion, we're taking the first list apart, holding onto one list element at each step. When we hit the end of the first list, recursion stops and we return the second list. On the way back up, we're consing those items onto the new list we're creating, back-to-front.

Suppose we have defined two lists, `foo` and `bar`, like this:

```(define foo '(x y z))
(define bar '(a b))
(define baz (append bar foo))
```

The result will be that `baz` shares structure with `foo`, but not with `bar`. Changes to the list via `foo` will also be visible via `baz`.

```                +----------------------------------------+
|                                        |
\|/                                       |
+---+    +---+---+     +---+---+      +---+---+     |
foo | *-+--->| * | *-+---->| * | *-+----->| * | * |     |
+---+    +-+-+---+     +-+-+---+      +-+-+---+     |
|             |             |            |
\|/           \|/           \|/           |
x             y             z            |
|
|
+---+    +---+---+     +---+---+                    |
bar | *-+--->| * | *-+---->| * | * |                    |
+---+    +-+-+---+     +-+-+---+                    |
|             |                          |
\|/           \|/                         |
a             b                          |
/|\           /|\                         |
|             |                          |
+---+    +---+---+     +---+---+                    |
baz | *-+--->| * | *-+---->| * | *-+--------------------+
+---+    +---+---+     +---+---+
```

In general, the result of `append` shares structure with the last argument passed to `append`. If you want to avoid the sharing, you can pass `append` an empty list as its last argument. For example `(append '(1 2 3) '())` will copy the list `(1 2 3)`.

If you're worried about efficiency, be aware that `append` takes time proportional to the length of the lists that must be copied, i.e., all but the last list being `append`ed. This time usually doesn't matter, but it's a consideration for performance-critical parts of your program, especially if you're appending long lists.

(It's common to `append` short lists onto the front of long lists, and then `reverse` the result if necessary.)

#### `reverse`

`reverse` returns a reversed copy of a list.

There's an easy (but slow) way to define `reverse` in terms of `append`. We just take the first element off the list, reverse the rest of the list, and append the first element to the end of the list. We do this recursively, so that each time we reverse the rest of the list, we're doing the same thing on a shorter list. When we get down to the end of the list, reversing it is a no-op: the reverse of an empty list is the empty list.

```(define (reverse lis)
(if (null? lis)
'()
(append (reverse (cdr lis))
(list (car lis)))))
```

Think about how this example actually works. `reverse` recurses down the list, calling itself on the `cdr` of the list at each recursive step, until the recursion stops at the end of the list. (This last call returns the empty list, which is the reverse of the empty list.) At each step, we use `car` to peel off one element of the list, and hold onto it until the recursive call returns.

The reversed lists are handed back up through the returns, with the cars being slapped on the rear of the list at each return step. (To add a single item to the end of the list using `append`, we must first put it in a one-element list using `list`.)

We end up constructing the new list back-to-front on the way up from the recursion. Going down recursively tears the list apart, one item at each recursive step, and coming back up adds an element to the end of the new list at each step.

This example is good to understand, both abstractly and concretely. You should understand the concrete steps involved in taking a list apart and putting it back together backwards. On the other hand, you should also recognize that the algorithm works even if you don't pay attention to that.

Once you get the hang of recursion, it's often easy to write algorithms without actually thinking about the little steps involved, or thinking much about the ordering of steps. In this case, it's easy to see that if we can reverse the rest of the list, and append the first item to the end of that, we've reversed the whole list. We don't need to think much about the ordering of the operations, because that falls naturally out of the way we pass arguments to functions. We can declare that "the `reverse` of a non-empty list is the `append` of the `reverse` of the rest of the list and (a list containing) the first item in the list", and then write the code accordingly, as a pure function---one that only depends on the values of its arguments, and has no side effects.

By writing this code recursively, we'll apply the same trick all the way down the list. Thinking a little more concretely--but not much--we can see that at each time we reverse the rest of the list, the list in question will be shorter. Somewhere we'll hit the end of the list, so we have to handle that base case. It's usually easy to see what the right thing to do is for the base case. In this case, we can declare that "the `reverse` of the empty list is the empty list," and add the appropriate branch to `append`.

This is a good example of how you can combine functions to create new functions, implementing algorithms without using sequencing or side effects. (Notice that if we had side effects, we'd have to think very carefully about the ordering of steps, to make sure that we used a data structure after certain changes, and before others. Bleah.)

(The following remarks about efficiency are fairly advanced--you shouldn't worry about these things yet if they get in the way of learning how to write programs simply and straightforwardly. You can skip or skim them and come back to them later once you've gotten the hang of Scheme, and want to tune the time-critical parts of your programs for maximum efficiency. On the other hand, you may find that thinking about the concrete details reinforces the basic ideas.) There are two problems coding `reverse` this very simple way, however---`reverse` turns out to be one of the hardest "simple" list routines to code efficiently. Later we'll show better versions that are more clever, but only very slightly more complicated. (They'll still be recursive, and won't use loops or assignment.)

[ where? (Later we need to show a linear-time version that uses list->vector and then reverses the vector into a list tail-recursively... ]

The first problem is that each call to `append` takes time proportional to the length of the list it's given. (Remember that `append` effectively copies all of the pairs in the first list it's given, making a backward copy.) We have to copy the "rest" of the list using `append`, starting at each pair in the list. On average, we copy half the list at a given recursive step, so since we do this for every pair in the list, we have an order n-squared algorithm.

Another problem is that we're doing things on the way back up from recursion, which turns out to be more expensive than doing things on the way down. As we'll explain in a later chapter, Scheme can do recursion very efficiently if everything is done in a forward direction, on the way down--Scheme can optimize away all but one of the returns, and the state-saving before the calls. (Luckily, this is easy to do.)

Since Scheme provides a built-in `reverse`, you don't have to think much about this. A good Scheme system will provide a heavily-optimized implementation of `reverse` that is linear in the length of the list being reversed.

`reverse` is very handy, and the efficiency of a built-in `reverse` is important, because it's usually best to construct a list in whichever order is easy and efficient, and then reverse the whole list if necessary. Typically, you `cons` one item onto a list at a time, or maybe `append` a few items at a time, in whatever order it's easiest to create the list. This allows you to construct the list in linear time; with a linear-time `reverse`, the overall process is still linear-time.

```==================================================================

This is the end of Hunk E.

TIME TO TRY IT OUT

At this point, you should go read Hunk F of the next chapter
and work through the examples using a running Scheme system.
Then return here and resume this chapter.

==================================================================
```

(Go to Hunk F, which starts at section Lists (Hunk F).)