Symbolic Data

So far our compound data has been made up of numbers.
Now, we will start working with arbitrary symbols.

Quotation

Lists with symbols look just like Lisp code, except we quote them.
To quote in Lisp, we use the special form #!scheme (quote exp, or the short hand 'exp.
For example, 'x evaluates to the symbol "x" instead of the variable x's value.
This is just like natural language. If I say, "Say your name," you will say your name. If I instead say, "Say 'your name'", you will literally say the words "your name."

(define a 1)
(define b 2)

(list a b)
→ (1 2)

(list 'a 'b)
→ (a b)

(list 'a b)
→ (a 2)

Now that we have quotation, we will use '() for the empty list instead of nil.
We need another primitive now: eq?. This checks if two symbols are the same.

(eq? 'a 'b)
→ #f

(eq? 'a 'a)
→ #t

Example: Symbolic Differentiation

Lets design a procedure that computes the derivative of an algebraic expression.
This will demonstrate both symbol manipulation and data abstraction.

Symbolic differentiation is of special historical significance in Lisp. It was one of the motivating examples behind the development of a computer language for symbol manipulation.

The differentiation program with abstract data

To start, we will only consider addition and multiplication.
We need three differentiation rules: constant, sum, and product.
Let's assume we already have these procedures

Procedure	Result
`(variable? e)`	Is `e` a variable?
`(same-variable? v1 v2)`	Are `v1` and `v2` the same variable?
`(sum? e)`	Is `e` a sum?
`(addend e)`	Addend of sum `e`
`(augend e)`	Augend of sum `e`
`(make-sum a1 a2)`	Construct the sum of `a1` and `a2`
`(product? e)`	Is `e` a product?
`(multiplier e)`	Multiplier of the product `e`
`(multiplicand e)`	Multiplicand of the product `e`
`(make-product m1 m2)`	Construct the product of `m1` and `m2`
- Then, we can express the differentiation rules like so:

(define (deriv exp var)
  (cond ((number? exp) 0)
        ((variable? exp)
         (if (same-variable? exp var) 1 0))
        ((sum? exp)
         (make-sum (deriv (addend exp) var)
                   (deriv (augend exp) var)))
        ((product? exp)
         (make-sum
          (make-product (multiplier exp)
                        (deriv (multiplicand exp) var))
          (make-product (deriv (multiplier exp) var)
                        (multiplicand exp))))
        (else
         (error "unknown expression type" exp))))

Representing algebraic expressions

There are many ways we could represent algebraic expressions.
The most straightforward way is parenthesized Polish notation: Lisp syntax.
We can simplify answers in the constructor just like in the rational number example.
Simplifying the same way a human would is hard, partly because the most "simplified" form is sometimes subjective.

Exercises:

Example: Representing Sets

There are a number of possible ways we could represent sets. A set is a collection of distinct objects. Our sets need to work with the following operations:

Procedure	Result
`element-of-set? x s`	Is x a member of the set s?
`adjoin-set x s`	Set containing x and the elements of set s
`union-set s t`	Set of elements that occur in either s or t
`intersection-set s t`	Set of elements that occur in both s and t
### Sets as unordered lists
- One method is to just use lists. Each time we add a new element, we have to check to make sure it isn't already in the set.
- This isn't very efficient. `element-of-set?` and `adjoin-set` are #\Theta(n)#, while `union-set` and `intersection-set` are \(\Theta(n^2)\).
- We can make `adjoin-set` \(\Theta(1)\) if we allow duplicates, but then the list can grow rapidly, which may cause an overall decrease in performance.

Exercises:

Sets as order lists

A more efficient representation is a sorted list.
To keep things simple, we will only consider sets of numbers.
Now, scanning the entire list in element-of-set? is a worst-case scenario. On average we should expect to scan half of the list.
This is still \(\Theta(n)\), but now we can make union-set and intersection-set \(\Theta(n)\) too.

Exercises:

Sets as binary trees

An even more efficient representation is a binary tree where left branches contain smaller numbers and right branches contain larger numbers.
The same set cna be represented by such a tree in many ways.
If the tree is balanced, each subtree is about half the size of the original tree. This allows us to implement element-of-set? in \(\Theta(\log(n))\) time.
We'll represent each node by (list number left-subtree right-subtree).
Efficiency hinges on the tree being balanced. We can write a procedure to balance trees, or we could use a difference data structure (B-trees or red—black trees).

Exercises:

Sets and information retrieval

The techniques discussed for sets show up again and again in information retrieval.
In a data management system, each record is identified by a unique key.
The simplest, least efficient method is to use an udneroder list with \(\Theta(n)\) access.
For fast "random access," trees are usually used.
Using data abstraction, we can start with unordered lists and then later change the constructors and selectors to use a tree representation.

Exercises:

Example: Huffman Encoding Trees

A code is a system for converting information (such as text) into another form.
ASCII is a fixed-length code: each symbol uses the same number of bits.
Morse code is variable-length: since "e" is the most common letter,