Symbolic Data
- So far our compound data has been made up of numbers.
- Now, we will start working with arbitrary symbols.
Quotation
- Lists with symbols look just like Lisp code, except we quote them.
- To quote in Lisp, we use the special form
#!scheme(quote exp, or the short hand'exp. - For example,
'xevaluates to the symbol "x" instead of the variablex's value. - This is just like natural language. If I say, "Say your name," you will say your name. If I instead say, "Say 'your name'", you will literally say the words "your name."
- Now that we have quotation, we will use
'()for the empty list instead ofnil. - We need another primitive now:
eq?. This checks if two symbols are the same.
Example: Symbolic Differentiation
- Lets design a procedure that computes the derivative of an algebraic expression.
- This will demonstrate both symbol manipulation and data abstraction.
Symbolic differentiation is of special historical significance in Lisp. It was one of the motivating examples behind the development of a computer language for symbol manipulation.
The differentiation program with abstract data
- To start, we will only consider addition and multiplication.
- We need three differentiation rules: constant, sum, and product.
- Let's assume we already have these procedures
| Procedure | Result |
|---|---|
(variable? e) |
Is e a variable? |
(same-variable? v1 v2) |
Are v1 and v2 the same variable? |
(sum? e) |
Is e a sum? |
(addend e) |
Addend of sum e |
(augend e) |
Augend of sum e |
(make-sum a1 a2) |
Construct the sum of a1 and a2 |
(product? e) |
Is e a product? |
(multiplier e) |
Multiplier of the product e |
(multiplicand e) |
Multiplicand of the product e |
(make-product m1 m2) |
Construct the product of m1 and m2 |
| - Then, we can express the differentiation rules like so: |
(define (deriv exp var)
(cond ((number? exp) 0)
((variable? exp)
(if (same-variable? exp var) 1 0))
((sum? exp)
(make-sum (deriv (addend exp) var)
(deriv (augend exp) var)))
((product? exp)
(make-sum
(make-product (multiplier exp)
(deriv (multiplicand exp) var))
(make-product (deriv (multiplier exp) var)
(multiplicand exp))))
(else
(error "unknown expression type" exp))))
Representing algebraic expressions
- There are many ways we could represent algebraic expressions.
- The most straightforward way is parenthesized Polish notation: Lisp syntax.
- We can simplify answers in the constructor just like in the rational number example.
- Simplifying the same way a human would is hard, partly because the most "simplified" form is sometimes subjective.
Exercises:
Example: Representing Sets
There are a number of possible ways we could represent sets. A set is a collection of distinct objects. Our sets need to work with the following operations:
| Procedure | Result |
|---|---|
element-of-set? x s |
Is x a member of the set s? |
adjoin-set x s |
Set containing x and the elements of set s |
union-set s t |
Set of elements that occur in either s or t |
intersection-set s t |
Set of elements that occur in both s and t |
| ### Sets as unordered lists | |
| - One method is to just use lists. Each time we add a new element, we have to check to make sure it isn't already in the set. | |
- This isn't very efficient. element-of-set? and adjoin-set are #\Theta(n)#, while union-set and intersection-set are \(\Theta(n^2)\). |
|
- We can make adjoin-set \(\Theta(1)\) if we allow duplicates, but then the list can grow rapidly, which may cause an overall decrease in performance. |
Exercises:
Sets as order lists
- A more efficient representation is a sorted list.
- To keep things simple, we will only consider sets of numbers.
- Now, scanning the entire list in
element-of-set?is a worst-case scenario. On average we should expect to scan half of the list. - This is still \(\Theta(n)\), but now we can make
union-setandintersection-set\(\Theta(n)\) too.
Exercises:
Sets as binary trees
- An even more efficient representation is a binary tree where left branches contain smaller numbers and right branches contain larger numbers.
- The same set cna be represented by such a tree in many ways.
- If the tree is balanced, each subtree is about half the size of the original tree. This allows us to implement
element-of-set?in \(\Theta(\log(n))\) time. - We'll represent each node by
(list number left-subtree right-subtree). - Efficiency hinges on the tree being balanced. We can write a procedure to balance trees, or we could use a difference data structure (B-trees or red—black trees).
Exercises:
Sets and information retrieval
- The techniques discussed for sets show up again and again in information retrieval.
- In a data management system, each record is identified by a unique key.
- The simplest, least efficient method is to use an udneroder list with \(\Theta(n)\) access.
- For fast "random access," trees are usually used.
- Using data abstraction, we can start with unordered lists and then later change the constructors and selectors to use a tree representation.
Exercises:
Example: Huffman Encoding Trees
- A code is a system for converting information (such as text) into another form.
- ASCII is a fixed-length code: each symbol uses the same number of bits.
- Morse code is variable-length: since "e" is the most common letter,