Skip to content

Symbolic Data

  • So far our compound data has been made up of numbers.
  • Now, we will start working with arbitrary symbols.

Quotation


  • Lists with symbols look just like Lisp code, except we quote them.
  • To quote in Lisp, we use the special form #!scheme (quote exp, or the short hand 'exp.
  • For example, 'x evaluates to the symbol "x" instead of the variable x's value.
  • This is just like natural language. If I say, "Say your name," you will say your name. If I instead say, "Say 'your name'", you will literally say the words "your name."
(define a 1)
(define b 2)

(list a b)
 (1 2)

(list 'a 'b)
 (a b)

(list 'a b)
 (a 2)
  • Now that we have quotation, we will use '() for the empty list instead of nil.
  • We need another primitive now: eq?. This checks if two symbols are the same.
(eq? 'a 'b)
 #f

(eq? 'a 'a)
 #t

Example: Symbolic Differentiation


  • Lets design a procedure that computes the derivative of an algebraic expression.
  • This will demonstrate both symbol manipulation and data abstraction.

Symbolic differentiation is of special historical significance in Lisp. It was one of the motivating examples behind the development of a computer language for symbol manipulation.

The differentiation program with abstract data

  • To start, we will only consider addition and multiplication.
  • We need three differentiation rules: constant, sum, and product.
  • Let's assume we already have these procedures
Procedure Result
(variable? e) Is e a variable?
(same-variable? v1 v2) Are v1 and v2 the same variable?
(sum? e) Is e a sum?
(addend e) Addend of sum e
(augend e) Augend of sum e
(make-sum a1 a2) Construct the sum of a1 and a2
(product? e) Is e a product?
(multiplier e) Multiplier of the product e
(multiplicand e) Multiplicand of the product e
(make-product m1 m2) Construct the product of m1 and m2
- Then, we can express the differentiation rules like so:
(define (deriv exp var)
  (cond ((number? exp) 0)
        ((variable? exp)
         (if (same-variable? exp var) 1 0))
        ((sum? exp)
         (make-sum (deriv (addend exp) var)
                   (deriv (augend exp) var)))
        ((product? exp)
         (make-sum
          (make-product (multiplier exp)
                        (deriv (multiplicand exp) var))
          (make-product (deriv (multiplier exp) var)
                        (multiplicand exp))))
        (else
         (error "unknown expression type" exp))))

Representing algebraic expressions

  • There are many ways we could represent algebraic expressions.
  • The most straightforward way is parenthesized Polish notation: Lisp syntax.
  • We can simplify answers in the constructor just like in the rational number example.
  • Simplifying the same way a human would is hard, partly because the most "simplified" form is sometimes subjective.

Exercises:

Example: Representing Sets


There are a number of possible ways we could represent sets. A set is a collection of distinct objects. Our sets need to work with the following operations:

Procedure Result
element-of-set? x s Is x a member of the set s?
adjoin-set x s Set containing x and the elements of set s
union-set s t Set of elements that occur in either s or t
intersection-set s t Set of elements that occur in both s and t
### Sets as unordered lists
- One method is to just use lists. Each time we add a new element, we have to check to make sure it isn't already in the set.
- This isn't very efficient. element-of-set? and adjoin-set are #\Theta(n)#, while union-set and intersection-set are \(\Theta(n^2)\).
- We can make adjoin-set \(\Theta(1)\) if we allow duplicates, but then the list can grow rapidly, which may cause an overall decrease in performance.

Exercises:

Sets as order lists

  • A more efficient representation is a sorted list.
  • To keep things simple, we will only consider sets of numbers.
  • Now, scanning the entire list in element-of-set? is a worst-case scenario. On average we should expect to scan half of the list.
  • This is still \(\Theta(n)\), but now we can make union-set and intersection-set \(\Theta(n)\) too.

Exercises:

Sets as binary trees

  • An even more efficient representation is a binary tree where left branches contain smaller numbers and right branches contain larger numbers.
  • The same set cna be represented by such a tree in many ways.
  • If the tree is balanced, each subtree is about half the size of the original tree. This allows us to implement element-of-set? in \(\Theta(\log(n))\) time.
  • We'll represent each node by (list number left-subtree right-subtree).
  • Efficiency hinges on the tree being balanced. We can write a procedure to balance trees, or we could use a difference data structure (B-trees or red—black trees).

Exercises:

Sets and information retrieval

  • The techniques discussed for sets show up again and again in information retrieval.
  • In a data management system, each record is identified by a unique key.
  • The simplest, least efficient method is to use an udneroder list with \(\Theta(n)\) access.
  • For fast "random access," trees are usually used.
  • Using data abstraction, we can start with unordered lists and then later change the constructors and selectors to use a tree representation.

Exercises:

Example: Huffman Encoding Trees


  • A code is a system for converting information (such as text) into another form.
  • ASCII is a fixed-length code: each symbol uses the same number of bits.
  • Morse code is variable-length: since "e" is the most common letter,