Lecture 1 (09/05/00) Overview: program construction and problem
solving, program transformation and program analysis, tools and
applications; two examples: longest common subsequence and image
blurring.
Lecture 2 (09/07/00) Problems in programming: clarity and simplicity
vs efficiency and language-specific details, good functional and
object-oriented programming, a simple list processing example.
Exercises: Handout E1. Reading: Chapter 5 of the textbook
"Introduction to Algorithms" by Cormen, Leiserson, and Rivest.
_______________________________________________________________________________
Handout syllabus, form to fill. Call me Annie.
What is this subject like? what is this course like?
Motivation:
Q: a central task/activity in CS?
construction of correct, efficient computer programs, productively.
anything else? and arguments? not really. cost drives everything.
Q: what subjects in CS are the basis for addressing this?
algorithms, programming
anything else? arguments? not really, but need to combine them.
Q: what attract you to study CS?
easier to get better jobs?
not all, o.w., you would do business: get twice as much
the subjects above, a kind of prob solving!
Q: so, what is problem solving in CS?
precisely: problem specification ----> machine executable code
e.g. finding the greatest common divisor of two numbers
factorial of a given number; Fibonacci function; Ackermann function
sort a list of numbers; searching
longest common subsequence; paragraph formatting;
...
many in network and distributed systems (routing): MST, shortest path
hardware design (circuit design, layout): multiply two num's
compilers (lex anal,parsing,sem anal,dataflow anal,code gen):
verification: reachability, SCC, topo sort
database query:
security (computing security levels): SCC, topo sort too
Q: what's important (forms challenge/difficulty) in such problem solving?
precise (down to earth, realized in bits), correct, efficient
go through problem analysis/specification
--- first understanding problem
design, code, test/debug/profile/performance tuning/(analyze/verify)
--- then solve problem
possibly re-analyze, design, code, test...
e.g., reachability, in a graph. say code in C.
is it a directed graph?
what strategies/steps? what precise data format/representation?
confidence in correctness? efficiency?
first challenge: problem analysis/specification, will see
but usually can only do well after knowing the rest well.
Q: 2 major aspects of such problem solving?
algorithms, in pseudo code, flow graph, English, some "spec" lang: UML.
programming, in particular programming language: C, Java, List, ...
latter is lower-level, and former is higher, as shown in companies.
also lower-level/non-CS courses vs ..., with data struct in middle.
so algorithm is at higher-level, more critical in ways.
but way from enough to meet the challenges:
lots of work to go from pseudo code to code.
lack systematic methods for design (mimicking a classical algorithm)
goal:
look at how to design algorithms, how analysis can guide design, and
how to do so systematically, even automatically, by transformation
not merely study designed algorithms and analysis of their efficiency,
not manually turn design / pseudo code into code.
(looking for methods, algorithms for algorithm design, at a meta level
Algorithms: steps, procedure for problem solving.
in Webster: a procedure for solving a mathematical problem (as of finding
the greatest common divisor) in a finite number of steps that
frequently involves repetition of an operation; broadly: a
step-by-step procedure for solving a problem or accomplishing some end)
mean:
do design precisely/formally based on languages, by program transformation
--- programming for algorithms
look at systematic methods (steps) for such design and analysis
--- algorithms for programming
if fully succeed, complete problem solving. that will be in the future.
even if partly, significant improvements to the state of art.
already better design/analysis for many real-world problems than
done by CS experts. (much more complex than textbook algorithms)
more goal:
so can focus on interesting parts of problem analysis and problem solving.
also look at what existing methods can't do, for research.
mean:
collect important problems and algorithms, from textbooks and applications
talk about best general methods known, with known derivations as examples.
try derivation yet undone (new sol) and on new problems (first sol),
may lead to new general methods
Summary of motivation/goals/problems: program construction
correctness/clarity, efficiency, productivity
(clearer code is easier to see correct or not,
and more importantly, easier to maintain for productivity/cost reasons)
issues:
specification: this course starts with executable programs.
trade-offs: (correctness <--> efficiency) <--> productivity
Methods and techniques:
transformational programming, step-wise refinement; (synthesis)
object-oriented programming, composable software; (reuse, modification)
program transformation: mostly source-to-source, optimization
basic algebraic properties: (0)
a-a =0, a and a =a, -- primitives
first(pair(a,b)) =a, -- data structures
if true a b =a -- control structures
fun fold,unfold,inlining, loop unrolling, peeling
-- abstraction, modularity, reuse
composition / fusion / stream processing / deforestation:
e.g., sum squares of numbers in a list
composing list traversal, squares, and sum (reuse code)
but removing intermediate list of squares.
specialization / partial eval / mixed comp / staging:
e.g., if f(x,y) def= x*x+y, then f(5,y)=25+y.
specialized for x=5, more efficient
especially when called repeated on different y's
incrementalization / finite differencing / memoization / tabulation
/ promotion & accumulation / tupling...
e.g., if sum(x) def= ..., then
if sum(x)=r, x'=x plus new element y
then sum(y')=y+r, call this sum'(y,r).
e.g., if #S=r, then #(S U {a}) = r + 1
crucial if #S is used in a loop body that adds elements
data representation selection: set -> list, arrays.
program analysis:
abstraction:
e.g., live or dead, not value;
date only, not all values, for Y2K problem.
explain using examples in (0) above
dependencies: forward vs backward
e.g., dead-code analysis is a backward analysis
looking for date dependent computation is forward
data-flow analysis:
abstract interpretation:
set constraints:
types:
Tools:
compilers;
language-based environments;
visualization tools; ...
tool generators;
compiler vs program manipulation environment:
(like advanced compilers; different from traditional compilers)
Compilers: Program Manipulation Environments:
stand alone interactive
functional reactive
batch incremental
imperative declarative, constraints-oriented
automatic semi-automatic
many additional issues:
pretty print, editing, tool interaction
Applications: solving many problems systematically. two examples.
1. longest common subsequence
the following function computes the length of the longest common
subsequence of sequences $x$ and $y$ of lengths $n$ and $m$,
respectively.
(much easier to write this recursion)
lcs(n,m) = if n=0 or m=0 then 0
else if x[n]=y[m] then 1+lcs(n-1,m-1)
else max(lcs(n,m-1),lcs(n-1,m))
this function contains repeated function calls, and takes exponential
time. write a program for this function that has no repeated function
calls and takes only O(n*m) time.
Clearly, the optimized program is an exponential factor faster.
2. image blurring
the following program takes an n-by-n image in array a, blurs it
into array s, and takes n*n*m*m time.
(much easier to write 4 nested loops)
for i = 0 to n-m
for j = 0 to n-m
s[i,j]=0;
for k = 0 to m-1
for l = 0 to m-1
s[i,j]=s[i,j]+a[i+k,j+l]
s[i,j]=s[i,j]/(m*m)
end
end
write an efficient program that computes exactly the same but takes
only O(n*n) time.
The optimized program is again much more difficult to write, but it
can be many many times faster depending on m.
_______________________________________________________________________________
move to next Wed 12-5p. will email to let you the meeting time and place.
OK Wed. 12-2:30p, still in CS 2212.
Review: a CS central task: constructing programs
2 levels: algorithm design at higher level, programming at lower level
preview ideas of some transformations and analyses.
longest common subsequence, image blurring are bonus-like homework.
today:
example programming problems before we study design and analysis later.
CS is data processing, information processing. handle lots of data.
how to represent data, so we can talk about them, do computation on them?
We used arrays for longest common subsequence and image blurring.
Other data structures include simple lists or complicated graphs.
1. a simple list processing example.
TO SHOW ALL THE PROBLEMS WE WANT TO AVOID!
Write a program that takes an element i of a list x and x, and
returns the rest of list without the first occurrence of i.
First a specification problem, can we update list x?
if x is used for other purposes, then we can not update it;
otherwise we can (and should, since it save allocating new
cells and freeing the old cells, much more efficient).
Code that updates pointers is always difficult to write correctly.
So assume that we just want to return a new list.
in C:
define a list struct. for simplicity, use list of int only
typedef struct {
int head;
void* tail;
} List;
code? there are >100 ways of writing it, using a loop.
I have one, but I am not giving it here.
try your best in the exercises
can write a recursive function (functional programming).
it is much easier to see that it is correct.
List* rest(int i, List* x) {
if (i == x->head) {
return x->tail;
} else {
List* cell = (List *)malloc(sizeof(List)); (1)
cell->head = x->head; (2)
cell->tail = rest(i,x->tail); (3)
return cell; (4)
}
}
can define a constructor mkList (object-oriented programming)
and change lines (1)-(4) above with (5) below.
the code is clearer, more modular, allows more reuse.
List* mkList(int a, List* b) {
List* cell = (List *)malloc(sizeof(List));
cell->head = a;
cell->tail = b;
return cell;
}
return mkList(x->head,rest(i,x->tail))". (5)
other language specific things (driver, input/output, more memory management)
driver:
int main(int argc, char* argv[]) {
...(rest(3, mkList(1,mkList(3,NULL)))); (6)
}
input and output: here output only even
replace ... in (6) with printList
void printList(list * x) {
list* temp = x;
printf("[ ");
while (temp != NULL) {
printf("%i ",temp->car);
temp = temp->cdr;
}
printf("]\n");
}
memory management: free a list of int.
need to assign the list in (6) to a variable, call freeList on it after rest.
void freeList(list* x) {
list* cur = x;
list* next;
while (cur != NULL) {
next = cur->cdr;
free(cur);
cur = next;
}
}
summarize: high-level design is simple, but has too much low-level stuff
in Java: OO. should define toString() instead of a separate printList.
no need to free memory, has garbage collection.
class List {
int head;
List tail;
public List (int hd, List tl) {
head = hd;
tail = tl;
}
public String toString() {
return tail==null ? head+"\n" : head+" "+tail.toString();
}
}
class Rest {
static List rest (int i, List x) {
if (i==x.head) return x.tail;
else return (new List(x.head,rest(i,x.tail)));
}
public static void main (String[] args) {
System.out.print(rest(3,new List(1,new List(3,new List(2,null)))));
}
}
could define printList in class Rest and call it instead of System.out.print.
but would be bad OO style.
public static void printList (List x) {
while (x!=null) {
System.out.print(x.head+" ");
x=x.tail;
}
System.out.println();
}
in Lisp:
built-in list, succinct, but strange name, also can not add new; not typed
(defun rest (x l)
(if (equal x (car l)) (cdr l)
(cons (car l) (rest x (cdr l)))))
driver: (rest '3 (cons 1 (cons 3 nil))) or (rest '3 '(1 3)))
return value of the exp is printed; no need to write one for output.
auto GC
in Scheme: similar to Lisp
(define (rest x l)
(if (equal? x (car l)) (cdr l)
(cons (car l) (rest x (cdr l)))))
same driver: (rest '3 '(1 3))
same output, could use (display ...)
auto GC
in ML: (typed, better than Java in terms its polymorphism;
but types can get in way, as in Java sometimes;
and type errors can be hard to understand)
fun rest (x, head::tail) =
if (x = head) then tail
else head::(rest x tail);
driver: rest 3 [1 3])
output: use output (out, makeString n)
need a loop or recursion.
auto GC
Summarize:
if you don't undertand all the details, it is not a big problem,
as long as you understand the following point:
functional, object-oriented, GC -> easier for writing better code.
easier to write and easier to understand what it is doing,
which is the no 1 most important thing in programming.
GC: in many new languages, not in C
but C allows finer performance tuning.
object-oriented styles, other language specific things: important
but can be done with automatic code generation
functional: e.g., easier to write recursions on recursive data like lists
but inefficient, e.g, using a loop for rest can be 30 times faster.
What we will do:
First, we write something like below
rest(x,l) def= if x=head(l) then tail(l)
else mkList(head(l),rest(x,tail(l)))
Then, use systematic transformation to improve efficiency,
including efficient memory management.
can use well-known compiler technology to generate code in specific lang's
more good (clearer, and easier to write) and bad (even less efficient)
examples of list manipulation using functional style code:
append(a,b) def= if empty(a) then b
else mkList(head(a),append(tail(a),b))
takes |a| recursive calls, each allocating a new cell.
reverse(l) def= if empty(l) then emptyList
else append(reverse(tail(l)),mkList(head(l),emptyList))
takes quadratic time.
fac(n) def= if n=1 then 1
else n*fac(n-1)
List is actually simple. What about graphs?
2. a simple graph reachability example.
given a graph and a set of nodes, find all the nodes in the graph
reachable from the given set of nodes.
If write C or Java, a blackboard, like we did for rest, is not large enough.
But using a functional language, it is nontrivial to write graph algorithms.
But there are methods that allow us to
First, describe the precise thing we want in one line:
if a node is reachable, then we can follow an edge to
get another node that is reachable.
Second, systematically (actually there is a system that automatically)
transforms it into 100-200 lines of C code that has
all the right data structures and manipulations.
e.g., adjacency-list representation of the graphs.
queue for depth-first search or stack for breath-first search.
We will continue next time.