ISE 390 -- Programming Challenges -- Week 3 Notes This Weeks Goals: To discuss string libraries and representations To introduce this weeks problems on character strings. Hints from last class: One common source of presentation errors seems to be people having extra blank characters at the end of lines (before the carriage return). Try to eliminate them. Programming Ideas: The problems this week all involve using text strings in one way or another. (1) Understand something about character code designs. In ASCII, upper and lower case numbers and the digits appear sequentially, so you can iterate through by incrementing. More modern international character code designs (unicode) use two bytes per symbol and can represent any character in any language on earth. This kind of knowledge is useful if you want to test or convert between upper and lower case. (2) Whether you work in C, C++, or Java, be aware of the support provided for characters and strings through libraries or classes. The following useful functions appear in the C language *character* library, #include /* these are the most useful */ int isalpha(int c); int isupper(int c); int islower(int c); int isdigit(int c); int ispunct(int c); int toupper(int c); int tolower(int c); Check the definition of each carefully before assuming it does what you want. /* these are more specialized */ int isxdigit(int c); /* hexadecimal */ int isalnum(int c); /* number or digit */ int isspace(int c); int isprint(int c); int isgraph(int c); int iscntrl(int c); int isascii(int c); The following useful functions appear in the C language string library, #include (*) char *strcat(char *dst, const char *src); /* concatenation */ char *strncat(char *dst, const char *src, size_t n); (*) char *strchr(const char *s, int c); /* character searching, first..*/ char *strrchr(const char *s, int c); /* and last occurence */ (*) int strcmp(const char *s1, const char *s2); /* string comparison */ int strncmp(const char *s1, const char *s2, size_t n); (*) char *strcpy(char *dst, const char *src); /* copy from src to dist */ char *strncpy(char *dst, const char *src, size_t n); size_t strcspn(const char *s1, const char *s2); /* length of match */ size_t strspn(const char *s1, const char *s2); char *strdup(const char *s1); /* duplicate string */ (*) size_t strlen(const char *s); /* length of string */ char *strpbrk(const char *s1, const char *s2); /*find any char from s2*/ (*) char *strstr(const char *s1, const char *s2); /* string searching */ (*) char *strtok(char *s1, const char *s2); /* iterate through words in s1 */ char *strtok_r(char *s1, const char *s2, char **lasts); There is a second strings library which has a few more functions, but is less common. #include int strcasecmp(const char *s1, const char *s2); /*case insensive strcmp*/ int strncasecmp(const char *s1, const char *s2, int n); The C++ string class has methods for many of these operations and more: string::size() /* string length */ string::empty() /* is it empty */ string::c_str() /* return a C string, so the functions above apply */ string::operator [](size_type i) /* access the ith character */ string::append(s) /* append to string */ string::erase(n,m) /* delete a run of characters */ string::insert(size_type n, const string&s) /* insert string s at position n */ string::find(s) string::rfind(s) /* search left or right for the given string */ string::first() string::last() /* get characters, also there are iterators */ Overloaded operators exist for concatenation and string comparison. Assigned Problems: 175 -- Look for patterns in book titles. Eliminate non-alphabetial characters. 272 -- Replace " by `` or '' as appropriate. The right library functions should make very short work of it. 306 -- Encode strings through a permutation cipher. Treat strings as arrays of characters. 333 -- Implement the checksum scheme used in ISBN numbers, but eliminate non-digits from consideration. 335 -- assign IP routing addresses according to a priority scheme, maintaining which machines are up and down. 385 -- Write a "gene finding" program to identify protein sequences in DNA sequences. Use array initializations to set up the basic coding table. Problems for Discussion ----------------------- Design of Checksum Functions Why does ISBN use such a complicated checksum method? Why don't they just add up digits and take mod 11? (Hint: think about common types of errors: transpositions and substitutions) Cycles in Permutations Permutations can be thought of as rearrangement operators as well as combinatorial objects. The permutation {3,1,4,2} can be thought of as taking the first element to the third position, bumping the third element to the fourth position, bumping the fourth element into the second position, and finally the second element into the first position, completing a cycle. These rearrangements form a set of disjoint cycles for any permutation. {3,1,4,2} has one cycle, while {2,1,4,3} has two cycles and {1,2,3,4} has four cycles. Understanding the cycle structure of permutations gives us a way to think about many problems. Analyzing the generalized clock patience of problem 170 with 52 piles asked for the probability that a permutation has exactly one cycle. Question: for the code of problem 306 (Cipher), for what combinations of permutation and repeat counts do you get the input text string back as the encrypted text? Gene Recognition Why is gene recognition harder to do in practice than problem 385?