Errata List for "The Data Science Design Manual" by Steven Skiena Last addition: October 30, 2018 Non-trivial errata are denoted with a (*). I thank Prof. Robert Piche of the Tampere University of Technology for a particularly extensive list of corrections. ============================================================= Page vi reducto -> reductio Page 1 comes challenges -> come challenges Page 1 have lead -> have led Page 3 There exist an important class -> There exists an important class Page 5 line -14 disinterested -> uninterested Page 7 zero to ten stars scale -> one to ten starts scale Page 12, line 10: "FOA" should be "FOIA" Page 15, line 11: "characters" may be better written as "options". Page 16, line 10: (age and martial status) should be (age and marital status). Page 26 each face baring -> each face bearing Page 28, 83, 198, 208, 229, 230, 257, 261, 279, 360, 361, 400, 403, 413 alternate -> alternative Page 29 a French nobleman -> a French writer [“chevalier de Méré” was the pseudonym of Antoine Gombaud, who was not a nobleman] Page 31, 32, 150, 153, 354, 356 Bayes theorem -> Bayes’ theorem (*) Page 31, line -12: P(B) = 8/36, not 9/36. This error propagates through the next two corrections. (*) Page 31, line -11: Should be $P(A)=27/36$ and $P(B|A) = 8/27$. (*) Page 31 ,line -1: Should be $P(B|A)= (1 \cdot 8/36)/(27/36) = 8/27$. (*) Page 32, Figure 2.2: The CDF's jumps should occur at integer-valued abscissas. (*) Page 33 always get higher -> can only get higher (*) Page 33, line -7: The equation for $P(k = X)$ should be: $P(k=X) = C'(k) = C(X \leq k ) - C(X \leq k - \delta)$ Page 36, 109, 128, 287, 400 Alternately -> Alternatively Page 37 there is no real phenomena -> there are no real phenomena Page 40 degree to which $Y$ is a function of $X$ -> degree to which $Y$ is a linear function of $X$ Page 40 social economic status -> socioeconomic status Page 43 $R^2$ -> $r^2$ Page 45 $V(Y)$ -> $V(y)$ Page 45, line 13: In the formula 1-r^2 = V(r)/V(y), the variable r is being used for two different things. The first one is r-square and the second one is residual error. Page 49 does terrible -> does terribly Page 58 statistical methods, which -> statistical methods which Page 62 LaTex -> LaTeX Page 65: "The Freedom of Information Act (FOI) enables" should be "The Freedom of Information Act (FOIA) enables" (*) Page 79 exponentially with $k$ -> exponentially with $k^2$ Page 82, line 12: "then wrote then wrote" should be "then wrote" (*) Page 82, line 14: Delete the sentence: "The median score among my students was closer than any single guess". First it was meant to read "any other guess". But actually 2350 is closer. (*) Page 101, line -9: "92.5th percentile" should be "97.7th percentile" (*) Page 103 unlikely to not make -> unlikely to make Page 115 y and z, prefer -> y and z prefer Page 116 Issac -> Isaac Page 118 a through introduction -> a thorough introduction Page 119 et. al. -> et al. Page 119, problem 4-1: "Suppose we observe X = 5.08." X should be lower case italic x Page 121, 205, 289, 297, 299, 300, 435: Baysian -> Bayesian Page 122 on the look out -> on the lookout (*) Page 124 the most likely number of failures will be zero -> the most likely number of failures will be one [ Prob(#fail=0)=0.3677, Prob(#fail=1)= 0.3681] Page 124 the number of hits per season are drawn -> the number of hits per season is drawn Page 124, 125, 139, 373, 406 phenomenon -> phenomena Page 125 are free to arbitrary real numbers -> are free to be arbitrary real numbers Page 127 the average value of the distribution -> the mean of the distribution (*) Page 128, line -2, $\lambda$ should be $\mu$. Page 132 -- inequality of the world -> inequities of the world Page 132, Section 5.2 Line 2: "it is pays" should be "it pays" (*) Page 132, last line, The equation for $P(k = X)$ should be: $P(k=X) = C'(k) = C(X \leq k ) - C(X \leq k - \delta)$ Page 134 Monte Carlo sampling -> acceptance-rejection sampling Page 136 $1,000 bucks -> 1,000 bucks (*) Page 137, line 15: The order of effect sizes is reversed: it should be "small effects 85\% overlap, medium effects 67\% overlap, and large effects 53\% overlap". Page 138 weighted 166.2 pounds -> weighed 166.2 pounds Page 138 if I what I observed -> if what I observed Page 138 The number of samples are large enough -> The number of samples is large enough (*) Page 144, line 26: (1-.0672) should be (1-.672) Page 152, line -11:: "person will weight over" should be "person will weigh over" Page 154 Each of friends -> Each friend Page 156 changing in the way -> changing the way Page 160 peak at the dot plots -> peek at the dot plots Page 172 Thus grouping -> Grouping Page 176 we are faced -> we are faced with Page 177 \emph{Oh G-d} -> \emph{Oh G-d!} Page 185 when each bin a non-trivial number of element -> when each bin has a non-trivial number of elements Page 187 The cdf is monotonically increasing -> The cdf is monotonically nondecreasing Page 188 Nobel gases -> noble gases [2 occurrences] Page 191 to the handle of a single water pump. They changed the handle -> to a single public water pump. The authorities disabled the pump by removing its handle (*) Page 195 There are few potential downsides -> There are a few potential downsides Page 195 Knobs and features get added because they can, -> Knobs and features get added because they can be, Page 197 Stronger friendships -> Stronger associations Page 200 problems 6-11 - "Describe good practices in data visualization?" Should end with a period instead of question mark Page 200, line -9: Problem 6-13: "How would you to determine" should be "How would you determine" Page 202 a 13th-century theologian -> a 14th-century theologian (*) Page 203, line 7: "Models that do much better on testing data than training data are overfit" should be "Models that do much better on training data than testing data are overfit". Page 203 There are always a range of possible outcomes -> There is always a range of possible outcomes Page 212 reduce the performance of a model -> reduce the performance assessment of a model (*) Page 219, line 2: "false positive and false negative rates" should be "true positive and false positive rates" (**) Page 219, line 3: The true positive rate is just recall, i.e., TP / (TP + FN). The false positive rate is FP / (FP + TN). (**) Page 219, paragraphs 2 and 3: This is a mess, because I reversed left and left and right.. line 5: the sweep should go from right to left. line 8: "very left" should be "very right" line 10: "as far to the right" should be "as far as the left" line 15: "sweep our threshhold to the right" should be "sweep our threshhold to the left" Page 220, line -5: "Ours in a hard task" should be "Ours is a hard task" (*) Page 221, line 3: "none are classified as 1800" should be "almost none (1\%) are classified as 1800" Page 221, line 14: "Sparse" rows and columns in the confusion matrix refers to raw counts. Figure 7.7 has been normalized so the counts are indeterminable. Page 222 adding an constant offset -> adding a constant offset Page 225 Upon taking position of a data set -> Upon taking possession of a data set Page 225: "Observe the the evaluation" should be "Observe the evaluation" Page 236 back testing -> backtesting Page 236, problem 7-16: "following types of betable events" betable should be spelled bettable Page 238, line 8: Add last plus to get$y = c_0 + c_1 x_1 + c_2 x_2 + \cdots + c_{m-1} x_{m-1}$(*) Page 239, line 6: The left variable of the equation should be$w$, not$c$. (*) Page 239, line 7: "Ax = b" should be "Aw = b". Page 243, Figure 8.4 caption: the transposition is in the center, not on the right as printed. (*) Page 243, line 1: The transpose of a matrix sort of “rotates” it by 180 degrees -> Transposing a matrix flips it about the main diagonal [ transposition is not a rotation, Lincoln^T parts his hair differently from Lincoln ] (*) Page 243 The matrix product$X Y^T$of these two vectors produces a$1 \times 1$matrix -> The matrix product$X^T Y$of these two column-vectors produces a$1 \times 1$matrix (*) Page 243$C_{ij}= \sum_{i=1}^k A_{ik}\cdot B_{kj}$->$C_{ij}= \sum_{l=1}^k A_{il} B_{lj}$(*) Page 243 For any pair of non-square matrices A and B, at most one of either AB or BA has compatible dimensions. -> [remove the statement; a counterexample is B=A^T]. (*) Page 244, line 12: Change 4 to 3 in lower-left entry of leftmost matrix: $$\Bigg( \begin{bmatrix} 1 & 2 \\ 3 & 4 \\ \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & 2 \\ \end{bmatrix} \Bigg)$$ (*) Page 248-249 We say$A^{−1}$is the multiplicative inverse of matrix$A$if$A \cdot A^{−1} = I$-> We say$A^{−1}$is the multiplicative inverse of square matrix$A$if$A A^{−1} = I$, from which it follows that$A^{−1}A=I$. (*) Page 249 such matrices are not invertible or \emph{singular}, -> such matrices are not invertible, they are \emph{singular}, Page 249 & 264 matricies -> matrices Page 250$C \cdot x$->$C X$(*) Page 251 there is not enough information to solve an undetermined system of linear equations -> there is not enough information to determine a unique solution of an undetermined system of linear equations (*) Page 257 Section 8.5, line 3:$\lambda_i \geq \lambda_{i−i} $for all$i$->$\lambda_i \geq \lambda_{i−1} $for all$i>1$(*) Page 259, Section 8.5.1, lines 15 and 16: In both the equation and the line following,$B_k^T$should be$B_k$. (*) Page 259 Should$M$contain convex numbers -> Should$M$contain complex numbers Page 264 8-10 "Ax=b?" Should end with a period instead of question mark (*) Page 264 Problem 8.17 Show that the eigenvalues of$MM^T$are the same as that of$M^T M$-> Show that for any square matrix$M$, the eigenvalues of$MM^T$are the same as those of$M^T M$Page 265 What are the size of the numerical residuals -> What are the sizes of the numerical residuals Page 268 verses -> versus Page 269 a singular point -> a single point (*) Page 270 is the projection of$y_i − f(X)$down to$X$-> is$y_i-f(x_i)$(*) Page 271, line 20: In the right hand side of the equation for$w_1$, the$(sigma_x/sigma_y)$part should be flipped to$(sigma_y/sigma_x)$. (*) Page 273 it is a linear function of its non-linear input values -> it is a linear function of the non-linear functions of input values Page 275, line -4: between 0 and 12 + 4 + 5 = 19 years, 19 --> 21 Page 280 the error function is shaped sort of like a parabola -> the error function is shaped like a parabola Page 280 has exactly one local minima -> has exactly one local minimum [and many other occurrences where plural “minima” is used in place of singular “minimum” ] Page 281 maxima or minima -> maximum or minimum Page 281 & 284 optima -> optimum (*) Page 283 Figure 9.9 add "t := t+1” inside the loop (*) Page 283, lines 3,4: there are errors in the equation, which is twice missing partial derivative symbols. 2 --> \342\210\202 ($\partial$) (*) Page 285, line 9: there are errors in the equation, which is twice missing partial derivative symbols. 2 --> âˆ‚ ($\partial$) All partial derivatives should not have a 2 in the numerator. The 1/2 factor cancels from the squared error term when differentiating. (*) Page 287 line 2 w_j^2 -> w_{j-1}^2 (*) Page 287 dependent variables -> independent variables (*) Page 287 Let$\Gamma$be our$n \times n$-> Let$\Gamma$be our$m \times m$(*) Page 287 || \lambda \Gamma w ||^2 -> \lambda || \Gamma w ||^2 Page 287 & 288 criteria -> criterion (*) Page 293 y_i is positive, i.e. y_i=1 -> y_i is negative, i.e. y_i=1 (*) Page 293, line -3: punish it aggressively when$f(y_i) \rightarrow 0$,$y_i$-->$x_i$(*) Page 294, line -1: there is an error in the summation symbol and square bracket. [ \sum --> \sum [ Page 294 Figure 9.16 caption: positive (blue) and negative (right) -> positive (blue) and negative (red) Page 301 the apples residual -> the apple's residual Page 301 the set of variables are -> the set of variables is (*) Page 308, line 25: 0.03 should be$0.03^2$Page 311 represents fair dice roll -> represents a fair dice roll Page 311 does not satisfies -> does not satisfy (*) Page 317 restricting to vectors instead of points. Recall that sets of d-dimensional vectors can be thought of as points -> restricting to unit-length vectors instead of points. Recall that d-dimensional unit-length vectors can be thought of as points Page 318 ”Thus we can compute the exact probability…” : change all instances of “plane” to “line” in this paragraph Page 319 in a particular organisms -> in a particular organism Page 319, line 6: LSH wants similar items to recieve the exact same hash code recieve --> receive (*) Page 320 node-ink diagrams -> node-link diagrams Page 321 the the adjacency matrix -> the adjacency matrix Page 321, line 7: like eigenvalue or singular value decomposition (see Section 8.5) on the the the the --> the (duplicate) Page 323, line -4: people do not see that a distance or similarity matrix is really just a graph than can take advantage of other tools. "than" -> "that" Page 323 dense graphs have a quadratic number of edges -> dense graphs have a number of edges that grows quadratically with the number of vertices Page 325, line 21: The Pagerank formula has a long dash in out-degree, should be \text{out-degree} Page 328, line -5: records in a (say) a hundred distinct subsets each ordered by similarity in a --> into Page 328, line -2: more accurate on this restricted class of items then a general model trained over all items. then --> than Page 332, line -3: sometimes called the k-mediods algorithm mediod --> medoid Page 333, line 17: k-means can proceed by reading the distances off this matrix. off --> off of Page 335 number of true sources -> the number of true sources Page 336 cultural/ethic groupings -> cultural/ethnic groupings Page 336, Figure 10.17 -- this figure is much too low resolution and blurry. Sorry. Page 338 Identifying the right position -> Identifying the correct position Page 339 Figure 10.20 on right -> on the right Page 339 is tree drawn -> is the tree drawn Page 340 this is the criteria -> this is the criterion Page 340, line -11: point in each cluster (medioid) as a representative in the general case medioid --> medoid Page 340, line -4, "... with using the fastest algorithm ," extra space between 'algorithm' and ',' Page 341, line -12: where I asked asked how many clusters asked asked --> asked (duplicate) Page 342 reflecting the similar of -> reflecting the similarity of Page 345 much better soon as -> much better as soon as (*) Page 346 Problem 10-4: angular distance is a distance metric -> angular distance is a distance metric for unit-length vectors Page 346 Problem 10.11 (a): "Might is possible this classifier" should be "Might it be possible that this classifier" Page 349 10-31: How can you ... of the data. Should end with a question mark instead of period Page 353 training test -> training set Page 354 P(X) -> p(X) [2 occurrences] Page 354, first equation (line 13):$P(B)$should be$p(B)$. Page 357 the frequency all outcomes -> the frequency of all outcomes Page 358 faction of the examples -> fraction of the examples Page 360 the complete set of observations of$x_i$are -> the complete set of observations of$x_i$is Page 360 This criteria -> This criterion Page 360, line 4:$t \in (10.5, 11.5, 17)$might be clearer as$t \in (10.5, 12.5, 17)$, since (11 + 14) / 2 = 12.5 Page 363 equality dictates -> fairness dictates (*) Page 365$C_1$if$x_i ≥ t_i$->$C_1$if$x_i < t_i$Page 367 This a natural objective -> This is a natural objective Page 369 The optimization algorithm of efficient solvers -> The optimization algorithms of efficient solvers (*) Page 369, line 6: repeated equations, one should be equal to -1 the other to 1 (*) Page 371, line -8: This take home lesson was badly botched. It should read: "Kernal fuctions are what gives SVMs their power. They project$nd$-dimensional points to$n$dimensions, while somehow still taking no more than$d\$ steps to manipulate each point." Page 373, paragraph 4, line 5: "... and F[i, j] reflects ... times work i ..." work -> word Page 373 represent genuine phenomenon -> represent genuine phenomena Page 375 a large corpora -> a large corpus Page 376 and, if so, replace them -> and, if there are, replace them Page 377 comes down to quality of -> comes down to the quality of Page 378 the full weight of these networks are -> the full weight of these networks is Page 378 split out -> spit out Page 378 The value actually passed to y is the -> The value actually passed to y is (*) Page 382 The ReLU function remains differentiable -> The ReLU function remains differentiable except at x=0 Page 384 the number of documents in corpus -> the number of documents in the corpus Page 386 training text be that would -> training text that would Page 389 a large text corpora -> a large text corpus Page 389 11-14 What are some ... from traditional machine learning No punctuation at end, should have question mark (*) Page 403 second display equation, V(a) = -> \frac{n-1}{n} V(A) = [the RHS is equal to the “divide by n” formula of sample variance, not the “divide by n-1” formula] Page 406 the chances of such phenomenon -> the chances of such a phenomenon Page 413 a largest piece that is often substantially higher -> a largest piece whose size is often substantially greater Page 419 line 7, Chapter notes: Rajarman -> Rajaraman Page 424 Be aware -> Beware