Type 'q()' to quit R. > # Small example to illustrate rpart() on the kyphosis data > > > library("rpart") > set.seed(784) > kyphosis[sample(nrow(kyphosis), 4), ] Kyphosis Age Number Start 8 absent 37 3 16 43 absent 143 9 3 52 absent 9 2 17 66 absent 17 4 10 > > fit <- rpart(Kyphosis ~ ., data = kyphosis, method = "class") > > fit n= 81 node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 81 17 absent (0.79012346 0.20987654) 2) Start>=8.5 62 6 absent (0.90322581 0.09677419) 4) Start>=14.5 29 0 absent (1.00000000 0.00000000) * 5) Start< 14.5 33 6 absent (0.81818182 0.18181818) 10) Age< 55 12 0 absent (1.00000000 0.00000000) * 11) Age>=55 21 6 absent (0.71428571 0.28571429) 22) Age>=111 14 2 absent (0.85714286 0.14285714) * 23) Age< 111 7 3 present (0.42857143 0.57142857) * 3) Start< 8.5 19 8 present (0.42105263 0.57894737) * > printcp(fit) Classification tree: rpart(formula = Kyphosis ~ ., data = kyphosis, method = "class") Variables actually used in tree construction: [1] Age Start Root node error: 17/81 = 0.20988 n= 81 CP nsplit rel error xerror xstd 1 0.176471 0 1.00000 1.00000 0.21559 2 0.019608 1 0.82353 0.88235 0.20565 3 0.010000 4 0.76471 0.94118 0.21078 > summary(fit) Call: rpart(formula = Kyphosis ~ ., data = kyphosis, method = "class") n= 81 CP nsplit rel error xerror xstd 1 0.17647059 0 1.0000000 1.0000000 0.2155872 2 0.01960784 1 0.8235294 0.8823529 0.2056488 3 0.01000000 4 0.7647059 0.9411765 0.2107780 Variable importance Start Age Number 64 24 12 Node number 1: 81 observations, complexity param=0.1764706 predicted class=absent expected loss=0.2098765 P(node) =1 class counts: 64 17 probabilities: 0.790 0.210 left son=2 (62 obs) right son=3 (19 obs) Primary splits: Start < 8.5 to the right, improve=6.762330, (0 missing) Number < 5.5 to the left, improve=2.866795, (0 missing) Age < 39.5 to the left, improve=2.250212, (0 missing) Surrogate splits: Number < 6.5 to the left, agree=0.802, adj=0.158, (0 split) Node number 2: 62 observations, complexity param=0.01960784 predicted class=absent expected loss=0.09677419 P(node) =0.7654321 class counts: 56 6 probabilities: 0.903 0.097 left son=4 (29 obs) right son=5 (33 obs) Primary splits: Start < 14.5 to the right, improve=1.0205280, (0 missing) Age < 55 to the left, improve=0.6848635, (0 missing) Number < 4.5 to the left, improve=0.2975332, (0 missing) Surrogate splits: Number < 3.5 to the left, agree=0.645, adj=0.241, (0 split) Age < 16 to the left, agree=0.597, adj=0.138, (0 split) Node number 3: 19 observations predicted class=present expected loss=0.4210526 P(node) =0.2345679 class counts: 8 11 probabilities: 0.421 0.579 Node number 4: 29 observations predicted class=absent expected loss=0 P(node) =0.3580247 class counts: 29 0 probabilities: 1.000 0.000 Node number 5: 33 observations, complexity param=0.01960784 predicted class=absent expected loss=0.1818182 P(node) =0.4074074 class counts: 27 6 probabilities: 0.818 0.182 left son=10 (12 obs) right son=11 (21 obs) Primary splits: Age < 55 to the left, improve=1.2467530, (0 missing) Start < 12.5 to the right, improve=0.2887701, (0 missing) Number < 3.5 to the right, improve=0.1753247, (0 missing) Surrogate splits: Start < 9.5 to the left, agree=0.758, adj=0.333, (0 split) Number < 5.5 to the right, agree=0.697, adj=0.167, (0 split) Node number 10: 12 observations predicted class=absent expected loss=0 P(node) =0.1481481 class counts: 12 0 probabilities: 1.000 0.000 Node number 11: 21 observations, complexity param=0.01960784 predicted class=absent expected loss=0.2857143 P(node) =0.2592593 class counts: 15 6 probabilities: 0.714 0.286 left son=22 (14 obs) right son=23 (7 obs) Primary splits: Age < 111 to the right, improve=1.71428600, (0 missing) Start < 12.5 to the right, improve=0.79365080, (0 missing) Number < 3.5 to the right, improve=0.07142857, (0 missing) Node number 22: 14 observations predicted class=absent expected loss=0.1428571 P(node) =0.1728395 class counts: 12 2 probabilities: 0.857 0.143 Node number 23: 7 observations predicted class=present expected loss=0.4285714 P(node) =0.08641975 class counts: 3 4 probabilities: 0.429 0.571 > names(fit) [1] "frame" "where" "call" [4] "terms" "cptable" "method" [7] "parms" "control" "functions" [10] "numresp" "splits" "variable.importance" [13] "y" "ordered" > > > > if(FALSE) { + postscript("kyphosis.eps", + horiz = FALSE, onefile = FALSE, print.it = FALSE, + width = 9, height = 5) + } else { + pdf("kyphosis.pdf", + onefile = FALSE, + width = 9, height = 5) + } > > par(mar = c(5, 4, 2, 2) + 0.8, mfrow = c(1, 1), xpd = TRUE) > plot(fit) > text(fit) > dev.off() null device 1 > > pfit <- predict(fit, kyphosis, type = "class") > tab <- table(with(kyphosis, Kyphosis), pfit) > > # Error checking: the table must be square > if(!all(dimnames(tab)[[1]] == dimnames(tab)[[2]])) + stop("not all dimnames are the same") > > # Obtain the (resubstitution) misclassification rate > 1 - sum(diag(tab)) / sum(tab) [1] 0.1604938 > # Other ways of obtaining the (resubstitution) misclassification rate > rev(fit$cptable[,"rel error"])[1] * rev(fit$parms$prior)[1] 3 0.1604938 > tail(fit$cptable[,"rel error"], 1) * tail(fit$parms$prior, 1) 2 0.1604938 > > > > > proc.time() user system elapsed 0.190 0.027 0.198