Quickly Computing OOB Error Estimates
If you’re using caret
or randomForest
, it can be helpful to compute the OOB error estimate. This value is given when you print the model output; however, as far as I can tell, the value is not available as a property. If I’m wrong about this, please let me know.
print(my_model$finalModel)
##
## Call:
## randomForest(x = x, y = y, mtry = param$mtry, nTree = ..1)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 2
##
## OOB estimate of error rate: 0.76%
## Confusion matrix:
## A B C D E class.error
## A 3902 3 1 0 0 0.001024066
## B 15 2636 6 1 0 0.008276900
## C 0 16 2376 4 0 0.008347245
## D 0 0 51 2200 1 0.023090586
## E 0 0 1 6 2518 0.002772277
The OOB error estimate is given in the output as OOB estimate of error rate: 0.76%
. This is computed by finding the probability that any given prediction is not correct within the test data. Fortunately, all we need for this is the confusion matrix of the model.
\begin{equation}
1 - accuracy \
1 - \frac{correct\ predictions}{all\ predictions} \
1 - \frac{3902 + 2636 + 2376 + 2200 + 2518}{3902 + 15 + 3 + 2636 + 16 + 1 + 6 + 2376 + 51 + 1 + 1 + 4 + 2200 + 6 + 1 + 2518} \
1 - \frac{13632}{13737} \
1 - 0.9924 \
0.0076
\end{equation}
This is incredibly easy to compute in R with the sum
and diag
functions.
computeOOBErrEst <- function (x)
{
cm <- x$confusion
cm <- cm[, -ncol(cm)]
1 - sum(diag(cm)) / sum(cm)
}
Plugging in our final model gives us the following result.
computeOOBErrEst(my_model$finalModel)
## 0.00764359
This value is the 0.76% that we saw when we first printed our model output.