In the last two articles, we introduced a variety of common evaluation methods and performance measures so that we can select the most appropriate one to calculate the learner’s test error based on the characteristics of the data set and model task.
Howerver, the test error is affected by many factors such as the randomness of the algorithm (e.g. K-Means) and the diffirence between test sets, which makes the same model get different results each time. Also the test error is an approximation of the generalization error instead of the true generalization performance of the learners.
So how to compare the performance measures of single or multiple learners on different/same test set? And that is the comparative test. Final bias and variance is an important tool to explain learner generalization performance. This post continues from the previous post and focuses on comparison tests, variance and bias.