Error and overfitting

Error

The difference between the learner’s actual prediction of the sample and the true value of the sample is called error. Definitions:

  • Training Error or Empirical Error: Error in the training set

  • Test Error: Error in the test set

  • `Generalization Error: The learner’s error in all new samples

Overfitting

Apparently, we want learners perform well on the new sample which with small generalization errors. Therefore, the learners should be able to learn as many universal general characterisitics from the training set as possible, so as to make the correct discrimination when encountering new samples.

However, when learners learn the traing set too wel that take some of the training sample’s own characteristics as a general feature; there are also cases where the learning capacity is insufficient to learn the basic characteristics of the training set. Definitions:

  • Overfitting: Over-learning to the point of learning the not-so-generic characteristics included in the training sample

  • Underfitting: The learning ability is so poor that the general properties of the training sample have not been learned well

It is known that in the overfitting problem, the training error is very small, but the test error is large; in the underfitting problem, both the training error and the test error are large. Currently, the underfitting problem is relatively easy to overcome, such as increasing the number of iterations, but there is still no very good solution to the overfitting problem, and overfitting is a key obstacle to machine learning.

img

Previous
Next