Bias and variance are inherent properties of estimators and we usually have to select learning algorithms and hyperparameters so that both bias and variance are as low as possible (see Bias-variance dilemma).

Another way to reduce the variance of a model is to use more training data.

We see that the first estimator can at best provide only a poor fit to the samples and the true function because it is too simple (high bias), the second estimator approximates it almost perfectly and the last estimator approximates the training data perfectly but does not fit the true function very well, i.e.

it is very sensitive to varying training data (high variance).

Note that if we optimized the hyperparameters based on a validation score the validation score is biased and not a good estimate of the generalization any longer. We will probably have to use an estimator or a parametrization of the current estimator that can learn more complex concepts (i.e. If the training score is much greater than the validation score for the maximum number of training samples, adding more training samples will most likely increase generalization. In the following plot you can see that the SVM could benefit from more training examples. However, you should only collect more training data if the true function is too complex to be approximated by an estimator with a lower variance. In the simple one-dimensional problem that we have seen in the example it is easy to see whether the estimator suffers from bias or variance. Its generalization error can be decomposed in terms of bias, variance and noise. The bias of an estimator is its average error for different training sets.


