What are variance and bias ?
Variance and bias are assertions to evaluate our model based on different datasets. Both of them happen together and we have to find the methodology to trade-off them. A good model is a small variance and bias model.
In wikipedia, we have definitions of variance and bias, formally:
The bias is an error from erroneous assumptions in the learning algorithm.
The variance is an error from sensitivity to small fluctuations in the training set.
(source: https://en.wikipedia.org/wiki/Bias–variance_tradeoff)
However, I want to make them more normal and practical, so we could understand them as bellowing definitions:
In wikipedia, we have definitions of variance and bias, formally:
The bias is an error from erroneous assumptions in the learning algorithm.
The variance is an error from sensitivity to small fluctuations in the training set.
(source: https://en.wikipedia.org/wiki/Bias–variance_tradeoff)
However, I want to make them more normal and practical, so we could understand them as bellowing definitions:
Variance asserts fluctuation of precision of model against training data, test data with real data(new data). One model has a hight precision in training and test phase but has low precision with real data, It means this model has hight variance. Hight variance also is called as overfitting: Too good for fitting model but bad for fitting real data.
Bias asserts fluctuation of precision of model against training data and evaluation data. One model has high precision in training data but has low precision in evaluation data, it means it has hight bias.
Hight bias also is called under-fitting: Too bad for fitting model in training phase.
Where do they come from ?
Now we will explore where variance and bias come from. For notational convenience, abbreviate h = h(x). First, note that for any random variable X, we have:
$Var(X) = E[(X - \mu)^2] = E[X^2] - E[X]^2$
Rearrange, we get: $E[X^2] = E[(X-\mu)^2] + E[X]^2$
Since the h function is deterministic: E[h] = h
Assume, we have true function estimation that give us:
$y = h(x) + \epsilon$ with $epsilon \sim (0, \sigma^2)$
Implies: $E[y] = E[h + \epsilon] = E[h] = h$
Estimate for y: $ \hat{y}$
We use Expectation of mean squared error to calculate the error of estimation:
$E[(y - \hat{y})^2] = E[y^2] - 2E[y*\hat{y}] + E[\hat{y}^2]$
$=Var(y) + E[y]^2 - 2E[y * \hat{y}] + Var(\hat{y}) + E[\hat{y}]^2 $
$= Var(y) + Var(\hat{y}) + h^2 - 2hE[\hat{y}] + E[\hat{y}]^2$
$ = Var(y) + Var(\hat{y}) + E[(h-\hat{y})^2]$
$ = \sigma^2 + Var(\hat{y}) + bias^2$
As you can see, $ Var(\hat{y})$ and $bias^2$ related to $ \hat{y} $, the estimation of $y$.
$ Var(\hat{y})$ is variance which we want to find out.
$bias^2$ is square of bias which we want to find out.
Rearrange, we get: $E[X^2] = E[(X-\mu)^2] + E[X]^2$
Since the h function is deterministic: E[h] = h
Assume, we have true function estimation that give us:
$y = h(x) + \epsilon$ with $epsilon \sim (0, \sigma^2)$
Implies: $E[y] = E[h + \epsilon] = E[h] = h$
Estimate for y: $ \hat{y}$
We use Expectation of mean squared error to calculate the error of estimation:
$E[(y - \hat{y})^2] = E[y^2] - 2E[y*\hat{y}] + E[\hat{y}^2]$
$=Var(y) + E[y]^2 - 2E[y * \hat{y}] + Var(\hat{y}) + E[\hat{y}]^2 $
$= Var(y) + Var(\hat{y}) + h^2 - 2hE[\hat{y}] + E[\hat{y}]^2$
$ = Var(y) + Var(\hat{y}) + E[(h-\hat{y})^2]$
$ = \sigma^2 + Var(\hat{y}) + bias^2$
As you can see, $ Var(\hat{y})$ and $bias^2$ related to $ \hat{y} $, the estimation of $y$.
$ Var(\hat{y})$ is variance which we want to find out.
$bias^2$ is square of bias which we want to find out.
How to trade off
As we know, variance and bias occur together. Additionally, they occur in opposite way. It means, when variance hight, bias is low and otherwise. We have to balance both of them. The common way is finding the intersection point of variance and bias based on complexity of model.
Another way is using regularizations. The common regularized techniques are Lasso and Ridge that we can explore in a new topic.we also can use some tips to trade-off variance and bias:
- Collect more training data (fix variance)
- Add more features (fix variance)
- Remove redundant features (fix bias)
- Add polynomial features (fix bias)
- Increase lambda (fix variance)
- Decrease lambda (fix bias)
- Collect more training data (fix variance)
- Add more features (fix variance)
- Remove redundant features (fix bias)
- Add polynomial features (fix bias)
- Increase lambda (fix variance)
- Decrease lambda (fix bias)
Comments