Regression is a set of methods for estimating the relationships between a outcome variable
and features
.
When our input consist of d features, we express our prediction $\hat{y}$ as
Collecting all features into a vector $\mathbf{x} \in \mathbb{R}^d$ and all weights into a vector $\mathbf{w} \in \mathbb{R}^d$, we can express our model compacity using a dot product:
The vector $\mathbb{x}$ corresponds to features of a single data example. To represent tho whole dataset we use $\mathbf{X} \in \mathbb{R}^{n\times d}$. Here $\mathbf{X}$ contains one row for every example and one column for every feature. The predictions $\mathbf{\hat{y}} \in \mathbb{R}^n$ can be expressed as:
The Loss function
quantifies the distance between the real and predicted value of the target.
We need to find $\mathbf{w}$ to minimize the loss function. The analytic solution is: