Regression is a set of methods for estimating the relationships between a outcome variable and features.

When our input consist of d features, we express our prediction $\hat{y}$ as $$\hat{y} = w_1 x_1 + ... + w_d x_d + b$$

Collecting all features into a vector $\mathbf{x} \in \mathbb{R}^d$ and all weights into a vector $\mathbf{w} \in \mathbb{R}^d$, we can express our model compacity using a dot product: $$ \hat{y} = \mathbf{w}^T\mathbf{x} +b$$

The vector $\mathbb{x}$ corresponds to features of a single data example. To represent tho whole dataset we use $\mathbf{X} \in \mathbb{R}^{n\times d}$. Here $\mathbf{X}$ contains one row for every example and one column for every feature. The predictions $\mathbf{\hat{y}} \in \mathbb{R}^n$ can be expressed as: $$ \mathbf{\hat{y}} = \mathbf{Xw} +b$$

The Loss function quantifies the distance between the real and predicted value of the target. $$ (y - \hat{y})^2 = \|\mathbf{y} - \mathbf{X}\mathbf{w}\|^2 $$

We need to find $\mathbf{w}$ to minimize the loss function. The analytic solution is: $$ (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X^T}\mathbf{y}$$

import tensorflow as tf
import random


def synthetic_data(w, b, num_examples):  
    """Generate y = Xw + b + noise."""
    X = tf.zeros((num_examples, w.shape[0]))
    X += tf.random.normal(shape=X.shape)
    y = tf.matmul(X, tf.reshape(w, (-1, 1))) + b
    y += tf.random.normal(shape=y.shape, stddev=0.01)
    y = tf.reshape(y, (-1, 1))
    return X, y

true_w = tf.constant([2, -3.4])
true_b = 4.2
X, y = synthetic_data(true_w, true_b, 1000)

def solution(X, y):
    X = tf.concat([X,tf.ones(y.shape)],1)
    A = tf.linalg.inv(tf.matmul(tf.transpose(X),X))
    B = tf.matmul(tf.transpose(X),y)
    return tf.matmul(A,B)

solution(X,y)

<tf.Tensor: shape=(3, 1), dtype=float32, numpy=
array([[ 1.9995857],
       [-3.4000173],
       [ 4.2007246]], dtype=float32)>