Note
Go to the end to download the full example code
Linear regression#
This example begins with linear regression in the real domain and then builds up to show how linear problems can be thought of in the complex domain.
The Wikipedia entry is a useful starting point for those unfamiliar with linear regression.
Useful references:
Begin the example by importing packages/functions and switching off the logger to avoid superfluous messages.
from loguru import logger
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from regressioninc.linear.models import add_intercept, OLS
from regressioninc.testing.complex import ComplexGrid
logger.remove()
One of the most straightforward linear problems to understand is the equation of a line. Let’s look at a line with gradient 3 and intercept -2.
The above code is essentially doing the following:
where \(m = 3\) and \(c = -2\) with the regressor going from -5 to 4 in steps of 1.
Let’s have a quick look at the values, noting that there is only a single regressor, which is in the first column (index 0) of X. The variable name X represents the values of one or more regressors and is usually referred to as the plural regressors.
Regressors X [-5 -4 -3 -2 -1 0 1 2 3 4]
Regrassand y [-17 -14 -11 -8 -5 -2 1 4 7 10]
And now plot the relationship between the regressors X and regrassand y.
fig = plt.figure()
plt.scatter(X[:, 0], y)
plt.xlabel("Independent variable")
plt.ylabel("Dependent variable")
plt.tight_layout()
fig.show()

When performing linear regression, the aim is to:
calculate the parameters (also called coefficients)
given the regressors X (values of the independent variable)
and values of the regrassand y (values of the dependent variable)
This can be done with linear regression, and the most common method of linear regression is least squares, which aims to estimate the parameters whilst minimising the squared misfit between the regrassands and predicted regrassands calculated using the estimated parameters.
Before solving, let’s add a row of 1s to our regressors X. This is to make sure a constant intercept is also solved for.
[[-5 -4 -3 -2 -1 0 1 2 3 4]
[ 1 1 1 1 1 1 1 1 1 1]]
Now use ordinary least squares to estimate the parameters.
model = OLS()
model.fit(X, y)
print(model.estimate.params)
[ 3. -2.]
Least squares was able to correctly calculate the slope and intercept for the real-valued regression problem. Let’s look at the predicted regrassands using the estimated parameters.
[-17. -14. -11. -8. -5. -2. 1. 4. 7. 10.]
It is also possible to have linear problems in the complex domain. These commonly occur in signal processing problems. Let’s define parameters and regressors X and generate the corresponding regrassand y for an example problem. For the time being, there is no intercept.
Let’s have a quick look at the values, again noting that there is only a single regressor.
Regressors X [1.+1.j 2.+1.j 3.+1.j 1.+2.j 2.+2.j 3.+2.j]
Regrassand y [-1. +5.j 1. +8.j 3.+11.j -4. +7.j -2.+10.j 0.+13.j]
It is a bit harder to visualise the complex-valued version, but let’s try and visualise the regressors X and regrassands y.
fig, axs = plt.subplots(nrows=1, ncols=2)
plt.sca(axs[0])
plt.scatter(X.real, X.imag, c="tab:blue")
plt.xlim(X.real.min() - 3, X.real.max() + 3)
plt.ylim(X.imag.min() - 3, X.imag.max() + 3)
plt.title("Regressors X")
plt.sca(axs[1])
plt.scatter(y.real, y.imag, c="tab:red")
plt.xlim(y.real.min() - 3, y.real.max() + 3)
plt.ylim(y.imag.min() - 3, y.imag.max() + 3)
plt.title("Regrassand y")
plt.show()

Visualsing the regressors X and the regrassand y this way gives a geometric indication of the linear problem in the complex domain. Multiplying the regressors by the parameters can be considered like a scaling and a rotation of the independent variables to give the dependent variables y.
With more samples, this can be a bit easier to visualise. In the below example, regressors and the regresand are generated again, this time with more samples. To start off with, the parameter is a real number to demonstrate the scaling without any rotation. Both the regressors and regrassand are plotted on the same axis with lines to show the mapping between independent and dependent values.
grid = ComplexGrid(r1=0, r2=10, nr=11, i1=-5, i2=5, ni=11)
X = grid.flat_grid()
params = np.array([0.5])
y = np.matmul(X, params)
fig = plt.figure()
for iobs in range(y.size):
plt.plot(
[y[iobs].real, X[iobs, 0].real],
[y[iobs].imag, X[iobs, 0].imag],
color="k",
lw=0.5,
)
plt.scatter(X.real, X.imag, c="tab:blue", label="Regressor")
plt.grid()
plt.title("Regressor X")
plt.scatter(y.real, y.imag, c="tab:red", label="Regrassand")
plt.grid()
plt.legend()
plt.title("Complex regression")
plt.show()

Now let’s add a complex component to the parameter (coefficient) to demonstrate the rotational aspect.
params = np.array([0.5 + 2j])
y = np.matmul(X, params)
fig = plt.figure()
for iobs in range(y.size):
plt.plot(
[y[iobs].real, X[iobs, 0].real],
[y[iobs].imag, X[iobs, 0].imag],
color="k",
lw=0.5,
)
plt.scatter(X.real, X.imag, c="tab:blue", label="Regressor")
plt.grid()
plt.title("Regressors X")
plt.scatter(y.real, y.imag, c="tab:red", label="Regrassand")
plt.grid()
plt.legend()
plt.title("Complex regression")
plt.show()

Finally, adding an intercept gives a translation.
params = np.array([0.5 + 2j])
intercept = 20 + 20j
y = np.matmul(X, params) + intercept
fig = plt.figure()
for iobs in range(y.size):
plt.plot(
[y[iobs].real, X[iobs, 0].real],
[y[iobs].imag, X[iobs, 0].imag],
color="k",
lw=0.3,
)
plt.scatter(X.real, X.imag, c="tab:blue", label="Regressor")
plt.grid()
plt.title("Regressors X")
plt.scatter(y.real, y.imag, c="tab:red", label="Regrassand")
plt.grid()
plt.legend()
plt.title("Complex regression")
plt.show()

Similar to the real-valued problem, linear regression can be used to estimate the values of the parameters for the complex-valued problem. Again, least squares is one of the most common methods of linear regression. However, not all least squares algorithms support complex data, though some do such as the least squares in Scipy. The focus of RegressionInC is to provide regression methods for complex-valued data.
Note that adding an intercept column to X allows for solving of the intercept. RegressionInC does not automatically solve for the intercept and if desired, an intercept column needs to be added to the regressors X, similar to the real-valued example shown at the top of the page.
X = add_intercept(X)
model = OLS()
model.fit(X, y)
print(model.estimate.params)
[ 0.5 +2.j 20. +20.j]
Finally, let’s compare the actual regressand y to the predicted regrassand calculated from the regressors X and the estimated parameters.
Regressor X parameter intercept regressand y predicted y
0 0.0-5.0j 0.5+2.0j 20.0+20.0j 30.0+17.5j 30.0+17.5j
1 0.0-4.0j 0.5+2.0j 20.0+20.0j 28.0+18.0j 28.0+18.0j
2 0.0-3.0j 0.5+2.0j 20.0+20.0j 26.0+18.5j 26.0+18.5j
3 0.0-2.0j 0.5+2.0j 20.0+20.0j 24.0+19.0j 24.0+19.0j
4 0.0-1.0j 0.5+2.0j 20.0+20.0j 22.0+19.5j 22.0+19.5j
.. ... ... ... ... ...
116 10.0+1.0j 0.5+2.0j 20.0+20.0j 23.0+40.5j 23.0+40.5j
117 10.0+2.0j 0.5+2.0j 20.0+20.0j 21.0+41.0j 21.0+41.0j
118 10.0+3.0j 0.5+2.0j 20.0+20.0j 19.0+41.5j 19.0+41.5j
119 10.0+4.0j 0.5+2.0j 20.0+20.0j 17.0+42.0j 17.0+42.0j
120 10.0+5.0j 0.5+2.0j 20.0+20.0j 15.0+42.5j 15.0+42.5j
[121 rows x 5 columns]
Total running time of the script: (0 minutes 6.201 seconds)