Solving System of linear equations

Introduction

A linear system of equations is a collection of two or more linear equations involving the same set of variables. The solution to a linear system is the set of values for the variables that simultaneously satisfy all equations in the system.

For example, this is a linear system of equations with two variables, \(x\) and \(y\): \(\begin{cases} 2x + 3y = 5 \\ 4x - y = 1 \end{cases}\)

A general linear system of equations can be written as:

\[\begin{cases} a_{11}x_1 + a_{12}x_2 + \cdots + a_{1n}x_n = b_1 \\ a_{21}x_1 + a_{22}x_2 + \cdots + a_{2n}x_n = b_2 \\ \vdots \\ a_{m1}x_1 + a_{m2}x_2 + \cdots + a_{mn}x_n = b_m \end{cases}\]

This system can be viewed as a matrix equation \(\mathbf{A} \mathbf{x} = \mathbf{b}\), where:

\[\mathbf{A} = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix}, \quad \mathbf{x} = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix}, \quad \mathbf{b} = \begin{pmatrix} b_1 \\ b_2 \\ \vdots \\ b_m \end{pmatrix}\]

For a better understanding, we urge you to read about vectors and matrices in linear algebra.

In this form, the linear system can be interpreted as a linear combination of vectors. Each row of the matrix \(\mathbf{A}\) represents the coefficients of a linear equation, and the vector \(\mathbf{x}\) contains the variables. The right-hand side vector \(\mathbf{b}\) represents the constants. Solving the system involves finding the vector \(\mathbf{x}\) such that when the matrix \(\mathbf{A}\) is multiplied by \(\mathbf{x}\), the result is the vector \(\mathbf{b}\).

Existence and Uniqueness of Solution

For linear systems, the existence and uniqueness of solutions are well-defined and can be analyzed using linear algebra techniques.

The system \( Ax = b \) has a solution if and only if the augmented matrix \([A | b]\) has the same rank as the coefficient matrix \(A\).
If \(\text{rank}(A) = \text{rank}([A | b])\), the system is consistent (has at least one solution).
If \(\text{rank}(A) = \text{rank}([A | b]) < n\) (where \(n\) is the number of variables), the system has infinitely many solutions.
If \(\text{rank}(A) < \text{rank}([A | b])\), the system is inconsistent (has no solutions).
For a square matrix \(A\), if \(\det(A) \ne 0\) (i.e., \(A\) is invertible), the system has a unique solution.
If \(\det(A) = 0\), the system may have no solutions or infinitely many solutions.

Matrix Inversion

A naive solution to a linear system of equations \(\mathbf{A} \mathbf{x} = \mathbf{b}\) is to find the inverse of the matrix \(\mathbf{A}\) (denoted as \(\mathbf{A}^{-1}\)) and then multiply it with the vector \(\mathbf{b}\) to obtain the solution vector \(\mathbf{x}\). Mathematically, this can be expressed as:

\[\mathbf{x} = \mathbf{A}^{-1} \mathbf{b}\]

This approach requires that the matrix \(\mathbf{A}\) is square (i.e., it has the same number of rows and columns) and invertible (i.e., its determinant is non-zero). The process involves two main steps:

Compute the inverse of \(\mathbf{A}\): \(\mathbf{A}^{-1}\).
Multiply \(\mathbf{A}^{-1}\) by the vector \(\mathbf{b}\).

Here's the matrix equation and its solution in detail:

\[\mathbf{A} \mathbf{x} = \mathbf{b} \quad \Rightarrow \quad \mathbf{x} = \mathbf{A}^{-1} \mathbf{b}\]

It is important to note that this method is not always practical for large systems or systems where \(\mathbf{A}\) is not square. In fact, if this system's size is big, computing its inverse is computationally expensive. In such cases, other numerical methods are preferred.

To compute the inverse of a matrix with python, the following example might be helpful:

import numpy as np
from scipy import linalg

a = np.array([[1., 2.], [3., 4.]])
inverse_of_a = linalg.inv(a)

Triangular matrix

A triangular matrix is a special type of square matrix where all the entries above or below the main diagonal are zero.

Types of Triangular Matrices:

Upper Triangular Matrix: All the entries below the main diagonal are zero. \(U = \left[\begin{array}{cccc} u_{11} & u_{12} & \cdots & u_{1n} \\ 0 & u_{22} & \cdots & u_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & u_{nn} \end{array}\right]\)
Lower Triangular Matrix: All the entries above the main diagonal are zero. \(L = \left[\begin{array}{cccc} l_{11} & 0 & \cdots & 0 \\ l_{21} & l_{22} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ l_{n1} & l_{n2} & \cdots & l_{nn} \end{array}\right]\)

Solving a system of linear equations \(A\mathbf{x} = \mathbf{b}\) is significantly simplified when \(A\) is a triangular matrix because the equations can be solved sequentially.

For an upper triangular matrix \(U\), the system can be solved using back substitution:

Given: \(U\mathbf{x} = \mathbf{b}\)

Start with the last equation \(u_{nn}x_n = b_n\) to solve for \(x_n\): \(x_n = \frac{b_n}{u_{nn}}\)
Substitute \(x_n\) into the second-to-last equation to solve for \(x_{n-1}\): \(u_{n-1,n-1}x_{n-1} + u_{n-1,n}x_n = b_{n-1}\) \(x_{n-1} = \frac{b_{n-1} - u_{n-1,n}x_n}{u_{n-1,n-1}}\)
Continue this process upwards to solve for \(x_{n-2}, x_{n-3}, \ldots, x_1\).

Thus, when \(A\) is a triangular matrix, the system \(A\mathbf{x} = \mathbf{b}\) can be efficiently solved by back substitution (for upper triangular matrices) or forward substitution (for lower triangular matrices).

If your matrix is triangular, you can compute the solution of the linear system using solve_triangular:

import numpy as np
from scipy.linalg import solve_triangular

a = np.array([[3, 0, 0, 0], [2, 1, 0, 0], [1, 0, 1, 0], [1, 1, 1, 1]])
b = np.array([4, 2, 4, 2])

x = solve_triangular(a, b, lower=True)

Many numerical methods aim to transform a linear system into an equivalent system with a triangular coefficient matrix. This triangular form allows for efficient solution using back substitution.

Gaussian Elimination

Gaussian elimination is a method used to solve systems of linear equations. It transforms a system into an upper triangular matrix form, which can then be easily solved by back substitution. Here’s a step-by-step breakdown of how the Gaussian elimination algorithm works:

Represent the System as an Augmented Matrix:

Consider a system of linear equations:
\[\begin{cases} a_{11}x_1 + a_{12}x_2 + \cdots + a_{1n}x_n = b_1 \\ a_{21}x_1 + a_{22}x_2 + \cdots + a_{2n}x_n = b_2 \\ \vdots \\ a_{m1}x_1 + a_{m2}x_2 + \cdots + a_{mn}x_n = b_m \\ \end{cases}\]
This can be written as an augmented matrix:
\[\left[\begin{array}{cccc|c} a_{11} & a_{12} & \cdots & a_{1n} & b_1 \\ a_{21} & a_{22} & \cdots & a_{2n} & b_2 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} & b_m \end{array}\right]\]
Forward Elimination

The goal of forward elimination is to transform the augmented matrix into an upper triangular form.

Step 2.1: Partial Pivoting (Optional but Recommended) To improve numerical stability, swap rows so that the largest (by absolute value) element in the column is placed in the pivot position.

Step 2.2: Eliminate Entries Below the Pivot:

For each row \(i\) (starting from the first row):
- Identify the pivot element, \(a_{ii}\).
- For each subsequent row \(j > i\):
  - Compute the factor \(f = \frac{a_{ji}}{a_{ii}}\).
  - Update row \(j\): \(\text{Row}_j \leftarrow \text{Row}_j - f \times \text{Row}_i\).
This process zeroes out the elements below the pivot.
Back Substitution.

Once the matrix is in upper triangular form, back substitution is used to find the values of the variables.

Example

Consider the system of equations:

\[\begin{cases} 2x + 3y = 8 \\ 4x + 5y = 14 \end{cases}\]

Augmented Matrix:
\[\left[\begin{array}{cc|c} 2 & 3 & 8 \\ 4 & 5 & 14 \end{array}\right]\]
Forward Elimination:
- Pivot element: 2 (first row, first column).
- Eliminate element in the second row, first column: \(f = \frac{4}{2} = 2\). Update second row: \(\text{Row}_2 = \text{Row}_2 - 2 \times \text{Row}_1 = [4, 5, 14] - 2 \times [2, 3, 8] = [0, -1, -2]\)
The matrix becomes: \(\left[\begin{array}{cc|c} 2 & 3 & 8 \\ 0 & -1 & -2 \end{array}\right]\)
Back Substitution:
- From the second row: \(-y = -2 \Rightarrow y = 2\).
- Substitute \(y = 2\) into the first row: \(2x + 3(2) = 8 \Rightarrow 2x + 6 = 8 \Rightarrow 2x = 2 \Rightarrow x = 1\)

The solution is \(x = 1\), \(y = 2\).

Gaussian elimination has a time complexity of \(O(n^3)\), where \(n\) is the number of equations. For very large systems, this can be computationally expensive.

LU Decomposition

LU decomposition is a method of decomposing a matrix \(A\) into the product of a lower triangular matrix \(L\) and an upper triangular matrix \(U\).

Given a matrix \(A\), decompose it into \(A = LU\), where:

\(L\) is a lower triangular matrix with ones on the diagonal.
\(U\) is an upper triangular matrix.

Cholesky decomposition

The Cholesky decomposition of a Hermitian positive-definite matrix A, is a decomposition of the form

\[A = L \times L^*\]

where L is a lower triangular matrix with real and positive diagonal entries, and \(L^*\) denotes the conjugate transpose of L. Every Hermitian positive-definite matrix (and thus also every real-valued symmetric positive-definite matrix) has a unique Cholesky decomposition.

A closely related variant of the classical Cholesky decomposition is the LDL decomposition,

\[A = L \times D \times L^*\]

where L is a lower unit triangular (unitriangular) matrix, and D is a diagonal matrix. That is, the diagonal elements of L are required to be 1 at the cost of introducing an additional diagonal matrix D in the decomposition.

Read more about Cholesky decomposition here.

If A is symmetric and positive definite, then \(A x = b\) can be solved by first computing the Cholesky decomposition \(A = L \times L^*\), then solving \(L y = b\) for y by forward substitution, and finally solving \(L x = y\) for x by back substitution.

To compute the Cholesky decomposition of a Hermitian positive-definite, use the function cholesky from scipy.linalg:

import numpy as np
from scipy.linalg import cholesky

a = np.array([[1,-2j],[2j,5]])
L = cholesky(a, lower=True)

For linear systems that can be put into symmetric form, the Cholesky decomposition (or its LDL variant) is the method of choice, for superior efficiency and numerical stability. Compared to the LU decomposition, it is roughly twice as efficient.

QR decompostion

If A is not square, we use QR decomposition. It’s a general decomposition that can be used for both square and non-square matrices. QR decomposition, also known as a QR factorization or QU factorization, is a decomposition of a matrix A into a product \(A = QR\) of an orthonormal matrix Q and an upper triangular matrix R.

There are several methods for actually computing the QR decomposition, such as the Gram-Schmidt process or Householder transformations.

Iterative methods

Iterative algorithms for solving \(Ax = b\) are employed when methods like Gaussian elimination are too time-consuming or require excessive memory. Unlike direct methods, iterative methods do not typically provide an exact solution after a finite number of steps; instead, they reduce the error incrementally with each step.

The convergence rate of iterative methods is influenced by the spectral properties of the coefficient matrix. Therefore, transforming the linear system into an equivalent one with more favorable spectral properties can be beneficial. A preconditioner is a matrix used to perform such a transformation.

For example, if a matrix approximates the coefficient matrix in some manner, the transformed system

\[M^{-1} A x = M^{-1} b\]

has the same solution as the original system \(Ax = b\), but the spectral properties of the coefficient matrix \(M^{-1} A\) may be more favorable. .

This part was highly inspired by this blog.

Jacobi preconditioner

The Jacobi preconditioner \(M\) is typically a diagonal matrix constructed from the diagonal elements of \(A\). Specifically, \(M=diag(A)\), where \(diag(A)\) denotes the diagonal elements of \(A\).

Gauss-Seidel Method

The Gauss-Seidel method for solving \(n\) linear equations begins with an initial approximation \(X_0\) to the solution, which does not need to be highly accurate. The method iteratively refines this approximation until convergence, where convergence is determined by \(\| x^{(m+1)} - x^{(m)} \|_\infty\) being less than \(\epsilon\).

Algorithm:

Set \(x = X_0\)
Repeat
- Set \(maxdiff = 0\)
- For \(i = 1\) to \(n\) do
  - Calculate \(y = (b_i - \sum_{\substack{j=1 \\ j \neq i}}^{n} a_{ij} x_j ) / a_{ii}\)
  - If \(\|y - x_i\| > maxdiff\) then
    - Set \(maxdiff = \|y - x_i\|\)
  - Update \(x_i = y\)
- End for
Until \(maxdiff < \epsilon\)

\(x\) is the refined solution.

This section was stated in this book: Theory and Applications of Numerical Analysis by G.M. PHILLIPS and P.J. TAYLOR (page 289)