In
mathematics , a block matrix pseudoinverse is a formula for the
pseudoinverse of a
partitioned matrix . This is useful for decomposing or approximating many algorithms updating parameters in
signal processing , which are based on the
least squares method.
Consider a column-wise partitioned matrix:
A
B
,
A
∈
R
m
×
n
,
B
∈
R
m
×
p
,
m
≥
n
+
p
.
{\displaystyle {\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}},\quad \mathbf {A} \in \mathbb {R} ^{m\times n},\quad \mathbf {B} \in \mathbb {R} ^{m\times p},\quad m\geq n+p.}
If the above matrix is full column rank, the
Moore–Penrose inverse matrices of it and its transpose are
A
B
+
=
(
A
B
T
A
B
)
−
1
A
B
T
,
A
T
B
T
+
=
A
B
(
A
B
T
A
B
)
−
1
.
{\displaystyle {\begin{aligned}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{+}&=\left({\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{\textsf {T}}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}\right)^{-1}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{\textsf {T}},\\{\begin{bmatrix}\mathbf {A} ^{\textsf {T}}\\\mathbf {B} ^{\textsf {T}}\end{bmatrix}}^{+}&={\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}\left({\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{\textsf {T}}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}\right)^{-1}.\end{aligned}}}
This computation of the pseudoinverse requires (n + p )-square matrix inversion and does not take advantage of the block form.
To reduce computational costs to n - and p -square matrix inversions and to introduce parallelism, treating the blocks separately, one derives
[1]
A
B
+
=
P
B
⊥
A
(
A
T
P
B
⊥
A
)
−
1
P
A
⊥
B
(
B
T
P
A
⊥
B
)
−
1
=
(
P
B
⊥
A
)
+
(
P
A
⊥
B
)
+
,
A
T
B
T
+
=
P
B
⊥
A
(
A
T
P
B
⊥
A
)
−
1
,
P
A
⊥
B
(
B
T
P
A
⊥
B
)
−
1
=
(
A
T
P
B
⊥
)
+
(
B
T
P
A
⊥
)
+
,
{\displaystyle {\begin{aligned}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{+}&={\begin{bmatrix}\mathbf {P} _{B}^{\perp }\mathbf {A} \left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{-1}\\\mathbf {P} _{A}^{\perp }\mathbf {B} \left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{-1}\end{bmatrix}}={\begin{bmatrix}\left(\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{+}\\\left(\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{+}\end{bmatrix}},\\{\begin{bmatrix}\mathbf {A} ^{\textsf {T}}\\\mathbf {B} ^{\textsf {T}}\end{bmatrix}}^{+}&={\begin{bmatrix}\mathbf {P} _{B}^{\perp }\mathbf {A} \left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{-1},\quad \mathbf {P} _{A}^{\perp }\mathbf {B} \left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{-1}\end{bmatrix}}={\begin{bmatrix}\left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\right)^{+}&\left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\right)^{+}\end{bmatrix}},\end{aligned}}}
where
orthogonal projection matrices are defined by
P
A
⊥
=
I
−
A
(
A
T
A
)
−
1
A
T
,
P
B
⊥
=
I
−
B
(
B
T
B
)
−
1
B
T
.
{\displaystyle {\begin{aligned}\mathbf {P} _{A}^{\perp }&=\mathbf {I} -\mathbf {A} \left(\mathbf {A} ^{\textsf {T}}\mathbf {A} \right)^{-1}\mathbf {A} ^{\textsf {T}},\\\mathbf {P} _{B}^{\perp }&=\mathbf {I} -\mathbf {B} \left(\mathbf {B} ^{\textsf {T}}\mathbf {B} \right)^{-1}\mathbf {B} ^{\textsf {T}}.\end{aligned}}}
The above formulas are not necessarily valid if
A
B
{\displaystyle {\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}}
does not have full rank – for example, if
A
≠
0
{\displaystyle \mathbf {A} \neq 0}
, then
A
A
+
=
1
2
A
+
A
+
≠
(
P
A
⊥
A
)
+
(
P
A
⊥
A
)
+
=
0
{\displaystyle {\begin{bmatrix}\mathbf {A} &\mathbf {A} \end{bmatrix}}^{+}={\frac {1}{2}}{\begin{bmatrix}\mathbf {A} ^{+}\\\mathbf {A} ^{+}\end{bmatrix}}\neq {\begin{bmatrix}\left(\mathbf {P} _{A}^{\perp }\mathbf {A} \right)^{+}\\\left(\mathbf {P} _{A}^{\perp }\mathbf {A} \right)^{+}\end{bmatrix}}=0}
Application to least squares problems
Given the same matrices as above, we consider the following least squares problems, which
appear as multiple objective optimizations or constrained problems in signal processing.
Eventually, we can implement a parallel algorithm for least squares based on the following results.
Column-wise partitioning in over-determined least squares
Suppose a solution
x
=
x
1
x
2
{\displaystyle \mathbf {x} ={\begin{bmatrix}\mathbf {x} _{1}\\\mathbf {x} _{2}\\\end{bmatrix}}}
solves an over-determined system:
A
,
B
x
1
x
2
=
d
,
d
∈
R
m
×
1
.
{\displaystyle {\begin{bmatrix}\mathbf {A} ,&\mathbf {B} \end{bmatrix}}{\begin{bmatrix}\mathbf {x} _{1}\\\mathbf {x} _{2}\\\end{bmatrix}}=\mathbf {d} ,\quad \mathbf {d} \in \mathbb {R} ^{m\times 1}.}
Using the block matrix pseudoinverse, we have
x
=
A
,
B
+
d
=
(
P
B
⊥
A
)
+
(
P
A
⊥
B
)
+
d
.
{\displaystyle \mathbf {x} ={\begin{bmatrix}\mathbf {A} ,&\mathbf {B} \end{bmatrix}}^{+}\,\mathbf {d} ={\begin{bmatrix}\left(\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{+}\\\left(\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{+}\end{bmatrix}}\mathbf {d} .}
Therefore, we have a decomposed solution:
x
1
=
(
P
B
⊥
A
)
+
d
,
x
2
=
(
P
A
⊥
B
)
+
d
.
{\displaystyle \mathbf {x} _{1}=\left(\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{+}\,\mathbf {d} ,\quad \mathbf {x} _{2}=\left(\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{+}\,\mathbf {d} .}
Row-wise partitioning in under-determined least squares
Suppose a solution
x
{\displaystyle \mathbf {x} }
solves an under-determined system:
A
T
B
T
x
=
e
f
,
e
∈
R
n
×
1
,
f
∈
R
p
×
1
.
{\displaystyle {\begin{bmatrix}\mathbf {A} ^{\textsf {T}}\\\mathbf {B} ^{\textsf {T}}\end{bmatrix}}\mathbf {x} ={\begin{bmatrix}\mathbf {e} \\\mathbf {f} \end{bmatrix}},\quad \mathbf {e} \in \mathbb {R} ^{n\times 1},\quad \mathbf {f} \in \mathbb {R} ^{p\times 1}.}
The minimum-norm solution is given by
x
=
A
T
B
T
+
e
f
.
{\displaystyle \mathbf {x} ={\begin{bmatrix}\mathbf {A} ^{\textsf {T}}\\\mathbf {B} ^{\textsf {T}}\end{bmatrix}}^{+}\,{\begin{bmatrix}\mathbf {e} \\\mathbf {f} \end{bmatrix}}.}
Using the block matrix pseudoinverse, we have
x
=
(
A
T
P
B
⊥
)
+
(
B
T
P
A
⊥
)
+
e
f
=
(
A
T
P
B
⊥
)
+
e
+
(
B
T
P
A
⊥
)
+
f
.
{\displaystyle \mathbf {x} ={\begin{bmatrix}\left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\right)^{+}&\left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\right)^{+}\end{bmatrix}}{\begin{bmatrix}\mathbf {e} \\\mathbf {f} \end{bmatrix}}=\left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\right)^{+}\,\mathbf {e} +\left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\right)^{+}\,\mathbf {f} .}
Instead of
(
A
B
T
A
B
)
−
1
{\displaystyle \mathbf {\left({\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{\textsf {T}}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}\right)} ^{-1}}
, we need to calculate directly or indirectly[
citation needed ] [
original research? ]
(
A
T
A
)
−
1
,
(
B
T
B
)
−
1
,
(
A
T
P
B
⊥
A
)
−
1
,
(
B
T
P
A
⊥
B
)
−
1
.
{\displaystyle \left(\mathbf {A} ^{\textsf {T}}\mathbf {A} \right)^{-1},\quad \left(\mathbf {B} ^{\textsf {T}}\mathbf {B} \right)^{-1},\quad \left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{-1},\quad \left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{-1}.}
In a dense and small system, we can use
singular value decomposition ,
QR decomposition , or
Cholesky decomposition to replace the matrix inversions with numerical routines. In a large system, we may employ
iterative methods such as Krylov subspace methods.
Considering
parallel algorithms , we can compute
(
A
T
A
)
−
1
{\displaystyle \left(\mathbf {A} ^{\textsf {T}}\mathbf {A} \right)^{-1}}
and
(
B
T
B
)
−
1
{\displaystyle \left(\mathbf {B} ^{\textsf {T}}\mathbf {B} \right)^{-1}}
in parallel. Then, we finish to compute
(
A
T
P
B
⊥
A
)
−
1
{\displaystyle \left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{-1}}
and
(
B
T
P
A
⊥
B
)
−
1
{\displaystyle \left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{-1}}
also in parallel.
Key concepts Problems Hardware Software