Poisson Equation

Previously, we encountered the following iterates while calculating diffusion

\begin{equation} \left( \textbf{I} -\nu \delta t \nabla ^2\right) \textbf{u}(t+\delta t,\textbf{x})=\textbf{u}(t,\textbf{x}) \label{eq:poisson} \tag{1} \end{equation}

and the following Laplace equation in the pressure relation

\begin{equation} \Delta p(\textbf{x})=\textbf{0} \label{eq:laplace} \tag{2} \end{equation}

Using a discrete grid and the finite difference M matrix allows us to discretize Equations \eqref{eq:poisson} and \eqref{eq:laplace} as

\begin{equation} \left( \textbf{I} -\frac{\nu \delta t}{\delta x\delta y} \textbf{M}_u\right) \textbf{u}^h(t+\delta t,\textbf{x})=\textbf{u}^h(t,\textbf{x}) \label{eq:poisson_discrete} \end{equation}

\begin{equation} \frac{1}{\delta x\delta y} \textbf{M}_p p^h(\textbf{x})=\textbf{0} \label{eq:laplace_discrete} \end{equation}

whereas $$\textbf{u}^h, p^h$$ are now $$n\cdot m\cdot k$$ vectors representing all entries of the discretized grid. In our case, $$m=\frac{\text{height}}{\delta y}, k=\frac{\text{width}}{\delta x}$$ For the pressure Laplacian $$n=1$$. For the vector Laplacian we have $$n=2$$.

[TODO: Insert image of lexicographical discrete grid assignment]

These are two separate linear systems of equations with solutions

\begin{equation} u_{i,j} ^{(d+1)} =\frac{1}{4+\frac{\delta x\delta y}{\nu \delta t}}\left(u_{i-1,j}^{(d+1)} +u_{i+1,j}+u_{i,j-1}^{(d+1)} +u_{i,j+1}^{(d+1)} +\frac{\delta x\delta y}{\nu \delta t} u^{(d)} _{i,j} \right) \label{eq:poisson_discrete_solution} \tag{3} \end{equation} and \begin{equation} p ^{(d+1)} _{i,j}=\frac{1}{4}\left(p^{(d+1)} _{i-1,j}+p^{(d+1)} _{i+1,j} +p^{(d+1)} _{i,j-1}+p^{(d+1)} _{i,j+1} -\delta x\delta y p^{(d)} _{i,j} \right) \label{eq:laplace_discrete_solution} \tag{4} \end{equation}

which can be aggregated into

\begin{equation} x ^{(d+1)} _{i,j}=\frac{x^{(d+1)} _{i-1,j}+x^{(d+1)} _{i+1,j} +x^{(d+1)} _{i,j-1}+x^{(d+1)} _{i,j+1} + \alpha x^{(d)} _{i,j}}{\gamma + \eta } \label{eq:general_solution} \tag{5} \end{equation}

This general form can be transformed into \eqref{eq:poisson_discrete_solution} when $$\alpha = \frac{\delta x\delta y}{\nu \delta t}$$ and $$\gamma=4, \eta=\alpha$$. In the same vein, $$\alpha =\delta x\delta y$$ and $$\gamma=4, \eta=0$$ turns the general form into Equation \eqref{eq:laplace_discrete_solution}. The general form is handy, as we are now able to use the same routine to solve both linear systems.

Written in matrix notation, we are solving linear systems of the form

$\textbf{A}\textbf{x}=\textbf{b}$

whereas $$\textbf{A}\in \mathbb{R}^{n\times n}$$. As our grid is usually highly discretized, $$\textbf{A}$$ is going to be very large ($$n= m\cdot k$$). In addition, because our finite difference scheme is represented by a tridiagonal toeplitz matrix, $$\textbf{A}$$ is sparse (most entries are zero). In fact, sparsity is so pronounced that

$\lim_{n->\infty}\frac{3n}{n^2} \rightarrow 0$

which lets the probability of randomly encountering a non zero entry vanish as $$n\rightarrow \infty$$. Storing the entirety of $$\textbf{A}$$ as 4 byte floating point numbers, 32 GB of GPU memory would allow us a matrix size of around $$n=89442$$. If we only stored the nonzero entries on the other hand, we could load matrices of around size $$n=2666666666$$, which 29814 times as much. A nice bonus is that computing $$\textbf{x}$$ could be sped up as most entries do not have to be considered in a computation. This is not irrelevant. Nowadays, it is feasible to assume that Equations \eqref{eq:poisson_discrete_solution} and \eqref{eq:laplace_discrete_solution} will be solved hundreds of times per second. We wish therefore for a large linear systems solver with a small memory footprint and quick computation of $$\textbf{x}$$. For math or computer science undergraduates, LU or QR decompositions probably come to mind. The idea appears sound, as they can also be used with sparse matrices. They fall under the class of direct solvers, meaning in exact arithmetic their result is precisely $$\textbf{x}$$. Unfortunately for our purposes, their scaling is unsuited for large matrices. For example, a householder reflected QR scales in the order of $$O(n^3)$$, which is unwanted. Newer approaches with better complexity exist (see exercises), so direct solvers should not immediately be written off. Much more suited for large problems are iterative solvers. These are usually algorithms that starting from some initial guess $$\textbf{x}_0$$ iteratively converge to a solution. Decome Equation \label{eq:general_solution} can be transformed into an equivalent matrix formulation that fulfills

\begin{equation} \textbf{D}\textbf{x}= \left(\textbf{L}+\textbf{U}\right)\textbf{x}+\textbf{b} \label{eq:general_solution_matrix} \tag{6} \end{equation}

whereas

\begin{equation}\textbf{D}=\begin{pmatrix} 1 & 0 & \dots & 0 \\ 0 & 1 & \ddots & \vdots\\ \vdots & \ddots & \ddots & 0 \\ 0 & \dots & 0 & 1 \end{pmatrix},\end{equation}

\begin{equation}\textbf{U}_{i,j}=\left\{\begin{array}{lr} 0, & \text{if } j\leq i\\ \frac{1}{\eta+\gamma}, & \text{else} \end{array}\right. ,\end{equation}

\begin{equation}\textbf{L}_{i,j}=\left\{\begin{array}{lr} 0, & \text{if } j\geq i\\ \frac{1}{\eta+\gamma}, & \text{else} \end{array}\right. \end{equation}

\begin{equation} \mathbf{b}_i=\alpha b_i \end{equation}

It can be shown that Equation \eqref{eq:general_solution} and \eqref{eq:general_solution_matrix} describe the same linear system. An iterative algorithm can now be described by repeatedly solving

\begin{equation} \textbf{D}\textbf{x}_{new}= \left(\textbf{L}+\textbf{U}\right)\textbf{x}_{old}+\textbf{b} \label{"eq:jacobi_method"} \tag{7} \end{equation}

The convergence of this method to the true solution can be shown via Banach's fixed point theorem. As $$\mathbf{D}$$ is diagonal, solving a single iteration is simple. This iteration scheme is called the Jacobi method. It can be proven that this method converges. In order to increase the convergence speed, the Jacobi method can be augmented by a weighting factor $$\omega \in \mathbb{R}^{+}$$,

\begin{equation} \textbf{x} _{new}= \left[\left(1-\omega\right)\textbf{I}+\omega\textbf{D} ^{-1}\left(\textbf{L}+\textbf{U} ^h\right) \right]\textbf{x} _{old}+\omega (\textbf{D} ^{-1}){^h}\textbf{b} \end{equation}

which defines $$\textbf{x}_{new}$$ to be a weighted average between $$\textbf{x}_{old}$$ and Equation \eqref{eq:general_solution_matrix}.

Interestingly, setting $$\omega=1$$ is not always the best choice. The optimal value of this hyperparameter is problem dependent. More analysis and an example on how to estimate $$\omega$$ can be seen here.

The Jacobi method is already sufficient to solve our Poisson equations. In GPU Gems for example, only this method was used. But in many situations, even the weighted Jacobi method converges too slowly. This problem must be addressed.

Let $$\textbf{v}$$ be the value of the current Jacobi iterate and $$\textbf{x}$$ a solution of the linear system in consideration. Then the current error is defined as

$\textbf{x}-\textbf{v}=\textbf{e}$

and the residual as

$\textbf{A}\textbf{e}=\textbf{r} \tag{8} \label{eq:residual_equation}$

Importantly, instead of continuing to look for $$\textbf{x}$$, we can now solve the residual equation for $$\textbf{e}$$. If some conditions are met, iterating over the residual equation converges faster than continuing to iterate over the original system. If $$\textbf{e}$$ is suitably approximated, we correct our current guess for the orignal equation $$\textbf{v}$$ by setting

$\textbf{x}=\textbf{v}+\textbf{e}$

which yields the wanted solution of the original problem $$\textbf{x}$$.

$\textbf{x}=\textbf{r}+\textbf{v}$

to yield $$\textbf{x}$$. Multigrid uses this observation to great effect. If an iteration scheme gets stuck and appears to only yield diminishing returns per iteration, then a multigrid algorithm starts solving the auxillary residual problem Equation \eqref{eq:residual_equation} under a different grid discretization, assuming that making the grid coarser or finer simplifies finding $$\textbf{e}$$. This assumption can be shown to be correct for many problems, as seen here. Successively coarsening or refining the grid can be done in stages. Coarsening a grid is called restriction and refining it prolongation.

Indicating an object with respect towards a specific grid size is indicated with $$^{h}$$. For example, $$\textbf{A}^h$$ indicates the original grid at discretization size $$h$$.

V-Cycle Multigrid Algorithm

Parameters: $$\textbf{A} \in \mathbb{R}^{n \times n}, \textbf{x},\textbf{b} \in \mathbb{R}^{n},\epsilon,\text{h}\in\mathbb{R}^{+}, S\in \mathbb{N}$$

Set: $$\textbf{b}^h\rightarrow\textbf{b}$$

Down Cycle
for $$i\rightarrow 1$$ to $$S$$:
$$\quad$$ Approximate $$\textbf{A}^{h} \textbf{e}^{h}=\textbf{b}^{h}$$ with initial guess $$\textbf{0}^{h}$$ to obtain $$\textbf{v}^{h}$$
$$\quad$$ $$\textbf{r}^{h}=\textbf{b} ^{h}-\textbf{A}^{h} v^{h}$$
$$\quad$$ if $$i$$ is not $$S$$:
$$\qquad$$ $$\textbf{b}_{2h}\rightarrow \text{restrict}(\textbf{r}^{h})$$
$$\qquad$$ $$h\rightarrow 2h$$

Up Cycle
for $$i\rightarrow S$$ to $$1$$:
$$\quad$$ $$h\rightarrow 0.5h$$
$$\quad$$ $$\textbf{e}^{h}\rightarrow \text{restrict}(\textbf{v}_{2h})$$
$$\quad$$ $$\textbf{v}^{h}\rightarrow \textbf{e}^{h} + \textbf{v}^{h}$$
$$\quad$$ Approximate $$\textbf{A}^{h} \textbf{r}^{h}=\textbf{b}^{h}$$ with initial guess $$\textbf{v}^{h}$$ to obtain $$\textbf{v}^{h} _{new}$$
$$\quad$$ $$\textbf{v}^{h}\rightarrow \textbf{v}^{h}_{new}$$
if $$\left( \left|\left|\textbf{A}\textbf{x}_{new}-\textbf{b} \right| \right|^2 \le \epsilon \right)$$ return $$\textbf{X} _{new}$$

For a thorough introduction into multigrid this book is recommended. In this tutorial the V-Cycle formulation is used, as experience has shown it to be well suited for rectangular Poisson problems.

Poisson's problem with varying boundary conditions simulated with our CUDA implementation. Without any kernel optimizations, for $$m\cdot k=1000000$$ grid points this simulation took 1 milliseconds per frame.