# Two mode decompositions: POD and DMD

This is a short overview of the Proper Orthogonal and Dynamic/Koopman Mode Decompositions, which are commonly used in analysis of velocity fields of fluid flows. While I worked with the theoretical side of Koopman modes, I never implemented the numerical code myself; I wrote these notes up a I was teaching myself the basics of numerics of these decompositions, and consequently used the notes for two lectures. The notes are based References at the end of the post. Caveat lector: Notes may contain gross oversimplifications — the emphasis was on understanding and not on precision. I welcome your corrections and comments below. (You can always stop by my Van Vleck office if you’re in Madison to discuss any part of this).

UPDATE: I have now posted my own implementation of several algorithms for Koopman mode decompositions. [GitHub]

#### 1. Decompositions in general

Let’s assume that we are recording evolution of a dynamical quantity $u(x,y,t)$ associated with a planar fluid flow. Commonly, $u$ contains (one or more) components of the velocity field. An objective of a decomposition is to decouple the spatial and temporal variation of the flow, and write the entire evolution as a superposition of spatial modes $\phi_k$ whose change in time is governed by scalar coefficients $a_k$:
$u(x,y,t) = \sum_{k=1}^K a_k(t) \cdot \phi_k(x, y).$
Depending on the dynamics, we might be able to perform such a decomposition exactly using finitely many modes; in real and realistic flows, this is highly unlikely. Instead, we aim to achieve the decomposition only approximately, by retaining a relatively-small number of modes $K$.

As the fluid evolves, we measure the $u$ at discrete time steps $t = t_{m}, m=1,\dots, M$ and at a finitely many spatial locations $(x_n, y_n), n = 1,\dots,N$. As a result, our measurements can be stored in a $U \in \mathbb{R}^{N \times M}$, in which columns are time-snapshots of the entire field, while rows are time series of evolution of $u$ at a particular location $(x_n, y_n)$.

The snapshot at time $t_m$, stored in the column $U_{\ast,m}$, can be written as (per the aforementioned decomposition)
$U_{\ast,m} = \underbrace{ \begin{bmatrix} \dots & \begin{bmatrix} \\ \phi_k \\ \phantom{.} \end{bmatrix} & \dots \end{bmatrix}}_{\Phi \in \mathbb{C}^{N \times K}} \cdot \begin{bmatrix} \\ a_k(m) \\ \phantom{.} \end{bmatrix},$
where we can further store coefficients $a_k(m)$ in the matrix $A \in \mathbb{C}^{M\times K}$. Therefore, we seek the decomposition of the data matrix
$U \approx \Phi \cdot A^\top, \quad \Phi \in \mathbb{C}^{N\times K},\,A \in \mathbb{C}^{M\times K}$, where $(\cdot)^\top$ denotes the conjugate transpose. The goal is to make $K$ as small as possible; in particular, much smaller than either $M,N$.

Time snapshots are rearranged into vectors of data matrix. Conversely, mode vectors can be rearranged by inverse operation in order to visualize them.

Notice now that the columns of $U$ are all linear combinations of columns of $\Phi$. Therefore, we are approximating a high-rank matrix $U$ by a $K$-rank matrix $\Phi \cdot A^\top$. The manner in which this approximation is achieved is what distinguishes POD and DMD.

#### 2. Proper Orthogonal Decomposition

Given the desired number of modes $K$, Proper Orthogonal Decomposition finds $\Phi \cdot A^\top$ of rank $K$ which attains the shortest distance $\Vert U - \Phi A^\top\Vert$. (The norm is either Frobenius or 2-norm.)

One explanation comes from statistics, where we imagine that our “true” dynamics really is composed of a small number of modes $\phi_k$ and that the remaining complexity in the data is just due to Gaussian random noise. If this is truly the case, then the approximation that minimizes the 2-norm of the error, as above, recovers the “deterministic” modes, and “filters-out” the noise. In statistics, this type of an approximation is known as Karhunen-Loève decomposition.

In fluid dynamics, the 2-norm of the fluid flow is associated with the kinetic energy stored in it. In this sense, Proper Orthogonal Decomposition finds $K$ modes that store the largest possible amount of energy present in $U$.

Computation of the best lower-rank approximation in a vector space with a scalar product is a well-known problem. How do we do this on vectors? Lower rank approximations are just those that live inside one of the axial planes. If our vector is already written in terms of the axial coordinate system, then the approximation is achieved by erasing enough coordinates. How do we know which coordinates to erase (and which axial plane to project on)? We don’t: we just erase the smallest coordinates until we achieve the target rank.

Minimum lower-rank approximation $\hat v$ is obtained by erasing coordinates of $v$

Now, our problem is more complicated than the simple image above, as, in addition to coordinates, we are looking for the appropriate orthogonal basis in which to perform the “coordinate deletion” on all columns of $U$ simultaneously. Problem is solved by computing the Singular Value Decomposition of
$U = L \Sigma R^\top$
where both $L$ and $R$ are orthonormal matrices, and $\Sigma$ is a diagonal matrix containing singular values in the decreasing order.

Calculating SVD matrices as eigenvectors and eigenvalues of the symmetrized matrices $UU^\top = L \Sigma^2 L^\top$ and $U^\top U = R \Sigma^2 R^\top$ is not advised numerically, as eigenvectors are notoriously ill-conditioned. Nonetheless, standard textbooks on numerical linear algebra, e.g., by Trefethen and Bau, or Golub and Van Loan, contain detailed account of the numerical algorithm. In a nutshell, $U$ is first reduced to a bidiagonal matrix using Householder reflections (in a finitely many steps). SVDs of such matrices can be computed iteratively in a numerically stable fashion, which is achieved by, e.g., Golub-Kahan algorithm. These are not the only two steps that can be employed, but are the most common, and are implemented in, e.g., GNU Scientific Library.

Rank of the matrix is equal to the number of nonzero elements in $R$. Notice that the $L$ matrix suits our needs for the $\Phi$ matrix in POD decomposition. To reduce the rank, we just compute $\Sigma_K$ by erasing everything but largest $K$ diagonal elements in $\Sigma$ and obtain the POD decomposition:
$\Phi = L, \quad A = R \Sigma_K.$

How do we choose the truncation order $K$? Singular values can be used to compute the magnitude of error between $U$ and its POD approximation: $\Vert U - \Phi A^\top\Vert_2^2 = \sigma_{K+1}^2$, $\Vert U - \Phi A^\top\Vert_F^2 = \sum_{k=K+1}^N \sigma_{k}^2$. Since all singular values are known, we can find the point of “diminishing returns” (loosely speaking), where increasing $K$ stops reaping big returns in capturing the energy contained in $U$.

There is usually a “sweet spot” at which increasing the rank of approximation does not reduce the error by much.

In terms of singular values $\sigma_i$, 2-norm of a matrix is given by the largest singular value, while the Frobenius norm is given by the root of the sum of squares of singular values. In approximations formed as above,. Regardless of how the error is measured, Eckart–Young theorem states that POD approximations really are the best reduced-rank approximations to $U$ in either 2- or Frobenius norm.

Now that we have the matrix $\Phi$, what is it good for? Well, first, visualization. Remember, each column of the matrix is one independent component of our measured data, which only changes in magnitude, i.e., its shape stays the same (or coherent!).

Second, we might want to write the evolution equation for $u(x,y,t)$ on a component-by component basis. If $u$ is a velocity field of a fluid, its dynamics are typically described by a PDE $\partial_t u = \mathcal{L}(u)$. However, if we establish that $u(x,y,t) = \sum_{k=1}^K a_k(t)\phi_k(x,y)$, then $\partial_t u(x,y,t) = \sum_{k=1}^K \dot a_k(t)\phi_k(x,y)$. If furthermore, modes $\phi_k$ are invariants of the right-hand side, $\mathcal{L}(\phi_k) = c_k(t) \phi_k$, then the entire evolution of the dynamics reduces to just $K$ ODEs.

Unfortunately, PDEs governing fluid motion are rarely linear and therefore there is little hope that POD modes will lead to dynamical reduction to ODEs. Note however that if the PDEs possess particular symmetries, partial success is possible, and these kinds of problems generated a lot of activities in fluid dynamics throughout 1990s. For more, see Ref [1].

#### 3. Dynamic Mode/Koopman Decomposition

In fluid dynamics, velocity field is treated both as a measured quantity and as a state quantity, i.e., the one governing the evolution. In this approach, evolution of $u_m, u_{m+1},\dots$ is nonlinear. An alternative approach allows for treating $u_m$ as a sequence of projections of an infinite-dimensional state evolved by a linear operator.

Let the “true” dynamics be described by state $s$, evolving under some unknown operator $s_{m+1} = M(s_m)$.
Our measurements, e.g., velocity field, are just static evaluations of some measurement function $f$. For example, let $f$ return the velocity field at point $(x,y)$, i.e., $u(x,y, t_m) = f( s_m )$. The velocity field at the same point, but a time step later is given by
$u(x,y,t_{m+1}) = f( s_{m+1} ) = f \circ M (s_m)$.
The operator that maps $f \to f \circ M$ is the Koopman operator; it is linear, since composition is linear in the “outer” function, but it is infinite-dimensional, since the space of functions over the states $s$ is infinite-dimensional, even when the state space is finite-dimensional.

Let’s see how that affects our analysis. Assume that $( \lambda_k, a_k ), k=1,2,\dots$ are eigenpairs of the Koopman operator. Remember, $a_k(s)$ are of the same class of objects the Koopman operator works on, so they are not vectors, but particular kinds of measurement functions taking states as inputs. If the measurement functions $f$ we care about lie in the span of these eigenfunctions, they can be written as $f = \sum_{k} \phi_k a_k$, where $\phi_k$ are combination coefficients. The evolution of $f$ is then given by $\mathcal{K}^m f = \sum_{k} \phi_k \mathcal{K}^{m} a_k = \sum_{k} \phi_k \mathcal{\lambda}^{m} a_k$. If we stack the eigenfunctions in a row-vector $\underline{a}(s)$, we can write
$\mathcal{K}^m f = \Phi \Lambda^m \underline{a}(s)^\top$.
The data matrix can be consequently written using the matrix
$A = \begin{bmatrix} a_1(s_0) & a_2(s_0) & \cdots & a_k(s_0)\\ a_1(s_1) & a_2(s_1) & \cdots & a_k(s_1)\\ & \vdots & \\ a_1(s_{m-1}) & a_2(s_{m-1}) & \cdots & a_k(s_{m-1})\\ \end{bmatrix} = \begin{bmatrix} \underline{a}(s_0) \\ \underline{a}(s_0) \Lambda \\ \vdots\\ \underline{a}(s_0) \Lambda^{m-1} \end{bmatrix}$
as $U = \Phi A^\top$ and $\mathcal{K} U = \Phi (A \Lambda)^\top = \Phi \Lambda A^\top$.
In this Koopman-mode interpretation of our data matrix, modes of importance are column-vectors of $\Phi$. They provide the axis along which the state measurement eigenfunctions $a_k$ affect our measured data.

As we can only experience the action of the Koopman operator through its transformation of the matrix $U$, for our purposes, Koopman operator $\mathcal{K}$ acts as a large, inaccessible matrix. Given the data matrix as before, Koopman operator acts by shifting its columns by one:
$\mathcal{K} \begin{bmatrix} \begin{bmatrix} \\ U_{\ast,0} \\ \phantom{.} \end{bmatrix} & \begin{bmatrix} \\ U_{\ast,1} \\ \phantom{.} \end{bmatrix} & \cdots & \begin{bmatrix} \\ U_{\ast,M-1} \\ \phantom{.} \end{bmatrix} \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} \\ U_{\ast,1} \\ \phantom{.} \end{bmatrix} & \begin{bmatrix} \\ U_{\ast,2} \\ \phantom{.} \end{bmatrix} & \cdots & \begin{bmatrix} \\ U_{\ast,M} \\ \phantom{.} \end{bmatrix} \end{bmatrix}$
The only new column on the right hand side is the last one: if it can be written as a linear combinations of the “old” ones, at least approximately, then this evolution can be written as
$\mathcal{K} U \approx U C$
where $C$ is a matrix in the companion form:
$C = \begin{bmatrix} 0_{1 \times M-1} & \ast \\ I_{M-1 \times M-1} & \ast \end{bmatrix}$.
If this decomposition is exact, then advancing time just iterates $C$:
$\mathcal{K}^2 U = \mathcal{K} U C = U C^2$.

We want to compute the matrix $\Phi$ from an approximate equality obtained by putting eigendecomposition derived earlier and the companion matrix together:
$\Phi \Lambda A^\top \approx \mathcal K U \approx U C.$
If $C$ can be diagonalized, it can be written as $C = V^{-1} \Lambda V$, then the equality becomes
$\Phi \Lambda A^\top = U V^{-1} \Sigma V,$
where both $\Lambda$ (eigenvalues of $\mathcal{K}$) and $\Sigma$ (eigenvalues of $C$) are diagonal. But can they be made equal? The answer is yes!

For an eigenpair $(\sigma, v)$ $C$,
$\mathcal{K} U v = U C v = \sigma U v,$
therefore $(\sigma, U v)$ is an eigenpair of $\mathcal{K}$. It follows that $\Sigma$ and $\Lambda$ can be arranged to be the same matrix:
$\Phi \Lambda A^\top = U V^{-1} \Lambda V$.
We can therefore set $\Phi = UV^{-1}$ and $A^\top = V$ to obtain our decomposition.

Notice that, unlike POD, matrix $\Phi$ is not orthonormal.

The description above is OK for conceptual purposes, but it is not the way we will want to compute $\Phi$. Instead, we interpret the matrix $U$ as a sequence of Krylov vector that lie in the output space of the (inaccessible) matrix $\mathcal{K}$.

Arnoldi iteration, applied to $U$ provides us progressively with matrices $Q$ (orthornormal) and $H$ (Hessenberg), which ideally satisfy
$\mathcal{K} = Q H Q^\top$, $U = Q R$.
Diagonalization of Hessenberg matrix $H$ is $H = W^{-1} \Lambda W$; substituting it into the derived relationships involving the companion matrix, we can obtain that
$V = W R$, which is useful only as an intermediate result, since the Arnoldi process never explicitly computes $R$. However, $R$ is lost in the final expression
$\Phi = UV^{-1} = Q R R^{-1} W^{-1} = Q W^{-1}$.

#### 4. Discussion

There are several differences between POD and DMD modes:

• POD modes are orthogonal; DMD are not
• DMD modes are dynamically invariant; POD are not, in general
• POD are computed using SVD; common application of DMD (additionally) uses Krylov decompositions
• Magnitude of the characteristic value of the POD mode relates to the fraction of L2 energy/variance-explained by the mode; characteristic value of the DMD mode determines its dynamics

In these notes I tried to give a brief introduction into how to compute POD and DMD modes. Perhaps in one of the future posts I will actually post some computations. In the mean time, Ref. [3] compares POD and DMD modes on a fabricated data set, as well as some experimentally-recorded von Karman streets.

#### 5. References

[1] Smith, Troy R., Jeff Moehlis, and Philip Holmes. “Low-Dimensional Modelling of Turbulence Using the Proper Orthogonal Decomposition: A Tutorial.” Nonlinear Dynamics 41, no. 1–3 (August 1, 2005): 275–307. doi:10.1007/s11071-005-2823-y.
[2] Rowley, Clarence W, Igor Mezić, Shervin Bagheri, Philipp Schlatter, and Dan S Henningson. “Spectral Analysis of Nonlinear Flows.” Journal of Fluid Mechanics 641 (2009): 115–27. doi:10.1017/S0022112009992059.
[3] Zhang, Qingshan, Yingzheng Liu, and Shaofei Wang. “The Identification of Coherent Structures Using Proper Orthogonal Decomposition and Dynamic Mode Decomposition.” Journal of Fluids and Structures 49 (August 2014): 53–72. doi:10.1016/j.jfluidstructs.2014.04.002.