62. Classical Filtering With Linear Algebra#

62.1. Overview#

This is a sequel to the earlier lecture Classical Control with Linear Algebra.

That lecture used linear algebra – in particular, the LU decomposition – to formulate and solve a class of linear-quadratic optimal control problems.

In this lecture, we’ll be using a closely related decomposition, the Cholesky decomposition , to solve linear prediction and filtering problems.

We exploit the useful fact that there is an intimate connection between two superficially different classes of problems:

  • deterministic linear-quadratic (LQ) optimal control problems

  • linear least squares prediction and filtering problems

The first class of problems involves no randomness, while the second is all about randomness.

Nevertheless, essentially the same mathematics solves both type of problem.

This connection, which is often termed “duality,” is present whether one uses “classical” or “recursive” solution procedures.

In fact we saw duality at work earlier when we formulated control and prediction problems recursively in lectures LQ dynamic programming problems, A first look at the Kalman filter, and The permanent income model.

A useful consequence of duality is that

  • With every LQ control problem there is implicitly affiliated a linear least squares prediction or filtering problem.

  • With every linear least squares prediction or filtering problem there is implicitly affiliated a LQ control problem.

An understanding of these connections has repeatedly proved useful in cracking interesting applied problems.

For example, Sargent [Sar87] [chs. IX, XIV] and Hansen and Sargent [HS80] formulated and solved control and filtering problems using z-transform methods.

In this lecture we investigate these ideas using mostly elementary linear algebra.

62.1.1. References#

Useful references include [Whi63], [HS80], [Orf88], [AP91], and [Mut60].

62.2. Infinite Horizon Prediction and Filtering Problems#

We pose two related prediction and filtering problems.

We let Yt be a univariate mth order moving average, covariance stationary stochastic process,

(62.1)#Yt=d(L)ut

where d(L)=j=0mdjLj, and ut is a serially uncorrelated stationary random process satisfying

(62.2)#Eut=0Eutus={1 if t=s0 otherwise

We impose no conditions on the zeros of d(z).

A second covariance stationary process is Xt given by

(62.3)#Xt=Yt+εt

where εt is a serially uncorrelated stationary random process with Eεt=0 and Eεtεs = 0 for all distinct t and s.

We also assume that Eεtus=0 for all t and s.

The linear least squares prediction problem is to find the L2 random variable X^t+j among linear combinations of {Xt, Xt1,} that minimizes E(X^t+jXt+j)2.

That is, the problem is to find a γj(L)=k=0γjkLk such that k=0|γjk|2< and E[γj(L)XtXt+j]2 is minimized.

The linear least squares filtering problem is to find a b(L)=j=0bjLj such that j=0|bj|2< and E[b(L)XtYt]2 is minimized.

Interesting versions of these problems related to the permanent income theory were studied by [Mut60].

62.2.1. Problem formulation#

These problems are solved as follows.

The covariograms of Y and X and their cross covariogram are, respectively,

(62.4)#CX(τ)=EXtXtτCY(τ)=EYtYtττ=0,±1,±2,CY,X(τ)=EYtXtτ

The covariance and cross covariance generating functions are defined as

(62.5)#gX(z)=τ=CX(τ)zτgY(z)=τ=CY(τ)zτgYX(z)=τ=CYX(τ)zτ

The generating functions can be computed by using the following facts.

Let v1t and v2t be two mutually and serially uncorrelated white noises with unit variances.

That is, Ev1t2=Ev2t2=1,Ev1t=Ev2t=0,Ev1tv2s=0 for all t and s, Ev1tv1tj=Ev2tv2tj=0 for all j0.

Let xt and yt be two random process given by

yt=A(L)v1t+B(L)v2txt=C(L)v1t+D(L)v2t

Then, as shown for example in [Sar87] [ch. XI], it is true that

(62.6)#gy(z)=A(z)A(z1)+B(z)B(z1)gx(z)=C(z)C(z1)+D(z)D(z1)gyx(z)=A(z)C(z1)+B(z)D(z1)

Applying these formulas to (62.1)(62.4), we have

(62.7)#gY(z)=d(z)d(z1)gX(z)=d(z)d(z1)+hgYX(z)=d(z)d(z1)

The key step in obtaining solutions to our problems is to factor the covariance generating function gX(z) of X.

The solutions of our problems are given by formulas due to Wiener and Kolmogorov.

These formulas utilize the Wold moving average representation of the Xt process,

(62.8)#Xt=c(L)ηt

where c(L)=j=0mcjLj, with

(62.9)#c0ηt=XtE^[Xt|Xt1,Xt2,]

Here E^ is the linear least squares projection operator.

Equation (62.9) is the condition that c0ηt can be the one-step ahead error in predicting Xt from its own past values.

Condition (62.9) requires that ηt lie in the closed linear space spanned by [Xt, Xt1,].

This will be true if and only if the zeros of c(z) do not lie inside the unit circle.

It is an implication of (62.9) that ηt is a serially uncorrelated random process, and that a normalization can be imposed so that Eηt2=1.

Consequently, an implication of (62.8) is that the covariance generating function of Xt can be expressed as

(62.10)#gX(z)=c(z)c(z1)

It remains to discuss how c(L) is to be computed.

Combining (62.6) and (62.10) gives

(62.11)#d(z)d(z1)+h=c(z)c(z1)

Therefore, we have already showed constructively how to factor the covariance generating function gX(z)=d(z)d(z1)+h.

We now introduce the annihilation operator:

(62.12)#[j=fjLj]+j=0fjLj

In words, [00]+ means “ignore negative powers of L”.

We have defined the solution of the prediction problem as E^[Xt+j|Xt,Xt1,]=γj(L)Xt.

Assuming that the roots of c(z)=0 all lie outside the unit circle, the Wiener-Kolmogorov formula for γj(L) holds:

(62.13)#γj(L)=[c(L)Lj]+c(L)1

We have defined the solution of the filtering problem as E^[YtXt,Xt1,]=b(L)Xt.

The Wiener-Kolomogorov formula for b(L) is

b(L)=(gYX(L)c(L1))+c(L)1

or

(62.14)#b(L)=[d(L)d(L1)c(L1)]+c(L)1

Formulas (62.13) and (62.14) are discussed in detail in [Whi83] and [Sar87].

The interested reader can there find several examples of the use of these formulas in economics Some classic examples using these formulas are due to [Mut60].

As an example of the usefulness of formula (62.14), we let Xt be a stochastic process with Wold moving average representation

Xt=c(L)ηt

where Eηt2=1, and c0ηt=XtE^[Xt|Xt1,],c(L)=j=0mcjL.

Suppose that at time t, we wish to predict a geometric sum of future X’s, namely

ytj=0δjXt+j=11δL1Xt

given knowledge of Xt,Xt1,.

We shall use (62.14) to obtain the answer.

Using the standard formulas (62.6), we have that

gyx(z)=(1δz1)c(z)c(z1)gx(z)=c(z)c(z1)

Then (62.14) becomes

(62.15)#b(L)=[c(L)1δL1]+c(L)1

In order to evaluate the term in the annihilation operator, we use the following result from [HS80].

Proposition Let

  • g(z)=j=0gjzj where j=0|gj|2<+

  • h(z1)= (1δ1z1)(1δnz1), where |δj|<1, for j=1,,n

Then

(62.16)#[g(z)h(z1)]+=g(z)h(z1)j=1n δjg(δj)k=1kjn(δjδk) (1zδj)

and, alternatively,

(62.17)#[g(z)h(z1)]+=j=1nBj(zg(z)δjg(δj)zδj)

where Bj=1/k=1k+jn(1δk/δj).

Applying formula (62.17) of the proposition to evaluating (62.15) with g(z)=c(z) and h(z1)=1δz1 gives

b(L)=[Lc(L)δc(δ)Lδ]c(L)1

or

b(L)=[1δc(δ)L1c(L)11δL1]

Thus, we have

(62.18)#E^[j=0δjXt+j|Xt,xt1,]=[1δc(δ)L1c(L)11δL1]Xt

This formula is useful in solving stochastic versions of problem 1 of lecture Classical Control with Linear Algebra in which the randomness emerges because {at} is a stochastic process.

The problem is to maximize

(62.19)#E0limN t0Nβt[atyt12 hyt212 [d(L)yt]2]

where Et is mathematical expectation conditioned on information known at t, and where {at} is a covariance stationary stochastic process with Wold moving average representation

at=c(L)ηt

where

c(L)=j=0n~cjLj

and

ηt=atE^[at|at1,]

The problem is to maximize (62.19) with respect to a contingency plan expressing yt as a function of information known at t, which is assumed to be (yt1, yt2,,at, at1,).

The solution of this problem can be achieved in two steps.

First, ignoring the uncertainty, we can solve the problem assuming that {at} is a known sequence.

The solution is, from above,

c(L)yt=c(βL1)1at

or

(62.20)#(1λ1L)(1λmL)yt=j=1mAjk=0(λjβ)kat+k

Second, the solution of the problem under uncertainty is obtained by replacing the terms on the right-hand side of the above expressions with their linear least squares predictors.

Using (62.18) and (62.20), we have the following solution

(1λ1L)(1λmL)yt=j=1mAj[1βλjc(βλj)L1c(L)11βλjL1]at

62.3. Finite Dimensional Prediction#

Let (x1,x2,,xT)=x be a T×1 vector of random variables with mean Ex=0 and covariance matrix Exx=V.

Here V is a T×T positive definite matrix.

We shall regard the random variables as being ordered in time, so that xt is thought of as the value of some economic variable at time t.

For example, xt could be generated by the random process described by the Wold representation presented in equation (62.8).

In this case, Vij is given by the coefficient on zij in the expansion of gx(z)=d(z)d(z1)+h, which equals h+k=0dkdk+ij.

We shall be interested in constructing j step ahead linear least squares predictors of the form

E^[xT|xTj,xTj+1,,x1]

where E^ is the linear least squares projection operator.

The solution of this problem can be exhibited by first constructing an orthonormal basis of random variables ε for x.

Since V is a positive definite and symmetric, we know that there exists a (Cholesky) decomposition of V such that

V=L1(L1)

or

LVL=I

where L is lower-trangular, and therefore so is L1.

Form the random variable Lx=ε.

Then ε is an orthonormal basis for x, since L is nonsingular, and Eεε=LExxL=I.

It is convenient to write out the equations Lx=ε and L1ε=x

(62.21)#L11x1=ε1L21x1+L22x2=ε2LT1x1+LTTxT=εT

or

(62.22)#j=0t1Lt,tjxtj=εt,t=1,2,T

We also have

(62.23)#xt=j=0t1Lt,tj1εtj .

Notice from (62.23) that xt is in the space spanned by εt,εt1,,ε1, and from (62.22) that εt is in the space spanned by xt,xt1,,x1.

Therefore, we have that for t1m1

(62.24)#E^[xtxtm,xtm1,,x1]=E^[xtεtm,εtm1,,ε1]

For t1m1 rewrite (62.23) as

(62.25)#xt=j=0m1Lt,tj1εtj+j=mt1Lt,tj1εtj

Representation (62.25) is an orthogonal decomposition of xt into a part j=mt1Lt,tj1εtj that lies in the space spanned by [xtm,xtm+1,,x1], and an orthogonal component not in this space.

62.3.1. Implementation#

Code that computes solutions to LQ control and filtering problems using the methods described here and in Classical Control with Linear Algebra can be found in the file control_and_filter.jl.

Here’s how it looks

using LinearAlgebra, Statistics
using Polynomials.PolyCompat, LinearAlgebra
import Polynomials.PolyCompat: roots, coeffs

function LQFilter(d, h, y_m;
                  r = nothing,
                  beta = nothing,
                  h_eps = nothing)
    m = length(d) - 1
    m == length(y_m) ||
        throw(ArgumentError("y_m and d must be of same length = $m"))

    # define the coefficients of phi up front
    phi = zeros(2m + 1)
    for i in (-m):m
        phi[m - i + 1] = sum(diag(d * d', -i))
    end
    phi[m + 1] = phi[m + 1] + h

    # if r is given calculate the vector phi_r
    if isnothing(r)
        k = nothing
        phi_r = nothing
    else
        k = size(r, 1) - 1
        phi_r = zeros(2k + 1)

        for i in (-k):k
            phi_r[k - i + 1] = sum(diag(r * r', -i))
        end

        if isnothing(h_eps) == false
            phi_r[k + 1] = phi_r[k + 1] + h_eps
        end
    end

    # if beta is given, define the transformed variables
    if isnothing(beta)
        beta = 1.0
    else
        d = beta .^ (collect(0:m) / 2) * d
        y_m = y_m * beta .^ (-collect(1:m) / 2)
    end

    return (; d, h, y_m, m, phi, beta, phi_r, k)
end

function construct_W_and_Wm(lqf, N)
    (; d, m) = lqf

    W = zeros(N + 1, N + 1)
    W_m = zeros(N + 1, m)

    # terminal conditions
    D_m1 = zeros(m + 1, m + 1)
    M = zeros(m + 1, m)

    # (1) constuct the D_{m+1} matrix using the formula

    for j in 1:(m + 1)
        for k in j:(m + 1)
            D_m1[j, k] = dot(d[1:j, 1], d[(k - j + 1):k, 1])
        end
    end

    # Make the matrix symmetric
    D_m1 = D_m1 + D_m1' - Diagonal(diag(D_m1))

    # (2) Construct the M matrix using the entries of D_m1

    for j in 1:m
        for i in (j + 1):(m + 1)
            M[i, j] = D_m1[i - j, m + 1]
        end
    end
    M

    # Euler equations for t = 0, 1, ..., N-(m+1)
    phi, h = lqf.phi, lqf.h

    W[1:(m + 1), 1:(m + 1)] = D_m1 + h * I
    W[1:(m + 1), (m + 2):(2m + 1)] = M

    for (i, row) in enumerate((m + 2):(N + 1 - m))
        W[row, (i + 1):(2m + 1 + i)] = phi'
    end

    for i in 1:m
        W[N - m + i + 1, (end - (2m + 1 - i) + 1):end] = phi[1:(end - i)]
    end

    for i in 1:m
        W_m[N - i + 2, 1:((m - i) + 1)] = phi[(m + 1 + i):end]
    end

    return W, W_m
end

function roots_of_characteristic(lqf)
    (; m, phi) = lqf

    # Calculate the roots of the 2m-polynomial
    phi_poly = Poly(phi[end:-1:1])
    proots = roots(phi_poly)
    # sort the roots according to their length (in descending order)
    roots_sorted = sort(proots, by = abs)[end:-1:1]
    z_0 = sum(phi) / polyval(poly(proots), 1.0)
    z_1_to_m = roots_sorted[1:m]     # we need only those outside the unit circle
    lambda = 1 ./ z_1_to_m
    return z_1_to_m, z_0, lambda
end

function coeffs_of_c(lqf)
    (; m) = lqf
    z_1_to_m, z_0, lambda = roots_of_characteristic(lqf)
    c_0 = (z_0 * prod(z_1_to_m) * (-1.0)^m)^(0.5)
    c_coeffs = coeffs(poly(z_1_to_m)) * z_0 / c_0
    return c_coeffs
end

function solution(lqf)
    z_1_to_m, z_0, lambda = roots_of_characteristic(lqf)
    c_0 = coeffs_of_c(lqf)[end]
    A = zeros(m)
    for j in 1:m
        denom = 1 - lambda / lambda[j]
        A[j] = c_0^(-2) / prod(denom[1:m .!= j])
    end
    return lambda, A
end

function construct_V(lqf; N = nothing)
    if isnothing(N)
        error("N must be provided!!")
    end
    if !(N isa Integer)
        throw(ArgumentError("N must be Integer!"))
    end

    (; phi_r, k) = lqf
    V = zeros(N, N)
    for i in 1:N
        for j in 1:N
            if abs(i - j) <= k
                V[i, j] = phi_r[k + abs(i - j) + 1]
            end
        end
    end
    return V
end

function simulate_a(lqf, N)
    V = construct_V(N + 1)
    d = MVNSampler(zeros(N + 1), V)
    return rand(d)
end

function predict(lqf, a_hist, t)
    N = length(a_hist) - 1
    V = construct_V(N + 1)

    aux_matrix = zeros(N + 1, N + 1)
    aux_matrix[1:(t + 1), 1:(t + 1)] = Matrix(I, t + 1, t + 1)
    L = cholesky(V).U'
    Ea_hist = inv(L) * aux_matrix * L * a_hist

    return Ea_hist
end

function optimal_y(lqf, a_hist, t = nothing)
    (; beta, y_m, m) = lqf

    N = length(a_hist) - 1
    W, W_m = construct_W_and_Wm(lqf, N)

    F = lu(W, Val(true))

    L, U = F
    D = diagm(0 => 1.0 ./ diag(U))
    U = D * U
    L = L * diagm(0 => 1.0 ./ diag(D))

    J = reverse(Matrix(I, N + 1, N + 1), dims = 2)

    if isnothing(t)                      # if the problem is deterministic
        a_hist = J * a_hist

        # transform the a sequence if beta is given
        if beta != 1
            a_hist = reshape(a_hist * (beta^(collect(N:0) / 2)), N + 1, 1)
        end

        a_bar = a_hist - W_m * y_m        # a_bar from the lecutre
        Uy = \(L, a_bar)                  # U @ y_bar = L^{-1}a_bar from the lecture
        y_bar = \(U, Uy)                  # y_bar = U^{-1}L^{-1}a_bar
        # Reverse the order of y_bar with the matrix J
        J = reverse(Matrix(I, N + m + 1, N + m + 1), dims = 2)
        y_hist = J * vcat(y_bar, y_m)     # y_hist : concatenated y_m and y_bar
        # transform the optimal sequence back if beta is given
        if beta != 1
            y_hist = y_hist .* beta .^ (-collect((-m):N) / 2)
        end

    else                                  # if the problem is stochastic and we look at it
        Ea_hist = reshape(predict(a_hist, t), N + 1, 1)
        Ea_hist = J * Ea_hist

        a_bar = Ea_hist - W_m * y_m       # a_bar from the lecutre
        Uy = \(L, a_bar)                  # U @ y_bar = L^{-1}a_bar from the lecture
        y_bar = \(U, Uy)                  # y_bar = U^{-1}L^{-1}a_bar

        # Reverse the order of y_bar with the matrix J
        J = reverse(Matrix(I, N + m + 1, N + m + 1), dims = 2)
        y_hist = J * vcat(y_bar, y_m)     # y_hist : concatenated y_m and y_bar
    end
    return y_hist, L, U, y_bar
end
optimal_y (generic function with 2 methods)

Let’s use this code to tackle two interesting examples.

62.3.2. Example 1#

Consider a stochastic process with moving average representation

xt=(12L)εt

where εt is a serially uncorrelated random process with mean zero and variance unity.

We want to use the Wiener-Kolmogorov formula (62.13) to compute the linear least squares forecasts E[xt+jxt,xt1,], for j=1,2.

We can do everything we want by setting d=r, generating an instance of LQFilter, then invoking pertinent methods of LQFilter

m = 1
y_m = zeros(m)
d = [1.0, -2.0]
r = [1.0, -2.0]
h = 0.0
example = LQFilter(d, h, y_m, r = d)
(d = [1.0, -2.0], h = 0.0, y_m = [0.0], m = 1, phi = [-2.0, 5.0, -2.0], beta = 1.0, phi_r = [-2.0, 5.0, -2.0], k = 1)

The Wold representation is computed by example.coefficients_of_c().

Let’s check that it “flips roots” as required

coeffs_of_c(example)
2-element Vector{Float64}:
  2.0
 -1.0
roots_of_characteristic(example)
([2.0], -2.0, [0.5])

Now let’s form the covariance matrix of a time series vector of length N and put it in V.

Then we’ll take a Cholesky decomposition of V=L1L1=LiLi and use it to form the vector of “moving average representations” x=Liε and the vector of “autoregressive representations” Lx=ε

V = construct_V(example, N = 5)
5×5 Matrix{Float64}:
  5.0  -2.0   0.0   0.0   0.0
 -2.0   5.0  -2.0   0.0   0.0
  0.0  -2.0   5.0  -2.0   0.0
  0.0   0.0  -2.0   5.0  -2.0
  0.0   0.0   0.0  -2.0   5.0

Notice how the lower rows of the “moving average representations” are converging to the appropriate infinite history Wold representation

F = cholesky(V)
Li = F.L
5×5 LowerTriangular{Float64, Matrix{Float64}}:
  2.23607     ⋅         ⋅         ⋅         ⋅ 
 -0.894427   2.04939    ⋅         ⋅         ⋅ 
  0.0       -0.9759    2.01187    ⋅         ⋅ 
  0.0        0.0      -0.9941    2.00294    ⋅ 
  0.0        0.0       0.0      -0.998533  2.00073

Notice how the lower rows of the “autoregressive representations” are converging to the appropriate infinite history autoregressive representation

L = inv(Li)
5×5 LowerTriangular{Float64, Matrix{Float64}}:
 0.447214    ⋅          ⋅         ⋅         ⋅ 
 0.19518    0.48795     ⋅         ⋅         ⋅ 
 0.0946762  0.236691   0.49705    ⋅         ⋅ 
 0.0469898  0.117474   0.246696  0.499266   ⋅ 
 0.0234518  0.0586295  0.123122  0.249176  0.499817

Remark Let π(z)=j=0mπjzj and let z1,,zk be the zeros of π(z) that are inside the unit circle, k<m.

Then define

θ(z)=π(z)((z1z1)(zz1))((z2z1)(zz2))((zkz1)(zzk))

The term multiplying π(z) is termed a “Blaschke factor”.

Then it can be proved directly that

θ(z1)θ(z)=π(z1)π(z)

and that the zeros of θ(z) are not inside the unit circle.

62.3.3. Example 2#

Consider a stochastic process Xt with moving average representation

Xt=(12L2)εt

where εt is a serially uncorrelated random process with mean zero and variance unity.

Let’s find a Wold moving average representation for xt.

Let’s use the Wiener-Kolomogorov formula (62.13) to compute the linear least squares forecasts E^[Xt+jXt1,] for j=1,2,3.

We proceed in the same way as example 1

m = 2
y_m = [0.0, 0.0]
d = [1, 0, -sqrt(2)]
r = [1, 0, -sqrt(2)]
h = 0.0
example = LQFilter(d, h, y_m, r = d)
(d = [1.0, 0.0, -1.4142135623730951], h = 0.0, y_m = [0.0, 0.0], m = 2, phi = [-1.4142135623730951, 0.0, 3.0000000000000004, 0.0, -1.4142135623730951], beta = 1.0, phi_r = [-1.4142135623730951, 0.0, 3.0000000000000004, 0.0, -1.4142135623730951], k = 2)
coeffs_of_c(example)
3-element Vector{Float64}:
  1.4142135623731025
 -0.0
 -1.0000000000000078
roots_of_characteristic(example)
([1.1892071150027195, -1.1892071150027195], -1.4142135623731136, [0.8408964152537157, -0.8408964152537157])
V = construct_V(example, N = 8)
8×8 Matrix{Float64}:
  3.0       0.0      -1.41421   0.0      …   0.0       0.0       0.0
  0.0       3.0       0.0      -1.41421      0.0       0.0       0.0
 -1.41421   0.0       3.0       0.0          0.0       0.0       0.0
  0.0      -1.41421   0.0       3.0         -1.41421   0.0       0.0
  0.0       0.0      -1.41421   0.0          0.0      -1.41421   0.0
  0.0       0.0       0.0      -1.41421  …   3.0       0.0      -1.41421
  0.0       0.0       0.0       0.0          0.0       3.0       0.0
  0.0       0.0       0.0       0.0         -1.41421   0.0       3.0
F = cholesky(V)
Li = F.L
Li[(end - 2):end, :]
3×8 Matrix{Float64}:
 0.0  0.0  0.0  -0.92582   0.0        1.46385   0.0      0.0
 0.0  0.0  0.0   0.0      -0.966092   0.0       1.43759  0.0
 0.0  0.0  0.0   0.0       0.0       -0.966092  0.0      1.43759
L = inv(Li)
8×8 LowerTriangular{Float64, Matrix{Float64}}:
 0.57735    ⋅         ⋅         ⋅        …   ⋅         ⋅         ⋅ 
 0.0       0.57735    ⋅         ⋅            ⋅         ⋅         ⋅ 
 0.308607  0.0       0.654654   ⋅            ⋅         ⋅         ⋅ 
 0.0       0.308607  0.0       0.654654      ⋅         ⋅         ⋅ 
 0.19518   0.0       0.414039  0.0           ⋅         ⋅         ⋅ 
 0.0       0.19518   0.0       0.414039  …  0.68313    ⋅         ⋅ 
 0.131165  0.0       0.278243  0.0          0.0       0.695608   ⋅ 
 0.0       0.131165  0.0       0.278243     0.459078  0.0       0.695608

62.3.4. Prediction#

It immediately follows from the “orthogonality principle” of least squares (see [AP91] or [Sar87] [ch. X]) that

(62.26)#E^[xtxtm,xtm+1,x1]=j=mt1Lt,tj1εtj=[Lt,11Lt,21,,Lt,tm1 0 00]Lx

This can be interpreted as a finite-dimensional version of the Wiener-Kolmogorov m-step ahead prediction formula.

We can use (62.26) to represent the linear least squares projection of the vector x conditioned on the first s observations [xs,xs1,x1].

We have

(62.27)#E^[xxs,xs1,,x1]=L1[Is000(ts)]Lx

This formula will be convenient in representing the solution of control problems under uncertainty.

Equation (62.23) can be recognized as a finite dimensional version of a moving average representation.

Equation (62.22) can be viewed as a finite dimension version of an autoregressive representation.

Notice that even if the xt process is covariance stationary, so that V is such that Vij depends only on |ij|, the coefficients in the moving average representation are time-dependent, there being a different moving average for each t.

If xt is a covariance stationary process, the last row of L1 converges to the coefficients in the Wold moving average representation for {xt} as T.

Further, if xt is covariance stationary, for fixed k and j>0,LT,Tj1 converges to LTk,Tkj1 as T.

That is, the “bottom” rows of L1 converge to each other and to the Wold moving average coefficients as T.

This last observation gives one simple and widely-used practical way of forming a finite T approximation to a Wold moving average representation.

First, form the covariance matrix Exx=V, then obtain the Cholesky decomposition L1L1 of V, which can be accomplished quickly on a computer.

The last row of L1 gives the approximate Wold moving average coefficients.

This method can readily be generalized to multivariate systems.

62.4. Combined Finite Dimensional Control and Prediction#

Consider the finite-dimensional control problem, maximize

Et=0N{atyt12hyt212[d(L)yt]2}, h>0

where d(L)=d0+d1L++dmLm, L is the lag operator, a¯=[aN,aN1,a1,a0] a random vector with mean zero and Ea¯a¯=V.

The variables y1,,ym are given.

Maximization is over choices of y0,y1,yN, where yt is required to be a linear function of {yts1,t+m10; ats,ts0}.

We saw in the lecture Classical Control with Linear Algebra that the solution of this problem under certainty could be represented in feedback-feedforward form

Uy¯=L1a¯+K[y1ym]

for some (N+1)×m matrix K.

Using a version of formula (62.26), we can express E^[a¯as,as1,,a0] as

E^[a¯as,as1,,a0]=U~1[000I(s+1)]U~a¯

where I(s+1) is the (s+1)×(s+1) identity matrix, and V=U~1U~1, where U~ is the upper trangular Cholesky factor of the covariance matrix V.

(We have reversed the time axis in dating the a’s relative to earlier)

The time axis can be reversed in representation (62.27) by replacing L with LT.

The optimal decision rule to use at time 0tN is then given by the (Nt+1)th row of

Uy¯=L1U~1[000I(t+1)]U~a¯+K[y1ym]

62.5. Exercises#

62.5.1. Exercise 1#

Let Yt=(12L)ut where ut is a mean zero white noise with Eut2=1. Let

Xt=Yt+εt

where εt is a serially uncorrelated white noise with Eεt2=9, and Eεtus=0 for all t and s.

Find the Wold moving average representation for Xt.

Find a formula for the A1j’s in

EX^t+1Xt,Xt1,=j=0A1jXtj

Find a formula for the A2j’s in

E^Xt+2Xt,Xt1,=j=0A2jXtj

62.5.2. Exercise 2#

(Multivariable Prediction) Let Yt be an (n×1) vector stochastic process with moving average representation

Yt=D(L)Ut

where D(L)=j=0mDjLJ,Dj an n×n matrix, Ut an (n×1) vector white noise with :math: mathbb{E} U_t =0 for all t, EUtUs=0 for all st, and EUtUt=I for all t.

Let εt be an n×1 vector white noise with mean 0 and contemporaneous covariance matrix H, where H is a positive definite matrix.

Let Xt=Yt+εt.

Define the covariograms as CX(τ)=EXtXtτ,CY(τ)=EYtYtτ,CYX(τ)=EYtXtτ.

Then define the matrix covariance generating function, as in (61.21), only interpret all the objects in (61.21) as matrices.

Show that the covariance generating functions are given by

gy(z)=D(z)D(z1)gX(z)=D(z)D(z1)+HgYX(z)=D(z)D(z1)

A factorization of gX(z) can be found (see [Roz67] or [Whi83]) of the form

D(z)D(z1)+H=C(z)C(z1),C(z)=j=0mCjzj

where the zeros of |C(z)| do not lie inside the unit circle.

A vector Wold moving average representation of Xt is then

Xt=C(L)ηt

where ηt is an (n×1) vector white noise that is “fundamental” for Xt.

That is, XtE^[XtXt1,Xt2]=C0ηt.

The optimum predictor of Xt+j is

E^[Xt+jXt,Xt1,]=[C(L)Lj]+ηt

If C(L) is invertible, i.e., if the zeros of det C(z) lie strictly outside the unit circle, then this formula can be written

E^[Xt+jXt,Xt1,]=[C(L)LJ]+C(L)1Xt