Introduction

Consider the following optimization problem, where the goal is to find the optimal input signal \(\mathbf{u}(t)\in\mathbb{R}^{m}\) for \(t\in[0, T]\): \[ \begin{align} & \min_{\mathbf{u}[0,T]} \Big( \int_{0}^{T}L(\mathbf{x}(s),\mathbf{u}(s),s)ds + \phi(\mathbf{x}(T), T) \Big) \\ & \text{s.t. } ~~ \mathbf{\dot{x}}(s)=\mathbf{f}(\mathbf{x}(s),s) ~~ \forall s\in[0,T] \\ & \text{I.C.} ~~ \mathbf{x}(0)=\mathbf{x}_0 \end{align} \]
In this equation, \(\mathbf{x}(\cdot):\mathbb{R}\rightarrow\mathbb{R}^{n}\) is the state trajectory; \(\mathbf{u}[0,T]:[0,T]\rightarrow \mathbb{R}^{m}\); \(\mathbf{\dot{x}}(s)=\mathbf{f}(\mathbf{x}(s),s)\) denotes the dynamics of a nonlinear non-autonomous system.

The Hamilton-Jacobi-Bellman (HJB) equation provides the solution for this optimization problem.



Cost-to-go and the HJB equation

We define a scalar function, \(V(\mathbf{x}(t),t)\), called the cost-to-go* function for \(t\in[0,T]\): \[ V(\mathbf{x}(t),t) = \min_{\mathbf{u}[t,T]} \Big( \int_{t}^{T}L(\mathbf{x}(s),\mathbf{u}(s),s)ds + \phi(\mathbf{x}(T), T) \Big) \] Note that for \(t=T\), then \(V(\mathbf{x}(T),T)=\phi(\mathbf{x}(T), T)\).

Based on simple algebra, we divide the integral from \([t,T]\) into two regions of \([t,t+h]\) and \([t+h,T]\), and the minimization over \(\mathbf{u}[0,T]\) is divided: \[ \begin{align} V(\mathbf{x}(t),t) &= \min_{\mathbf{u}[t,T]} \Bigg( \int_{t}^{t+h}L(\mathbf{x}(s),\mathbf{u}(s),s)ds + \int_{t+h}^{T}L(\mathbf{x}(s),\mathbf{u}(s),s)ds + \phi(\mathbf{x}(T), T) \Bigg) \\ &= \min_{\mathbf{u}[t,t+h]}\Bigg( \int_{t}^{t+h}L(\mathbf{x}(s),\mathbf{u}(s),s)ds + \underbrace{ \min_{\mathbf{u}[t+h,T]} \Big[ \int_{t+h}^{T}L(\mathbf{x}(s),\mathbf{u}(s),s)ds + \phi(\mathbf{x}(T), T) \Big] }_{ = V(\mathbf{x}(t+h),t+h)}\Bigg) \\ &= \min_{\mathbf{u}[t,t+h]}\Bigg( \int_{t}^{t+h}L(\mathbf{x}(s),\mathbf{u}(s),s)ds + V(\mathbf{x}(t+h),t+h) \Bigg) \end{align} \] Now we send \(h\rightarrow 0\), then \(\min_{\mathbf{u}[t,t+h]} \rightarrow \min_{\mathbf{u}(t)\in\mathbb{R}^{m}}\), i.e., an optimization over a function is changed to an optimization over a point. Also, using first-order Taylor approximation: \[ \begin{align} V(\mathbf{x}(t+h),t+h) &\approx V(\mathbf{x}(t),t) + \frac{\partial V}{\partial t} h + \frac{\partial V}{\partial \mathbf{x}} ( \mathbf{x}(t+h) - \mathbf{x}(t)) \approx V(\mathbf{x}(t),t) + \frac{\partial V}{\partial t} h + \frac{\partial V}{\partial \mathbf{x}}\cdot \mathbf{\dot{x}}(t)h \\ &=V(\mathbf{x}(t),t) + \frac{\partial V}{\partial t} h + \frac{\partial V}{\partial \mathbf{x}} \cdot \mathbf{f}(\mathbf{x}(t), \mathbf{u}(t),t) h \end{align} \] Summarizing: \[ \begin{align} V(\mathbf{x}(t),t) &= \min_{\mathbf{u}(t)} \Big( L(\mathbf{x}(t),\mathbf{u}(t),t) + V(\mathbf{x}(t),t) + \frac{\partial V}{\partial t} h + \frac{\partial V}{\partial \mathbf{x}} \cdot \mathbf{f}(\mathbf{x}(t), \mathbf{u}(t),t) h \Big) \\ 0 &= \min_{\mathbf{u}(t)} \Big( L(\mathbf{x}(t),\mathbf{u}(t),t) + \frac{\partial V}{\partial t} + \frac{\partial V}{\partial \mathbf{x}} \cdot \mathbf{f}(\mathbf{x}(t), \mathbf{u}(t),t) \Big) \end{align} \] This is the Hamilton-Jacobi-Bellman equation, which is a differential equation with respect to function \(V\), and the boundary condition at \(t=T\) is \(V(\mathbf{x}(T),T)=\phi(\mathbf{x}(T), T)\). Since the boundary condition is at \(t=T\), dynamic programming is regarded as solving backward.



# Linear-Quadaric Regulator Controller The solution for the Linear-Quadratic Regulator (LQR) controller is naturally derived.