Machined Learnings: July 2020

If you need to solve a convex optimization problem nowadays, you are in great shape. Any problem of the form $$
\begin{alignat}{2}
&\!\inf_z & \qquad & f(z) \\
& \text{subject to} & & h(z) = 0 \\
& & & g(z) \preceq 0
\end{alignat}
$$ where $f$ and $g$ are convex and $h$ is affine can be attacked by several excellent freely available software packages: my current favorite is cvxpy, which is a joy to use. If you have a lot of variables and not a lot of constraints, you can instead solve a dual problem. It ends up looking like $$
\begin{alignat}{2}
&\!\sup_{x} & \qquad & L(x) \\
& \text{subject to} & & \tilde{h}_x(x) = 0 \\
& & & \tilde{g}_x(x) \preceq 0
\end{alignat}
$$ where $L$ is concave, assuming that you get lucky and can analytically eliminate all the primal variables $z$ such that only the dual variables $x$ remain. But what if you can't eliminate all the primal variables, but only most of them? You might end up with a problem like $$
\begin{alignat}{2}
&\!\sup_{x} \inf_y & \qquad & L(x, y) \\
& \text{subject to} & & \tilde{h}_x(x) = 0 \\
& & & \tilde{g}_x(x) \preceq 0 \\
& & & \tilde{h}_y(y) = 0 \\
& & & \tilde{g}_y(y) \preceq 0
\end{alignat}
$$ where $\tilde{g}_x$ and $\tilde{g}_y$ are convex, and $\tilde{h}_x$ and $\tilde{h}_y$ are affine, $L(x, \cdot)$ is convex in $y$ given fixed $x$, and $L(\cdot, y)$ is concave in $x$ given fixed $y$. It feels like this problem should be easier to solve than the original problem if many primal variables have been analytically eliminated. Unfortunately, none of my favorite convex optimization toolkits will accept a problem of this form. This is despite the viability of interior-point methods for such problems. Bummer.

One thing I tried was to solve the inner infimum using a standard toolkit, compute the gradient of the solution wrt the outer parameters via the sensitivity map, and then use a first-order method for the outer supremum. This did not work for me: it works for toy problems but on real problems the outer supremum has very slow convergence suggesting ill-conditioning.

What I need is the power of interior-point methods to handle ill-conditioning via second-order information. I'm able to achieve this via sequential quadratic minimax programming: first, locally approximate the objective $L(\lambda, \mu, y)$ with a quadratic around the current point and linearize the constraints. $$
\begin{alignat}{2}
&\!\sup_{\delta x} \inf_{\delta y} & \qquad & \frac{1}{2} \left(\begin{array}{c} \delta x \\ \delta y \end{array}\right)^\top \left(\begin{array}{cc} P_{xx} & P_{yx}^\top \\ P_{yx} & P_{yy} \end{array}\right) \left(\begin{array}{c} \delta x \\ \delta y \end{array} \right) + \left(\begin{array}{c} q_x \\ q_y \end{array} \right)^\top \left(\begin{array}{c} \delta x \\ \delta y \end{array} \right) \\
& \text{subject to} & & \left(\begin{array}{cc} A_x & 0 \\ 0 & A_y \end{array} \right) \left(\begin{array}{c} \delta x \\ \delta y \end{array}\right) = \left(\begin{array}{c} b_x \\ b_y \end{array}\right) \\
& & & \left(\begin{array}{cc} G_x & 0 \\ 0 & G_y \end{array} \right) \left(\begin{array}{c} \delta x \\ \delta y \end{array}\right) \preceq \left(\begin{array}{c} h_x \\ h_y \end{array}\right) \\
\end{alignat}
$$ The Wolfe dual converts this problem into a standard QP: $$
\begin{alignat}{2}
&\!\sup_{\delta x, \delta y, \lambda, \mu} & \qquad & \frac{1}{2} \left(\begin{array}{c} \delta x \\ \delta y \\ \lambda \\ \mu \end{array}\right)^\top \left(\begin{array}{cccc} P_{xx} & 0 & 0 & 0 \\ 0 & -P_{yy} & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array}\right) \left(\begin{array}{c} \delta x \\ \delta y \\ \lambda \\ \mu \end{array}\right) + \left(\begin{array}{c} q_x \\ 0 \\ -b_y \\ -h_y \end{array} \right)^\top \left(\begin{array}{c} \delta x \\ \delta y \\ \lambda \\ \mu \end{array} \right) \\
& \text{subject to} & & \left(\begin{array}{cc} A_x & 0 & 0 & 0 \\ P_{yx} & P_{yy} & A_y^\top & G_y^\top \end{array} \right) \left(\begin{array}{c} \delta x \\ \delta y \\ \lambda \\ \mu \end{array}\right) = \left(\begin{array}{c} b_x \\ -q_y \end{array}\right) \\
& & & \left(\begin{array}{cc} G_x & 0 & 0 & 0 \\ 0 & 0 & 0 & -I \end{array} \right) \left(\begin{array}{c} \delta x \\ \delta y \\ \lambda \\ \mu \end{array}\right) \preceq \left(\begin{array}{c} h_x \\ 0 \end{array}\right) \\
\end{alignat}
$$ If you solve this for $(\delta x, \delta y)$ you get a Newton step for your original problem. The step acceptance criterion here is tricky: if the iterate is feasible you want to leverage the saddle point condition (see equation (11) of Essid et. al.). If the iterate is infeasible more sophistication is required, but fortunately my constraints were actually linear so doing an initial exact inner solve allowed me to iterate while staying feasible. (Note: if you solve a more general convex problem on each step, you don't need to linearize the $x$ constraints.)

YMMV!

Machined Learnings

Friday, July 17, 2020

Convex-concave games off the shelf