Thibault Randrianarisoa (University of Toronto) : Deep Vecchia Gaussian processes
Séminaire « Probabilités et Statistique »Deep Gaussian processes (DGPs) have drawn a lot of attention in recent years. By stacking Gaussian processes into layers, they give you the flexibility of a deep model while keeping the uncertainty estimates that GPs are known for. But they also inherit, and even worsen, the same computational bottleneck that limits ordinary GPs, namely the cubic cost of inverting large covariance matrices. On the other hand, Vecchia approximation scales up computation by introducing sparsity into the spatial dependency structure of GPs, represented by a directed acyclic graph (DAG).
In this talk, I'll introduce Deep Vecchia Gaussian Processes, which bring together deep GPs and single-layer Vecchia GPs. The main idea is to apply the Vecchia approximation to the layer-wise mappings themselves, rather than to the intermediate states during training as is usually considered. That one change sidesteps the random parent-set problem that has long tripped up earlier approaches.
I'll then show that, used as a prior, the model gives valid Bayesian inference while staying computationally scalable and achieving minimax-optimal contraction rates over a broad class of composite functions.