Tuesday, February 20 |
07:00 - 08:45 |
Breakfast ↓ Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |
09:00 - 09:45 |
Davide Murari: Improving the robustness of Graph Neural Networks with coupled dynamical systems ↓ Graph Neural Networks (GNNs) have established themselves as a key component in addressing diverse graph-based tasks, like node classification. Despite their notable successes, GNNs remain susceptible to input perturbations in the form of adversarial attacks. In this talk, we present a new approach to fortify GNNs against adversarial perturbations through the lens of coupled contractive dynamical systems. (TCPL 201) |
09:45 - 10:30 |
Eldad Haber: Time dependent graph neural networks ↓ Graph Neural Networks (GNNs) have demonstrated remarkable success in modeling complex relationships in graph-structured data. A recent innovation in this field is the family of Differential Equation-Inspired Graph Neural Networks (DE-GNNs), which leverage principles from continuous dynamical systems to model information flow on graphs with built-in properties such as feature smoothing or preservation. However, existing DE-GNNs rely on first or second-order temporal dependencies. In this talk, we propose a neural extension to those pre-defined temporal dependencies. We show that our model, called TDE-GNN, can capture a wide range of temporal dynamics that go beyond typical first or second-order methods, and provide use cases where existing temporal models are challenged. We demonstrate the benefit of learning the temporal dependencies using our method rather than using pre-defined temporal dynamics on several graph benchmarks. (TCPL 201) |
10:30 - 11:00 |
Group Photo / Coffee Break (TCPL Foyer) |
11:00 - 11:45 |
Melanie Weber: Representation Trade-Offs in Geometric Machine Learning ↓ The utility of encoding geometric structure, such as known symmetries, into machine learning architectures has been demonstrated empirically, in domains ranging from biology to computer vision. However, rigorous analysis of its impact on the learnability of neural networks is largely missing. A recent line of learning theoretic research has demonstrated that learning shallow, fully-connected neural networks, which are agnostic to data geometry, has exponential complexity in the correlational statistical query (CSQ) model, a framework encompassing gradient descent. In this talk, we ask, whether knowledge on data geometry is sufficient to alleviate the fundamental hardness of learning neural networks? We discuss learnability in several geometric settings, including equivariant neural networks, a class of geometric machine learning architectures that explicitly encode symmetries. Based on joined work with Bobak Kiani, Jason Wang, Thien Le, Hannah Lawrence, and Stefanie Jegelka. (Online) |
11:55 - 13:20 |
Lunch ↓ Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |
13:30 - 14:15 |
Geoffrey McGregor: Conservative Hamiltonian Monte Carlo ↓ Hamiltonian Monte Carlo (HMC) is a prominent Markov Chain Monte Carlo algorithm often used to generate samples from a target distribution by evolving an associated Hamiltonian system using symplectic integrators. HMC’s improved sampling efficacy over traditional Gaussian random walk algorithms is primarily due to its higher acceptance probability on distant proposals, thereby reducing the correlation between successive samples more effectively and thus requiring fewer samples overall. Yet, thin high density regions can occur in high dimensional target distributions, which can lead to a significant decrease in the acceptance probability of HMC proposals when symplectic integrators are used. Instead, we introduce a variant of HMC called Conservative Hamiltonian Monte Carlo (CHMC), which utilizes a symmetric R-reversible second-order energy-preserving integrator to generate distant proposals with high probability of acceptance. We show that CHMC satisfies approximate stationarity with an error proportional to the integrator’s accuracy order. We also highlight numerical examples, with improvements in convergence over HMC, persisting even for large step sizes and narrowing widths of high density regions. This work is in collaboration with Andy Wan. (TCPL 201) |
14:15 - 15:00 |
Wu Lin: (Lie-group) Structured Inverse-free Second-order Optimization for Large Neural Nets ↓ Optimization is an essential ingredient of machine learning. Many optimization problems can be formulated from a probabilistic perspective to exploit the Fisher-Rao geometric structure of a probability family. By leveraging the structure, we can design new optimization methods. A classic approach to exploiting the Fisher-Rao structure is natural-gradient descent (NGD). In this talk, we show that performing NGD on a Gaussian manifold recovers Newton's method for unconstrained optimization, where the inverse covariance matrix is viewed as a preconditioning matrix. This connection allows us to develop (Lie-group) structured second-order methods by reparameterizing a preconditioning matrix and exploiting the parameterization invariance of natural gradients. We show applications where we propose structured matrix-inverse-free second-order optimizers and use them to train large-scale neural nets with millions of parameters in half precision settings. (TCPL 201) |
15:00 - 15:30 |
Coffee Break (TCPL Foyer) |
15:30 - 16:15 |
Molei Tao: Optimization and Sampling in Non-Euclidean Spaces ↓ Machine learning in non-Euclidean spaces have been rapidly attracting attention in recent years, and this talk will give some examples of progress on its mathematical and algorithmic foundations. I will begin with variational optimization, which, together with delicate interplays between continuous- and discrete-time dynamics, enables the construction of momentum-accelerated algorithms that optimize functions defined on manifolds. Selected applications, namely a generic improvement of Transformer, and a low-dim. approximation of high-dim. optimal transport distance, will be described. Then I will turn the optimization dynamics into an algorithm that samples probability distributions on Lie groups. If time permits, the efficiency and accuracy of the sampler will also be quantified via a new, non-asymptotic error analysis. (TCPL 201) |
16:15 - 17:00 |
Melvin Leok: The Connections Between Discrete Geometric Mechanics, Information Geometry, Accelerated Optimization and Machine Learning ↓ Geometric mechanics describes Lagrangian and Hamiltonian mechanics geometrically, and information geometry formulates statistical estimation, inference, and machine learning in terms of geometry. A divergence function is an asymmetric distance between two probability densities that induces differential geometric structures and yields efficient machine learning algorithms that minimize the duality gap. The connection between information geometry and geometric mechanics will yield a unified treatment of machine learning and structure-preserving discretizations. In particular, the divergence function of information geometry can be viewed as a discrete Lagrangian, which is a generating function of a symplectic map, that arise in discrete variational mechanics. This identification allows the methods of backward error analysis to be applied, and the symplectic map generated by a divergence function can be associated with the exact time-h flow map of a Hamiltonian system on the space of probability distributions. We will also discuss how time-adaptive Hamiltonian variational integrators can be used to discretize the Bregman Hamiltonian, whose flow generalizes the differential equation that describes the dynamics of the Nesterov accelerated gradient descent method. (TCPL 201) |
17:00 - 17:45 |
Free Discussion (TCPL Foyer) |
17:55 - 19:30 |
Dinner ↓ A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building. (Vistas Dining Room) |