## Symposium on Sparsity and Singular Structures 2024

### Minisymposia

#### Algebra in Sparsity and Singular Structures

##### Organizer: Daniel Robertz

Algebra plays a crucial role in various aspects of sparsity and singular structures. Symbolic algorithms generate exact descriptions, allow to trace parameter dependencies and reveal configurations exhibiting singular behavior. Algebraic methods are developed for classifying established structures, while new structures arise, e.g., in the study of phenomena in the natural sciences described by partial differential equations. These pose new challenges for algorithmic algebraic approaches.

## Nathalie Verdière (Wednesday, 10:30): It’s only a short step from symbolic computation to the estimation of parameters in dynamical systems!

Many phenomena, such as the evolution over time and space of epidemics, are described using dynamical systems. However, these models contain parameters that are not directly measurable by experimenters. Optimization problems are then set up to obtain their values. However, their resolution requires either an approximate value or knowledge of the feasible domain. In this presentation, we will propose a numerical method based on the use of functional differential relations obtained from differential algebra tools. The approach presented does not require any prior knowledge of their value. An application will be first given on a spatiotemporal model describing the spread of chikungunya disease. Then, other perspectives on the application of parameter estimation will be provided.

## Yassine El Maazouz (Wednesday, 11:10): The Bernoulli clock: probabilistic and combinatorial interpretations of the Bernoulli polynomials by circular convolution

We exhibit a novel elementary characterization of Bernoulli polynomials by circular convolution. Underlying this characterization, there is a combinatorial model we call the Bernoulli clock which relates Bernoulli numbers to permutation combinatorics. We use this model to show a central limit theorem. This talk is based on joint work with Jim Pitman.

## André Platzer (Wednesday, 11:50): Differential equation invariance axiomatization

This talk proves the completeness of an axiomatization for differential equation invariants described by Noetherian functions. First, the differential equation axioms of differential dynamic logic are shown to be complete for reasoning about analytic invariants. Completeness crucially exploits differential ghosts, which introduce additional variables that can be chosen to evolve freely along new differential equations. Cleverly chosen differential ghosts are the proof-theoretical counterpart of dark matter. They create new hypothetical state, whose relationship to the original state variables satisfies invariants that did not exist before. The reflection of these new invariants in the original system then enables its analysis.

An extended axiomatization with existence and uniqueness axioms is complete for all local progress properties, and, with a real induction axiom, is complete for all semianalytic invariants. This parsimonious axiomatization serves as the logical foundation for reasoning about invariants of differential equations where properties of the global behavior of a dynamical system can be analyzed solely from the logic of their local change without having to solve the dynamics. Indeed, it is precisely this logical treatment that enables the generalization of completeness to the Noetherian case. Moreover, the approach is purely axiomatic, and so the axiomatization is suitable for sound implementation in foundational theorem provers.

This talk is based on joint work with Yong Kiam Tan, LICS 2018, JACM 2020.

Speakers:

- Nathalie Verdière
- Yassine El Maazouz
- André Platzer

#### Analysis and Numerics of Nonlinear and Singular PDEs

##### Organizers: Christof Melcher, Liwei Zhang

Nonlinear and singular phenomena in PDEs are commonly seen in problems from geometry and models of mathematical physics. Their thorough understanding and numerical simulation require the combination of different tools and concepts at the interface between mathematical and numerical analysis. This minisymposium will bring together young researchers to explore the intricate mathematical foundations and computational techniques essential for tackling complex nonlinear and singular PDEs.

## Geneviève Dusson (Wednesday, 16:00): A multipoint perturbation formula for eigenvalue problems

In standard perturbation theory, one obtains approximate solutions, i.e. eigenmodes of a given Hamiltonian, in the neighbourhood of one known solution, typically the eigenmodes of a reference Hamiltonian. In this talk, I will present a way to exploit simultaneously several nearby solutions in order to obtain better approximations than in standard perturbation theory. I will illustrate how this can be used to accelerate molecular dynamics simulations.

## Yvonne Alama Bronsard (Wednesday, 16:25): Symmetric approximations to nonlinear dispersive equations in non-smooth regimes

This talk deals with the numerical approximation to nonlinear dispersive equations, such as the prototypical nonlinear Schrödinger equation. We introduce novel integration techniques allowing for the construction of schemes which perform well both in smooth and non-smooth settings. We obtain symmetric low-regularity schemes with very good structure preserving properties over long times.

#### Gaspar Kemlin (Wednesday, 16:50): Numerical simulation of the Gross-Pitaevskii equation via vortex-tracking

This talk deals with the numerical simulation of the Gross--Pitaevskii (GP) equation, for which a well-known feature is the appearance of quantized vortices with size of an order parameter ε. Without a magnetic field and with suitable initial conditions, these vortices interact, in the singular limit ε=0, through an explicit Hamiltonian dynamic. Using this analytical framework, we will present a numerical strategy based on the reduced-order Hamiltonian system to efficiently simulate the infinite-dimensional GP equation for small but finite ε. This method allows us to avoid numerical stability issues in solving the GP equation, where small values of ε typically require very fine meshes and time steps. We will also provide a mathematical justification of our method, in terms of good supercurrent approximation, as well as some numerical experiments.

## Thiago Carvalho Corso (Wednesday, 17:15)

In this talk, we discuss the Dyson equation for the density-density response function (DDRF) that plays a central role in linear response time-dependent density functional theory (LR-TDDFT). First, we present a functional analytic setting that allows for a unified treatment of the Dyson equation with general adiabatic approximations for discrete (finite and infinite) and continuum systems. In this setting, we provide a representation formula for the solution of the Dyson equation in terms of an operator version of the Casida matrix. We then present some consequences of this representation formula such as a stability criteria for the solution and a characterization of the maximal meromorphic extension of its Fourier transform. The results presented here apply to widely used adiabatic approximations such as (but not limited to) the random phase approximation (RPA) and the adiabatic local density approximation (ALDA). In particular, these results show that neither of these approximations can shift the ionization threshold of the Kohn-Sham system.

## Muhammad Hassan (Wednesday, 17:40): The numerical analysis of the coupled cluster method for electronic structure calculations

A central problem in quantum chemistry is the computation of the lowest eigenvalue of the electronic Hamiltonian– an unbounded, self-adjoint operator acting on a Hilbert space of antisymmetric functions. The main difficulty in the resolution of this problem is the very high dimensionality of the eigenfunctions, being functions of 3N variables where N denotes the number of electrons in the system. A popular strategy to deal with this complexity is to use a *low-rank, non-linear* representation of the sought-after eigenfunction. Examples include the tensor-train-based DMRG algorithm and the coupled cluster method in which the ansatz is an exponential cluster operator acting on a judiciously-chosen reference state.

The goal of this talk is to present a new well-posedness and error analysis for the *single-reference* coupled cluster method. Under the minimal assumption that the sought-after eigenvalue is non-degenerate and the associated eigenfunction is not orthogonal to a chosen reference, we prove that the continuous coupled cluster equations are locally well-posed. Under some additional structural assumptions on the associated discretisation spaces, we prove that several classes of discrete coupled cluster equations are also locally well-posed, and we derive a priori and residual-based a posteriori error estimates.

This is joint work with Yvon Maday and Yipeng Wang (LJLL, Sorbonne Université).

Speakers:

- Yvonne Alama Bronsard
- Thiago Carvalho Corso
- Genevieve Dusson
- Muhammad Hassan
- Gaspard Kemlin

#### Kinetic Equations

##### Organizers: Manuel Torrilhon, Andrea Hanke

Modelling physical systems such as gas dynamics, radiative transport or plasmas requires solving kinetic equations, which can be computationally expensive. It is possible to reduce this effort by finding and exploiting sparsities within the system of equations, which can often be motivated from the underlying physics. Our minisymposium presents recent developments in this field of study.

#### Katharina Kormann (Thursday, 16:00): A parallel low-rank Vlasov solver

Continuum Vlasov simulations can be utilized for highly accurate modelling of fully kinetic plasmas. Great progress has been made recently regarding the applicability of the method in realistic plasma configurations. However, a reduction of the high computational cost that is inherent to fully kinetic simulations would be desirable, especially at high velocity space resolutions.

For this purpose, low-rank approximations can be employed. However, low-rank approximations are hard to parallelize on distributed memory systems. In this talk, I will present a parallel low-rank solver for the full six-dimensional electromagnetic Vlasov-Maxwell equations with a compression of the particle distribution function in velocity space. Special attention is paid to mass conservation and Gauss’s law. The low-rank Vlasov solver is applied to standard benchmark problems of plasma turbulence and magnetic reconnection and compared to the full grid method. It yields accurate results at significantly reduced computational cost.

#### Marlies Pirner (Thursday, 16:30): On multi-species BGK models: Modeling and convergence to equilibrium

In this talk we focus on kinetic equations for gas mixtures since in applications one often has to deal with mixtures instead of a single gas. In particular we consider an approximation of the Boltzmann equation, the Bathnagar-Gross-Krook (BGK) equation. This equation is used in many applications because it is very efficient in numerical simulations. In this talk, I mainly present extensions of these models to gas mixtures and provide an overview concerning modeling and theoretical results, especially convergence to equilibrium.

#### Pia Stammer (Thursday, 17:00): A hybrid dynamical low-rank approach for transport equations in radiation therapy

Dose predictions in radiation therapy require the solution of high-dimensional transport equations in a heterogeneous medium and with highly forward-peaked scattering. Despite the complexity of the problem, dose calculation errors of less than 2% are clinically recommended and computation times cannot exceed a few minutes. This often prohibits the use of exact grid- and moment-based numerical methods.

We propose a hybrid dynamical low-rank approach based on a multi-level collided uncollided split, i.e., the transport equation is split through a collision source method. Uncollided particles are described using an analytical ray tracer, facilitating the use of boundary conditions and accurate physics models, whereas collided particles are represented using moment method combined with the dynamical low-rank approximation. Here the energy is treated as a pseudo-time and a rank adaptive integrator is chosen to dynamically adapt the rank in energy. We discuss how to overcome limitations of the continuous slowing down approximation, the relation between the rank and overall error, as well as possible advantages of locally refined grids.

## Andrea Hanke (Thursday, 17:30): Landau and Boltzmann collision operators in the irreducible Burnett ansatz

Numerically solving the Boltzmann equation is computationally expensive in part due to the number of variables the distribution function depends upon. Another contributor to the complexity of the Boltzmann Equation is the the quadratic collision operator describing changes in the distribution function due to colliding particle pairs. Solving it as efficiently as possible has been a topic of recent research, e.g. [(Cai and Torrilhon, "On the Holway-Weiss debate: Convergence of the Grad-moment-expansion in kinetic gas theory", 2019), (Wang and Cai, "Approximation of the Boltzmann collision operator based on hermite spectral method", 2019), (Cai and Fan and Wang, "Burnett spectral method for the spatially homogeneous Boltzmann equation",2020].

Recently, we exploited results from representation theory to find a very efficient algorithm both in terms of memory and computational time for the evaluation of Boltzmann's quadratic collision operator [(Hanke and Torrilhon, "Representation Theory Based Algorithm to Compute Boltzmann's Bilinear Collision Operator in the Irreducible Spectral Burnett Ansatz Efficiently",2023)]. With this novel approach we are also able to provide a meaningful interpretation of its structure, leading us to the separation between purely mathematical operations, represented in the coupling tensor, and physically relevant coefficients, collected in the impact tensor.

In this talk we apply the Irreducible Burnett Ansatz not only to Boltzmann's collision operator, but also to the Landau operator, which describes grazing collisions, which occur e.g. in plasma. We will take a closer look at the structure of different impact tensors and explore how they imact the evolution of moments.

Speakers:

- Andrea Hanke
- Katharina Kormann
- Marlies Pirner
- Pia Stammer

#### Mathematics of Deep Learning

##### Organizers: El Mehdi Achour, Leonardo Galli

Despite the undeniable success of deep learning in an ever-increasing number of applications, the understanding of its mathematical foundations is still very limited. The lack of comprehension and guarantees undermines the reliability of these models and the application to safety-critical fields. The minisymposium focuses on various questions related to the mathematical foundations of deep learning by exploiting tools of optimization, approximation theory, probability theory, statistics, linear algebra, geometry and others.

#### Etienne Boursier (Tuesday 10:30): Training overparameterised networks: the early alignment phenomenon and its consequences

The training of neural networks with first order methods still remains misunderstood in theory, despite compelling empirical evidence. Not only it is believed that neural networks converge towards global minimisers, but the implicit bias of optimisation algorithms makes them converge towards specific minimisers with nice generalisation properties. This talk focuses on the early alignment phase that appears in the training dynamics of two layer networks with small initialisations. During this early alignment phase, the numerous neurons align towards a few number of key directions, hence leading to some sparsity in the number of represented neurons. Although we believe this phenomenon to be key in the implicit bias of gradient methods, it also has some serious drawbacks, e.g., being at the origin of convergence towards spurious local minima of the network parameters.

## Kathlén Kohn (Tuesday, 11:00): Understanding linear convolutional neural networks via sparse factorizations of real polynomials

This talk will explain that Convolutional Neural Networks without activation parametrize polynomials that admit a certain sparse factorization. For a fixed network architecture, these polynomials form a semialgebraic set. We will investigate how the geometry of this semialgebraic set (e.g., its singularities and relative boundary) changes with the network architecture. Moreover, we will explore how these geometric properties affect the optimization of a loss function for given training data. We prove that for architectures where all strides are larger than one and generic data, the non-zero critical points of the squared-error loss are smooth interior points of the semialgebraic function space. This property is known to be false for dense linear networks or linear convolutional networks with stride one.

This talk is based on joint work with Guido Montúfar, Vahid Shahverdi, and Matthew Trager.

#### Guido Montúfar (Tuesday, 11:30): Deep linear networks trained with the Bures-Wasserstein loss

We consider a deep matrix factorization model of covariance matrices trained with the Bures-Wasserstein distance. We characterize critical points and minimizers of the Bures-Wasserstein distance over the space of rank-bounded covariance matrices and establish convergence results for the deep model trained with gradient flow or gradient descent under certain assumptions on the initial weights. The talk is based on work with Pierre Brechet, Katerina Papagiannouli, Jing An.

#### Sebastian U. Stich (Tuesday, 12:00): On gradient methods for non-convex optimization

Gradient methods are widely used in machine learning, yet significant gaps remain between our theoretical understanding of them and their practical performance. In this talk, I will delve into two aspects:

(1) The role of the learning rate: The success of SGD is often attributed to stochastic noise in the optimization dynamics. However, we argue that stochastic noise alone cannot explain successful non-convex training, asserting that the learning rate plays a crucial role.

(2) Striving for a more intricate characterization of the complexity of minimizing smooth non-convex functions, we introduce a unique concept termed 'graded non-convexity'. This concept allows us to partition the class of non-convex problems into a nested chain of subclasses. Many traditional non-convex objectives fall within these subclasses, including partially convex problems, matrix factorizations, and neural networks. We will showcase methods that can exploit this structure.

Joint work with Amirkeivan Mohtashami, Nikita Doikov, and Martin Jaggi.

Speakers:

- Etienne Boursier
- Kathlén Kohn
- Guido Montúfar
- Sebastian U. Stich

#### Parametric Problems and Uncertainty

##### Organizers: Markus Bachmayr, Matthieu Dolbeault

In many physical problems, the quantity of interest is provided by the solution to a parametric PDE, whose parameters could account for variations in the coefficients of the equation, spatial heterogeneities, source terms, or boundary conditions. When these parameters are unknown or when statistical properties are sought for, one is led to consider each parameter as a random variable, yielding probabilistic estimates on the solution through uncertainty quantification. In both deterministic and random settings, sparse or low-rank structures can be exploited to approximate the solution as an expansion in the spatial and parametric variables.

## Karen Veroy-Grepl (Monday, 10:30): Model order reduction in the multi-scale materials setting

## Raúl Tempone (Monday, 11:00): Stochastic optimal control with exercise rate optimization and Markovian Projections: Pricing American options and importance sampling applications

#### Henrik Eisenmann (Monday, 11:30): A convergent adaptive finite element stochastic Galerkin method based on multilevel expansions

The subject of this work is a stochastic Galerkin method for second-order elliptic partial differential equations with random diffusion coefficients. It combines operator compression in the stochastic variables with adaptive finite-element approximation in the spatial variables. We provide a convergence analysis for the method. Numerical experiments illustrate optimal or close to optimal complexity.

#### Albert Cohen (Monday, 12:00): Optimal non-linear dimensionality reduction

Understanding how to optimally approximate general compact sets by finite dimensional spaces is of central interest for designing efficient inverse problems. While the concept of n-width, introduced in 1936 by Kolmogorov, is well tailored to linear methods, finding the correct analogous concept for nonlinear approximation is still the object of current research. In this talk, we shall discuss a general framework that allows to embrace various concepts of linear and nonlinear widths, present some recent results and relevant open problems.

Speakers:

- Albert Cohen
- Henrik Eisenmann
- Raúl Tempone
- Karen Veroy-Grepl

#### Sparsity and Related Structures in Signal Processing and Data Science

##### Organizers: Hartmut Führ, Michael Schaub

In this mini-symposium, the role of sparsity in the context of data processing will be explored. Sparsity and low-dimensional structures play an important role in the development of machine learning and data processing algorithms, as well as in the mathematical underpinnings of these methods. Exploiting the underlying mathematical problem structure is often necessary to enable effective computational solution in the first place.

#### Rémi Gribonval (Monday, 16:00): Sparsity, a value for the future in frugal learning?

Promoting sparse connections in neural networks is natural to control their complexity. Besides, given its thoroughly documented role in inverse problems and variable selection, sparsity also has the potential to give rise to learning mechanisms endowed with certain interpretability guarantees. Through an overview of recent explorations around this theme, I will compare and contrast classical sparse regularization for inverse problems with multilayer sparse regularization. During our journey, we will meet a rescaling-invariant lifting of the parameters of modern deep parameterizations. I will notably highlight how its properties can be leveraged to control the statistical generalization guarantees of trained networks, but also to characterize conservation laws of their training dynamics.

## Felix Voigtländer (Monday, 16:30): Approximation properties of complex-valued neural networks

## Christopher Morris (Monday, 17:00): Weisfeiler and Leman go sparse: Expressive and scalable graph neural networks

Recently, graph neural networks (GNNs) emerged as the dominant approach for machine learning on graph-structured data, leading to many successful real-world applications. However, our theoretical understanding of GNNs is lagging behind. Here, we derive connections between GNNs’ expressive power and the 1-dimensional Weisfeiler-Leman algorithm, a well-studied heuristic for the graph isomorphism problem, showing how sparsity controls the tradeoff between scalability and expressivity.

#### Konstantin Usevich (Monday, 17:30): Neural networks, flexible activation functions, and (multilayer) tensor decompositions

Neural networks are a fundamental tool for solving various machine learning tasks, such as supervised and unsupervised classification. In this talk, we are interested in neural networks with flexible activation functions (which can vary from the unit to unit) and discuss their relation to tensor decompositions. For this, we focus on the so-called decoupling approach which uses low-rank approximation of the tensor of derivatives (this framework, initially developed in the context of nonlinear system identification, has close connections to the active subspace approaches). In the decoupling framework, a single hidden layer corresponds to the well-known canonical polyadic tensor decomposition, which can be used for identification of the network structure. The multilayer case leads to a class of multilayer tensor decompositions, which are much less understood (unlike the single-layer case). We will report some recent results on algorithms for such tensor decompositions.

Speakers:

- Rémi Gribonval
- Christopher Morris
- Felix Voigtländer
- Konstantin Usevich

#### Sparsity in Optimization

##### Organizers: Christina Büsing, Arie Koster, Sonja Steffensen

Sparse variable selection or sparse network structures became increasingly important, as they arise in a natural way in applications such as compressed sensing, image and signal processing, machine learning and transport but also economic applications such as sparse portfolio selection. Here, sparsity impacts the a) the performance of the algorithms, b) the complexity of the problem and c) representation of solutions. In this minisymposium, we aim to give an insight into recent works coming from various fields of optimization which address different aspects of this particular subject.

#### Laura Vargas-Koch (Thursday, 10:30): Unsplittable flows in planar graphs

The single-source unsplittable flow (SSUF) problem asks to send flow in a digraph with capacities from a common source to different terminals with unrelated demands, each terminal being served through a single path. A seminal result of Dinitz, Garg, and Goemans showed that, whenever a fractional flow exists respecting the capacities, then there is an unsplittable one violating the capacities by at most the maximum demand. Goemans conjectured a very natural cost version of the same result, where the unsplittable flow is required to be no more expensive than the fractional one. So far there are no non-trivial graph classes for which it is known to hold. This type of problems is closely related to sparsity, since the unsplittable representation of a flow with one path per commodity is sparse compared to the fractional flow, that might use a lot of paths.

In the talk, we will see that a slight weakening of the Goeman's conjecture (with at most twice as large violations) holds for planar graphs. This result is based on a connection to a highly structured discrepancy problem, whose repeated resolution allows us to successively reduce the number of paths used for each terminal, until we obtain an unsplittable flow.

This is joint work with Vera Traub and Rico Zenklusen.

#### Clemens Thielen (Thursday, 11:10): Temporal shortest path interdiction

Network interdiction problems deal with the question of how the optimal objective value of an optimization problem on a graph or network (e.g., a shortest path or network flow problem) can be worsened as much as possible by removing a limited number of arcs. These problems represent an important research topic and have a variety of practical applications such as assessing the robustness of critical infrastructure networks or containing the spread of infectious diseases.

In many of these applications, however, the underlying graph is not static, but each arc is active only at a specific point in time. For example, train connections in a passenger transportation network are only available at specified departure times, and in the case of infectious disease spread, contacts between individuals exist only at specific times. The graphs that appear here, where the set of active arcs changes over time, are called temporal graphs in the literature.

In this talk, we connect network interdiction problems and temporal graphs by investigating the interdiction of shortest paths in temporal graphs. The resulting temporal shortest path interdiction problem generalizes one of the most important network interdiction problems to temporal graphs. It studies the question of how to remove a bounded number of arcs from a temporal graph such that the length of a shortest path between two given nodes is maximized. We characterize the complexity of the temporal shortest-path interdiction problem for various natural interpretations of "shortest" paths in temporal graphs (latest start path, earliest arrival path, shortest duration path, shortest traversal path). Although the static shortest path interdiction problem is known to be NP-hard, we show how a time-expanded network representation can be used to obtain polynomial-time algorithms for two of the four problem variants. Here, the resulting time-expanded network is "sparse" in the sense of containing much fewer arcs than time-expanded networks arising in related areas such as dynamic network flows.

#### Martin Comis (Thursday, 11:50): Multi-budgeted matching problems

The multi-budgeted matching problem (mBM) is a weighted matching problem with independent edge cost functions. For each cost function, a budget constraint requires the accumulated cost not to exceed a corresponding budget. We show that the mBM is strongly NP-hard on paths with uniform edge weights and budgets by a reduction from 3-SAT. Subsequently, we propose a dynamic program for series-parallel graphs with pseudo-polynomial run time for a fixed number of budget constraints. As an extension we show how this algorithm can be used to solve the mBM on trees using a graph transformation. Realizing that both these graph classes have a bounded treewidth in common, we introduce a dynamic program based on tree decompositions. This approach leads to a pseudo-polynomial algorithm for the mBM with fixed on graphs of bounded treewidth.

Speakers:

- Laura Vargas-Koch
- Clemens Thielen
- Martin Comis