Borjan Geshkovski (Inria Paris, Sorbonne Université)

SeMath
Colloquium

Title: Some mathematical perspectives on Transformers
Abstract: Since their introduction in 2017, Transformers have revolutionized large language models and the broader field of deep learning. Central to this success is the groundbreaking self-attention mechanism. In this presentation, I’ll introduce a mathematical framework that casts this mechanism as a mean-field interacting particle system, revealing a desirable long-time clustering behavior. This perspective leads to a trove of fascinating questions with unexpected connections to Kuramoto oscillators, sphere packing, and Wasserstein gradient flows.

(Host: Michael Westdickenberg)

Back