Floating-Point Arithmetic
Lecturers
Claude-Pierre
Jeannerod, Guillaume
Melquiond, and Jean-Michel
Muller.
Description
Calculations performed on a computer are built from the
underlying arithmetic of the processors and thus, most often,
floating-point (FP) arithmetic. In recent years, FP arithmetic has
been undergoing a major and very interesting evolution, with the
classical IEEE-754 specification being complemented with new
formats and instructions, but also with alternative number
systems. This evolution, which is mostly driven by new
computational needs in artificial intelligence, is pushing towards
the systematic use of very-low precision FP formats. While
low-precision FP arithmetics have obvious performance advantages,
they raise many questions when it comes to certifying the quality
of the numerical computations that rely on them. Such correctness
guarantees are becoming more and more necessary in many emerging
application domains (such as autonomous vehicles), requiring in
turn new approaches to the design and analysis of FP algorithms.
This course will offer a timely and comprehensive treatment of all
these very recent and exciting developments, covering in
particular the following topics:
- Foundations of computer arithmetic: number systems,
arithmetic algorithms, hardware designs
- The latest (2019) revision of the IEEE-754 standard for FP
arithmetic, and the on-preparation revision
- Beyond IEEE-754: low-precision arithmetics and
tensor-processing units
- New error-analysis techniques: mixed-precision analysis,
probabilistic models, statistical error analysis
- Fast and accurate function evaluation: algorithms,
certification tools, and libm implementation
- Tools for computer-assisted analysis of numerical programs
Some references
- The
Mathematical-Function Computation Handbook, N. Beebe,
Springer, 2017.
- Computer
Arithmetic and Formal Proofs: Verifying Floating-point
Algorithms with the Coq System, S. Boldo and G. Melquiond,
ISTE Press, 2017.
- Handbook
of Floating-Point Arithmetic, J.-M. Muller et al.,
Birkhäuser, 2018.
- FP8 Formats for
Deep Learning, NVIDIA, ARM, Intel, 2022.
- Mixed
precision algorithms in numerical linear algebra, H.
Higham and T. Mary, Acta Numerica, 2022.
- Floating-Point Arithmetic, S.
Boldo, C.-P. Jeannerod, G. Melquiond, J.-M. Muller, Acta
Numerica, 2023.
- Arithmetic Formats for
Machine Learning, IEEE SA Working Group P3109, 2023-2024.
- Hardware Trends
Impacting Floating-Point Computations In Scientific
Applications, J. Dongarra et al., arXiv, 2024.