Floating-Point Arithmetic

Lecturers

Claude-Pierre Jeannerod, Guillaume Melquiond, and Jean-Michel Muller.

Description

Calculations performed on a computer are built from the underlying arithmetic of the processors and thus, most often, floating-point (FP) arithmetic. In recent years, FP arithmetic has been undergoing a major and very interesting evolution, with the classical IEEE-754 specification being complemented with new formats and instructions, but also with alternative number systems. This evolution, which is mostly driven by new computational needs in artificial intelligence, is pushing towards the systematic use of very-low precision FP formats. While low-precision FP arithmetics have obvious performance advantages, they raise many questions when it comes to certifying the quality of the numerical computations that rely on them. Such correctness guarantees are becoming more and more necessary in many emerging application domains (such as autonomous vehicles), requiring in turn new approaches to the design and analysis of FP algorithms. This course will offer a timely and comprehensive treatment of all these very recent and exciting developments, covering in particular the following topics:

Some references