IEEE Standard for Binary Floating-Point Arithmetic (IEEE 754) is the most widely-used standard for floating-point computation, and is followed by many CPU and FPU implementations. The standard defines formats for representing floating-point numbers and special values together with a set of floating-point operations that operate on these values. It also specifies four rounding modes and five exceptions (Michael L Overton).
2. How floating point numbers are stored in memory
An IEEE-754 float (4 bytes) or double (8 bytes) has three components (there is also an analogous 96-bit extended-precision format under IEEE-854): a sign bit telling whether the number is positive or negative, an exponent giving its order of magnitude, and a mantissa specifying the actual digits of the number. Using single-precision floats as an example, here is the bit layout: seeeeeeeemmmmmmmmmmmmmmmmmmmmmmm meaning
31 0 bit # s = sign bit, e = exponent, m = mantissa
In the internal representation, there is 1 bit for the sign (S), 8 bits for the exponent (E), and 23 bits for the mantissa (m). The number is stored as follows, with high memory to the right: Byte 0 Byte 1 Byte 2 Byte 3
00000000 11111100 22221111 33222222
76543210 54321098 32109876 10987654
FFFFFFFF FFFFFFFF EFFFFFFF SEEEEEEE
3. The difficulty of manipulating and using floating point numbers in c calculations
There are two reasons why a real number might not be exactly represented as a floating-point number. The most common situation is illustrated by the decimal number 0.1. Although it has a finite decimal representation, in binary it has an infinite repeating representation. Thus when β = 2, the number 0.1 lies strictly between two floating-point numbers and is exactly represented by neither of them (Cleve Moler).
Floating-point