While fixed point binary offers a simple approach to representing real numbers, the range of numbers that we can represent is limited. A more common, but slightly more complex approach is the use of floating point binary.

Floating point binary representation uses a method that is very similar to standard form. This is a method of representing very large or very small decimal numbers without having to write out lots of zeroes.

Standard Form

Lets begin by considering quite a big number, take the decimal number sixteen quintillion. If we were to write this number in decimal there would be 16 followed by eighteen zeroes.

Now lets think about a small value, a hydrogen atom is about one twenty-billionth of a millimetre wide. Writing that down in decimal needs a zero, a decimal point followed by ten zeroes and then a five.

Generally we try to avoid writing numbers like this because it takes quite a long time to write, uses up a lot of space on the page to write all those zeroes and it is quite easy to unintentionally miss or add a zero.

Standard form is a method we use to represent these sorts of numbers in a simpler way:

  • Sixteen quintillion can be represented as 1.6 x 1019
  • One twenty billionth can be represented as 5 x 10-11
An animated GIF showing the value 1.6 times ten to the power of nineteen. The decimal point is being shown "hopping2 to the right and leaving zeroes in it's trail until the value is expanded to 1600000000000000000.0

To make the standard form representation of a number, present a short decimal number (the “mantissa”) and then we state how many places we need to move the decimal point, (the “exponent”) to achieve the original number.

Moving the decimal point n places to the right involves multiplying by 10+n moving the decimal point n places to the left involves multiplying by 10-n.

Decimal to Floating Point

Floating point is a way of representing real numbers in binary by making use of this method, the difference is that instead of the process being based on ten, it is based on two!

Just like standard form, floating point numbers make use of a mantissa and an exponent. With floating point it is important that the number of bits that will be used for these values has been predetermined. The worked example we are about to use will use eight bits for the mantissa and four bits for the exponent.

Method for converting decimal to floating point binary

  • Write the value as a (twos complement) fixed point binary number.
  • Move the point to wherever it needs to go. This is the mantissa.
  • The number of “jumps” made by the point and direction is used to make the exponent.

For example, consider the decimal number 17.75, to express this value as a floating point binary value with an 8 bit mantissa and 4 bit exponent we need to:

  • Express +17.75 as a twos complement fixed point binary number. That’s 010001.11
  • Next, we need to move the decimal point six places to the left to give the mantissa, 0.1000111 (Note: When we “move” the decimal point, in reality we are performing a left or right binary shift)
  • Six places to the left is -6, in twos complement binary this is 1010

So 17.75 in floating point (with an 8 bit mantissa and 4 bit exponent) is:

  • Mantissa: 0.1000111
  • Exponent: 1010

(In the computer, these two values would be combined and stored as 010001111010)

Floating Point to Decimal

Consider the following floating point binary value:

  • 8 bit mantissa: 1.0001100
  • 4 bit exponent: 0111

Convert the mantissa value to decimal, that is -1 + (1/16)+(1/32) = -0.90625

Calculate the exponent value which is 7.

Remember, this is like standard form except it is based on two, so multiply the mantissa value by two to the power of the exponent.

-0.90625 x 27= -116

Normalised Floating Point

It is important for floating point numbers to be normalised. This is because it ensures that all of the bits in the mantissa are utilised to give as precise a value as possible and it also ensures that there is only one way possible to represent the number.

  • Normalised floating point numbers must only have a single bit before the point.
  • Positive value mantissas must begin with 0.1 and negative value mantissas must begin 1.0
    • It is not permissible for the value before the point to be the same as the value after the point. In other words, mantissas with 1.1xxxxxx or 0.0xxxxxx are not valid normalised floating point numbers.

When a floating point number is normalised, the point must be moved (either to the left or right) so that the value of the binary digits on either size of the point are different and there is only a single bit before the decimal point.

By admin