Note: The contents of this page were edited from the original for use in CS 31.
Binary is a number system which builds numbers from
elements called bits. Each bit can be
represented by any two mutually exclusive states. Generally,
when we write it down or code bits, we represent them with
1
and
0
. We also talk about them
being true and false, and the computer internally represents
bits with high and low voltages.
We build binary numbers the same way we build numbers in our traditional base 10 system. However, instead of a one's column, a 10's column, a 100's column (and so on) we have a one's column, a two's columns, a four's column, an eight's column, and so on, as illustrated below.
2... | 26 | 25 | 24 | 23 | 22 | 21 | 20 |
... | 64 | 32 | 16 | 8 | 4 | 2 | 1 |
For example, to represent the number 203 in base 10, we
know we place a 3
in the
1's
column, a
0
in the
10's
column and a
2
in the
100's
column. This is
expressed with exponents in the table below.
102 | 101 | 100 |
2 | 0 | 3 |
Or, in other words, 2 × 102 + 3 × 100 = 200 + 3 = 203. To represent the same thing in binary, we would have the following table.
27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 |
1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 |
That equates to 27 + 26 + 23+21 + 20 = 128 + 64 + 8 + 2 + 1 = 203.
The easiest way to convert between bases is to use a computer, after all, that's what they're good at! However, it is often useful to know how to do conversions by hand.
The easiest method to convert between bases is repeated division. To convert, repeatedly divide the quotient by the base, until the quotient is zero, making note of the remainders at each step. Then, write the remainders in reverse, starting at the bottom and appending to the right each time. An example should illustrate; since we are converting to binary we use a base of 2.
Quotient | Remainder | ||
---|---|---|---|
20310 ÷ 2 = | 101 | 1 | |
10110 ÷ 2 = | 50 | 1 | ↑ |
5010 ÷ 2 = | 25 | 0 | ↑ |
2510 ÷ 2 = | 12 | 1 | ↑ |
1210 ÷ 2 = | 6 | 0 | ↑ |
610 ÷ 2 = | 3 | 0 | ↑ |
310 ÷ 2 = | 1 | 1 | ↑ |
110 ÷ 2 = | 0 | 1 | ↑ |
Reading from the bottom and appending to the right
each time gives 11001011
,
which we saw from the previous example was 203.
To represent all the letters of the alphabet we would need at least enough different combinations to represent all the lower case letters, the upper case letters, numbers and punctuation, plus a few extras. Adding this up means we need probably around 80 different combinations.
If we have two bits, we can represent four possible
unique combinations (00 01 10
11
). If we have three bits, we can represent
8 different combinations.
8 bits gives us
28 =
256
unique representations, more than enough
for our alphabet combinations. We call a group of 8 bits a
byte. Guess what size a C
char
variable type is? One
byte.
Given that a byte can represent any of the values 0
through 256, anyone could arbitrarily make up a mapping
between characters and numbers. For example, a video card
manufacturer could decide that the value 10 represents
A
, so when value 10 is sent
to the video card it displays a capital 'A' on the
screen.
To avoid this happening, the American Standard Code for Information Interchange or ASCII was invented. This is a 7-bit code, meaning there are 27 or 128 available codes.
The range of codes is divided up into two major parts;
the non-printable and the printable. Printable characters
are things like characters (upper and lower case), numbers
and punctuation. Non-printable codes are for control, and
do things like make a carriage-return, ring the terminal
bell or the special NULL
code which represents nothing at all.
127 unique characters is sufficient for American English, but becomes very restrictive when one wants to represent characters common in other languages, especially Asian languages which can have many thousands of unique characters.
To alleviate this, modern systems are moving away from ASCII to Unicode, which can use up to 4 bytes to represent a character, giving much more room!
A single byte can represent a maximum value of 256, and thus to allow
for larger values, it's often useful to use multiple bytes when storing
numbers; hopefully your bank balance in dollars will need more range than
can fit into one byte! Most modern architectures are 32-bit (or 64-bit) computers. This
means they work with 4 (or 8) bytes at a time when processing and reading
or writing to memory. We often refer to a 4-byte group as a word; this is analogous to human
language where letters (bits) make up words in a sentence, except in
computing every word has the same size! The size of a C int
variable is (usually) 32 bits.
Computers easily deal with a lot of bytes; that's one of the things that makes them so useful!
We need a way to talk about large numbers of bytes, and a natural way is to use the "International System of Units" (SI) prefixes as used in most other scientific areas. So for example, kilo refers to 103 or 1000 units, as in a kilogram has 1000 grams.
1000 is a nice round number in base 10, but in binary
it is 1111101000
which is
not a particularly "round" number. However, 1024 (or
210) is
(10000000000
), and happens
to be quite close to the base ten meaning of kilo (1000 as
opposed to 1024).
Hence 1024 bytes became known as a kilobyte. The first mass market computer was the Commodore 64, so named because it had 64 kilobytes of storage.
Today, kilobytes of memory would be small for a wrist
watch, let alone a personal computer. The next SI unit is
"mega" for
106
.
As it happens,
220
is again close to the SI base 10 definition; 1048576 as
opposed to 1000000.
The units keep increasing by powers of 10; each time it diverges further from the base SI meaning.
210 | Kilobyte |
220 | Megabyte |
230 | Gigabyte |
240 | Terrabyte |
250 | Petabyte |
260 | Exabyte |
Hexadecimal refers to a base 16 number system. We use this in computer science for only one reason, it makes it easy for humans to think about binary numbers. Computers only ever deal in binary and hexadecimal is simply a shortcut for us humans trying to work with the computer.
So why base 16? Well, the most natural choice is base 10, since we are used to thinking in base 10 from our every day number system. But base 10 does not work well with binary -- to represent 10 different elements in binary, we need four bits. Four bits, however, gives us sixteen possible combinations. So we can either take the very tricky road of trying to convert between base 10 and binary, or take the easy road and make up a base 16 number system -- hexadecimal!
Hexadecimal uses the standard base 10 numerals, but adds
A B C D E F
which refer to
10 11 12 13 14 15
(note we
start from zero).
Traditionally, any time you see a number prefixed by
0x
this will denote a
hexadecimal number.
As mentioned, to represent 16 different patterns in binary, we would need exactly four bits. Therefore, each hexadecimal numeral represents exactly four bits. You should consider it an exercise to learn the following table off by heart.
Hexadecimal | Binary | Decimal |
---|---|---|
0 | 0000 | 0 |
1 | 0001 | 1 |
2 | 0010 | 2 |
3 | 0011 | 3 |
4 | 0100 | 4 |
5 | 0101 | 5 |
6 | 0110 | 6 |
7 | 0111 | 7 |
8 | 1000 | 8 |
9 | 1001 | 9 |
A | 1010 | 10 |
B | 1011 | 11 |
C | 1100 | 12 |
D | 1101 | 13 |
E | 1110 | 14 |
F | 1111 | 15 |
Of course there is no reason not to continue the pattern
(say, assign G to the value 16), but 16 values is an excellent
trade off between the vagaries of human memory and the number of
bits used by a computer (occasionally you will also see base 8
used, for example for file permissions under UNIX). We simply
represent larger numbers of bits with more numerals. For
example, a sixteen bit variable can be represented by 0xAB12,
and to find it in binary simply take each individual numeral,
convert it as per the table and join them all together (so
0xAB12
ends up as the 16-bit
binary number
1010101100010010
). We can use
the reverse to convert from binary back to hexadecimal.
We can also use the same repeated division scheme to change the base of a number. For example, to find 203 in hexadecimal
Quotient | Remainder | ||
---|---|---|---|
20310 ÷ 16 = | 12 | 11 (0xB) | |
1210 ÷ 16 = | 0 | 12 (0xC) | ↑ |
Hence 203 in hexadecimal is
0xCB
.