An experiment on the distribution of data
- Find a list of at least 20 numbers (for example, a table of
properties of the elements in a chemistry handbook).
- These should be empirical data, not artificially constructed numbers
such as phone numbers.
- They should span several orders of magnitude, not be tightly bunched
like student GPRs.
- On an axis or a ruler, plot the numbers with the first digit
removed. (Ignore the location of the decimal point.)
The distribution should be roughly uniform, since the second digit is
completely random.
- Plot the numbers intact. (Ignore the location of the
decimal point.) You should find them strongly bunched to the low end
(e.g., many more numbers begin with 1 than with 9).
- Plot the intact numbers on a logarithmic scale.
(Ignore the location of the decimal point.)
The distribution should now be uniform!
- This result makes sense if you think about what happens when you put
the decimal point back in.
The number of data points between 10 and 11 ought to be roughly the same
as the number between 9 and 10, right? But if the data really span
several orders of magnitude randomly, then the number between 10 and 11
should be about the same as the number between 1.0 and 1.1, and that is
far fewer than the total number between 1 and 2.