Cool graph, but where’d the data come from?

by bledsoe on December 29, 2008

Below is a nifty graph which I first saw displayed in a post by Clay Shirky at boingboing.

S&P Returns since 1825

The graph shows the annual return on the S&P stock index for every year going back to 1825.  The heavy black line in the middle represents a zero return; if a year is listed to the right of the black line then the S&P index yielded a positive return for that year, and if it's listed to the left, then it had a negative return for that year.  The little boxes that contain the years are stacked in groups of 10%.  For example, the year 2006 (at the top of the second highest stack) had a total annual return somewhere between 10% and 20%, while the year 2000 (at the top of the third highest stack) had a return somewhere between 0% and -10%.  And yes, 2008 is all the way over to the left.

This type of graph is called a histogram, and is a fairly common way of displaying these kinds of data, which are called "frequencies."  The horizontal axis of the graph represents a range of intervals (in this case, stock returns expressed in percents), and the vertical axis merely shows the number of years which fell into each range.  One of the things I like about histograms is that they can translate a bunch of confusing numbers into a picture which makes a very clear point.  In this case, just a quick look at the histogram makes it clear that for the past 183 years, the number of years in which the S&P index had a positive return is greater than the number of years in which the index had a negative return.  (Here's another histogram that makes this point in a slightly different way.)

[The graph shown above is actually a slightly more readable version of a graph originally prepared by a group called "Value Square Asset Management" of Yale University though I couldn't figure out exactly who this group is, and it seems I wasn't the only one.]

Looking into this graph a little more, I discovered that the returns for each year are not calculated based on the same data.  For the years from 1957 to 2008, the returns are those calculated for the S&P 500.  Prior to 1957 the S&P 500 did not exist, but from 1923 to 1957, Standard and Poor's did have another index called the S&P 90, which was based on 90 stocks rather than 500.  The returns for the years 1825 to 1923 (almost 100 years of the 183 years in the graph) apparently come from a paper published in 2000 which you can download for free here.  (You can also pay $31.50 for the same paper here, which I don't entirely understand.)

All of which says something else about the value of graphs.  While the whole point of most graphs is to visually summarize (and simplify) a bunch of data so as to present them in an understandable format, it's important to remember that any time you simplify something complicated (like stock market returns for 183 years), you're probably going to lose a lot of the details.  While trading off some complexity for clarity may in fact be a good trade, it's still a good idea to remember that not every detail about a set of data is going to show up in a graph.  I don't know enough about economics to know whether the differences in stock returns calculated for the three different time periods discussed above substantially compromises the value of this graph, but if you look at the graph and never ask where the data come from, you don't have the opportunity to consider it.

Previous post:

Next post: