An Explanation of Statistical Tools from DocumentingExcellence.com
A consulting practice focusing on working with colleges', organizations', and individuals' utilization of quantitative and qualitative assessment tools to analyze and document their quality outcomes through providing staff development, research design and analysis, and psychometric evaluations.

Determining Cause and Effect

Statistical Control

Examining Unemployment and Crime

The following example illustrates some of the uses of correlation and regression as well as the importance of control for exogenous variables.  The example uses real data, but it is intended as an example, not a theoretically based explanation.  So let's start with a reasonable hypothesis.

bullet State a research hypothesis:  As unemployment rates rise the rate of crime will increase.

Now the researcher collects data.  In this case we can use data collected by governmental agencies.

  Illinois rates of crime by year  
Year Violent Crime rate Property crime rate Unemployment rate Year Violent Crime rate Property crime rate Unemployment rate
1975 670 5,033 8.5 1991 1039 5,093 6.8
1976 626 4,830 7.7 1992 977 4,788 7.5
1977 631 4,697 7.1 1993 960 4,658 6.9
1978 677 4,943 6.1 1994 961 4,665 6.1
1979 744 5,287 5.8 1995 996 4,460 5.6
1980 808 5,461 7.1 1996 890 4,430 5.4
1981 793 5,323 7.6 1997 861 4,280 4.9
1982 774 5,066 9.7 1998 808 4,051 4.5
1983 728 4,813 9.6 1999 690 3,825 4.2
1984 725 4,579 7.5 2000 654 3,585 4.0
1985 715 4,597 7.2 2001 637 3,461 4.7
1986 809 4,746 7.0 2002 602 3,420 5.8
1987 796 4,620 6.2 2003 556 3,288 6.0
1988 810 4,810 5.5 2004 546 3,174 5.5
1989 846 4,793 5.3 2005 552 3,080 5.1
1990 967 4,968 5.6        
bullet Next the researcher would test the hypothesis.  It appears that a simple correlation between crime rates and unemployment is appropriate.

Using the Excel function correl() the correlation between Violent Crimes 100,000 and Unemployment is .058.  This appears to be a very weak, possibly a non-significant relationship.  Consulting a table of Critical Values for rho (the name given to this statistic), the researcher finds that this correlation is NOT STATISTICALLY SIGNIFICANT.  (It is not our intention to provide full instructions on calculating this statistic, nor on how to test for significance.  The reader is directed to any general statistical textbook.)

bullet Test an alternative hypothesis.  In this case let's look at Property Crimes per 100,000 (PC) and unemployment (U).  Once again a correlation is calculated.

In this case the correlation between PC and U is .577.  This is much stronger.  Consulting that critical value table for rho the research find that the correlation is significant.Next let's look at these relationships.  In the following graph that scatter plot shows each of the 31 sets of data. 

An Excel procedure has calculated and drawn a linear regression line and given us the equation in the upper right corner of the graph.  From it we can see that in general as unemployment increases so does the rate of property crime.  The regression formula in the top right corner of the graph indicates that the best estimate of property crime is 270.08 times the unemployment rate.  In other words, for every 1% increase in unemployment, property crime appears in increase by 270 crimes per 100,000 people in the population. 

When the correlation of .577 is squared the researcher finds that about 33% of the variance is shared between the two variables.  Sometimes this shared variance is called "explained variance."   Thus, the research claims that 33% of the changes in property crime rates can be attributed to changes in the unemployment rate.  (A part of this interpretation rests with some logical assumptions, not statistical rational.)

bullet Our researcher (R1) publishes, and waits.  One knows that once put out in public some other researcher will come along to test ones findings.
bullet And so it comes,  a second researcher (R2) says "Yes, but...."

R2 presents a different graph.  In this graph the Unemployment and Property Crime data are graphed as they chronologically occurred.

It is very clear that there has been a major trend down in property crime and unemployment from 1975 to 2005.  The three upward peaks in the unemployment rate (pink line) are not reflected in the property crime rates.  Both rates have declined more or less together across the years.  The correlation between the rates is the result of this shared patter over time.So here then is an alternative explanation.  The original hypothesis must now be rejected.  However, note that the correlations are still very true and accurate.  But, the issue is that other exogenous variables were not included.

R2 published the following summary from a SPSS regression procedure.

Regression

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

95% Confidence Interval for B

B

Std. Error

Beta

Lower Bound

Upper Bound

(Constant)

129615.530

19891.253

 

6.516

.000

88870.145

170360.916

Year

-62.869

9.857

-.860

-6.378

.000

-83.062

-42.677

Unemployment rate

-4.265

63.053

-.009

-.068

.947

-133.424

124.893

a Dependent Variable: Property crime rate

The following summarize how to interpret these findings.
bullet The constant (intercept, a) is a very large 129,615.5.  But remember that the year variable ended at 2005, so the intercept is over 2000 years prior.
bullet Unstandardized β for the year indicates a decline of 62.869 crimes.  While this does not sound a big change in raw units, move to the right and notice that the standardized coefficient is -.86.  This is a very strong decline.  While we have not talked about how to test significant, the next two columns "t" and "Sig." indicate that this variable is statistically significant.
bullet The unstandardized β for the unemployment rate is also very small.  As is the standardized β and the test for significance indicates that unemployment IS NOT STATISTICALLY significant.

Bottom line - statistical tests can only find "truth" within the specifications of the model provided.   In this example, the second graph showed a truth that the first graph did not indicate.  Statistics are strong tools, but they are not omnipotent (with my apologies to philosophers who will point out the logical fallacy of this statement).  Taking a moment to step back and look at patterns, and alternatives is an important part of research and model building.



Send mail to with questions or comments about this web site.
Copyright © 2012 by Peter T. Klassen, Ph.D. Principal, www.DocumentingExcellence.com
2 March, 2012