# Finding a common distribution - Z-scores

Variable measurement uses units that are based on a raw scale. For example, distances can be measured in feet and inches, or in centimeters and meters. The use of these raw units of measurement in statistical analysis presents two challenges. One challenge is that of comparing and testing for differences when data set distributions are different. The other challenge is that of comparing and testing for relationships when the raw scale units of measurement are different. Each of these challenges and the statistical responses are discussed below.

First, **two sets of data may have different distributions**. One may have a wide range of values while
the second is very narrowly distributed. These differences in
distributions make it difficult to judge similarities or significant differences
among categories in the data sets.

Second, **two variables may have different units in the raw scale. ** In the distance example it is possible to convert from one scale to another.
However, as we want to look at more complex relationships between variables the
raw scale units may not be convertible. For example, the relationship
between a person's height and weight are two different units of measurement.

In both of these challenges, conversion of data point from raw-units to **standard deviation units** allows for statistical testing. **Z-scores are
the data label for standard deviation units.** Raw unit measures are
converted to standard deviation units. Z-scores are sometimes referred to
as "standardized" scores. Z-scores range from negative numbers
through 0 to positive numbers.

A z-score of 0 is the mean of the z-score distribution and it is equal to the mean of the raw-unit mean average. | |

Each data-point's z-score is that data point's raw score minus the mean average then divided by the the standard deviation |

**z-score _{1} = (raw-score_{1 } - MA) / SD**Where z-score

_{1}is the value for each z-score

raw-score

_{1}is the raw score value for each data point

MA is the mean average for the set of data points, and

SD is the standard deviation for the set of data points.

By converting to z-scores data sets are converted to a distribution that has know characteristics. This means that it is possible to calculate the relative probabilities of different scores and values for averages of scores. In so doing one can estimate the probability that two different categories of data-points come from the same population or from significantly different populations. It is this use of z-scores that allows one to use the statistical tables of areas of the Standard Normal Distributions to find the probability of higher and lower scores in a data set.

By converting to z-scores variables with different raw-units are converted to comparable distributions in which the averages are set as equal, and differences in distributions are adjusted to a common distribution -- the standard normal distribution.