Developers often ask, "How many cases do I need in order to do an good assessment of my tool?
Answering is not easy since the number of cases needed for assessment depends on the variation of answers within the population and sample. But here are a few possible responses.
| If you have every case in a population, no minimum number is required. Any summary and differences is significant, and no inferential statistics are required, so no minimum number of cases is required. But, very seldom do we have all of a population. | |
| Inferential statistics are based on a normal distribution. The differences between sample analysis and population parameters tends to decrease as sample size increases. It is inappropriate to analyze very-small samples (~<25) unless they are the population. Small samples (25 - 100) may be evaluated with care. Samples of 100 to several thousands are foundations for a solid analysis. | |
| Your can never have too many cases. This tends not to be both true and problematic. The larger the number of cases the stronger the generalizations that can be drawn. With descriptive statistics large data sets results in output with very small standard errors. These results are stronger than reports based on small numbers. However, with inferential statistics high numbers of cases tend to obscure judgments of significance and substance. With large data sets (>2500) all differences may be statistically significant in their differences while the substance of those differences is minimal. To avoid this paradox, draw samples (~1000) from lager data sets when running inferential tests. This sampling also allows for a confirmatory analysis based on a second independent sampling from the large data set. |

