Understanding the difference between Z statistics and T statistics for fair lending analysis can be frustrating, even more so when it comes to determining which test to use. Let’s take a more detailed look into this world of statistics.
A common question often asked is “what is the difference in Z stats and t stats, and which test should be used?” Let us start by understanding that both Z-tests and t-tests are used to compare results obtained from one group of observations with the results obtained from another group of observations, assuming that the results are normally distributed. The sizes of these groups are the primary reason why we have two types of tests available: Z and t.
In order to understand the key differences between Z and t-tests, let us first define a few terms:
- Population: The entire group of all relevant observations that are possible for a given outcome of interest is called the population.
- Sample: A selection of observations (usually, random) that is a subset of the population in order to study a specific outcome is called a sample.
- Paramaters: Statistical characteristics of the population including but not limited to mean, standard deviation, and proportion. Population parameters are fixed and are descriptive of that specific population. Typically, Greek letters such as μ (mean), σ (standard deviation), and ρ (proportion) or capital letters such as N (population size) are used to represent parameters. In a majority of the cases, the population parameters are hard to obtain due to size-related concerns and other data collection issues.
- Statistics: Statistical characteristics of a sample including but not limited to the mean, variance, proportion, or percentage. While statistics of the sample are fixed for that sample itself, they are not characteristic of the entire population. A different sample would result in different statistics. Therefore, if a researcher is using a sample in order to draw conclusions about the entire population, tests need to be performed to determine how reliable these statistics are with respect to the population parameters. Typically, lower case English letters such as (mean), s (standard deviation), p (proportion) and n (sample size) are used to represent sample statistics.
For simplicity in understanding, let us assume that we want to find out if the mean of a sample () of size n is indeed a reliable estimate of the mean (μ) of the population from which the sample was obtained. The test statistic is obtained by standardizing the difference between the two means with the help of the standard deviation of the population normalized with respect to the sample size.
If the sample size n is greater than 30, we use Z as the test statistic. If the sample size is less than 30, we use t as the test statistic. In both cases, we refer to the relevant Z-tables or t-tables to find the value, and thereby the significance of our sample results.
In most cases the σ is unknown, and if the sample is bigger than 30 observations, Central Limit Theorem allows us to use the sample standard deviation (s) to substitute for σ. If the sample is smaller than 30 observations, we use s to substitute for σ, but refer to the t-tables for our test statistic value and significance.
If we are comparing two samples to each other to check if they belong to the same population or not, instead of comparing sample with them directly to the population, we use t-tests since we are not using any population parameters in our measurement.
If we are doing a test of proportions, we always use the Z-test because proportions, unlike means are not dependent on the standard deviations and size of sample for normalization.