**Making Wage-and-Hour Discovery Count **

**By Valentín Estévez, Ph.D.**

Wage-and-hour litigation involves the collection and analysis of large amounts of data. The following statistical concepts are useful when requesting and analyzing these data:

**I. Parameter estimate:** Averages, medians, maximum and minimum values, standard deviations, and other statistics are examples of “parameter estimates.” A parameter estimate summarizes some of the information in the data. For example, averages and medians are ways to summarize the data by estimating its center.

*Don’t forget that it is an estimate*: Even under ideal analytical conditions, a parameter estimate will likely differ from the true value of the parameter because of random noise or variation: parameter estimates have a “standard error^{[1]}” that helps quantify the precision of the estimate.

**II. Statistical significance:** If after accounting for the estimate’s standard error, the parameter estimate differs to a sufficient degree–for our purposes, 2 or more standard deviations–from a hypothetical value, the difference between the parameter estimate and the hypothetical value is considered *statistically significant.*

*Why 2 standard**deviations?*The courts usually consider differences of 2 or more standard deviations as statistically significant. Under certain assumptions, differences of 2 or more standard deviations are expected to happen less than 5% of the time. Therefore, this threshold implies a belief that events that happen less than 5% of the time are rare enough to require a closer look.*What do the books say?*Statistical theory does not define a threshold for statistical significance. Data analysts rely on the norms of their scientific disciplines when determining statistical significance. For example, economists use the 2-standard deviation threshold, while particle physicists use a stricter 5-standard deviation threshold (5-standard deviation events happen only 0.00006% of the time).

**III. Confidence intervals:** A confidence interval defines the range of values that is consistent with a parameter estimate. For example, a 95% confidence interval contains the values that are within 2 standard deviations of the parameter estimate. Recall that events 2 or more standard deviations away from the parameter estimate happen approximately 5% of the time. These are some common misconceptions about confidence intervals:

- Confidence intervals
*do not*imply precise estimates. A higher confidence level implies that a wider range of values is consistent with the sample estimate. For example, a 99% confidence interval is about 30% wider than a 95% confidence interval. - Confidence intervals
*do not*tell you the probability that the true value of the estimate is contained in it: It is incorrect to state that a 95% confidence interval has a 95% probability of containing the true value of the parameter. - Confidence intervals
*do not*bound the parameter estimates that may be found in other data: It is wrong to say that 95% of all databases will generate parameter estimates that fall within the 95% confidence interval computed for the data at hand.

**IV. Sample size:** It is common in wage-and-hour litigation for the parties to agree to use a sample instead of the full population. The parties also commonly use legal precedents or rules-of-thumb to determine the size of the sample (e.g., 5% or 10% samples). If the objective is to obtain precise estimates of the parameters of interest, here are some considerations regarding sample sizes:

- More precise estimates require larger samples: The standard error of the parameter estimate decreases with the increased size of the sample.
- More variance among members of a population require larger samples to achieve the same level of precision: Increased variation among sample members in the key characteristics to be measured translates to larger standard errors for the parameter estimates.

**V. Statistically significant samples:** *Individual **samples cannot be statistically significant* because statistical significance applies to the estimates from the sample, and not to the sample data itself. Rather, the difference between a parameter estimate and a hypothetical value of the parameter may or may not be considered statistically significant.

A very good introduction to statistical concepts as they apply to litigation is the “Reference Guide on Statistics” in the Reference Manual on Scientific Evidence: (https://www.fjc.gov/sites/default/files/2015/SciMan3D01.pdf__)__

**Valentín Estévez is a Managing Director at Welch Consulting’s Washington, DC and Bryan, Texas offices. You can reach him with questions or comments at **vestevez@welchcon.com.

^{[1]} “Standard error” is a term of art in statistics, and it does not mean that there was an error in the computation. In litigation matters, standard errors are often called standard deviations.