Making Wage-and-Hour Discovery Count
By Valentín Estévez, Ph.D.

wage & hour data analysisWage-and-hour litigation involves the collection and analysis of large amounts of data. The following statistical concepts are useful when requesting and analyzing these data:

I. Parameter estimate:  Averages, medians, maximum and minimum values, standard deviations, and    other statistics are examples of “parameter estimates.”  A parameter estimate summarizes some of the information in the data.  For example, averages and medians are ways to summarize the data by estimating its center.

  • Don’t forget that it is an estimate: Even under ideal analytical conditions, a parameter estimate will likely differ from the true value of the parameter because of random noise or variation:  parameter estimates have a “standard error[1]” that helps quantify the precision of the estimate.

II. Statistical significance:  If after accounting for the estimate’s standard error, the parameter estimate differs to a sufficient degree–for our purposes, 2 or more standard deviations–from a hypothetical value, the difference between the parameter estimate and the hypothetical value is considered statistically significant.

  • Why 2 standard deviations? The courts usually consider differences of 2 or more standard deviations as statistically significant.  Under certain assumptions, differences of 2 or more standard deviations are expected to happen less than 5% of the time.  Therefore, this threshold implies a belief that events that happen less than 5% of the time are rare enough to require a closer look.
  • What do the books say? Statistical theory does not define a threshold for statistical significance.  Data analysts rely on the norms of their scientific disciplines when determining statistical significance.  For example, economists use the 2-standard deviation threshold, while particle physicists use a stricter 5-standard deviation threshold (5-standard deviation events happen only 0.00006% of the time).

III. Confidence intervals:  A confidence interval defines the range of values that is consistent with a parameter estimate.  For example, a 95% confidence interval contains the values that are within 2 standard deviations of the parameter estimate.  Recall that events 2 or more standard deviations away from the parameter estimate happen approximately 5% of the time.  These are some common misconceptions about confidence intervals:

  • Confidence intervals do not imply precise estimates. A higher confidence level implies that a wider range of values is consistent with the sample estimate.   For example, a 99% confidence interval is about 30% wider than a 95% confidence interval.
  • Confidence intervals do not tell you the probability that the true value of the estimate is contained in it: It is incorrect to state that a 95% confidence interval has a 95% probability of containing the true value of the parameter.
  • Confidence intervals do not bound the parameter estimates that may be found in other data: It is wrong to say that 95% of all databases will generate parameter estimates that fall within the 95% confidence interval computed for the data at hand.

IV. Sample size:  It is common in wage-and-hour litigation for the parties to agree to use a sample instead of the full population.  The parties also commonly use legal precedents or rules-of-thumb to determine the size of the sample (e.g., 5% or 10% samples).  If the objective is to obtain precise estimates of the parameters of interest, here are some considerations regarding sample sizes:

  • More precise estimates require larger samples: The standard error of the parameter estimate decreases with the increased size of the sample.
  • More variance among members of a population require larger samples to achieve the same level of precision: Increased variation among sample members in the key characteristics to be measured translates to larger standard errors for the parameter estimates.

V. Statistically significant samples:  Individual samples cannot be statistically significant because statistical significance applies to the estimates from the sample, and not to the sample data itself. Rather, the difference between a parameter estimate and a hypothetical value of the parameter may or may not be considered statistically significant.

A very good introduction to statistical concepts as they apply to litigation is the “Reference Guide on Statistics” in the Reference Manual on Scientific Evidence: (https://www.fjc.gov/sites/default/files/2015/SciMan3D01.pdf)

Valentín Estévez is a Managing Director at Welch Consulting’s Washington, DC and Bryan, Texas offices.  You can reach him with questions or comments at vestevez@welchcon.com.

 

[1] “Standard error” is a term of art in statistics, and it does not mean that there was an error in the computation.  In litigation matters, standard errors are often called standard deviations.