Making Wage-and-Hour Discovery Count
By Valentín Estévez, Ph.D.

wage & hour data analysisWage-and-hour litigation involves the collection and analysis of large amounts of data. The following statistical concepts are useful when requesting and analyzing these data:

I. Parameter estimate:  Averages, medians, maximum and minimum values, standard deviations, and    other statistics are examples of “parameter estimates.”  A parameter estimate summarizes some of the information in the data.  For example, averages and medians are ways to summarize the data by estimating its center.

  • Don’t forget that it is an estimate: Even under ideal analytical conditions, a parameter estimate will likely differ from the true value of the parameter because of random noise or variation:  parameter estimates have a “standard error[1]” that helps quantify the precision of the estimate.

II. Statistical significance:  If after accounting for the estimate’s standard error, the parameter estimate differs to a sufficient degree–for our purposes, 2 or more standard deviations–from a hypothetical value, the difference between the parameter estimate and the hypothetical value is considered statistically significant.

  • Why 2 standard deviations? The courts usually consider differences of 2 or more standard deviations as statistically significant.  Under certain assumptions, differences of 2 or more standard deviations are expected to happen less than 5% of the time.  Therefore, this threshold implies a belief that events that happen less than 5% of the time are rare enough to require a closer look.
  • What do the books say? Statistical theory does not define a threshold for statistical significance.  Data analysts rely on the norms of their scientific disciplines when determining statistical significance.  For example, economists use the 2-standard deviation threshold, while particle physicists use a stricter 5-standard deviation threshold (5-standard deviation events happen only 0.00006% of the time).

III. Confidence intervals:  A confidence interval defines the range of values that is consistent with a parameter estimate.  For example, a 95% confidence interval contains the values that are within 2 standard deviations of the parameter estimate.  Recall that events 2 or more standard deviations away from the parameter estimate happen approximately 5% of the time.  These are some common misconceptions about confidence intervals:

  • Confidence intervals do not imply precise estimates. A higher confidence level implies that a wider range of values is consistent with the sample estimate.   For example, a 99% confidence interval is about 30% wider than a 95% confidence interval.
  • Confidence intervals do not tell you the probability that the true value of the estimate is contained in it: It is incorrect to state that a 95% confidence interval has a 95% probability of containing the true value of the parameter.
  • Confidence intervals do not bound the parameter estimates that may be found in other data: It is wrong to say that 95% of all databases will generate parameter estimates that fall within the 95% confidence interval computed for the data at hand.

IV. Sample size:  It is common in wage-and-hour litigation for the parties to agree to use a sample instead of the full population.  The parties also commonly use legal precedents or rules-of-thumb to determine the size of the sample (e.g., 5% or 10% samples).  If the objective is to obtain precise estimates of the parameters of interest, here are some considerations regarding sample sizes:

  • More precise estimates require larger samples: The standard error of the parameter estimate decreases with the increased size of the sample.
  • More variance among members of a population require larger samples to achieve the same level of precision: Increased variation among sample members in the key characteristics to be measured translates to larger standard errors for the parameter estimates.

V. Statistically significant samples:  Individual samples cannot be statistically significant because statistical significance applies to the estimates from the sample, and not to the sample data itself. Rather, the difference between a parameter estimate and a hypothetical value of the parameter may or may not be considered statistically significant.

A very good introduction to statistical concepts as they apply to litigation is the “Reference Guide on Statistics” in the Reference Manual on Scientific Evidence.

Valentín Estévez is a Managing Director at Welch Consulting’s Washington, DC and Bryan, Texas offices.  You can reach him with questions or comments at vestevez@welchcon.com.

 

[1] “Standard error” is a term of art in statistics, and it does not mean that there was an error in the computation.  In litigation matters, standard errors are often called standard deviations.

 

The opinions expressed are those of the author(s) and do not necessarily reflect the views of our firm or its clients.