Comparing Multiple Comparisons Phil Ender Culver City, ... are followed by post-hoc multiple comparisons. ... just Bonferroni and Sid ak adjustments...

0 downloads 24 Views 136KB Size

Loading...

Comparing Multiple Comparisons Phil Ender Culver City, California

Stata Conference Chicago - July 29, 2016

Phil Ender

Comparing Multiple Comparisons

1/ 23

Prologue

Prologue

In ANOVA, a significant omnibus F-tests only indicates that there is a significant effect. It does not indicate where the significant effects can be found. This is why many, if not most, significant ANOVAs, with more than two levels, are followed by post-hoc multiple comparisons.

Phil Ender

Comparing Multiple Comparisons

2/ 23

Prologue

What’s is the Problem?

Computing multiple comparisons increases the probability of making a Type I error. The more comparisons you make, the greater the chance of Type I errors. Multiple comparison techniques are designed to control the probability of these Type I errors.

Phil Ender

Comparing Multiple Comparisons

3/ 23

Prologue

What’s the Problem? Part 2 If n independent contrasts are each tested at α, then the probability of making at least one Type I error is 1 − (1 − α)n . The table below gives the probability of making at least one type I error for different numbers of comparisons when α = 0.05: n 1 2 3 5 10 15 20

probability 0.0500 0.0975 0.1426 0.2262 0.4013 0.5367 0.6415

The above probabilities apply to independent contrasts. However, most sets of contrasts are not independent. Phil Ender

Comparing Multiple Comparisons

4/ 23

Prologue

What is the solution? Adjust the critical values or p-values to reduce the probability of a false positive. The goal is to protect the familywise or experimentwise error rate in a strong sense, i.e., whether the null is true or not. Multiple comparison techniques such as Dunnett, Tukey HSD, ˘ ak or Scheff`e do a reasonably good job of of Bonferroni, Sid` protecting the familywise error rate. Techniques such as Fisher’s least significant difference (LSD), Student-Newman-Keuls, and Duncan’s multiple range test fail to strongly protect the familywise error rate. Such procedures are said to protect the familywise error rate in a weak sense, avoid them if possible. Phil Ender

Comparing Multiple Comparisons

5/ 23

Prologue

Outline of Multiple comparisons

I. A. B. II. A. B. C. III.

Planned Comparisons Planned Orthogonal Comparisons Planned Non-orthogonal Comparisons Post-hoc Comparisons All Pairwise Pairwise versus control group Non-pairwise Comparisons Other Comparisons

Phil Ender

Comparing Multiple Comparisons

6/ 23

Prologue

I. Planned Comparisons

Phil Ender

Comparing Multiple Comparisons

7/ 23

Prologue

Planned Orthogonal Comparisons

These are among the most powerful hypothesis tests available. Two Stringent requirements:

Phil Ender

Comparing Multiple Comparisons

8/ 23

Prologue

Planned Orthogonal Comparisons

These are among the most powerful hypothesis tests available. Two Stringent requirements: 1. Comparisons must be planned

Phil Ender

Comparing Multiple Comparisons

8/ 23

Prologue

Planned Orthogonal Comparisons

These are among the most powerful hypothesis tests available. Two Stringent requirements: 1. Comparisons must be planned 2. Comparisons must be orthogonal

Phil Ender

Comparing Multiple Comparisons

8/ 23

Prologue

Planned Orthogonal Comparisons

These are among the most powerful hypothesis tests available. Two Stringent requirements: 1. Comparisons must be planned 2. Comparisons must be orthogonal Say, 1vs2, 3vs4 and avg 1&2vs avg 3&4

Phil Ender

Comparing Multiple Comparisons

8/ 23

Prologue

Planned Orthogonal Comparisons

These are among the most powerful hypothesis tests available. Two Stringent requirements: 1. Comparisons must be planned 2. Comparisons must be orthogonal Say, 1vs2, 3vs4 and avg 1&2vs avg 3&4 Downside: Comparisons of interest may not be orthogonal.

Phil Ender

Comparing Multiple Comparisons

8/ 23

Prologue

Planned Non-orthogonal Comparisons

˘ ak-Dunn adjustment. Use either the Dunn or the Sid` Consider C contrasts: Dunn: αDunn = αEW /C ˘ ak-Dunn: αSD = 1 − (1 − αEW )(1/C ) Sid` If C = 5 and αEW = .05 then αDunn = .01 and αSD = .010206. ˘ ak adjustments. Basically, just Bonferroni and Sid`

Phil Ender

Comparing Multiple Comparisons

9/ 23

Prologue

Planned Non-orthogonal Comparisons: Pairwise vs Control

Special Case: Pairwise versus control group. Dunnett’s test is used to compare k − 1 treatment groups with a control group. Does not require an omnibus F -test. Dunnett’s test is a t-test with critical values derived by Dunnett (1955). The critical value depends on the number of groups and the denominator degrees of freedom.

Phil Ender

Comparing Multiple Comparisons

10/ 23

Prologue

II. Post-hoc Comparisons

Phil Ender

Comparing Multiple Comparisons

11/ 23

Prologue

Post-hoc Comparisons: All pairwise

Tukey’s HSD (honestly significant difference) is the perennial favorite for performing all possible pairwise comparisons among group means. With k groups there are k ∗ (k − 1)/2 possible contrasts. Tukey’s HSD uses quantiles of Studentized Range Statistic to make adjustments for the number of comparisons. All pairwise contrasts with large k may look like a fishing expedition.

Phil Ender

Comparing Multiple Comparisons

12/ 23

Prologue

Post-hoc Comparisons: All pairwise

Tukey HSD Test, Y −Y qHSD = √ mi mj

MSerror /n

Note the single n in the denominator. Tukey’s HSD requires that all groups must have the same number of observations.

Phil Ender

Comparing Multiple Comparisons

13/ 23

Prologue

What if the cell sizes are not equal?

Harmonic mean, the old school approach n = k/(1/n1 + 1/n2 + 1/n3 + 1/n4) Spjøtvol and Stoline’s modification of the HSD test, Y −Y qSS = √ mi mj

MSerror /nmin

Uses the minimum n of the two groups. Uses Studentized Augmented Range distribution for k and error df.

Phil Ender

Comparing Multiple Comparisons

14/ 23

Prologue

More on unequal cell sizes

Tukey-Kramer Modification of the HSD test,

qTK = √

Ymi −Ymj MSerror (1/ni +1/nj )/2

Use the Studentized Range distribution for k means with ν error degrees of freedom.

Phil Ender

Comparing Multiple Comparisons

15/ 23

Prologue

Post-hoc Comparisons: Pairwise vs Control

I know Dunnett’s test is for planned comparisons of k − 1 treatment groups with a control group. However, it is also used for post-hoc comparisons. It is marginally more powerful then the Tukey HSD because there are fewer contrasts. Dunnett’s test is a t-test with critical values derived by Dunnett (1955). The critical value depends on number of groups (k) and the anova error degrees of freedom.

Phil Ender

Comparing Multiple Comparisons

16/ 23

Prologue

Post-hoc Comparisons: Non-pairwise Comparisons

Example: Average of groups 1 & 2 versus the mean of group 3. Use the Scheff´e adjustment. Scheff´e is very conservative adjustment making use the F distribution. The Scheff´e critical value is ... FCrit = (k − 1) ∗ F(1,νerror ) Where k is the total number of groups.

Phil Ender

Comparing Multiple Comparisons

17/ 23

Prologue

III. Other Comparisons

Phil Ender

Comparing Multiple Comparisons

18/ 23

Prologue

If you absolutely positively have to make a few comparisons, but ...

but they don’t fit any of the approaches we’ve seen so far?

Phil Ender

Comparing Multiple Comparisons

19/ 23

Prologue

If you absolutely positively have to make a few comparisons, but ...

but they don’t fit any of the approaches we’ve seen so far? ... say, 15 regressions on 15 separate response variables.

Phil Ender

Comparing Multiple Comparisons

19/ 23

Prologue

If you absolutely positively have to make a few comparisons, but ...

but they don’t fit any of the approaches we’ve seen so far? ... say, 15 regressions on 15 separate response variables. ˘ ak adjustments Try a Bonferroni or Sid´

Phil Ender

Comparing Multiple Comparisons

19/ 23

Prologue

If you absolutely positively have to make a few comparisons, but ...

but they don’t fit any of the approaches we’ve seen so far? ... say, 15 regressions on 15 separate response variables. ˘ ak adjustments Try a Bonferroni or Sid´ Good protection but low power.

Phil Ender

Comparing Multiple Comparisons

19/ 23

Prologue

What if you want to make a huge number of contrasts, ...

say 10,000 or more?

Phil Ender

Comparing Multiple Comparisons

20/ 23

Prologue

What if you want to make a huge number of contrasts, ...

say 10,000 or more? Try a false discovery rate (FDR) method such as Benjamini-Hochberg.

Phil Ender

Comparing Multiple Comparisons

20/ 23

Prologue

What if you want to make a huge number of contrasts, ...

say 10,000 or more? Try a false discovery rate (FDR) method such as Benjamini-Hochberg. FDR control offers a way to increase power while maintaining some principled bound on error.

Phil Ender

Comparing Multiple Comparisons

20/ 23

Prologue

What if you want to make a huge number of contrasts, ...

say 10,000 or more? Try a false discovery rate (FDR) method such as Benjamini-Hochberg. FDR control offers a way to increase power while maintaining some principled bound on error. Note that when the FDR is controlled at .05, it is guaranteed that on average only 5% of the tests that are rejected are spurious.

Phil Ender

Comparing Multiple Comparisons

20/ 23

Prologue

What if you don’t want to be bothered making any adjustments for multiple comparisons?

Analyze your experiment using Bayesian methods.

Phil Ender

Comparing Multiple Comparisons

21/ 23

Prologue

What if you don’t want to be bothered making any adjustments for multiple comparisons?

Analyze your experiment using Bayesian methods. All comparisons are made from a single posterior distribution.

Phil Ender

Comparing Multiple Comparisons

21/ 23

Prologue

What if you don’t want to be bothered making any adjustments for multiple comparisons?

Analyze your experiment using Bayesian methods. All comparisons are made from a single posterior distribution. See whether the region of equivalence for the difference in means falls outside of the 95% highest posterior density (HPD) credible interval.

Phil Ender

Comparing Multiple Comparisons

21/ 23

Prologue

References

Benjamini, Y, & Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Statist Soc. Series B (Methodological), 57(1), 289.-300. Hays, R.E. (1995). Experimental design: Procedures for the behavioral sciences (3rd Edition). Pacific Grove, CA: Brooks/Cole. Kruschke, J.K. (2015). Doing bayesian analysis: a tutorial with R., JAGS and Stan (2nd Edition). Amsterdam: Elsevier.

Phil Ender

Comparing Multiple Comparisons

22/ 23

Prologue

¿Questions?

Phil Ender

Comparing Multiple Comparisons

23/ 23