Estimated Marginal a.k.a. Least Square Means

At their simplest, estimated marginal means, a.k.a least-square means, are just effects: predictions and associated errors computed from a model at specific points on a reference grid. The "marginal" here refers to estimation at the margins of the table, i.e. either averaging over or computing at specific values of other variables, often the means of other variables. Effects.jl provides a convenient interface for computing EM means at all levels of categorical variables and the means of any continuous covariates. An example will make this clear:

using DataFrames, Effects, GLM, StatsModels, StableRNGs, StandardizedPredictors
rng = StableRNG(42)
growthdata = DataFrame(; age=[13:20; 13:20],
                       sex=repeat(["male", "female"], inner=8),
                       weight=[range(100, 155; length=8); range(100, 125; length=8)] .+ randn(rng, 16))
model_scaled = lm(@formula(weight ~ 1 + sex * age), growthdata;
                  contrasts=Dict(:age => ZScore(), :sex => DummyCoding()))

emmeans(model_scaled)
2×4 DataFrame
Rowsexageweighterr
StringFloat64Float64Float64
1female16.5112.6940.397181
2male16.5127.6820.397181

(This is the same example data as Interaction Terms in Effects.)

Notice that we could have achieved the same result by using effects and specifying the levels manually:

design = Dict(:sex => ["female", "male"], :age => 16.5)
grid = expand_grid(design)
effects!(grid, model_scaled)
2×4 DataFrame
Rowsexageweighterr
StringFloat64Float64Float64
1female16.5112.6940.397181
2male16.5127.6820.397181

emmeans is primarily a convenience function for computing effects on a convenient, pre-defined reference grid. Notably, emmeans includes information by default about all variables, even the ones analyzed only at their typical values. This differs from effects!:

design = Dict(:sex => ["female", "male"])
grid = expand_grid(design)
effects!(grid, model_scaled)
2×3 DataFrame
Rowsexweighterr
StringFloat64Float64
1female112.6940.397181
2male127.6820.397181

Pairwise Comparisons

EM means are also useful for post-hoc examination of pairwise differences. This is implemented via the empairs function:

empairs(model_scaled)
1×4 DataFrame
Rowsexageweighterr
StringFloat64Float64Float64
1female > male16.5-14.98820.561699

Or for a more advanced case:

using MixedModels
kb07 = MixedModels.dataset(:kb07)
mixed_form = @formula(rt_trunc ~ 1 + spkr * prec * load + (1|item) + (1|subj))
mixed_model = fit(MixedModel, mixed_form, kb07; progress=false)
empairs(mixed_model)
28×5 DataFrame
Rowspkrloadprecrt_truncerr
StringStringStringFloat64Float64
1new > oldnobreak-189.655128.645
2newno > yesbreak-158.7128.645
3new > oldno > yesbreak-327.682128.645
4newnobreak > maintain585.983128.645
5new > oldnobreak > maintain576.028128.73
6newno > yesbreak > maintain503.358128.645
7new > oldno > yesbreak > maintain327.809128.645
8old > newno > yesbreak30.9554128.603
9oldno > yesbreak-138.027128.603
10old > newnobreak > maintain775.638128.603
11oldnobreak > maintain765.684128.688
12old > newno > yesbreak > maintain693.013128.603
13oldno > yesbreak > maintain517.464128.603
14new > oldyesbreak-168.982128.603
15newyes > nobreak > maintain744.683128.603
16new > oldyes > nobreak > maintain734.728128.688
17newyesbreak > maintain662.058128.603
18new > oldyesbreak > maintain486.509128.603
19old > newyes > nobreak > maintain913.665128.603
20oldyes > nobreak > maintain903.71128.688
21old > newyesbreak > maintain831.04128.603
22oldyesbreak > maintain655.491128.603
23new > oldnomaintain-9.95474128.688
24newno > yesmaintain-82.625128.603
25new > oldno > yesmaintain-258.174128.603
26old > newno > yesmaintain-72.6703128.688
27oldno > yesmaintain-248.219128.688
28new > oldyesmaintain-175.549128.603

If we provide a method for computing the degrees of freedom or an appropriate value, then empairs will also generate test statistics:

empairs(model_scaled; dof=dof_residual)
1×7 DataFrame
RowsexageweighterrdoftPr(>|t|)
StringFloat64Float64Float64Float64Float64Float64
1female > male16.5-14.98820.56169912.0-26.68384.70971e-12

These degrees of freedom correspond to the denominator degrees of freedom in the associated F-tests.

Mixed-effects Models

For mixed-effects models, the denominator degrees of freedom are not clearly defined[GLMMFAQ] (nor is it clear that the asymptotic distribution is even F for anything beyond a few special cases[Bates2006]). The fundamental problem is that mixed models have in some sense "more" degrees of freedom than you would expect from naively counting parameters – the conditional modes (BLUPs) aren't technically parameters, but do somehow contribute to increasing the amount of wiggle-room that the model has for adapting to the data. We can try a few different approaches and see how they differ.

We'll use a simplified version of the model above:

mixed_form2 = @formula(rt_trunc ~ 1 + prec * load + (1|item) + (1|subj))
mixed_model2 = fit(MixedModel, mixed_form2, kb07; progress=false)
Est.SEzpσ_subjσ_item
(Intercept)2441.688184.381528.94<1e-99318.4042363.6708
prec: maintain-676.077948.5432-13.93<1e-43
load: yes148.142348.48653.060.0022
prec: maintain & load: yes17.303468.59040.250.8008
Residual725.2555
  • the default definition in MixedModels.jl for dof_residual(model) is simply nobs(model) - dof(model)
empairs(mixed_model2; dof=dof_residual)
6×7 DataFrame
Rowloadprecrt_truncerrdoftPr(>|t|)
StringStringFloat64Float64Int64Float64Float64
1no > yesbreak-148.142119.3221782-1.241530.214572
2nobreak > maintain676.078119.34517825.664911.71183e-8
3no > yesbreak > maintain510.632119.32217824.279451.97306e-5
4yes > nobreak > maintain824.22119.33317826.906876.86976e-12
5yesbreak > maintain658.775119.3117825.521523.85549e-8
6no > yesmaintain-165.446119.3331782-1.386410.165794
  • the leverage formula, given by nobs(model) - sum(leverage(model)) (for classical linear models, this is exactly the same as (1))
empairs(mixed_model2; dof=nobs(mixed_model) - sum(leverage(mixed_model)))
6×7 DataFrame
Rowloadprecrt_truncerrdoftPr(>|t|)
StringStringFloat64Float64Float64Float64Float64
1no > yesbreak-148.142119.3221704.63-1.241530.214579
2nobreak > maintain676.078119.3451704.635.664911.7235e-8
3no > yesbreak > maintain510.632119.3221704.634.279451.97767e-5
4yes > nobreak > maintain824.22119.3331704.636.906876.97054e-12
5yesbreak > maintain658.775119.311704.635.521523.87932e-8
6no > yesmaintain-165.446119.3331704.63-1.386410.165802
  • counting each conditional mode as a parameter (this is the most conservative value because it has the smallest degrees of freedom)
empairs(mixed_model2; dof=nobs(mixed_model) - sum(size(mixed_model)[2:3]))
6×7 DataFrame
Rowloadprecrt_truncerrdoftPr(>|t|)
StringStringFloat64Float64Int64Float64Float64
1no > yesbreak-148.142119.3221693-1.241530.21458
2nobreak > maintain676.078119.34516935.664911.72535e-8
3no > yesbreak > maintain510.632119.32216934.279451.9784e-5
4yes > nobreak > maintain824.22119.33316936.906876.9866e-12
5yesbreak > maintain658.775119.3116935.521523.8831e-8
6no > yesmaintain-165.446119.3331693-1.386410.165803
  • the infinite degrees of freedom (i.e. interpret $t$ as $z$ and $F$ as $\chi^2$ )
empairs(mixed_model2; dof=Inf)
6×7 DataFrame
Rowloadprecrt_truncerrdoftPr(>|t|)
StringStringFloat64Float64Float64Float64Float64
1no > yesbreak-148.142119.322Inf-1.241530.214408
2nobreak > maintain676.078119.345Inf5.664911.47106e-8
3no > yesbreak > maintain510.632119.322Inf4.279451.87356e-5
4yes > nobreak > maintain824.22119.333Inf6.906874.9548e-12
5yesbreak > maintain658.775119.31Inf5.521523.36087e-8
6no > yesmaintain-165.446119.333Inf-1.386410.16562

These values are all very similar to each other, because the $t$ distribution rapidly converges to the $z$ distribution for $t > 30$ and so amount of probability mass in the tails does not change much for a model with more than a thousand observations and 30+ levels of each grouping variable.

Multiple Comparisons Correction

A perhaps more important concern is inflation of Type-1 error rate through multiple testing. We can also provide a correction method to be applied to the p-values.

using MultipleTesting
bonferroni(pvals) = adjust(PValues(pvals), Bonferroni())
empairs(mixed_model2; dof=Inf, padjust=bonferroni)
6×7 DataFrame
Rowloadprecrt_truncerrdoftPr(>|t|)
StringStringFloat64Float64Float64Float64Float64
1no > yesbreak-148.142119.322Inf-1.241531.0
2nobreak > maintain676.078119.345Inf5.664918.82634e-8
3no > yesbreak > maintain510.632119.322Inf4.279450.000112414
4yes > nobreak > maintain824.22119.333Inf6.906872.97288e-11
5yesbreak > maintain658.775119.31Inf5.521522.01652e-7
6no > yesmaintain-165.446119.333Inf-1.386410.993722

A Final Note

There are many subleties in more advanced applications of EM means. These are discussed at length in the documentation for the R package emmeans. For example:

All of these problems are fundamental to EM means as a technique and not particular to the software implementation. Effects.jl does not yet support everything that R's emmeans does, but all of the fundamental statistical concerns discussed in the latter's documentation still apply.