Estimated Marginal a.k.a. Least Square Means

At their simplest, estimated marginal means, a.k.a least-square means, are just effects: predictions and associated errors computed from a model at specific points on a reference grid. The "marginal" here refers to estimation at the margins of the table, i.e. either averaging over or computing at specific values of other variables, often the means of other variables. Effects.jl provides a convenient interface for computing EM means at all levels of categorical variables and the means of any continuous covariates. An example will make this clear:

using DataFrames, Effects, GLM, StatsModels, StableRNGs, StandardizedPredictors
rng = StableRNG(42)
growthdata = DataFrame(; age=[13:20; 13:20],
                       sex=repeat(["male", "female"], inner=8),
                       weight=[range(100, 155; length=8); range(100, 125; length=8)] .+ randn(rng, 16))
model_scaled = lm(@formula(weight ~ 1 + sex * age), growthdata;
                  contrasts=Dict(:age => ZScore(), :sex => DummyCoding()))

emmeans(model_scaled)

2×4 DataFrame

Row	sex	age	weight	err
	String	Float64	Float64	Float64
1	female	16.5	112.694	0.397181
2	male	16.5	127.682	0.397181

(This is the same example data as Interaction Terms in Effects.)

Notice that we could have achieved the same result by using effects and specifying the levels manually:

design = Dict(:sex => ["female", "male"], :age => 16.5)
grid = expand_grid(design)
effects!(grid, model_scaled)

2×4 DataFrame

Row	sex	age	weight	err
	String	Float64	Float64	Float64
1	female	16.5	112.694	0.397181
2	male	16.5	127.682	0.397181

emmeans is primarily a convenience function for computing effects on a convenient, pre-defined reference grid. Notably, emmeans includes information by default about all variables, even the ones analyzed only at their typical values. This differs from effects!:

design = Dict(:sex => ["female", "male"])
grid = expand_grid(design)
effects!(grid, model_scaled)

2×3 DataFrame

Row	sex	weight	err
	String	Float64	Float64
1	female	112.694	0.397181
2	male	127.682	0.397181

Pairwise Comparisons

EM means are also useful for post-hoc examination of pairwise differences. This is implemented via the empairs function:

empairs(model_scaled)

1×4 DataFrame

Row	sex	age	weight	err
	String	Float64	Float64	Float64
1	female > male	16.5	-14.9882	0.561699

Or for a more advanced case:

using MixedModels
kb07 = MixedModels.dataset(:kb07)
mixed_form = @formula(rt_trunc ~ 1 + spkr * prec * load + (1|item) + (1|subj))
mixed_model = fit(MixedModel, mixed_form, kb07; progress=false)
empairs(mixed_model)

28×5 DataFrame

Row	spkr	load	prec	rt_trunc	err
	String	String	String	Float64	Float64
1	new > old	no	break	-189.655	128.645
2	new	no > yes	break	-158.7	128.645
3	new > old	no > yes	break	-327.682	128.645
4	new	no	break > maintain	585.983	128.645
5	new > old	no	break > maintain	576.028	128.73
6	new	no > yes	break > maintain	503.358	128.645
7	new > old	no > yes	break > maintain	327.809	128.645
8	old > new	no > yes	break	30.9554	128.603
9	old	no > yes	break	-138.027	128.603
10	old > new	no	break > maintain	775.638	128.603
11	old	no	break > maintain	765.684	128.688
12	old > new	no > yes	break > maintain	693.013	128.603
13	old	no > yes	break > maintain	517.464	128.603
14	new > old	yes	break	-168.982	128.603
15	new	yes > no	break > maintain	744.683	128.603
16	new > old	yes > no	break > maintain	734.728	128.688
17	new	yes	break > maintain	662.058	128.603
18	new > old	yes	break > maintain	486.509	128.603
19	old > new	yes > no	break > maintain	913.665	128.603
20	old	yes > no	break > maintain	903.71	128.688
21	old > new	yes	break > maintain	831.04	128.603
22	old	yes	break > maintain	655.491	128.603
23	new > old	no	maintain	-9.95474	128.688
24	new	no > yes	maintain	-82.625	128.603
25	new > old	no > yes	maintain	-258.174	128.603
26	old > new	no > yes	maintain	-72.6703	128.688
27	old	no > yes	maintain	-248.219	128.688
28	new > old	yes	maintain	-175.549	128.603

If we provide a method for computing the degrees of freedom or an appropriate value, then empairs will also generate test statistics:

empairs(model_scaled; dof=dof_residual)

1×7 DataFrame

Row	sex	age	weight	err	dof	t	Pr(>\|t\|)
	String	Float64	Float64	Float64	Float64	Float64	Float64
1	female > male	16.5	-14.9882	0.561699	12.0	-26.6838	4.70971e-12

These degrees of freedom correspond to the denominator degrees of freedom in the associated F-tests.

Mixed-effects Models

For mixed-effects models, the denominator degrees of freedom are not clearly defined^[GLMMFAQ] (nor is it clear that the asymptotic distribution is even F for anything beyond a few special cases^[Bates2006]). The fundamental problem is that mixed models have in some sense "more" degrees of freedom than you would expect from naively counting parameters – the conditional modes (BLUPs) aren't technically parameters, but do somehow contribute to increasing the amount of wiggle-room that the model has for adapting to the data. We can try a few different approaches and see how they differ.

We'll use a simplified version of the model above:

mixed_form2 = @formula(rt_trunc ~ 1 + prec * load + (1|item) + (1|subj))
mixed_model2 = fit(MixedModel, mixed_form2, kb07; progress=false)

	Est.	SE	z	p	σ_subj	σ_item
(Intercept)	2441.6881	84.3815	28.94	<1e-99	318.4042	363.6708
prec: maintain	-676.0779	48.5432	-13.93	<1e-43
load: yes	148.1423	48.4865	3.06	0.0022
prec: maintain & load: yes	17.3034	68.5904	0.25	0.8008
Residual	725.2555

the default definition in MixedModels.jl for dof_residual(model) is simply nobs(model) - dof(model)

empairs(mixed_model2; dof=dof_residual)

6×7 DataFrame

Row	load	prec	rt_trunc	err	dof	t	Pr(>\|t\|)
	String	String	Float64	Float64	Int64	Float64	Float64
1	no > yes	break	-148.142	119.322	1782	-1.24153	0.214572
2	no	break > maintain	676.078	119.345	1782	5.66491	1.71183e-8
3	no > yes	break > maintain	510.632	119.322	1782	4.27945	1.97306e-5
4	yes > no	break > maintain	824.22	119.333	1782	6.90687	6.86976e-12
5	yes	break > maintain	658.775	119.31	1782	5.52152	3.85549e-8
6	no > yes	maintain	-165.446	119.333	1782	-1.38641	0.165794

the leverage formula, given by nobs(model) - sum(leverage(model)) (for classical linear models, this is exactly the same as (1))

empairs(mixed_model2; dof=nobs(mixed_model) - sum(leverage(mixed_model)))

6×7 DataFrame

Row	load	prec	rt_trunc	err	dof	t	Pr(>\|t\|)
	String	String	Float64	Float64	Float64	Float64	Float64
1	no > yes	break	-148.142	119.322	1704.63	-1.24153	0.214579
2	no	break > maintain	676.078	119.345	1704.63	5.66491	1.7235e-8
3	no > yes	break > maintain	510.632	119.322	1704.63	4.27945	1.97767e-5
4	yes > no	break > maintain	824.22	119.333	1704.63	6.90687	6.97054e-12
5	yes	break > maintain	658.775	119.31	1704.63	5.52152	3.87932e-8
6	no > yes	maintain	-165.446	119.333	1704.63	-1.38641	0.165802

counting each conditional mode as a parameter (this is the most conservative value because it has the smallest degrees of freedom)

empairs(mixed_model2; dof=nobs(mixed_model) - sum(size(mixed_model)[2:3]))

6×7 DataFrame

Row	load	prec	rt_trunc	err	dof	t	Pr(>\|t\|)
	String	String	Float64	Float64	Int64	Float64	Float64
1	no > yes	break	-148.142	119.322	1693	-1.24153	0.21458
2	no	break > maintain	676.078	119.345	1693	5.66491	1.72535e-8
3	no > yes	break > maintain	510.632	119.322	1693	4.27945	1.9784e-5
4	yes > no	break > maintain	824.22	119.333	1693	6.90687	6.9866e-12
5	yes	break > maintain	658.775	119.31	1693	5.52152	3.8831e-8
6	no > yes	maintain	-165.446	119.333	1693	-1.38641	0.165803

the infinite degrees of freedom (i.e. interpret $t$ as $z$ and $F$ as $\chi^2$ )

empairs(mixed_model2; dof=Inf)

6×7 DataFrame

Row	load	prec	rt_trunc	err	dof	t	Pr(>\|t\|)
	String	String	Float64	Float64	Float64	Float64	Float64
1	no > yes	break	-148.142	119.322	Inf	-1.24153	0.214408
2	no	break > maintain	676.078	119.345	Inf	5.66491	1.47106e-8
3	no > yes	break > maintain	510.632	119.322	Inf	4.27945	1.87356e-5
4	yes > no	break > maintain	824.22	119.333	Inf	6.90687	4.9548e-12
5	yes	break > maintain	658.775	119.31	Inf	5.52152	3.36087e-8
6	no > yes	maintain	-165.446	119.333	Inf	-1.38641	0.16562

These values are all very similar to each other, because the $t$ distribution rapidly converges to the $z$ distribution for $t > 30$ and so amount of probability mass in the tails does not change much for a model with more than a thousand observations and 30+ levels of each grouping variable.

Multiple Comparisons Correction

A perhaps more important concern is inflation of Type-1 error rate through multiple testing. We can also provide a correction method to be applied to the p-values.

using MultipleTesting
bonferroni(pvals) = adjust(PValues(pvals), Bonferroni())
empairs(mixed_model2; dof=Inf, padjust=bonferroni)

6×7 DataFrame

Row	load	prec	rt_trunc	err	dof	t	Pr(>\|t\|)
	String	String	Float64	Float64	Float64	Float64	Float64
1	no > yes	break	-148.142	119.322	Inf	-1.24153	1.0
2	no	break > maintain	676.078	119.345	Inf	5.66491	8.82634e-8
3	no > yes	break > maintain	510.632	119.322	Inf	4.27945	0.000112414
4	yes > no	break > maintain	824.22	119.333	Inf	6.90687	2.97288e-11
5	yes	break > maintain	658.775	119.31	Inf	5.52152	2.01652e-7
6	no > yes	maintain	-165.446	119.333	Inf	-1.38641	0.993722

A Final Note

There are many subleties in more advanced applications of EM means. These are discussed at length in the documentation for the R package emmeans. For example:

the impact of transformations and link functions
derived covariates, e.g., including both x and x^2 in a model
weighting of cells in computing means

All of these problems are fundamental to EM means as a technique and not particular to the software implementation. Effects.jl does not yet support everything that R's emmeans does, but all of the fundamental statistical concerns discussed in the latter's documentation still apply.