Effects.jl

Regression models are useful but they can be tricky to interpret. Variable centering and contrast coding can obscure the meaning of main effects. Interaction terms, especially higher order ones, only increase the difficulty of interpretation. Here, we introduce Effects.jl which translates the fitted model, including estimated uncertainty, back into data space. Using Effects.jl, it is possible to generate effects plots that enable rapid visualization and interpretation of regression models.

The examples below demonstrate the use of Effects.jl with GLM.jl, but they will work with any modeling package that is based on the StatsModels.jl formula. The second example is borrowed in no small part from StandardizedPredictors.jl.

The Effect of Contrast Coding

Let's consider a synthetic dataset of weights (in grams) for chicks feed different types of feed, with a single predictor feed (categorical, with three levels A, B, C). The simulated weights are based loosely on the R dataset chickwts.

using AlgebraOfGraphics, CairoMakie, DataFrames, Effects, GLM, StatsModels, StableRNGs
rng = StableRNG(42)
reps = 5
sd = 50
wtdat = DataFrame(; feed=repeat(["A", "B", "C"], inner=reps),
                  weight=[180 .+ sd * randn(rng, reps);
                          220 .+ sd * randn(rng, reps);
                          300 .+ sd * randn(rng, reps)])

15×2 DataFrame

Row	feed	weight
	String	Float64
1	A	146.487
2	A	202.356
3	A	248.682
4	A	245.477
5	A	186.304
6	B	254.197
7	B	169.04
8	B	180.324
9	B	308.736
10	B	284.867
11	C	217.807
12	C	339.722
13	C	234.517
14	C	298.147
15	C	353.616

If we fit a linear model to this data using default treatment/dummy coding, then the coefficient corresponding of feed C is significant.

mod_treat = lm(@formula(weight ~ 1 + feed), wtdat)

StatsModels.TableRegressionModel{GLM.LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

weight ~ 1 + feed

Coefficients:
───────────────────────────────────────────────────────────────────────
                Coef.  Std. Error     t  Pr(>|t|)  Lower 95%  Upper 95%
───────────────────────────────────────────────────────────────────────
(Intercept)  205.861      25.075   8.21    <1e-05  151.227      260.495
feed: B       33.5719     35.4614  0.95    0.3625  -43.6919     110.836
feed: C       82.9008     35.4614  2.34    0.0375    5.63696    160.165
───────────────────────────────────────────────────────────────────────

If on the other hand, we use effects (sum-to-zero) coding, then the coefficient for feed C is no longer significant:

mod_eff = lm(@formula(weight ~ 1 + feed), wtdat; contrasts=Dict(:feed => EffectsCoding()))

StatsModels.TableRegressionModel{GLM.LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

weight ~ 1 + feed

Coefficients:
──────────────────────────────────────────────────────────────────────────
                 Coef.  Std. Error      t  Pr(>|t|)   Lower 95%  Upper 95%
──────────────────────────────────────────────────────────────────────────
(Intercept)  244.685       14.4771  16.90    <1e-09  213.143      276.228
feed: B       -5.25232     20.4737  -0.26    0.8019  -49.8606      39.356
feed: C       44.0766      20.4737   2.15    0.0524   -0.531742    88.6849
──────────────────────────────────────────────────────────────────────────

This is in some sense unsurprising: the different coding schemes correspond to different hypotheses. In treatment coding, the hypothesis for the coefficient feed: C is whether feed C differs from the reference level, feed A. In effects coding, the hypothesis is whether feed C differs from the mean across all levels. In more complicated models, the hypotheses being tested – especially for interaction terms – can become more complex and difficult to "read off" from the model summary.

In spite of these differences, these models make the same predictions about the data:

fitted(mod_treat) ≈ fitted(mod_eff)

true

At a deep level, these models are the actually same model, but with different parameterizations. In order to get a better view about what a model is saying about the data, abstracted away from the parameterization, we can see what the model looks like in data space. For that, we can use Effects.jl to generate the effects that the model is capturing. We do this by specifying a (subset of the) design and creating a reference grid, then computing the model's prediction and associated error at those values.

The effects function will compute the reference grid for a crossed design specified by a dictionary of values. As we only have one predictor in this dataset, the design is completely crossed.

design = Dict(:feed => unique(wtdat.feed))
eff_feed = effects(design, mod_eff)

3×5 DataFrame

Row	feed	weight	err	lower	upper
	String	Float64	Float64	Float64	Float64
1	A	205.861	25.075	180.786	230.936
2	B	239.433	25.075	214.358	264.508
3	C	288.762	25.075	263.687	313.837

The effects table consists of four columns: the levels of the feed predictor specified in the design (feed), the prediction of the model at those levels (weight), the standard error of those predictions err, and the lower and upper edges of confidence interval of those predictions (lower, upper; computed using a normal approximation based on the standard error).

Because the effects are on the data (response) scale, they are invariant to parameterization:

effects(design, mod_treat)

3×5 DataFrame

Row	feed	weight	err	lower	upper
	String	Float64	Float64	Float64	Float64
1	A	205.861	25.075	180.786	230.936
2	B	239.433	25.075	214.358	264.508
3	C	288.762	25.075	263.687	313.837

We can also compute effects on a subset of the levels in the data:

effects(Dict(:feed => ["A"]), mod_eff)

1×5 DataFrame

Row	feed	weight	err	lower	upper
	String	Float64	Float64	Float64	Float64
1	A	205.861	25.075	180.786	230.936

Note that in this case, we computed the average effect of the level A, which is not directly encoded in any single model coefficient in this parameterization / contrast coding.

We can of course use the estimated effects to get a nice visual of the data:

plt = data(eff_feed) * mapping(:feed, :weight) * (visual(Scatter) + mapping(:err) * visual(Errorbars))
draw(plt)

Interaction Terms in Effects

Let's consider a (slightly) synthetic dataset of weights for adolescents of different ages, with predictors age (continuous, from 13 to 20) and sex, and weight in pounds. The weights are based loosely on the medians from the CDC growth charts, which show that the median male and female both start off around 100 pounds at age 13, but by age 20 the median male weighs around 155 pounds while the median female weighs around 125 pounds.

using AlgebraOfGraphics, CairoMakie, DataFrames, Effects, GLM, StatsModels, StableRNGs
rng = StableRNG(42)
growthdata = DataFrame(; age=[13:20; 13:20],
                       sex=repeat(["male", "female"], inner=8),
                       weight=[range(100, 155; length=8); range(100, 125; length=8)] .+ randn(rng, 16))

16×3 DataFrame

Row	age	sex	weight
	Int64	String	Float64
1	13	male	99.3297
2	14	male	108.304
3	15	male	117.088
4	16	male	124.881
5	17	male	131.555
6	18	male	139.97
7	19	male	146.124
8	20	male	154.206
9	13	female	101.775
10	14	female	104.869
11	15	female	105.499
12	16	female	111.509
13	17	female	112.976
14	18	female	117.82
15	19	female	122.501
16	20	female	124.603

In this dataset, there's obviously a main effect of sex: males are heavier than females for every age except 13 years. But if we run a basic linear regression, we see something rather different:

mod_uncentered = lm(@formula(weight ~ 1 + sex * age), growthdata)

StatsModels.TableRegressionModel{GLM.LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

weight ~ 1 + sex + age + sex & age

Coefficients:
──────────────────────────────────────────────────────────────────────────────
                     Coef.  Std. Error       t  Pr(>|t|)  Lower 95%  Upper 95%
──────────────────────────────────────────────────────────────────────────────
(Intercept)       56.4387     2.88762    19.55    <1e-09   50.1471    62.7303
sex: male        -56.1509     4.08372   -13.75    <1e-07  -65.0485   -47.2532
age                3.40941    0.173344   19.67    <1e-09    3.03172    3.78709
sex: male & age    4.31146    0.245146   17.59    <1e-09    3.77734    4.84559
──────────────────────────────────────────────────────────────────────────────

Is this model just a poor fit to the data? We can plot the effects and see that's not the case. For example purposes, we'll create a reference grid that does not correspond to a fully balanced design and call effects! to insert the effects-related columns. In particular, we'll take odd ages for males and even ages for females.

refgrid = copy(growthdata)
filter!(refgrid) do row
    return mod(row.age, 2) == (row.sex == "male")
end
effects!(refgrid, mod_uncentered)

8×4 DataFrame

Row	age	sex	weight	err
	Int64	String	Float64	Float64
1	13	male	100.659	0.72515
2	15	male	116.101	0.474722
3	17	male	131.543	0.406528
4	19	male	146.984	0.587838
5	14	female	104.17	0.587838
6	16	female	110.989	0.406528
7	18	female	117.808	0.474722
8	20	female	124.627	0.72515

Note that the column corresponding to the response variable weight has been overwritten with the effects prediction and that only the standard error is provided: the effects! method does less work than the effects convenience method.

We can add the confidence interval bounds in and plot our predictions:

refgrid[!, :lower] = @. refgrid.weight - 1.96 * refgrid.err
refgrid[!, :upper] = @. refgrid.weight + 1.96 * refgrid.err
sort!(refgrid, [:age])

plt = data(refgrid) * mapping(:age, :weight; color=:sex) *
      (visual(Lines) + mapping(; lower=:lower, upper=:upper) * visual(LinesFill))
draw(plt)

We can also add in the raw data to check the model fit:

draw(plt + data(growthdata) * mapping(:age, :weight; color=:sex) * visual(Scatter))

The model seems to be doing a good job. Indeed it is, and as pointed out in the StandardizedPredictors.jl docs, the problem is that we should center the age variable. While we're at it, we'll also set the contrasts for sex to be effects coded.

using StandardizedPredictors
contrasts = Dict(:age => Center(15), :sex => EffectsCoding())
mod_centered = lm(@formula(weight ~ 1 + sex * age), growthdata; contrasts=contrasts)

StatsModels.TableRegressionModel{GLM.LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

weight ~ 1 + sex + age(centered: 15) + sex & age(centered: 15)

Coefficients:
────────────────────────────────────────────────────────────────────────────────────────────
                                   Coef.  Std. Error       t  Pr(>|t|)  Lower 95%  Upper 95%
────────────────────────────────────────────────────────────────────────────────────────────
(Intercept)                    111.84       0.335679  333.18    <1e-24  111.109    112.572
sex: male                        4.26052    0.335679   12.69    <1e-07    3.52914    4.99191
age(centered: 15)                5.56514    0.122573   45.40    <1e-14    5.29808    5.8322
sex: male & age(centered: 15)    2.15573    0.122573   17.59    <1e-09    1.88867    2.42279
────────────────────────────────────────────────────────────────────────────────────────────

All of the estimates have now changed because the parameterization is completely different, but the predictions and thus the effects remain unchanged:

refgrid_centered = copy(growthdata)
effects!(refgrid_centered, mod_centered)
refgrid_centered[!, :lower] = @. refgrid_centered.weight - 1.96 * refgrid_centered.err
refgrid_centered[!, :upper] = @. refgrid_centered.weight + 1.96 * refgrid_centered.err
sort!(refgrid_centered, [:age])

plt = data(refgrid_centered) * mapping(:age, :weight; color=:sex) *
      (visual(Lines) + mapping(; lower=:lower, upper=:upper) * visual(LinesFill))
draw(plt)

Understanding lower-level terms in the presence of interactions can be particularly tricky, and effect plots are also useful for this. For example, if we want to examine the effect of sex at a typical age, then we would need some way to reduce age to typical values. By default, effects[!] will take use the mean of all model terms not specified in the effects formula as representative values. Looking at sex, we see that

design = Dict(:sex => unique(growthdata.sex))
effects(design, mod_uncentered)

2×5 DataFrame

Row	sex	weight	err	lower	upper
	String	Float64	Float64	Float64	Float64
1	male	127.682	0.397181	127.285	128.079
2	female	112.694	0.397181	112.297	113.091

correspond to the model's estimate of weights for each sex at the average age in the dataset. Like all effects predictions, this is invariant to contrast coding:

effects(design, mod_centered)

2×5 DataFrame

Row	sex	weight	err	lower	upper
	String	Float64	Float64	Float64	Float64
1	male	127.682	0.397181	127.285	128.079
2	female	112.694	0.397181	112.297	113.091

Categorical Variables and Typical Values

We can also compute effects for the different ages at typical values of sex for this dataset. For categorical variables, the typical values are computed on the contrasts themselves. For balanced datasets and the default use of the mean as typical values, this reflects the average across all levels of the categorical variable.

For our model using the centered parameterization and effects coding, the effect at the center (age = 15) at typical values of sex is simply the intercept: the mean weight across both sexes at age 15.

effects(Dict(:age => [15]), mod_centered)

1×5 DataFrame

Row	age	weight	err	lower	upper
	Int64	Float64	Float64	Float64	Float64
1	15	111.84	0.335679	111.505	112.176

If we had an imbalanced dataset, then averaging across all values of the contrasts would be weighted towards the levels (here: sex) with more observations. This makes sense: the category with more observations is in some sense more "typical" for that dataset.

# create imbalance
growthdata2 = growthdata[5:end, :]
mod_imbalance = lm(@formula(weight ~ 1 + sex * age), growthdata2; contrasts=contrasts)
effects(Dict(:age => [15]), mod_imbalance)

1×5 DataFrame

Row	age	weight	err	lower	upper
	Int64	Float64	Float64	Float64	Float64
1	15	110.728	0.712056	110.016	111.44

Finally, we note that the user can specify alternative functions for computing the typical values. For example, specifying the mode for the imbalanced dataset results in effects estimates for the most frequent level of sex, i.e. female:

using StatsBase
effects(Dict(:age => [15]), mod_imbalance; typical=mode)

1×5 DataFrame

Row	age	weight	err	lower	upper
	Int64	Float64	Float64	Float64	Float64
1	15	107.58	0.48853	107.091	108.068

Note that these should be scalar valued functions, so we can use minimum or maximum but not extrema:

effects(Dict(:sex => ["female", "male"]), mod_imbalance; typical=maximum)

2×5 DataFrame

Row	sex	weight	err	lower	upper
	String	Float64	Float64	Float64	Float64
1	female	124.627	0.746242	123.881	125.373
2	male	154.08	0.96724	153.113	155.047

using Test
@test_throws ArgumentError effects(Dict(:sex => ["female", "male"]), mod_imbalance; typical=extrema)

Test Passed
      Thrown: ArgumentError