Effects.jl

Regression models are useful but they can be tricky to interpret. Variable centering and contrast coding can obscure the meaning of main effects. Interaction terms, especially higher order ones, only increase the difficulty of interpretation. Here, we introduce Effects.jl which translates the fitted model, including estimated uncertainty, back into data space. Using Effects.jl, it is possible to generate effects plots that enable rapid visualization and interpretation of regression models.

The examples below demonstrate the use of Effects.jl with GLM.jl, but they will work with any modeling package that is based on the StatsModels.jl formula. The second example is borrowed in no small part from StandardizedPredictors.jl.

The Effect of Contrast Coding

Let's consider a synthetic dataset of weights (in grams) for chicks feed different types of feed, with a single predictor feed (categorical, with three levels A, B, C). The simulated weights are based loosely on the R dataset chickwts.

using AlgebraOfGraphics, CairoMakie, DataFrames, Effects, GLM, StatsModels, StableRNGs
rng = StableRNG(42)
reps = 5
sd = 50
wtdat = DataFrame(; feed=repeat(["A", "B", "C"], inner=reps),
                  weight=[180 .+ sd * randn(rng, reps);
                          220 .+ sd * randn(rng, reps);
                          300 .+ sd * randn(rng, reps)])
15×2 DataFrame
Rowfeedweight
StringFloat64
1A146.487
2A202.356
3A248.682
4A245.477
5A186.304
6B254.197
7B169.04
8B180.324
9B308.736
10B284.867
11C217.807
12C339.722
13C234.517
14C298.147
15C353.616

If we fit a linear model to this data using default treatment/dummy coding, then the coefficient corresponding of feed C is significant.

mod_treat = lm(@formula(weight ~ 1 + feed), wtdat)
StatsModels.TableRegressionModel{GLM.LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}}}}, Matrix{Float64}}

weight ~ 1 + feed

Coefficients:
───────────────────────────────────────────────────────────────────────
                Coef.  Std. Error     t  Pr(>|t|)  Lower 95%  Upper 95%
───────────────────────────────────────────────────────────────────────
(Intercept)  205.861      25.075   8.21    <1e-05  151.227      260.495
feed: B       33.5719     35.4614  0.95    0.3625  -43.6919     110.836
feed: C       82.9008     35.4614  2.34    0.0375    5.63696    160.165
───────────────────────────────────────────────────────────────────────

If on the other hand, we use effects (sum-to-zero) coding, then the coefficient for feed C is no longer significant:

mod_eff = lm(@formula(weight ~ 1 + feed), wtdat; contrasts=Dict(:feed => EffectsCoding()))
StatsModels.TableRegressionModel{GLM.LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}}}}, Matrix{Float64}}

weight ~ 1 + feed

Coefficients:
──────────────────────────────────────────────────────────────────────────
                 Coef.  Std. Error      t  Pr(>|t|)   Lower 95%  Upper 95%
──────────────────────────────────────────────────────────────────────────
(Intercept)  244.685       14.4771  16.90    <1e-09  213.143      276.228
feed: B       -5.25232     20.4737  -0.26    0.8019  -49.8606      39.356
feed: C       44.0766      20.4737   2.15    0.0524   -0.531742    88.6849
──────────────────────────────────────────────────────────────────────────

This is in some sense unsurprising: the different coding schemes correspond to different hypotheses. In treatment coding, the hypothesis for the coefficient feed: C is whether feed C differs from the reference level, feed A. In effects coding, the hypothesis is whether feed C differs from the mean across all levels. In more complicated models, the hypotheses being tested – especially for interaction terms – can become more complex and difficult to "read off" from the model summary.

In spite of these differences, these models make the same predictions about the data:

fitted(mod_treat) ≈ fitted(mod_eff)
true

At a deep level, these models are the actually same model, but with different parameterizations. In order to get a better view about what a model is saying about the data, abstracted away from the parameterization, we can see what the model looks like in data space. For that, we can use Effects.jl to generate the effects that the model is capturing. We do this by specifying a (subset of the) design and creating a reference grid, then computing the model's prediction and associated error at those values.

The effects function will compute the reference grid for a crossed design specified by a dictionary of values. As we only have one predictor in this dataset, the design is completely crossed.

design = Dict(:feed => unique(wtdat.feed))
eff_feed = effects(design, mod_eff)
3×5 DataFrame
Rowfeedweighterrlowerupper
StringFloat64Float64Float64Float64
1A205.86125.075180.786230.936
2B239.43325.075214.358264.508
3C288.76225.075263.687313.837

The effects table consists of four columns: the levels of the feed predictor specified in the design (feed), the prediction of the model at those levels (weight), the standard error of those predictions err, and the lower and upper edges of confidence interval of those predictions (lower, upper; computed using a normal approximation based on the standard error).

Because the effects are on the data (response) scale, they are invariant to parameterization:

effects(design, mod_treat)
3×5 DataFrame
Rowfeedweighterrlowerupper
StringFloat64Float64Float64Float64
1A205.86125.075180.786230.936
2B239.43325.075214.358264.508
3C288.76225.075263.687313.837

We can also compute effects on a subset of the levels in the data:

effects(Dict(:feed => ["A"]), mod_eff)
1×5 DataFrame
Rowfeedweighterrlowerupper
StringFloat64Float64Float64Float64
1A205.86125.075180.786230.936

Note that in this case, we computed the average effect of the level A, which is not directly encoded in any single model coefficient in this parameterization / contrast coding.

We can of course use the estimated effects to get a nice visual of the data:

plt = data(eff_feed) * mapping(:feed, :weight) * (visual(Scatter) + mapping(:err) * visual(Errorbars))
draw(plt)

Interaction Terms in Effects

Let's consider a (slightly) synthetic dataset of weights for adolescents of different ages, with predictors age (continuous, from 13 to 20) and sex, and weight in pounds. The weights are based loosely on the medians from the CDC growth charts, which show that the median male and female both start off around 100 pounds at age 13, but by age 20 the median male weighs around 155 pounds while the median female weighs around 125 pounds.

using AlgebraOfGraphics, CairoMakie, DataFrames, Effects, GLM, StatsModels, StableRNGs
rng = StableRNG(42)
growthdata = DataFrame(; age=[13:20; 13:20],
                       sex=repeat(["male", "female"], inner=8),
                       weight=[range(100, 155; length=8); range(100, 125; length=8)] .+ randn(rng, 16))
16×3 DataFrame
Rowagesexweight
Int64StringFloat64
113male99.3297
214male108.304
315male117.088
416male124.881
517male131.555
618male139.97
719male146.124
820male154.206
913female101.775
1014female104.869
1115female105.499
1216female111.509
1317female112.976
1418female117.82
1519female122.501
1620female124.603

In this dataset, there's obviously a main effect of sex: males are heavier than females for every age except 13 years. But if we run a basic linear regression, we see something rather different:

mod_uncentered = lm(@formula(weight ~ 1 + sex * age), growthdata)
StatsModels.TableRegressionModel{GLM.LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}}}}, Matrix{Float64}}

weight ~ 1 + sex + age + sex & age

Coefficients:
──────────────────────────────────────────────────────────────────────────────
                     Coef.  Std. Error       t  Pr(>|t|)  Lower 95%  Upper 95%
──────────────────────────────────────────────────────────────────────────────
(Intercept)       56.4387     2.88762    19.55    <1e-09   50.1471    62.7303
sex: male        -56.1509     4.08372   -13.75    <1e-07  -65.0485   -47.2532
age                3.40941    0.173344   19.67    <1e-09    3.03172    3.78709
sex: male & age    4.31146    0.245146   17.59    <1e-09    3.77734    4.84559
──────────────────────────────────────────────────────────────────────────────

Is this model just a poor fit to the data? We can plot the effects and see that's not the case. For example purposes, we'll create a reference grid that does not correspond to a fully balanced design and call effects! to insert the effects-related columns. In particular, we'll take odd ages for males and even ages for females.

refgrid = copy(growthdata)
filter!(refgrid) do row
    return mod(row.age, 2) == (row.sex == "male")
end
effects!(refgrid, mod_uncentered)
8×4 DataFrame
Rowagesexweighterr
Int64StringFloat64Float64
113male100.6590.72515
215male116.1010.474722
317male131.5430.406528
419male146.9840.587838
514female104.170.587838
616female110.9890.406528
718female117.8080.474722
820female124.6270.72515

Note that the column corresponding to the response variable weight has been overwritten with the effects prediction and that only the standard error is provided: the effects! method does less work than the effects convenience method.

We can add the confidence interval bounds in and plot our predictions:

refgrid[!, :lower] = @. refgrid.weight - 1.96 * refgrid.err
refgrid[!, :upper] = @. refgrid.weight + 1.96 * refgrid.err
sort!(refgrid, [:age])

plt = data(refgrid) * mapping(:age, :weight; lower=:lower, upper=:upper, color=:sex) *
      (visual(Lines) + visual(LinesFill))
draw(plt)

We can also add in the raw data to check the model fit:

draw(plt + data(growthdata) * mapping(:age, :weight; color=:sex) * visual(Scatter))

The model seems to be doing a good job. Indeed it is, and as pointed out in the StandardizedPredictors.jl docs, the problem is that we should center the age variable. While we're at it, we'll also set the contrasts for sex to be effects coded.

using StandardizedPredictors
contrasts = Dict(:age => Center(15), :sex => EffectsCoding())
mod_centered = lm(@formula(weight ~ 1 + sex * age), growthdata; contrasts=contrasts)
StatsModels.TableRegressionModel{GLM.LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}}}}, Matrix{Float64}}

weight ~ 1 + sex + age(centered: 15) + sex & age(centered: 15)

Coefficients:
────────────────────────────────────────────────────────────────────────────────────────────
                                   Coef.  Std. Error       t  Pr(>|t|)  Lower 95%  Upper 95%
────────────────────────────────────────────────────────────────────────────────────────────
(Intercept)                    111.84       0.335679  333.18    <1e-24  111.109    112.572
sex: male                        4.26052    0.335679   12.69    <1e-07    3.52914    4.99191
age(centered: 15)                5.56514    0.122573   45.40    <1e-14    5.29808    5.8322
sex: male & age(centered: 15)    2.15573    0.122573   17.59    <1e-09    1.88867    2.42279
────────────────────────────────────────────────────────────────────────────────────────────

All of the estimates have now changed because the parameterization is completely different, but the predictions and thus the effects remain unchanged:

refgrid_centered = copy(growthdata)
effects!(refgrid_centered, mod_centered)
refgrid_centered[!, :lower] = @. refgrid_centered.weight - 1.96 * refgrid_centered.err
refgrid_centered[!, :upper] = @. refgrid_centered.weight + 1.96 * refgrid_centered.err
sort!(refgrid_centered, [:age])

plt = data(refgrid_centered) * mapping(:age, :weight; lower=:lower, upper=:upper, color=:sex) *
      (visual(Lines) + visual(LinesFill))
draw(plt)

Understanding lower-level terms in the presence of interactions can be particularly tricky, and effect plots are also useful for this. For example, if we want to examine the effect of sex at a typical age, then we would need some way to reduce age to typical values. By default, effects[!] will take use the mean of all model terms not specified in the effects formula as representative values. Looking at sex, we see that

design = Dict(:sex => unique(growthdata.sex))
effects(design, mod_uncentered)
2×5 DataFrame
Rowsexweighterrlowerupper
StringFloat64Float64Float64Float64
1male127.6820.397181127.285128.079
2female112.6940.397181112.297113.091

correspond to the model's estimate of weights for each sex at the average age in the dataset. Like all effects predictions, this is invariant to contrast coding:

effects(design, mod_centered)
2×5 DataFrame
Rowsexweighterrlowerupper
StringFloat64Float64Float64Float64
1male127.6820.397181127.285128.079
2female112.6940.397181112.297113.091

Categorical Variables and Typical Values

We can also compute effects for the different ages at typical values of sex for this dataset. For categorical variables, the typical values are computed on the contrasts themselves. For balanced datasets and the default use of the mean as typical values, this reflects the average across all levels of the categorical variable.

For our model using the centered parameterization and effects coding, the effect at the center (age = 15) at typical values of sex is simply the intercept: the mean weight across both sexes at age 15.

effects(Dict(:age => [15]), mod_centered)
1×5 DataFrame
Rowageweighterrlowerupper
Int64Float64Float64Float64Float64
115111.840.335679111.505112.176

If we had an imbalanced dataset, then averaging across all values of the contrasts would be weighted towards the levels (here: sex) with more observations. This makes sense: the category with more observations is in some sense more "typical" for that dataset.

# create imbalance
growthdata2 = growthdata[5:end, :]
mod_imbalance = lm(@formula(weight ~ 1 + sex * age), growthdata2; contrasts=contrasts)
effects(Dict(:age => [15]), mod_imbalance)
1×5 DataFrame
Rowageweighterrlowerupper
Int64Float64Float64Float64Float64
115110.7280.712056110.016111.44

Finally, we note that the user can specify alternative functions for computing the typical values. For example, specifying the mode for the imbalanced dataset results in effects estimates for the most frequent level of sex, i.e. female:

using StatsBase
effects(Dict(:age => [15]), mod_imbalance; typical=mode)
1×5 DataFrame
Rowageweighterrlowerupper
Int64Float64Float64Float64Float64
115107.580.48853107.091108.068

Note that these should be scalar valued functions, so we can use minimum or maximum but not extrema:

effects(Dict(:sex => ["female", "male"]), mod_imbalance; typical=maximum)
2×5 DataFrame
Rowsexweighterrlowerupper
StringFloat64Float64Float64Float64
1female124.6270.746242123.881125.373
2male154.080.96724153.113155.047
using Test
@test_throws ArgumentError effects(Dict(:sex => ["female", "male"]), mod_imbalance; typical=extrema)
Test Passed
      Thrown: ArgumentError