API
Effects.AutoInvLink
Effects.effects
Effects.effects!
Effects.emmeans
Effects.empairs
Effects.expand_grid
Effects.pooled_sem
Effects.AutoInvLink
— TypeAutoInvLink
Singleton type indicating that the inverse link should be automatically determined from the model type.
Automatic inverse link determination is implemented using package extensions, which are available beginning in Julia 1.9.
An error is thrown if the inverse link cannot be determined. This will always occur with Julia versions prior to 1.9, and will otherwise occur when no extension has been loaded that specifies the link function for the model type.
Currently, this is only implemented for GLM.jl and MixedModels.jl
Effects.effects!
— Methodeffects!(reference_grid::DataFrame, model::RegressionModel;
eff_col=nothing, err_col=:err, typical=mean, invlink=identity,
vcov=StatsBase.vcov)
Compute the effects
as specified in formula
.
Effects are the model predictions made using values given via the reference grid. For terms present in the model, but not in the reference grid, then the typical value of those predictors is used. (In other words, effects are conditional on the typical value.) The function for computing typical values is specified via typical
. Note that this is also applied to categorical contrasts, thus yielding an average of the contrast, weighted by the balance of levels in the data set used to fit the model.
typical
can either be a single, scalar-valued function (e.g. mean
) or a dictionary matching term symbols to scalar-valued functions. The use of a dictionary allows specifying different typical
functions for different input variables. In this case, typical
functions must be provided for all term variables except the intercept. If there is a single term that should be "typified" differently than others, then the use of DataStructures.DefaultDict
may be useful to create a default typical
value with only the exception explicitly specified. For example:
using DataStructures
typical = DefaultDict(() -> mean) # default to x -> mean(x)
typical[:sex] = v -> 0.0 # typical value for :sex
By default, the column corresponding to the response variable in the formula is overwritten with the effects, but an alternative column for the effects can be specified by eff_col
. Note that eff_col
is determined first by trying StatsBase.responsename
and then falling back to the string
representation of the model's formula's left-hand side. For models with a transformed response, whether in the original formula specification or via hints/contrasts, the name will be the display name of the resulting term and not the original variable. This convention also has the advantage of highlighting another aspect of the underlying method: effects are computed on the scale of the transformed response. If this column does not exist, it is created. Pointwise standard errors are written into the column specified by err_col
.
By default (invlink=identity
), effects are computed on the scale of the transformed response. For models with an explicit transformation, that transformation is the scale of the effects. For models with a link function, the scale of the effects is the link scale, i.e. after application of the link function. For example, effects for logitistic regression models are on the logit and not the probability scale.
If the inverse link is specified via invlink
, then effects and errors are computed on the original, untransformed scale via the delta method using automatic differentiation. This means that the invlink
function must be differentiable and should not involve inplace operations.
On Julia versions 1.9 or later, the special singleton value AutoInvLink()
can be used to specify that the appropriate inverse link should be determined automatically. In that case, a direct or analytic computation of the derivative is used when possible.
Effects are computed using the model's variance-covariance matrix, which is computed by default using StatsBas.vcov
. Alternative methods such as the sandwich estimator or robust estimators can be used by specifying vcov
, which should be a function of a single argument (the model) returning the estimated variance-covariance matrix. Vcov.jl and CovarianceMatrices.jl provide several possibilities as functions of multiple arguments and so it is necessary to curry when using these functions. For example
using Vcov
myvcov(x) = Base.Fix2(vcov, Vcov.robust())
The reference grid must contain columns for all predictors in the formula. (Interactions are computed automatically.) Contrasts must match the contrasts used to fit the model; using other contrasts will lead to undefined behavior.
Interaction terms are computed in the same way for any regression model: as the product of the lower-order terms. Typical values of lower terms thus propagate up into the interaction term in the same way that any value would.
The use of typical values for excluded effects differs from other approaches such as "partial effects" used in R packages like remef
. The R package effects
is similar in approach, but due to differing languages and licenses, no source code was inspected and there is no attempt at API compatibility or even similarity.
The approach for computing effect is based on the effects plots described here:
Fox, John (2003). Effect Displays in R for Generalised Linear Models. Journal of Statistical Software. Vol. 8, No. 15
See also AutoInvLink
.
Effects.effects
— Methodeffects(design::AbstractDict, model::RegressionModel;
eff_col=nothing, err_col=:err, typical=mean,
lower_col=:lower, upper_col=:upper, invlink=identity,
vcov=StatsBase.vcov, level=nothing)
Compute the effects
as specified by the design
.
This is a convenience wrapper for effects!
. Instead of specifying a reference grid, a dictionary containing the levels/values of each predictor is specified. This is then expanded into a reference grid representing a fully-crossed design. Additionally, two extra columns are created representing the lower and upper edges of the confidence interval. The default level=nothing
corresponds to [resp-err, resp+err], while level=0.95
corresponds to the 95% confidence interval.
See also AutoInvLink
.
Effects.emmeans
— Methodemmeans(model::RegressionModel; eff_col=nothing, err_col=:err,
invlink=identity, levels=Dict(), dof=nothing)
Compute estimated marginal means, a.k.a. least-square (LS) means for a model.
By default, emmeans are computed for each level of each categorical variable along with the means of continuous variables. For centered terms, the center is used instead of the mean. Alternative levels can be specified with levels
.
dof
is used to compute the associated degrees of freedom for a particular margin. For regression models, the appropriate degrees of freedom are the same degrees of freedom as used for the Wald tests on the coefficients. These are typically the residual degrees of freedom (e.g., via dof_residual
). If dof
is a function, then it is called on model
and filled elementwise into the dof
column. Alternatively, dof
can be specified as a single number or vector of appropriate length. For example, in a mixed effect model with a large number of observations, dof=Inf
may be appropriate.
invlink
, eff_col
and err_col
work exactly as in effects!
.
Estimated marginal means are closely related to effects and are also known as least-square means. The functionality here is a convenience wrapper for effects
and maps onto the concept of least-square means as presented in e.g. the SAS documentation. There are several extensions available to estimated marginal means, related to when the marginalization occurs and how cells are weighted, but these are not currently supported. The documentation for the R package emmeans explains the background in more depth.
Effects.empairs
— Methodempairs(model::RegressionModel; eff_col=nothing, err_col=:err,
invlink=identity, levels=Dict(), dof=nothing, padjust=identity)
empairs(df::AbstractDataFrame; eff_col, err_col=:err, padjust=identity)
Compute pairwise differences of estimated marginal means.
The method for AbstractDataFrame
acts on the results of emmeans
, while the method for RegressionModel
is a convenience wrapper that calls emmeans
internally.
The keyword arguments are generally the same as emmeans
.
By default, pairs are computed for all combinations of categorical variables and the means/centers of continuous variables. The contrast for a pair "a" vs "b" is represented in the column for the contrasted variable with a > b
. For variables that are the same within a pair (e.g., a continuous variable), the sole value is displayed as is.
If dof
is not nothing
, then p-values are also computed using a t-distribution with the resulting degrees of freedom. The results for dof=Inf
correspond to using a z-distribution, but the column names in the returned dataframe remain t
and Pr(>|t|)
.
If padjust
is provided, then it is used to compute adjust the p-values for multiple comparisons. MultipleTesting.jl
provides a number of useful possibilities for this.
padjust
is silently ignored if dof
is not provided.
This feature is experimental and the precise column names and presentation of contrasts/differences may change without being considered breaking.
The use of invlink
is subject to a number of interpretation subtleties. The EM means are computed on the scale of the linear predictor, then transformed to the scale of invlink
. The associated errors on the transformed scale are computed via the difference method. These estimates and errors are then used to compute the pairs. Test statistics are computed on the scale of these pairs. In general, these will not be the same as test statistics computed on the original scale. These subleties are discussed in the documentation for the R package emmeans
.
Effects.expand_grid
— Methodexpand_grid(design)
Compute a fully crossed reference grid.
design
can be either a NamedTuple
or a Dict
, where the keys represent the variables, i.e., column names in the resulting grid and the values are vectors of possible values that each variable can take on.
julia> expand_grid(Dict(:x => ["a", "b"], :y => 1:3))
6×2 DataFrame
Row │ y x
│ Int64 String
─────┼───────────────
1 │ 1 a
2 │ 2 a
3 │ 3 a
4 │ 1 b
5 │ 2 b
6 │ 3 b
Effects.pooled_sem
— Methodpooled_sem(sems...)
Compute the pooled standard error of the mean.
This corresponds to the square root of the sum of the the squared SEMs, which in turn corresponds to the weighted root-mean-square of the underlying standard deviations.