API Documentation
The AbstractClassifier
Interface
Lighthouse.AbstractClassifier
— TypeAbstractClassifier
An abstract type whose subtypes C<:AbstractClassifier
must implement:
Subtypes may additionally overload default implementations for:
The AbstractClassifier
interface is built upon the expectation that any multiclass label will be represented in one of two standardized forms:
- "soft label": a probability distribution vector where the
i
th element is the probability assigned to thei
th class inclasses(classifier)
. - "hard label": the interger index of a corresponding class in
classes(classifier)
.
Internally, Lighthouse converts hard labels to soft labels via onehot
and soft labels to hard labels via onecold
.
See also: learn!
Lighthouse.classes
— FunctionLighthouse.classes(classifier::AbstractClassifier)
Return a Vector
or Tuple
of class values for classifier
.
This method must be implemented for each AbstractClassifier
subtype.
Lighthouse.train!
— FunctionLighthouse.train!(classifier::AbstractClassifier, batches, logger)
Train classifier
on the iterable batches
for a single epoch. This function is called once per epoch by learn!
.
This method must be implemented for each AbstractClassifier
subtype. Implementers should ensure that the training loss is properly logged to logger
by calling Lighthouse.log_value!(logger, "train/loss_per_batch", batch_loss)
for each batch in batches
.
Lighthouse.loss_and_prediction
— FunctionLighthouse.loss_and_prediction(classifier::AbstractClassifier,
input_batch::AbstractArray,
args...)
Return (loss, soft_label_batch)
given input_batch
and any additional args
provided by the caller; loss
is a scalar, which soft_label_batch
is a matrix with length(classes(classifier))
rows and size(input_batch)
.
Specifically, the i
th column of soft_label_batch
is classifier
's soft label prediction for the i
th sample in input_batch
.
This method must be implemented for each AbstractClassifier
subtype.
Lighthouse.onehot
— FunctionLighthouse.onehot(classifier::AbstractClassifier, hard_label)
Return the one-hot encoded probability distribution vector corresponding to the given hard_label
. hard_label
must be an integer index in the range 1:length(classes(classifier))
.
Lighthouse.onecold
— FunctionLighthouse.onecold(classifier::AbstractClassifier, soft_label)
Return the hard label (integer index in the range 1:length(classes(classifier))
) corresponding to the given soft_label
(one-hot encoded probability distribution vector).
By default, this function returns argmax(soft_label)
.
Lighthouse.is_early_stopping_exception
— FunctionLighthouse.is_early_stopping_exception(classifier::AbstractClassifier, exception)
Return true
if exception
should be considered an "early-stopping exception" (e.g. Flux.Optimise.StopException
), rather than rethrown from learn!
.
This function returns false
by default, but can be overloaded by subtypes of AbstractClassifier
that employ exceptions as early-stopping mechanisms.
The learn!
Interface
Lighthouse.learn!
— Functionlearn!(model::AbstractClassifier, logger,
get_train_batches, get_test_batches, votes,
elected=majority.(eachrow(votes), (1:length(classes(model)),));
epoch_limit=100, post_epoch_callback=(_ -> nothing),
optimal_threshold_class::Union{Nothing,Integer}=nothing,
test_set_logger_prefix="test_set")
Return model
after optimizing its parameters across multiple epochs of training and test, logging Lighthouse's standardized suite of classifier performance metrics to logger
throughout the optimization process.
The following phases are executed at each epoch (note: in the below lists of logged values, $resource
takes the values of the field names of Lighthouse.ResourceInfo
):
Train
model
by callingtrain!(model, get_train_batches(), logger)
. The following quantities are logged tologger
during this phase:train/loss_per_batch
- any additional quantities logged by the relevant model/framework-specific implementation of
train!
.
Compute
model
's predictions on test set provided byget_test_batches()
(see below for details). The following quantities are logged tologger
during this phase:<test_set_logger_prefix>_prediction/loss_per_batch
<test_set_logger_prefix>_prediction/mean_loss_per_epoch
<test_set_logger_prefix>_prediction/$resource_per_batch
Compute a battery of metrics to evaluate
model
's performance on the test set based on the test set prediction phase. The following quantities are logged tologger
during this phase:<test_set_logger_prefix>_evaluation/metrics_per_epoch
<test_set_logger_prefix>_evaluation/$resource_per_epoch
Call
post_epoch_callback(current_epoch)
.
Where...
get_train_batches
is a zero-argument function that returns an iterable of training set batches. Internally,learn!
uses this function when it callstrain!(model, get_train_batches(), logger)
.get_test_batches
is a zero-argument function that returns an iterable of test set batches used during the current epoch's test phase. Each element of the iterable takes the form(batch, votes_locations)
. Internally,batch
is passed toloss_and_prediction
asloss_and_prediction(model, batch...)
, andvotes_locations[i]
is expected to yield the row index ofvotes
that corresponds to thei
th sample inbatch
.votes
is a matrix of hard labels whose columns correspond to voters and whose rows correspond to the samples in the test set that have been voted on. Ifvotes[sample, voter]
is not a valid hard label formodel
, thenvoter
will simply be considered to have not assigned a hard label tosample
.elected
is a vector of hard labels where thei
th element is the hard label elected as "ground truth" out ofvotes[i, :]
.optimal_threshold_class
is the class index (1
or2
) for which to calculate an optimal threshold for convertingpredicted_soft_labels
topredicted_hard_labels
. This is only a valid parameter whenlength(classes) == 2
. Ifoptimal_threshold_class
is present, test set evaluation will be based on predicted hard labels calculated with this threshold; ifoptimal_threshold_class
isnothing
, predicted hard labels will be calculated viaonecold(classifier, soft_label)
.
Lighthouse.evaluate!
— Functionevaluate!(predicted_hard_labels::AbstractVector,
predicted_soft_labels::AbstractMatrix,
elected_hard_labels::AbstractVector,
classes, logger;
logger_prefix, logger_suffix,
votes::Union{Nothing,AbstractMatrix}=nothing,
thresholds=0.0:0.01:1.0,
optimal_threshold_class::Union{Nothing,Integer}=nothing)
Return nothing
after computing and logging a battery of classifier performance metrics that each compare predicted_soft_labels
and/or predicted_hard_labels
agaist elected_hard_labels
.
The following quantities are logged to logger
: - <logger_prefix>/metrics<logger_suffix>
- <logger_prefix>/$resource<logger_suffix>
Where...
predicted_soft_labels
is a matrix of soft labels whose columns correspond to classes and whose rows correspond to samples in the evaluation set.predicted_hard_labels
is a vector of hard labels where thei
th element is the hard label predicted by the model for samplei
in the evaulation set.elected_hard_labels
is a vector of hard labels where thei
th element is the hard label elected as "ground truth" for samplei
in the evaulation set.thresholds
are the range of thresholds used by metrics (e.g. PR curves) that are calculated on thepredicted_soft_labels
for a range of thresholds.votes
is a matrix of hard labels whose columns correspond to voters and whose rows correspond to the samples in the test set that have been voted on. Ifvotes[sample, voter]
is not a valid hard label formodel
, thenvoter
will simply be considered to have not assigned a hard label tosample
.optimal_threshold_class
is the class index (1
or2
) for which to calculate an optimal threshold for converting thepredicted_soft_labels
topredicted_hard_labels
. If present, the inputpredicted_hard_labels
will be ignored and newpredicted_hard_labels
will be recalculated from the new threshold. This is only a valid parameter whenlength(classes) == 2
Lighthouse.predict!
— Functionpredict!(model::AbstractClassifier,
predicted_soft_labels::AbstractMatrix,
batches, logger::LearnLogger;
logger_prefix::AbstractString)
Return mean_loss
of all batches
after using model
to predict their soft labels and storing those results in predicted_soft_labels
.
The following quantities are logged to logger
:
<logger_prefix>/loss_per_batch
<logger_prefix>/mean_loss_per_epoch
<logger_prefix>/$resource_per_batch
Where...
model
is a model that outputs soft labels when called on a batch ofbatches
,model(batch)
.predicted_soft_labels
is a matrix whose columns correspond to classes and whose rows correspond to samples in batches, and which is filled in with soft-label predictions.batches
is an iterable of batches, where each element of the iterable takes the form(batch, votes_locations)
. Internally,batch
is passed toloss_and_prediction
asloss_and_prediction(model, batch...)
.
The logging interface
The following "primitives" must be defined for a logger to be used with Lighthouse:
Lighthouse.log_value!
— Functionlog_value!(logger, field::AbstractString, value)
Log a value value
to field
.
Lighthouse.log_line_series!
— Functionlog_line_series!(logger, field::AbstractString, curves, labels=1:length(curves))
Logs a series plot to logger
under field
, where...
curves
is an iterable of the formTuple{Vector{Real},Vector{Real}}
, where each tuple contains(x-values, y-values)
, as in theLighthouse.EvaluationV1
fieldper_class_roc_curves
labels
is the class label for each curve, which defaults to the numeric index of each curve.
Lighthouse.log_plot!
— Functionlog_plot!(logger, field::AbstractString, plot, plot_data)
Log a plot
to logger
under field field
.
plot
: the plot itselfplot_data
: an unstructured dictionary of values used in creatingplot
.
See also log_line_series!
.
Lighthouse.step_logger!
— Functionstep_logger!(logger)
Increments the logger
's step
, if any. Defaults to doing nothing.
in addition to Base.flush(logger)
(which can be a no-op by defining Base.flush(::MyLoggingType) = nothing
).
These primitives can be used in implementations of train!
, evaluate!
, and predict!
, as well as in the following composite logging functions, which by default call the above primitives. Loggers may provide custom implementations of these.
Lighthouse.log_event!
— Functionlog_event!(logger, value::AbstractString)
Logs a string event given by value
to logger
. Defaults to calling log_value!
with a field named event
.
Lighthouse.log_evaluation_row!
— Functionlog_evaluation_row!(logger, field::AbstractString, metrics)
From fields in EvaluationV1
, generate and plot the composite evaluation_metrics_plot
as well as spearman_correlation
(if present).
Lighthouse.log_values!
— Functionlog_values!(logger, values)
Logs an iterable of (field, value)
pairs to logger
. Falls back to calling log_value!
in a loop. Loggers may specialize this method for improved performance.
Lighthouse.log_array!
— Functionlog_array!(logger::Any, field::AbstractString, value)
Log an array value
to field
.
Defaults to log_value!(logger, mean(value))
.
Lighthouse.log_arrays!
— Functionlog_arrays!(logger, values)
Logs an iterable of (field, array)
pairs to logger
. Falls back to calling log_array!
in a loop.
Loggers may specialize this method for improved performance.
LearnLogger
s
LearnLoggers
are a Tensorboard-backed logger which comply with the above logging interface. They also support additional callback functionality with upon
:
Lighthouse.LearnLogger
— TypeLearnLogger
A struct that wraps a TensorBoardLogger.TBLogger
in order to enforce the following:
- all values logged to Tensorboard should be accessible to the
post_epoch_callback
argument tolearn!
- all values that are cached during
learn!
should be logged to Tensorboard
To access values logged to a LearnLogger
instance, inspect the instance's logged
field.
Lighthouse.upon
— Functionupon(logger::LearnLogger, field::AbstractString; condition, initial)
upon(logged::Dict{String,Any}, field::AbstractString; condition, initial)
Return a closure that can be called to check the most recent state of logger.logged[field]
and trigger a caller-provided function when condition(recent_state, previously_chosen_state)
is true
.
For example:
upon_loss_decrease = upon(logger, "test_set_prediction/mean_loss_per_epoch";
condition=<, initial=Inf)
save_upon_loss_decrease = _ -> begin
upon_loss_decrease(new_lowest_loss -> save_my_model(model, new_lowest_loss),
consecutive_failures -> consecutive_failures > 10 && Flux.stop())
end
learn!(model, logger, get_train_batches, get_test_batches, votes;
post_epoch_callback=save_upon_loss_decrease)
Specifically, the form of the returned closure is f(on_true, on_false)
where on_true(state)
is called if condition(state, previously_chosen_state)
is true
. Otherwise, on_false(consecutive_falses)
is called where consecutive_falses
is the number of condition
calls that have returned false
since the last condition
call returned true
.
Note that the returned closure is a no-op if logger.logged[field]
has not been updated since the most recent call.
Lighthouse.forward_logs
— Functionforwarding_task = forward_logs(channel, logger::LearnLogger)
Forwards logs with values supported by TensorBoardLogger
to logger::LearnLogger
:
- string events of type
AbstractString
- scalars of type
Union{Real,Complex}
- plots that
TensorBoardLogger
can convert to raster images
returns the forwarding_task:::Task
that does the forwarding. To cleanly stop forwarding, close(channel)
and wait(forwarding_task)
.
outbox is a Channel or RemoteChannel of Pair{String, Any} field names starting with "plot" forward to TensorBoardLogger.log_image
Base.flush
— MethodBase.flush(logger::LearnLogger)
Persist possibly transient logger state.
Performance Metrics
Lighthouse.confusion_matrix
— Functionconfusion_matrix(class_count::Integer, hard_label_pairs = ())
Given the iterable hard_label_pairs
whose k
th element takes the form (first_classifiers_label_for_sample_k, second_classifiers_label_for_sample_k)
, return the corresponding confusion matrix where matrix[i, j]
is the number of samples that the first classifier labeled i
and the second classifier labeled j
.
Note that the returned confusion matrix can be updated in-place with new labels via Lighthouse.increment_at!(matrix, more_hard_label_pairs)
.
Lighthouse.accuracy
— Functionaccuracy(confusion::AbstractMatrix)
Returns the percentage of matching classifications out of total classifications, or NaN
if all(iszero, confusion)
.
Note that accuracy(confusion)
is equivalent to overall percent agreement between confusion
's row classifier and column classifier.
Lighthouse.binary_statistics
— Functionbinary_statistics(confusion::AbstractMatrix, class_index)
Treating the rows of confusion
as corresponding to predicted classifications and the columns as corresponding to true classifications, return a NamedTuple
with the following fields for the given class_index
:
predicted_positives
predicted_negatives
actual_positives
actual_negatives
true_positives
true_negatives
false_positives
false_negatives
true_positive_rate
true_negative_rate
false_positive_rate
false_negative_rate
precision
f1
Lighthouse.cohens_kappa
— Functioncohens_kappa(class_count, hard_label_pairs)
Return (κ, p₀)
where κ
is Cohen's kappa and p₀
percent agreement given class_count
and hard_label_pairs
(these arguments take the same form as their equivalents in confusion_matrix
).
Lighthouse.calibration_curve
— Functioncalibration_curve(probabilities, bitmask; bin_count=10)
Given probabilities
(the predicted probabilities of the positive class) and bitmask
(a vector of Bool
s indicating whether or not the element actually belonged to the positive class), return (bins, fractions, totals, mean_squared_error)
where:
bins
a vector withbin_count
Pairs
specifying the calibration curve's probability binsfractions
: a vector wherefractions[i]
is the number of values inprobabilities
that falls withinbin[i]
over the total number of values withinbin[i]
, orNaN
if the total number of values inbin[i]
is zero.totals
: a vector wheretotals[i]
the total number of values withinbin[i]
.mean_squared_error
: The mean squared error offractions
vs. an ideal calibration curve.
This method is similar to the corresponding scikit-learn method:
https://scikit-learn.org/stable/modules/generated/sklearn.calibration.calibration_curve.html
Lighthouse.EvaluationV1
— Type@version EvaluationV1 begin
class_labels::Union{Missing,Vector{String}}
confusion_matrix::Union{Missing,Array{Int64,1},Array{Int64,2}} = vec_to_mat(confusion_matrix)
discrimination_calibration_curve::Union{Missing,GenericCurve}
discrimination_calibration_score::Union{Missing,Float64}
multiclass_IRA_kappas::Union{Missing,Float64}
multiclass_kappa::Union{Missing,Float64}
optimal_threshold::Union{Missing,Float64}
optimal_threshold_class::Union{Missing,Int64}
per_class_IRA_kappas::Union{Missing,Vector{Float64}}
per_class_kappas::Union{Missing,Vector{Float64}}
stratified_kappas::Union{Missing,
Vector{@NamedTuple{per_class::Vector{Float64},
multiclass::Float64,
n::Int64}}}
per_class_pr_curves::Union{Missing,Vector{GenericCurve}}
per_class_reliability_calibration_curves::Union{Missing,Vector{GenericCurve}}
per_class_reliability_calibration_scores::Union{Missing,Vector{Float64}}
per_class_roc_aucs::Union{Missing,Vector{Float64}}
per_class_roc_curves::Union{Missing,Vector{GenericCurve}}
per_expert_discrimination_calibration_curves::Union{Missing,Vector{GenericCurve}}
per_expert_discrimination_calibration_scores::Union{Missing,Vector{Float64}}
spearman_correlation::Union{Missing,
@NamedTuple{ρ::Float64, # Note: is rho not 'p' 😢
n::Int64,
ci_lower::Float64,
ci_upper::Float64}}
thresholds::Union{Missing,Vector{Float64}}
end
A Legolas record representing the output metrics computed by evaluation_metrics_record
and evaluation_metrics
.
See Legolas.jl for details regarding Legolas record types.
Lighthouse.ObservationV1
— Type@version ObservationV1 begin
predicted_hard_label::Int64
predicted_soft_labels::Vector{Float32}
elected_hard_label::Int64
votes::Union{Missing,Vector{Int64}}
end
A Legolas record representing the per-observation input values required to compute evaluation_metrics_record
.
Lighthouse.evaluation_metrics
— Functionevaluation_metrics(args...; optimal_threshold_class=nothing, kwargs...)
Return evaluation_metrics_record
after converting output EvaluationV1
into a Dict
. For argument details, see evaluation_metrics_record
.
Lighthouse._evaluation_dict
— Function_evaluation_row_dict(row::EvaluationV1) -> Dict{String,Any}
Convert EvaluationV1
into ::Dict{String, Any}
results, as are output by [
evaluation_metrics](@ref)
(and predated use of EvaluationV1
in Lighthouse <v0.14.0).
Lighthouse.evaluation_metrics_record
— Functionevaluation_metrics_record(observation_table, classes, thresholds=0.0:0.01:1.0;
strata::Union{Nothing,AbstractVector{Set{T}} where T}=nothing,
optimal_threshold_class::Union{Missing,Nothing,Integer}=missing)
evaluation_metrics_record(predicted_hard_labels::AbstractVector,
predicted_soft_labels::AbstractMatrix,
elected_hard_labels::AbstractVector,
classes,
thresholds=0.0:0.01:1.0;
votes::Union{Nothing,Missing,AbstractMatrix}=nothing,
strata::Union{Nothing,AbstractVector{Set{T}} where T}=nothing,
optimal_threshold_class::Union{Missing,Nothing,Integer}=missing)
Returns EvaluationV1
containing a battery of classifier performance metrics that each compare predicted_soft_labels
and/or predicted_hard_labels
agaist elected_hard_labels
.
Where...
predicted_soft_labels
is a matrix of soft labels whose columns correspond to classes and whose rows correspond to samples in the evaluation set.predicted_hard_labels
is a vector of hard labels where thei
th element is the hard label predicted by the model for samplei
in the evaulation set.elected_hard_labels
is a vector of hard labels where thei
th element is the hard label elected as "ground truth" for samplei
in the evaulation set.thresholds
are the range of thresholds used by metrics (e.g. PR curves) that are calculated on thepredicted_soft_labels
for a range of thresholds.votes
is a matrix of hard labels whose columns correspond to voters and whose rows correspond to the samples in the test set that have been voted on. Ifvotes[sample, voter]
is not a valid hard label formodel
, thenvoter
will simply be considered to have not assigned a hard label tosample
.strata
is a vector of sets of (arbitrarily typed) groups/strata for each sample in the evaluation set, ornothing
. If notnothing
, per-class and multiclass kappas will also be calculated per group/stratum.optimal_threshold_class
is the class index (1
or2
) for which to calculate an optimal threshold for converting thepredicted_soft_labels
topredicted_hard_labels
. If present, the inputpredicted_hard_labels
will be ignored and newpredicted_hard_labels
will be recalculated from the new threshold. This is only a valid parameter whenlength(classes) == 2
Alternatively, an observation_table
that consists of rows of type ObservationV1
can be passed in in place of predicted_soft_labels
,predicted_hard_labels
,elected_hard_labels
, and votes
. Supply a function to the keyword argument binarize
which takes as input (soft_label, threshold)
and outputs a Bool
indicating whether or not the class of interest.
See also evaluation_metrics_plot
.
Lighthouse.ClassV1
— Type@version ClassV1 begin
class_index::Union{Int64,Symbol} = check_valid_class(class_index)
class_labels::Union{Missing,Vector{String}}
end
A Legolas record representing a single column class_index
that holds either an integer or the value :multiclass
, and the class names associated to the integer class indices.
Lighthouse.TradeoffMetricsV1
— Type@version TradeoffMetricsV1 > ClassV1 begin
roc_curve::Curve = lift(Curve, roc_curve)
roc_auc::Float64
pr_curve::Curve = lift(Curve, pr_curve)
spearman_correlation::Union{Missing,Float64}
spearman_correlation_ci_upper::Union{Missing,Float64}
spearman_correlation_ci_lower::Union{Missing,Float64}
n_samples::Union{Missing,Int}
reliability_calibration_curve::Union{Missing,Curve} = lift(Curve,
reliability_calibration_curve)
reliability_calibration_score::Union{Missing,Float64}
end
A Legolas record representing metrics calculated over predicted soft labels. See also get_tradeoff_metrics
and get_tradeoff_metrics_binary_multirater
.
Lighthouse.get_tradeoff_metrics
— Functionget_tradeoff_metrics(predicted_soft_labels, elected_hard_labels, class_index;
thresholds, binarize=binarize_by_threshold, class_labels=missing)
Return [TradeoffMetricsV1
] calculated for the given class_index
, with the following fields guaranteed to be non-missing: roc_curve
, roc_auc
, prcurve,
reliabilitycalibrationcurve,
reliabilitycalibrationscore.
Supply a function to the keyword argument binarize
which takes as input `(softlabel, threshold)and outputs a
Boolindicating whether or not the class of interest (
class_index`).
Lighthouse.get_tradeoff_metrics_binary_multirater
— Functionget_tradeoff_metrics_binary_multirater(predicted_soft_labels, elected_hard_labels, class_index;
thresholds, binarize=binarize_by_threshold, class_labels=missing)
Return [TradeoffMetricsV1
] calculated for the given class_index
. In addition to metrics calculated by get_tradeoff_metrics
, additionally calculates spearman_correlation
-based metrics. Supply a function to the keyword argument binarize
which takes as input (soft_label, threshold)
and outputs a Bool
indicating whether or not the class of interest (class_index
).
Lighthouse.HardenedMetricsV1
— Type@version HardenedMetricsV1 > ClassV1 begin
confusion_matrix::Union{Missing,Array{Int64,1},Array{Int64,2}} = vec_to_mat(confusion_matrix)
discrimination_calibration_curve::Union{Missing,Curve} = lift(Curve,
discrimination_calibration_curve)
discrimination_calibration_score::Union{Missing,Float64}
ea_kappa::Union{Missing,Float64}
end
A Legolas record representing metrics calculated over predicted hard labels. See also get_hardened_metrics
, get_hardened_metrics_multirater
, and get_hardened_metrics_multiclass
.
Lighthouse.get_hardened_metrics
— Functionget_hardened_metrics(predicted_hard_labels, elected_hard_labels, class_index;
class_labels=missing)
Return [HardenedMetricsV1
] calculated for the given class_index
, with the following field guaranteed to be non-missing: expert-algorithm agreement (ea_kappa
).
Lighthouse.get_hardened_metrics_multirater
— Functionget_hardened_metrics_multirater(predicted_hard_labels, elected_hard_labels, class_index;
class_labels=missing)
Return [HardenedMetricsV1
] calculated for the given class_index
. In addition to metrics calculated by get_hardened_metrics
, additionally calculates discrimination_calibration_curve
and discrimination_calibration_score
.
Lighthouse.get_hardened_metrics_multiclass
— Functionget_hardened_metrics_multiclass(predicted_hard_labels, elected_hard_labels,
class_count; class_labels=missing)
Return [HardenedMetricsV1
] calculated over all class_count
classes. Calculates expert-algorithm agreement (ea_kappa
) over all classes, as well as the multiclass confusion_matrix
.
Lighthouse.LabelMetricsV1
— Type@version LabelMetricsV1 > ClassV1 begin
ira_kappa::Union{Missing,Float64}
per_expert_discrimination_calibration_curves::Union{Missing,Vector{Curve}} = lift(v -> Curve.(v),
per_expert_discrimination_calibration_curves)
per_expert_discrimination_calibration_scores::Union{Missing,Vector{Float64}}
end
A Legolas record representing metrics calculated over labels provided by multiple labelers. See also get_label_metrics_multirater
and get_label_metrics_multirater_multiclass
.
Lighthouse.get_label_metrics_multirater
— Functionget_label_metrics_multirater(votes, class_index; class_labels=missing)
Return [LabelMetricsV1
] calculated for the given class_index
, with the following field guaranteed to be non-missing: per_expert_discrimination_calibration_curves
, per_expert_discrimination_calibration_scores
, interrater-agreement (ira_kappa
).
Lighthouse.get_label_metrics_multirater_multiclass
— Functionget_label_metrics_multirater_multiclass(votes, class_count; class_labels=missing)
Return [LabelMetricsV1
] calculated over all class_count
classes. Calculates the multiclass interrater agreement (ira_kappa
).
Lighthouse._evaluation_record
— Function_evaluation_record(tradeoff_metrics_table, hardened_metrics_table, label_metrics_table;
optimal_threshold_class=missing, class_labels, thresholds,
optimal_threshold, stratified_kappas=missing)
Helper function to create an EvaluationV1
from tables of constituent Metrics schemas, to support evaluation_metrics_record
:
tradeoff_metrics_table
: table ofTradeoffMetricsV1
shardened_metrics_table
: table ofHardenedMetricsV1
slabel_metrics_table
: table ofLabelMetricsV1
s
Lighthouse._calculate_ea_kappas
— Function_calculate_ea_kappas(predicted_hard_labels, elected_hard_labels, classes)
Return NamedTuple
with keys :per_class_kappas
, :multiclass_kappa
containing the Cohen's Kappa per-class and over all classes, respectively. The value of output key :per_class_kappas
is an Array
such that item i
is the Cohen's kappa calculated for class i
.
Where...
predicted_hard_labels
is a vector of hard labels where thei
th element is the hard label predicted by the model for samplei
in the evaulation set.elected_hard_labels
is a vector of hard labels where thei
th element is the hard label elected as "ground truth" for samplei
in the evaulation set.class_count
is the number of possible classes.
Lighthouse._calculate_ira_kappas
— Function_calculate_ira_kappas(votes, classes)
Return NamedTuple
with keys :per_class_IRA_kappas
, :multiclass_IRA_kappas
containing the Cohen's Kappa for inter-rater agreement (IRA) per-class and over all classes, respectively. The value of output key :per_class_IRA_kappas
is an Array
such that item i
is the IRA kappa calculated for class i
.
Where...
votes
is a matrix of hard labels whose columns correspond to voters and whose rows correspond to the samples in the test set that have been voted on. Ifvotes[sample, voter]
is not a valid hard label formodel
, thenvoter
will simply be considered to have not assigned a hard label tosample
.classes
all possible classes voted on.
Returns (per_class_IRA_kappas=missing, multiclass_IRA_kappas=missing)
if votes
has only a single voter (i.e., a single column) or if no two voters rated the same sample. Note that vote entries of 0
are taken to mean that the voter did not rate that sample.
Lighthouse._calculate_spearman_correlation
— Function_calculate_spearman_correlation(predicted_soft_labels, votes, classes)
Return NamedTuple
with keys :ρ
, :n
, :ci_lower
, and ci_upper
that are the Spearman correlation constant ρ and its 95% confidence interval bounds. Only valid for binary classification problems (i.e., length(classes) == 2
)
Where...
predicted_soft_labels
is a matrix of soft labels whose columns correspond to the two classes and whose rows correspond to the samples in the test set that have been classified. For a given sample, the two class column values must sum to 1 (i.e., softmax has been applied to the classification output).votes
is a matrix of hard labels whose columns correspond to voters and whose rows correspond to the samples in the test set that have been voted on. Ifvotes[sample, voter]
is not a valid hard label formodel
, thenvoter
will simply be considered to have not assigned a hard label tosample
. May contain a single voter (i.e., a single column).classes
are the two classes voted on.
Utilities
Lighthouse.majority
— Functionmajority([rng::AbstractRNG=Random.GLOBAL_RNG], hard_labels, among::UnitRange)
Return the majority label within among
out of hard_labels
:
julia> majority([1, 2, 1, 3, 2, 2, 3], 1:3)
2
julia> majority([1, 2, 1, 3, 2, 2, 3, 4], 3:4)
3
In the event of a tie, a winner is randomly selected from the tied labels via rng
.
Lighthouse.area_under_curve
— Functionarea_under_curve(x, y)
Calculates the area under the curve specified by the x
vector and y
vector using the trapezoidal rule. If inputs are empty, return NaN
. Excludes NaN entries.
Lighthouse.area_under_curve_unit_square
— Functionarea_under_curve_unit_square(x, y)
Calculates the area under the curve specified by the x
vector and y
vector for a unit square, using the trapezoidal rule. If inputs are empty, return missing
.
Lighthouse.Curve
— TypeCurve(x, y)
Represents a (plot) curve of x
and y
points.
When constructing a Curve
, missing
's are replaced with NaN
, and values are converted to Float64
. Curve objects c
support iteration, x, y = c
, and indexing, x = c[1]
, y = c[2]
.