Metrics
Evaluation metrics for all three DeepTab task types: regression, classification, and distributional (LSS) regression.
Every metric is a DeepTabMetric subclass with three attributes the
framework reads automatically:
Attribute |
Type |
Purpose |
|---|---|---|
|
|
Key in |
|
|
|
|
|
|
Quick Start
from deeptab.metrics import RootMeanSquaredError, CRPS, Accuracy
rmse = RootMeanSquaredError()
print(rmse.name) # "rmse"
print(rmse.higher_is_better) # False
# Pass to model.fit() for live training logging
from deeptab.models import MambularLSS
model = MambularLSS()
model.fit(
X_train, y_train,
val_metrics={
"crps": CRPS(family="normal"), # logged as "val_crps"
"rmse": RootMeanSquaredError(), # logged as "val_rmse"
},
)
# Post-hoc evaluation
scores = model.evaluate(X_test, y_test)
# Returns e.g. {"crps": 0.32, "rmse": 1.45}
# Auto-select default metrics via the registry
from deeptab.metrics import get_default_metrics
metrics = get_default_metrics("lss", family="normal")
# [CRPS(family='normal'), RootMeanSquaredError(), MeanAbsoluteError()]
Available Metrics
Regression Metrics
Class |
|
|
Default |
Notes |
|---|---|---|---|---|
|
|
sklearn-backed; lower = better |
||
|
|
✓ |
Same units as target; primary regression metric |
|
|
|
✓ |
Robust to outliers |
|
|
|
✓ |
1.0 = perfect; higher = better |
|
|
|
% scale; avoid when targets near zero |
||
|
|
Quantile regression; tau in (0, 1) |
The Default column marks the metrics returned by get_default_metrics("regression")
and reported by model.evaluate() when no metrics argument is given; the
first row (RMSE) is the primary metric used for HPO and model selection.
All regression metrics accept 2-D LSS parameter arrays and extract the first column (predicted mean) automatically.
Classification Metrics
Class |
|
|
Default |
Input |
Notes |
|---|---|---|---|---|---|
|
|
✓ |
labels |
sklearn-backed; argmax of probability array |
|
|
|
labels |
|
||
|
|
✓ |
proba |
Ranking-based; threshold-free |
|
|
|
proba |
Better than AUROC for imbalanced data |
||
|
|
✓ |
proba |
Cross-entropy over class probabilities |
|
|
|
proba |
MSE of probability; binary only |
||
|
|
proba |
0 = perfectly calibrated; custom implementation |
The Default column marks the metrics returned by get_default_metrics("classification").
The Input column shows which prediction model.evaluate() feeds each
metric: proba metrics (auroc, auprc, log_loss, brier, ece)
receive the 2-D predict_proba output, while labels metrics receive the
1-D predict output. The dispatch is automatic, keyed on the metric name.
Distributional / LSS Metrics
Class |
|
|
|
Notes |
|---|---|---|---|---|
|
|
|
Requires distribution object; passes raw logits |
|
|
|
|
= -NLL; higher = better |
|
|
|
|
Vectorised via |
|
|
|
|
Winkler score; expects [lower, upper] columns |
|
|
|
|
Multivariate CRPS generalisation |
|
|
|
|
poisson, zip families |
|
|
|
|
gamma, inversegamma families |
|
|
|
|
tweedie family; |
|
|
|
|
negativebinom family |
|
|
|
|
beta family (proportions) |
|
|
|
|
dirichlet family; KL divergence |
|
|
|
|
studentt family; proper NLL |
|
|
|
|
inversegamma family |
|
|
|
|
lognormal family |
|
|
|
|
Fraction of targets inside prediction interval |
|
|
|
|
Mean interval width; lower = sharper |
|
|
|
|
MAD from uniform CDF; 0 = perfectly calibrated |
Registry
The registry maps (task, family) keys to ordered lists of default metrics.
The first entry in each list is the primary metric used by HPO and model selection.
from deeptab.metrics import get_default_metrics, get_default_metrics_dict
# Returns list of DeepTabMetric instances
get_default_metrics("regression")
# [RootMeanSquaredError(), MeanAbsoluteError(), R2Score()]
get_default_metrics("classification")
# [Accuracy(), AUROC(), LogLoss()]
get_default_metrics("lss", family="gamma")
# [GammaDeviance(), RootMeanSquaredError()]
# Returns {name: metric} dict, useful for model.evaluate()
get_default_metrics_dict("lss", family="normal")
# {"crps": CRPS(...), "rmse": RootMeanSquaredError(), "mae": MeanAbsoluteError()}
Choosing a Distribution-Specific Metric
For continuous point-estimate regression: use RMSE (default) or MAE for outlier-robustness.
For distributional (LSS) models: use CRPS as the primary metric. CRPS is a proper scoring rule: it rewards both accuracy and calibration, so it cannot be gamed by reporting an over-wide predictive distribution.
For count data (poisson, zip, negativebinom): use the appropriate deviance. Deviances are equivalent to twice the log-likelihood ratio against the saturated model and are the standard criterion for GLM-type models.
For probability / composition (beta, dirichlet): use BetaBrierScore or DirichletError.
For uncertainty quantification: combine CRPS with CoverageProbability and SharpnessScore to get a complete picture of calibration and precision.
Writing a Custom Metric
Subclass DeepTabMetric, set name and higher_is_better, then
implement __call__:
from deeptab.metrics import DeepTabMetric
import numpy as np
class MedianAbsoluteError(DeepTabMetric):
name = "mdae"
higher_is_better = False # lower = better
needs_raw = False # use transformed predictions
def __call__(self, y_true, y_pred):
y_pred = np.asarray(y_pred)
mean_pred = y_pred[:, 0] if y_pred.ndim == 2 else y_pred.ravel()
return float(np.median(np.abs(np.asarray(y_true).ravel() - mean_pred)))
# Use it anywhere a standard metric is accepted
model.fit(X_train, y_train, val_metrics={"mdae": MedianAbsoluteError()})
scores = model.evaluate(X_test, y_test, metrics={"mdae": MedianAbsoluteError()})
See Also
Training and Evaluation: training loop and evaluation guide
Uncertainty Quantification: LSS model tutorial with metric examples
Distributions: distribution families reference