deeptab.metrics
Base Class
- class deeptab.metrics.DeepTabMetric[source]
Abstract base class for all DeepTab evaluation metrics.
Every metric in
deeptab.metricssubclasses this ABC and exposes three class-level attributes that the training loop and registry read automatically — you never need to set them yourself when using a metric, only when writing a custom one.- name
A short, machine-readable identifier for the metric. It is used as:
the key in the dict returned by
model.evaluate()the suffix in training-log entries (e.g.
val_rmse)the registry lookup key in
METRIC_REGISTRY
Examples:
"rmse","crps","auroc".- Type:
str
- higher_is_better
Tells the framework whether a larger or smaller value is preferable. This matters in two places:
HPO — hyperparameter search uses it to set the optimisation direction (maximise vs. minimise) when a metric is chosen as the objective.
Early stopping / model selection — callbacks can use it to decide whether a new checkpoint is an improvement.
False(default) means lower is better — appropriate for loss functions and error metrics (MSE, MAE, NLL, deviances).Truemeans higher is better — appropriate for scores like R², accuracy, AUROC, and CRPS variants where a higher value is desirable.- Type:
bool
- needs_raw
Controls which form of
y_predthe training loop passes to this metric.False(default) — the metric receives already-transformed distribution parameters, i.e. the output ofmodel.predict(X, raw=False). For example, a Normal distribution model returns[mean, std]wherestd > 0is guaranteed. This is the right choice for almost every metric.True— the metric receives raw model logits before the distribution’s parameter transforms are applied.NegativeLogLikelihoodsets this toTruebecause it callsdistribution.compute_loss()which applies the transforms itself; passing already-transformed values would double-transform and produce wrong results.
- Type:
bool
Examples
Using a built-in metric directly:
>>> from deeptab.metrics import RootMeanSquaredError >>> import numpy as np >>> metric = RootMeanSquaredError() >>> metric.name 'rmse' >>> metric.higher_is_better False >>> metric(np.array([1.0, 2.0, 3.0]), np.array([1.1, 2.0, 2.9])) 0.08164965809277261Passing metrics to
model.fit()for live training logging:>>> from deeptab.metrics import CRPS, MeanAbsoluteError >>> model.fit(X_train, y_train, ... val_metrics={"crps": CRPS(family="normal"), ... "mae": MeanAbsoluteError()}) # Logs val_crps and val_mae each epoch.Writing a custom metric:
>>> from deeptab.metrics import DeepTabMetric >>> import numpy as np >>> class MedianAbsoluteError(DeepTabMetric): ... name = "mdae" ... higher_is_better = False # lower error = better ... needs_raw = False # use transformed predictions ... ... def __call__(self, y_true, y_pred): ... y_pred = np.asarray(y_pred) ... mean_pred = y_pred[:, 0] if y_pred.ndim == 2 else y_pred.ravel() ... return float(np.median(np.abs(np.asarray(y_true).ravel() - mean_pred)))
Registry
- deeptab.metrics.METRIC_REGISTRY = {'classification': [Accuracy(), AUROC(average='macro'), LogLoss()], 'lss:beta': [BetaBrierScore(), RootMeanSquaredError()], 'lss:categorical': [Accuracy(), LogLoss()], 'lss:dirichlet': [DirichletError()], 'lss:gamma': [GammaDeviance(), RootMeanSquaredError()], 'lss:inversegamma': [InverseGammaDeviance(), GammaDeviance()], 'lss:johnsonsu': [CRPS(family='johnsonsu'), RootMeanSquaredError()], 'lss:lognormal': [LogNormalNLL(), CRPS(family='lognormal'), RootMeanSquaredError()], 'lss:mog': [CRPS(family='normal'), RootMeanSquaredError()], 'lss:multinomial': [LogLoss()], 'lss:negativebinom': [NegativeBinomialDeviance(default_alpha=1.0), RootMeanSquaredError()], 'lss:normal': [CRPS(family='normal'), RootMeanSquaredError(), MeanAbsoluteError()], 'lss:poisson': [PoissonDeviance(), RootMeanSquaredError()], 'lss:quantile': [PinballLoss(quantile=0.5, col=0)], 'lss:studentt': [StudentTLoss(default_df=3.0), CRPS(family='studentt')], 'lss:tweedie': [TweedieDeviance(p=1.5), RootMeanSquaredError()], 'lss:zip': [PoissonDeviance(), RootMeanSquaredError()], 'regression': [RootMeanSquaredError(), MeanAbsoluteError(), R2Score()]}
- deeptab.metrics.get_default_metrics(task, family=None)[source]
Return the default list of metrics for a given task and distribution family.
- Parameters:
task (
str) – One of"regression","classification", or"lss".family (
str|None) – Distribution family key used for LSS tasks, e.g."normal","gamma","poisson". Ignored for non-LSS tasks.
- Returns:
Ordered list of metric instances. The first entry is the primary metric. Returns an empty list when the combination is unknown.
- Return type:
list[DeepTabMetric]
- deeptab.metrics.get_default_metrics_dict(task, family=None)[source]
Like
get_default_metrics()but returns a{name: metric}dict.Convenience wrapper for code paths that store metrics as dicts.
- Return type:
dict[str,DeepTabMetric]
Regression Metrics
- class deeptab.metrics.MeanSquaredError[source]
Mean Squared Error – delegates to
sklearn.metrics.mean_squared_error().Accepts both point-prediction vectors and 2-D parameter arrays (uses the first column as the predicted mean).
- class deeptab.metrics.RootMeanSquaredError[source]
Root Mean Squared Error – sqrt of
sklearn.metrics.mean_squared_error().
- class deeptab.metrics.MeanAbsoluteError[source]
Mean Absolute Error – delegates to
sklearn.metrics.mean_absolute_error().
- class deeptab.metrics.R2Score[source]
Coefficient of Determination (R2) – delegates to
sklearn.metrics.r2_score().Higher is better; perfect prediction gives R2 = 1.
- class deeptab.metrics.MeanAbsolutePercentageError[source]
Mean Absolute Percentage Error – delegates to
sklearn.metrics.mean_absolute_percentage_error().sklearn clips the denominator to
np.finfo(np.float64).epsinternally.
- class deeptab.metrics.PinballLoss(quantile=0.5, col=0)[source]
Pinball (Quantile) Loss – delegates to
sklearn.metrics.mean_pinball_loss().Measures calibration at a single quantile level
tau in (0, 1).For LSS
quantilefamily predictions,y_predis a 2-D array where each column is a predicted quantile. Passcolto select the relevant column (default 0).- Parameters:
quantile (
float) – The quantile level, e.g. 0.5 for the median.col (
int) – Column ofy_predto use when predictions are 2-D. Default 0.
Classification Metrics
- class deeptab.metrics.Accuracy[source]
Classification accuracy – delegates to
sklearn.metrics.accuracy_score().Accepts 1-D integer labels or 2-D probability arrays (argmax is taken).
- class deeptab.metrics.F1Score(average='binary')[source]
F1 Score – delegates to
sklearn.metrics.f1_score().- Parameters:
average (
str) – Averaging strategy:"binary"(default),"macro", or"weighted".
- class deeptab.metrics.AUROC(average='macro')[source]
Area Under the ROC Curve – delegates to
sklearn.metrics.roc_auc_score().- Parameters:
average (
str) –"macro"(default) or"weighted". Ignored for binary tasks.
- class deeptab.metrics.AUPRC[source]
Area Under the Precision-Recall Curve – delegates to
sklearn.metrics.average_precision_score().
- class deeptab.metrics.LogLoss[source]
Cross-Entropy / Log Loss – delegates to
sklearn.metrics.log_loss().
- class deeptab.metrics.BrierScore[source]
Brier Score – delegates to
sklearn.metrics.brier_score_loss().Accepts 1-D probability scores or a 2-D array (second column is used).
- class deeptab.metrics.ExpectedCalibrationError(n_bins=10)[source]
Expected Calibration Error (ECE).
sklearn does not provide ECE natively, so this is a custom implementation. Bins predictions by confidence and measures the gap between mean confidence and accuracy per bin.
- Parameters:
n_bins (
int) – Number of confidence bins. Default 10.
Distributional / LSS Metrics
Proper Scoring Rules
- class deeptab.metrics.NegativeLogLikelihood(distribution)[source]
Negative Log-Likelihood computed via the distribution’s
compute_loss.This metric requires raw model logits (
needs_raw=True) and the distribution family object, becausecompute_lossapplies parameter transforms internally.- Parameters:
distribution (
BaseDistribution) – The fitted distribution object (e.g.model.task_model.family).
- class deeptab.metrics.LogScore(distribution)[source]
Log Score (higher is better = -NLL).
Convenience wrapper around
NegativeLogLikelihood.- Parameters:
distribution (
BaseDistribution) – The fitted distribution object.
- class deeptab.metrics.CRPS(family='normal')[source]
Continuous Ranked Probability Score (CRPS) for univariate distributions.
Uses vectorised
properscoringroutines when available. Falls back to a pure-NumPy energy-form approximation whenproperscoringis not installed.Expected
y_predformat (2-D array, columns are distribution parameters):Normal / StudentT / LogNormal / JohnsonSU —
[loc, scale]All other families —
[mean, ...]; CRPS is approximated from the predicted mean only (less informative).
For the
normalfamily, the exact Gaussian CRPS is computed.- Parameters:
family (
str) – Distribution family key (e.g."normal","studentt"). When provided, enables family-specific CRPS formulas.
- class deeptab.metrics.IntervalScore(alpha=0.05)[source]
Winkler Interval Score at coverage level
1 - alpha.Penalises both width and mis-coverage. Expected
y_predformat:Column 0: lower bound of the prediction interval
Column 1: upper bound of the prediction interval
- Parameters:
alpha (
float) – Significance level, e.g.0.05for a 95% prediction interval.
- class deeptab.metrics.EnergyScore[source]
Energy Score — multivariate generalisation of CRPS.
Suitable for multivariate / compositional distributions (e.g.
MixtureOfGaussiansDistribution,DirichletDistribution).Computed via Monte-Carlo sampling from the predicted distribution when samples are provided, or via a closed-form energy distance otherwise.
For simple use-cases where
y_predis a 2-D parameter array, the energy score is approximated as the mean Euclidean distance betweeny_trueand the predicted mean.
Distribution-Specific Deviances
- class deeptab.metrics.PoissonDeviance[source]
Mean Poisson Deviance.
Suitable for
poissonandzipfamilies. Expectedy_pred: predicted mean (1-D or first column of 2-D).
- class deeptab.metrics.GammaDeviance[source]
Mean Gamma Deviance.
Suitable for
gammaandinversegammafamilies. Expectedy_pred: predicted mean (1-D or first column of 2-D).
- class deeptab.metrics.TweedieDeviance(p=1.5)[source]
Mean Tweedie Deviance.
Suitable for the
tweediefamily where1 < p < 2.- Parameters:
p (
float) – Tweedie power parameter. Defaults to 1.5.
- class deeptab.metrics.NegativeBinomialDeviance(default_alpha=1.0)[source]
Mean Negative-Binomial Deviance.
Suitable for the
negativebinomfamily.Expected
y_pred: 2-D array where column 0 is the predicted meanmuand column 1 (optional) is the overdispersion parameteralpha. If only one column is present,alphafalls back to thedefault_alphaconstructor argument.- Parameters:
default_alpha (
float) – Overdispersion parameter used wheny_predhas only one column. Defaults to1.0.
- class deeptab.metrics.BetaBrierScore[source]
Mean Squared Error of the predicted mean for Beta-distributed targets.
Suitable for the
betafamily. Expectedy_pred: 1-D or first column is predicted mean in (0, 1).
- class deeptab.metrics.DirichletError[source]
Mean KL Divergence between true and predicted Dirichlet means.
Suitable for the
dirichletfamily. Bothy_trueandy_predare treated as probability vectors (rows must sum to 1 after clipping).
- class deeptab.metrics.StudentTLoss(default_df=3.0)[source]
Proper Student-T negative log-likelihood (mean) for the
studenttfamily.Expected
y_predcolumns:[loc, scale, (df)]. If only 2 columns are present,dfdefaults to the constructor argument.- Parameters:
default_df (
float) – Degrees-of-freedom fallback when not present iny_pred. Defaults to 3.0.
- class deeptab.metrics.InverseGammaDeviance[source]
Mean Inverse-Gamma deviance for the
inversegammafamily.Expected
y_predcolumns:[shape (alpha), scale (beta)].The deviance is computed as
-2 * (log p(y | alpha, beta) - log p(y | alpha_sat, beta_sat))where the saturated model likelihood equals 1 (per-sample deviance).
- class deeptab.metrics.LogNormalNLL[source]
Mean Log-Normal Negative Log-Likelihood for the
lognormalfamily.Expected
y_predcolumns:[loc (log-space mean), scale (log-space std)].
Calibration & Uncertainty
- class deeptab.metrics.CoverageProbability(alpha=0.05)[source]
Empirical coverage probability at a given
1 - alphalevel.Expected
y_predcolumns:[lower_bound, upper_bound].A well-calibrated model should have coverage close to
1 - alpha. Higher is not unconditionally better — the target is the nominal level.- Parameters:
alpha (
float) – Significance level, e.g.0.05for 95% prediction intervals.
- class deeptab.metrics.SharpnessScore[source]
Mean prediction interval width (sharpness).
Narrower intervals are sharper (lower is better), but must be balanced against calibration. Expected
y_predcolumns:[lower, upper].
- class deeptab.metrics.ProbabilityIntegralTransform(n_bins=10, family='normal')[source]
PIT uniformity test — returns the mean absolute deviation from uniformity.
The Probability Integral Transform (PIT) of a well-calibrated forecast should be uniform on [0, 1]. This metric computes the PIT values for a Normal predictive distribution and returns the MAD from the uniform CDF. Lower is better (0 = perfect calibration).
Expected
y_predcolumns:[loc, scale](Normal distribution).- Parameters:
n_bins (
int) – Number of histogram bins for the PIT. Defaults to 10.family (
str) – Distribution family for CDF computation. Currently only"normal"is supported.