MambAttention

Hybrid Mamba + Attention architecture for complex feature interactions.

For detailed usage, configuration examples, and performance notes, see MambAttention.

API Reference

class deeptab.models.MambAttentionClassifier(model_config=None, preprocessing_config=None, trainer_config=None, observability_config=None, random_state=None)[source]

MambAttention classifier. This class extends the SklearnBaseClassifier class and uses the MambAttention model with the default MambAttention configuration.

Parameters:

model_config (MambAttentionConfig, optional) – Architecture hyperparameters for the model. If None, a default MambAttentionConfig is used. See that class for the full list of available fields.
preprocessing_config (PreprocessingConfig, optional) – Feature preprocessing settings such as scaling, encoding, and numerical embeddings. If None, defaults from PreprocessingConfig are used.
trainer_config (TrainerConfig, optional) – Training-loop settings such as epochs, batch size, learning rate, and early stopping. If None, defaults from TrainerConfig are used.
observability_config (ObservabilityConfig, optional) – Optional logging, experiment tracking, and run-directory settings (deeptab.core.observability.ObservabilityConfig). If None, observability is disabled and the estimator emits nothing.
random_state (int, optional) – Seed for reproducible weight initialisation and data shuffling.

Examples

>>> from deeptab.models import MambAttentionClassifier
>>> from deeptab.configs import MambAttentionConfig
>>> model = MambAttentionClassifier(model_config=MambAttentionConfig(d_model=64, n_layers=8))
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)

build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, stratify=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={}, class_weight=None, loss_fct=None, balanced_sampler=False, sample_weight=None)

Builds the model using the provided training data.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.
y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).
val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.
X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.
y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.
random_state (int) – Controls the shuffling applied to the data before applying the split.
batch_size (int) – Number of samples per gradient update.
shuffle (bool) – Whether to shuffle the training data before each epoch.
stratify (bool) – Whether to stratify the validation split on y so the split keeps the same class proportions. Set to False for a purely random split.
lr (float | None) – Learning rate for the optimizer.
lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.
lr_factor (float | None) – Factor by which the learning rate will be reduced.
train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.
val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.
weight_decay (float | None) – Weight decay (L2 penalty) coefficient.
dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.
class_weight (str | dict | list | ndarray | None) – Weights associated with classes for imbalanced data. "balanced" mirrors scikit-learn and uses n_samples / (n_classes * bincount(y)). A mapping {class_label: weight} or an array (ordered like np.unique(y)) sets weights explicitly. Ignored when loss_fct is an nn.Module.
loss_fct (nn.Module, str, or None, default=None) – Custom loss. An nn.Module is used as-is; a registered loss name (e.g. "focal", "bce", "cross_entropy") is built and combined with class_weight. None falls back to the default (weighted) task loss.
balanced_sampler (bool) – If True, draw class-balanced mini-batches with a WeightedRandomSampler (oversamples minority classes).
sample_weight (array-like, optional) – Explicit per-row sampling weights (length matches X). Takes precedence over balanced_sampler and drives the WeightedRandomSampler.

Returns:

self – The built classifier.

Return type:

object

property config

The instantiated model config object backing this estimator.

Stored on the private _config attribute so it stays out of sklearn’s get_params/__init__ introspection (it is derived from model_config/_model_cls rather than a constructor parameter), while remaining readable and settable as estimator.config.

configure_observability(config)

Wire up logging backends described by config.

Can be called at any point — before or after fit(). Changes take effect on the next lifecycle event emitted (i.e. the next fit() or predict() call).

Parameters:: config (ObservabilityConfig) – Observability settings. Imports optional dependencies lazily; raises ImportError with install hints if they are absent.
Return type:: None

describe()

Return a structured description of the estimator and fitted model.

The method is safe to call before fitting. Parameter counts and feature metadata are included only after the model has been built.

Return type:: dict[str, Any]

encode(X, embeddings=None, batch_size=64)

Return dense embedding vectors from the model backbone.

Runs the fitted model’s encode method on batches of X and concatenates the results into a single tensor.

Parameters:

X (array-like or DataFrame of shape (n_samples, n_features)) – Input features to encode.
embeddings (array-like or None, optional) – Pre-computed external embeddings aligned with the rows of X.
batch_size (int, default=64) – Number of samples processed in each forward pass.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor of shape (n_samples, embedding_dim)

Raises:

ValueError – If the model has not been fitted yet.

Examples

>>> clf = MLPClassifier()
>>> clf.fit(X_train, y_train)
>>> embeddings = clf.encode(X_test)        # (n_samples, embedding_dim)
>>> embeddings.shape
torch.Size([100, 64])

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:

X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.
y_true (array-like of shape (n_samples,)) – The true class labels.
embeddings (array-like or list, optional) – Embeddings for unstructured data inputs.
metrics (dict, optional) –
A {name: callable} dictionary where each callable has the signature metric(y_true, y_pred) -> float. Each callable may be a DeepTabMetric instance or any plain callable. Metrics that need probability scores (e.g. AUROC, LogLoss) should accept the 2-D predict_proba output as y_pred; metrics that need class labels (e.g. Accuracy, F1) should accept the 1-D predict output.

For DeepTabMetric instances, the method inspects the name attribute to decide which prediction format to supply: probability-based metrics (auroc, auprc, log_loss, brier, ece) receive predict_proba output; all others receive predict output.

If None, defaults to the registry defaults for "classification" (Accuracy, AUROC, LogLoss).

Returns:

scores – {metric_name: score} dictionary.

Return type:

dict

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, stratify=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, class_weight=None, loss_fct=None, balanced_sampler=False, sample_weight=None, **trainer_kwargs)

Trains the classification model using the provided training data. Optionally, a separate validation set can be used.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.
y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).
val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.
X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.
y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.
max_epochs (int) – Maximum number of epochs for training.
random_state (int) – Controls the shuffling applied to the data before applying the split.
batch_size (int) – Number of samples per gradient update.
shuffle (bool) – Whether to shuffle the training data before each epoch.
stratify (bool) – Whether to stratify the validation split on y so the split keeps the same class proportions. Set to False for a purely random split. When a TrainerConfig is set, its stratify value takes precedence.
patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.
monitor (str) – The metric to monitor for early stopping.
mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).
lr (float | None) – Learning rate for the optimizer.
lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.
factor (float, default=0.1) – Factor by which the learning rate will be reduced.
weight_decay (float | None) – Weight decay (L2 penalty) coefficient.
checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.
train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.
val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.
dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.
rebuild (bool, default=True) – Whether to rebuild the model when it already was built.
class_weight (str | dict | list | ndarray | None) – Weights associated with classes for imbalanced data. "balanced" mirrors scikit-learn and uses n_samples / (n_classes * bincount(y)) so under-represented classes contribute more to the loss. A mapping {class_label: weight} or an array (ordered like np.unique(y)) sets weights explicitly. For binary targets the weights are converted to a pos_weight for BCEWithLogitsLoss; for multiclass they become the weight of CrossEntropyLoss. Ignored when loss_fct is an nn.Module.
loss_fct (nn.Module, str, or None, default=None) – Custom loss. An nn.Module is used as-is; a registered loss name (e.g. "focal", "bce", "cross_entropy") is built and combined with class_weight (see deeptab.training.losses.build_classification_loss()). None falls back to the default (weighted) task loss.
balanced_sampler (bool) – If True, draw class-balanced mini-batches with a WeightedRandomSampler (oversamples minority classes). This rebalances the data instead of (or in addition to) reweighting the loss.
sample_weight (array-like, optional) – Explicit per-row sampling weights (length matches X). Takes precedence over balanced_sampler; rows are drawn into batches in proportion to their weight.
**trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted classifier.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:: requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.
Returns:: The total number of parameters in the model.
Return type:: int
Raises:: ValueError – If the model has not been built prior to calling this method.

get_params(deep=True): Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:: path (str) – Path to a file previously written by save().
Returns:: A fully reconstructed, ready-to-predict estimator of the same type that was saved.
Return type:: estimator

Examples

>>> loaded = MLPClassifier.load("my_model.deeptab")
>>> predictions = loaded.predict(X_test)
>>> print(loaded.task_info_["task"])
'classification'
>>> print(loaded.n_features_in_)
6

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:

X (array-like) – Training data.
y (array-like) – Training labels.
X_val (array-like, optional) – Validation data and labels.
y_val (array-like, optional) – Validation data and labels.
time (int) – The number of optimization trials to run.
max_epochs (int) – Maximum number of epochs for training.
prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).
prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.
**optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

parameter_table(trainable_only=False)

Return one row per model parameter as a pandas DataFrame.

Parameters:: trainable_only (bool) – If True, include only parameters with requires_grad=True.
Return type:: DataFrame

predict(X, embeddings=None, device=None)

Predicts target labels for the given input samples.

Parameters:: X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.
Returns:: predictions – The predicted class labels.
Return type:: ndarray, shape (n_samples,)

predict_proba(X, embeddings=None, device=None)

Predicts class probabilities for the given input samples.

Parameters:: X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict class probabilities.
Returns:: probabilities – The predicted class probabilities.
Return type:: ndarray, shape (n_samples, n_classes)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:

pretrain_epochs (int, default=15) – Number of epochs to run pretraining.
k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.
temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.
save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.
lr (float, default=1e-3) – Learning rate for the pretraining optimizer.
use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.
use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.
pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:

ValueError – If the model has not been built before calling this method.
ValueError – If the model does not contain an embedding layer.

Notes

This function requires that self.build_model() has been called beforehand.
The pretraining method uses self.task_model.estimator.embedding_layer.
The method invokes super()._pretrain() with regression mode enabled.

profile(X, y, dry_run=True, n_forward_passes=3, batch_size=None, random_state=0)

Build the model on a small data sample and run a dry forward pass.

Combines describe(), runtime_info(), and a timed forward pass to give a complete pre-training picture without running any gradient updates.

Parameters:

X (DataFrame or array-like) – Feature matrix. The first min(256, len(X)) rows are used for the dry-run build.
y (array-like) – Target vector aligned with X.
dry_run (bool) – When True the temporary model is discarded after profiling so the estimator’s state is left unchanged (unless the model was already built, in which case the existing model is used directly).
n_forward_passes (int) – Number of forward passes used to estimate per-batch runtime. The median is reported to reduce noise.
batch_size (int | None) – Override the batch size used for timing. Defaults to the value in trainer_config or 64.
random_state (int) – Seed passed to the dry-run build for reproducibility.

Returns:

Keys:

builds: True if the model constructed without error.
error: Exception message when builds is False, else None.
device: Device string (e.g. "cpu", "mps:0", "cuda:0").
dtype: Parameter dtype string (e.g. "float32").
total_params: Total number of model parameters.
trainable_params: Number of trainable parameters.
memory_mb: Estimated parameter memory in megabytes.
batch_shape: Shape of the first dummy batch drawn from the data module.
output_shape: Shape of the model output for that dummy batch (None on error).
loss_fct: Class name of the loss function.
forward_ms_median: Median forward-pass wall time in milliseconds (None on error).
forward_ms_min: Minimum forward-pass wall time in milliseconds (None on error).
describe: Full describe() dict (populated after build).
runtime: Full runtime_info() dict (populated after build).

Return type:

dict[str, Any]

runtime_info()

Return runtime setup information for the estimator.

The method is safe to call before fitting. Device and dtype are inferred from model parameters when a model has been built.

Return type:: dict[str, Any]

save(path=None)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: architecture/config, neural-network weights, fitted preprocessing state, feature schema, column order, task metadata, classifier classes (when available), and package versions for debugging reloads across environments.

Parameters:: path (str | None) – Destination file path (e.g. "model.pt"). When None and a run directory is active (i.e. configure_observability was called with a config that creates a run dir), the model is saved to <run_dir>/artifacts/model.deeptab automatically. When no run dir is active either, raises ValueError.
Returns:: The resolved path the bundle was written to.
Return type:: str
Raises:: ValueError – If the model has not been fitted yet, or path is None and no run directory is active.

Examples

>>> model = MLPClassifier()
>>> model.fit(X_train, y_train)
>>> saved_path = model.save("my_model.deeptab")
>>> loaded = MLPClassifier.load(saved_path)
>>> predictions = loaded.predict(X_test)

score(X, y, embeddings=None, metric=None)

Calculate the score of the model using the specified metric.

Parameters:

X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.
y (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.
metric (tuple or callable, optional) – A tuple containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False). If omitted, accuracy is used to match scikit-learn classifier behavior.

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters): Set the parameters of this estimator.

summary()

Return a compact human-readable model summary.

Return type:: str

property task_model

The fitted Lightning task model, or None before fitting.

This exposes the underlying TaskModel (which holds the architecture via task_model.estimator and the loss via task_model.loss_fct) as a stable, public read-only attribute.

class deeptab.models.MambAttentionRegressor(model_config=None, preprocessing_config=None, trainer_config=None, observability_config=None, random_state=None)[source]

MambAttention regressor. This class extends the SklearnBaseRegressor class and uses the MambAttention model with the default MambAttention configuration.

Parameters:

model_config (MambAttentionConfig, optional) – Architecture hyperparameters for the model. If None, a default MambAttentionConfig is used. See that class for the full list of available fields.
preprocessing_config (PreprocessingConfig, optional) – Feature preprocessing settings such as scaling, encoding, and numerical embeddings. If None, defaults from PreprocessingConfig are used.
trainer_config (TrainerConfig, optional) – Training-loop settings such as epochs, batch size, learning rate, and early stopping. If None, defaults from TrainerConfig are used.
observability_config (ObservabilityConfig, optional) – Optional logging, experiment tracking, and run-directory settings (deeptab.core.observability.ObservabilityConfig). If None, observability is disabled and the estimator emits nothing.
random_state (int, optional) – Seed for reproducible weight initialisation and data shuffling.

Examples

>>> from deeptab.models import MambAttentionRegressor
>>> from deeptab.configs import MambAttentionConfig
>>> model = MambAttentionRegressor(model_config=MambAttentionConfig(d_model=64, n_layers=8))
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)

build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.
y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).
val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.
X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.
y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.
random_state (int) – Controls the shuffling applied to the data before applying the split.
batch_size (int) – Number of samples per gradient update.
shuffle (bool) – Whether to shuffle the training data before each epoch.
lr (float | None) – Learning rate for the optimizer.
lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.
factor (float, default=0.1) – Factor by which the learning rate will be reduced.
weight_decay (float | None) – Weight decay (L2 penalty) coefficient.
train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.
val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.
dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built regressor.

Return type:

object

property config

The instantiated model config object backing this estimator.

Stored on the private _config attribute so it stays out of sklearn’s get_params/__init__ introspection (it is derived from model_config/_model_cls rather than a constructor parameter), while remaining readable and settable as estimator.config.

configure_observability(config)

Wire up logging backends described by config.

Can be called at any point — before or after fit(). Changes take effect on the next lifecycle event emitted (i.e. the next fit() or predict() call).

Parameters:: config (ObservabilityConfig) – Observability settings. Imports optional dependencies lazily; raises ImportError with install hints if they are absent.
Return type:: None

describe()

Return a structured description of the estimator and fitted model.

The method is safe to call before fitting. Parameter counts and feature metadata are included only after the model has been built.

Return type:: dict[str, Any]

encode(X, embeddings=None, batch_size=64)

Return dense embedding vectors from the model backbone.

Runs the fitted model’s encode method on batches of X and concatenates the results into a single tensor.

Parameters:

X (array-like or DataFrame of shape (n_samples, n_features)) – Input features to encode.
embeddings (array-like or None, optional) – Pre-computed external embeddings aligned with the rows of X.
batch_size (int, default=64) – Number of samples processed in each forward pass.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor of shape (n_samples, embedding_dim)

Raises:

ValueError – If the model has not been fitted yet.

Examples

>>> clf = MLPClassifier()
>>> clf.fit(X_train, y_train)
>>> embeddings = clf.encode(X_test)        # (n_samples, embedding_dim)
>>> embeddings.shape
torch.Size([100, 64])

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:

X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.
y_true (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.
metrics (dict) – A dictionary where keys are metric names and values are the metric functions.

Notes

This method uses the predict method to generate predictions and computes each metric.

Returns:: scores – A dictionary with metric names as keys and their corresponding scores as values.
Return type:: dict

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', dataloader_kwargs={}, train_metrics=None, val_metrics=None, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.
y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).
val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.
X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.
y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.
max_epochs (int) – Maximum number of epochs for training.
random_state (int) – Controls the shuffling applied to the data before applying the split.
batch_size (int) – Number of samples per gradient update.
shuffle (bool) – Whether to shuffle the training data before each epoch.
patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.
monitor (str) – The metric to monitor for early stopping.
mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).
lr (float | None) – Learning rate for the optimizer.
lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.
factor (float, default=0.1) – Factor by which the learning rate will be reduced.
weight_decay (float | None) – Weight decay (L2 penalty) coefficient.
checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.
dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.
train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.
val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.
rebuild (bool, default=True) – Whether to rebuild the model when it already was built.
**trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:: requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.
Returns:: The total number of parameters in the model.
Return type:: int
Raises:: ValueError – If the model has not been built prior to calling this method.

get_params(deep=True): Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:: path (str) – Path to a file previously written by save().
Returns:: A fully reconstructed, ready-to-predict estimator of the same type that was saved.
Return type:: estimator

Examples

>>> loaded = MLPClassifier.load("my_model.deeptab")
>>> predictions = loaded.predict(X_test)
>>> print(loaded.task_info_["task"])
'classification'
>>> print(loaded.n_features_in_)
6

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:

X (array-like) – Training data.
y (array-like) – Training labels.
X_val (array-like, optional) – Validation data and labels.
y_val (array-like, optional) – Validation data and labels.
time (int) – The number of optimization trials to run.
max_epochs (int) – Maximum number of epochs for training.
prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).
prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.
**optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

parameter_table(trainable_only=False)

Return one row per model parameter as a pandas DataFrame.

Parameters:: trainable_only (bool) – If True, include only parameters with requires_grad=True.
Return type:: DataFrame

predict(X, embeddings=None, device=None)

Predicts target values for the given input samples.

Parameters:: X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.
Returns:: predictions – The predicted target values.
Return type:: ndarray, shape (n_samples,) or (n_samples, n_outputs)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:

pretrain_epochs (int, default=15) – Number of epochs to run pretraining.
k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.
temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.
save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.
lr (float, default=1e-3) – Learning rate for the pretraining optimizer.
use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.
use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.
pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:

ValueError – If the model has not been built before calling this method.
ValueError – If the model does not contain an embedding layer.

Notes

This function requires that self.build_model() has been called beforehand.
The pretraining method uses self.task_model.estimator.embedding_layer.
The method invokes super()._pretrain() with regression mode enabled.

profile(X, y, dry_run=True, n_forward_passes=3, batch_size=None, random_state=0)

Build the model on a small data sample and run a dry forward pass.

Combines describe(), runtime_info(), and a timed forward pass to give a complete pre-training picture without running any gradient updates.

Parameters:

X (DataFrame or array-like) – Feature matrix. The first min(256, len(X)) rows are used for the dry-run build.
y (array-like) – Target vector aligned with X.
dry_run (bool) – When True the temporary model is discarded after profiling so the estimator’s state is left unchanged (unless the model was already built, in which case the existing model is used directly).
n_forward_passes (int) – Number of forward passes used to estimate per-batch runtime. The median is reported to reduce noise.
batch_size (int | None) – Override the batch size used for timing. Defaults to the value in trainer_config or 64.
random_state (int) – Seed passed to the dry-run build for reproducibility.

Returns:

Keys:

builds: True if the model constructed without error.
error: Exception message when builds is False, else None.
device: Device string (e.g. "cpu", "mps:0", "cuda:0").
dtype: Parameter dtype string (e.g. "float32").
total_params: Total number of model parameters.
trainable_params: Number of trainable parameters.
memory_mb: Estimated parameter memory in megabytes.
batch_shape: Shape of the first dummy batch drawn from the data module.
output_shape: Shape of the model output for that dummy batch (None on error).
loss_fct: Class name of the loss function.
forward_ms_median: Median forward-pass wall time in milliseconds (None on error).
forward_ms_min: Minimum forward-pass wall time in milliseconds (None on error).
describe: Full describe() dict (populated after build).
runtime: Full runtime_info() dict (populated after build).

Return type:

dict[str, Any]

runtime_info()

Return runtime setup information for the estimator.

The method is safe to call before fitting. Device and dtype are inferred from model parameters when a model has been built.

Return type:: dict[str, Any]

save(path=None)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: architecture/config, neural-network weights, fitted preprocessing state, feature schema, column order, task metadata, classifier classes (when available), and package versions for debugging reloads across environments.

Parameters:: path (str | None) – Destination file path (e.g. "model.pt"). When None and a run directory is active (i.e. configure_observability was called with a config that creates a run dir), the model is saved to <run_dir>/artifacts/model.deeptab automatically. When no run dir is active either, raises ValueError.
Returns:: The resolved path the bundle was written to.
Return type:: str
Raises:: ValueError – If the model has not been fitted yet, or path is None and no run directory is active.

Examples

>>> model = MLPClassifier()
>>> model.fit(X_train, y_train)
>>> saved_path = model.save("my_model.deeptab")
>>> loaded = MLPClassifier.load(saved_path)
>>> predictions = loaded.predict(X_test)

score(X, y, embeddings=None, metric=<function r2_score>)

Calculate the score of the model using the specified metric.

Parameters:

X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.
metric (callable, default=r2_score) – The metric function to use for evaluation. Must be a callable with the signature metric(y_true, y_pred). Defaults to r2_score to match scikit-learn’s RegressorMixin convention (higher is better).

Returns:

score – The score calculated using the specified metric.

Return type:

float

Examples

>>> from sklearn.metrics import mean_squared_error, mean_absolute_error
>>> model.score(X_test, y_test)                             # R² (default)
>>> model.score(X_test, y_test, metric=mean_squared_error)  # MSE
>>> model.score(X_test, y_test, metric=mean_absolute_error) # MAE

set_params(**parameters): Set the parameters of this estimator.

summary()

Return a compact human-readable model summary.

Return type:: str

property task_model

The fitted Lightning task model, or None before fitting.

This exposes the underlying TaskModel (which holds the architecture via task_model.estimator and the loss via task_model.loss_fct) as a stable, public read-only attribute.

class deeptab.models.MambAttentionLSS(model_config=None, preprocessing_config=None, trainer_config=None, observability_config=None, random_state=None)[source]

MambAttention LSS for distributional regression. This class extends the SklearnBaseLSS class and uses the MambAttention model with the default MambAttention configuration.

Parameters:

model_config (MambAttentionConfig, optional) – Architecture hyperparameters for the model. If None, a default MambAttentionConfig is used. See that class for the full list of available fields.
preprocessing_config (PreprocessingConfig, optional) – Feature preprocessing settings such as scaling, encoding, and numerical embeddings. If None, defaults from PreprocessingConfig are used.
trainer_config (TrainerConfig, optional) – Training-loop settings such as epochs, batch size, learning rate, and early stopping. If None, defaults from TrainerConfig are used.
observability_config (ObservabilityConfig | None) – Optional logging, experiment tracking, and run-directory settings (deeptab.core.observability.ObservabilityConfig). If None, observability is disabled and the estimator emits nothing.
random_state (int, optional) – Seed for reproducible weight initialisation and data shuffling.

Examples

>>> from deeptab.models import MambAttentionLSS
>>> from deeptab.configs import MambAttentionConfig
>>> model = MambAttentionLSS(model_config=MambAttentionConfig(d_model=64, n_layers=8))
>>> model.fit(X_train, y_train, family='normal')
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)

build_model(X, y, val_size=0.2, X_val=None, y_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.
y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).
val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.
X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.
y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.
random_state (int) – Controls the shuffling applied to the data before applying the split.
batch_size (int) – Number of samples per gradient update.
shuffle (bool) – Whether to shuffle the training data before each epoch.
lr (float | None) – Learning rate for the optimizer.
lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.
lr_factor (float | None) – Factor by which the learning rate will be reduced.
train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.
val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.
weight_decay (float | None) – Weight decay (L2 penalty) coefficient.
dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built distributional regressor.

Return type:

object

property config

The instantiated model config object backing this estimator.

Stored on the private _config attribute so it stays out of sklearn’s get_params/__init__ introspection (it is derived from model_config/_model_cls rather than a constructor parameter), while remaining readable and settable as estimator.config.

configure_observability(config)

Wire up logging backends described by config.

Can be called at any point — before or after fit(). Changes take effect on the next lifecycle event emitted (i.e. the next fit() or predict() call).

Parameters:: config (ObservabilityConfig) – Observability settings. Imports optional dependencies lazily; raises ImportError with install hints if they are absent.
Return type:: None

describe()

Return a structured description of the estimator and fitted model.

The method is safe to call before fitting. Parameter counts and feature metadata are included only after the model has been built.

Return type:: dict[str, Any]

encode(X, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:

X (array-like or DataFrame) – Input data to be encoded.
batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, metrics=None, distribution_family=None)

Evaluate the model on the given data using specified metrics.

Parameters:

X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.
y_true (array-like of shape (n_samples,)) – The true target values.
metrics (dict, optional) – A {name: callable} dictionary of metric functions with signature metric(y_true, y_pred) -> float. Each callable may be a DeepTabMetric instance or any plain callable. When a metric has needs_raw=True, raw model logits are passed instead of transformed distribution parameters. If None, the default metrics for the distribution family are used (see deeptab.metrics.get_default_metrics()).
distribution_family (str, optional) – Distribution family key (e.g. "normal", "gamma"). Inferred from the fitted model when None.

Returns:

scores – {metric_name: score} dictionary.

Return type:

dict

fit(X, y, family, val_size=0.2, X_val=None, y_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', distributional_kwargs=None, train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.
y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).
family (str) – The name of the distribution family to use for the loss function. Examples include ‘normal’ for regression tasks.
val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.
X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.
y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.
max_epochs (int) – Maximum number of epochs for training.
random_state (int) – Controls the shuffling applied to the data before applying the split.
batch_size (int) – Number of samples per gradient update.
shuffle (bool) – Whether to shuffle the training data before each epoch.
patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.
monitor (str) – The metric to monitor for early stopping.
mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).
lr (float | None) – Learning rate for the optimizer.
lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.
factor (float, default=0.1) – Factor by which the learning rate will be reduced.
weight_decay (float | None) – Weight decay (L2 penalty) coefficient.
distributional_kwargs (dict, default=None) – any arguments taht are specific for a certain distribution.
train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.
val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.
checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.
dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.
**trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_default_metrics(distribution_family)

Return default evaluation metrics for the given distribution family.

Delegates to deeptab.metrics.get_default_metrics_dict(), which returns a {name: DeepTabMetric} dictionary covering all supported distribution families.

Parameters:: distribution_family (str) – Distribution family key, e.g. "normal", "gamma".
Returns:: {metric_name: callable} dictionary of metric functions.
Return type:: dict

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:: requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.
Returns:: The total number of parameters in the model.
Return type:: int
Raises:: ValueError – If the model has not been built prior to calling this method.

get_params(deep=True): Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:: path (str) – Path to a file previously written by save().
Returns:: A fully reconstructed, ready-to-predict estimator. Exposes artifact_metadata_, architecture_metadata_, feature_schema_, input_columns_, task_info_, classes_, and versions_ attributes after loading.
Return type:: estimator

Examples

>>> loaded = MLPLSS.load("my_lss_model.deeptab")
>>> predictions = loaded.predict(X_test)
>>> print(loaded.task_info_["family"])
'normal'

optimize_hparams(X, y, X_val=None, y_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:

X (array-like) – Training data.
y (array-like) – Training labels.
X_val (array-like, optional) – Validation data and labels.
y_val (array-like, optional) – Validation data and labels.
time (int) – The number of optimization trials to run.
max_epochs (int) – Maximum number of epochs for training.
prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).
prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.
**optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

parameter_table(trainable_only=False)

Return one row per model parameter as a pandas DataFrame.

Parameters:: trainable_only (bool) – If True, include only parameters with requires_grad=True.
Return type:: DataFrame

predict(X, raw=False, device=None)

Predicts target values for the given input samples.

Parameters:: X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.
Returns:: predictions – The predicted target values.
Return type:: ndarray, shape (n_samples,) or (n_samples, n_outputs)

profile(X, y, dry_run=True, n_forward_passes=3, batch_size=None, random_state=0)

Build the model on a small data sample and run a dry forward pass.

Combines describe(), runtime_info(), and a timed forward pass to give a complete pre-training picture without running any gradient updates.

Parameters:

X (DataFrame or array-like) – Feature matrix. The first min(256, len(X)) rows are used for the dry-run build.
y (array-like) – Target vector aligned with X.
dry_run (bool) – When True the temporary model is discarded after profiling so the estimator’s state is left unchanged (unless the model was already built, in which case the existing model is used directly).
n_forward_passes (int) – Number of forward passes used to estimate per-batch runtime. The median is reported to reduce noise.
batch_size (int | None) – Override the batch size used for timing. Defaults to the value in trainer_config or 64.
random_state (int) – Seed passed to the dry-run build for reproducibility.

Returns:

Keys:

builds: True if the model constructed without error.
error: Exception message when builds is False, else None.
device: Device string (e.g. "cpu", "mps:0", "cuda:0").
dtype: Parameter dtype string (e.g. "float32").
total_params: Total number of model parameters.
trainable_params: Number of trainable parameters.
memory_mb: Estimated parameter memory in megabytes.
batch_shape: Shape of the first dummy batch drawn from the data module.
output_shape: Shape of the model output for that dummy batch (None on error).
loss_fct: Class name of the loss function.
forward_ms_median: Median forward-pass wall time in milliseconds (None on error).
forward_ms_min: Minimum forward-pass wall time in milliseconds (None on error).
describe: Full describe() dict (populated after build).
runtime: Full runtime_info() dict (populated after build).

Return type:

dict[str, Any]

runtime_info()

Return runtime setup information for the estimator.

The method is safe to call before fitting. Device and dtype are inferred from model parameters when a model has been built.

Return type:: dict[str, Any]

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the architecture/config, neural-network weights, fitted preprocessing state, feature schema and column order, task metadata, distribution family, classifier classes for categorical LSS models, and package versions for debugging reloads across environments.

The bundle is built by build_save_bundle(), which is the single source of truth for artifact structure across all model variants.

Parameters:: path (str) – Destination file path (e.g. "model.pt").
Raises:: ValueError – If the model has not been fitted yet.
Return type:: None

Examples

>>> model = MLPLSS()
>>> model.fit(X_train, y_train, family="normal")
>>> model.save("my_lss_model.deeptab")
>>> loaded = MLPLSS.load("my_lss_model.deeptab")
>>> predictions = loaded.predict(X_test)

score(X, y, metric='NLL')

Calculate the score of the model using the specified metric.

Parameters:

X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.
metric (str, default="NLL") – So far, only negative log-likelihood is supported

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters): Set the parameters of this estimator.

summary()

Return a compact human-readable model summary.

Return type:: str

property task_model

The fitted Lightning task model, or None before fitting.

This exposes the underlying TaskModel (which holds the architecture via task_model.estimator and the loss via task_model.loss_fct) as a stable, public read-only attribute.